Graduate Research Assistant

Performance and Algorithms Research (PAR) Lab at the Lawrence Berkeley National Lab

San Francisco, California, United States

May 2023 - Aug 2023 · 3 mos

I was a graduate research intern with the Performance and Algorithms Research (PAR) Lab (May 2023 ‑ Aug 2023).

I was working on hybrid communication techniques and increasing fault tolerance of distributed learning for deep learning workflows. I built a hybrid AllReduce and Parameter Server approach to parameter distribution/update and collective communication for distributed training using PyTorch DDP and RPC.

Working under the guidance of Khaled Ibrahim.