Graduate Research Assistant
Performance and Algorithms Research (PAR) Lab at the Lawrence Berkeley National Lab
San Francisco, California, United States
May 2023 - Aug 2023 · 3 mos
I was a graduate research intern with the Performance and Algorithms Research (PAR) Lab (May 2023 ‑ Aug 2023).
I was working on hybrid communication techniques and increasing fault tolerance of distributed learning for deep learning workflows. I built a hybrid AllReduce and Parameter Server approach to parameter distribution/update and collective communication for distributed training using PyTorch DDP and RPC.
Working under the guidance of Khaled Ibrahim.