I work on the large-scale deployment of GPT-3, principled approaches to large model training, and theories of infinitely wide neural networks.
Recently, my collaborators and I released Low-Rank Adaptation (LoRA) for large language models, which helps to adapt GPT-3 using 10,000x less storage space and to practically eliminate task-switching latency.
In 2020, Greg Yang and I released a paper on a new infinite-width limit that exhibits feature learning (ICML 2021), refuting the myth that wide models are linear in nature as suggested by the theory of Neural Tangent Kernel.
I was a member of the Microsoft Research AI Residency program. I graduated with a Bachelor of Science in Computer Science and Cognitive Science from Johns Hopkins University in 2019.
Latest from Edward Hu
In the pursuit of learning about fundamentals of the natural world, scientists have had success with coming at discoveries from both a bottom-up and top-down approach. Neuroscience is a great example of the former. Spanish anatomist Santiago Ramón y Cajal…