About
I received the B.S. degree in computer science from ShanghaiTech University and Ph.D degree in the electronic and computer engineering from the Hong Kong University of Science and Technology.
During my Ph.D. studies, I focused on the intersection of signal processing and machine learning. I aimed to demystify deep learning by applying signal processing tools such as sparse coding. Additionally, I was the first to introduce graph neural networks (GNNs) to communication and networking (V1 (opens in new tab) V2 (opens in new tab)), providing comprehensive theoretical analysis and practical guidelines. GNN-based resource allocation and signal processing have since been implemented in numerous base stations.
After graduation, I joined Microsoft. With large language models (LLMs) becoming a key focus for productivity, I shifted my research toward LLMs and large multimodal models (LMMs) to align with industry interests.
First, I concentrated on exploring the inner workings of these models using signal processing tools, with the goal of enhancing both their trustworthiness and performance. We were among the first to:
1. Analyze the emergence of reasoning and planning capabilities within LLMs and the gap between supervised fine-tuning (SFT) and reinforcement learning (RL), applying these insights to real-world agents.
2. Investigate LMMs using sparse coding tools, applying them to reduce hallucinations.
3. Provide theoretical guidelines for Mixture of Experts (MoE) structures, applying them to vision foundation models.
Second, I focused on the training systems for LLMs and LMMs. We developed the BlockOptimizers, capable of fine-tuning 8 billion-parameter models on an RTX 3090 and 70 billion-parameter models on four A100 GPUs.
Together with my amazing colleagues, we applied these techniques to fields such as embodied AI (Habi (opens in new tab) Diffusion Veteran (opens in new tab)) and AI for Science (Omni-DNA (opens in new tab) MIMSID (opens in new tab) MuDM (opens in new tab) GraphormerV2 (opens in new tab)).
During my part-time, I write blogs on AI, mathematics, and physics, with more than 10 thousand followers and favorites:
- Blogs on Triton Programming (opens in new tab)
- Blogs on Graph Neural Networks (opens in new tab)
- Blogs on Navier-Stokes Equations (opens in new tab)
- Blogs on Nonconvex Optimization (opens in new tab)