Follow Me...
Undergraduate Student at Tsinghua University, focusing on reinforcement learning for reasoning-heavy large language models.
I am a third-year B.Eng. student in Automation at Tsinghua University, advised by Prof. Gao Huang in LeapLab. My previous research probed how reinforcement learning with verifiable rewards (RLVR) shapes reasoning behaviour in large language models. Currently, I am exploring further in reinforcement learning for reasoning.
2025
National Scholarship of China
2024
National Scholarship of China
Reinforcement Learning for Reasoning LLM & MLLM,
Embodied AI
Find the essence of reinforcement learning for reasoning and build a universal theory for reasoning.
GPA 3.99/4.00 (Rank 1/172).
Core contributor to the program Limit-of-RLVR.
I am actively looking for collaborations on RL and other topics. Feel free to reach out via email—I'm happy to talk!