Accelerating protein engineering with fitness landscape modeling and reinforcement learning

Haoran Sun; Liang He; Pan Deng; Guoqing Liu; Zhiyu Zhao; Yuliang Jiang; Chuan Cao; Fusong Ju; Lijun Wu; Haiguang Liu; Tao Qin; Tie-Yan Liu

Accelerating protein engineering with fitness landscape modeling and reinforcement learning

Haoran Sun ,
Liang He ,
Pan Deng ,
Guoqing Liu ,
Zhiyu Zhao ,
Yuliang Jiang ,
Chuan Cao ,
Fusong Ju ,
Lijun Wu ,
Haiguang Liu ,
Tao Qin ,
Tie-Yan Liu

March 2025

bioRxiv

Preprint

Publication

Download BibTex

Protein engineering holds significant promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited lab capacity constrains the discovery of optimal sequences. To address this, we present the µProtein framework, which accelerates protein engineering by combining µFormer, a deep learning model for accurate mutational effect prediction, with µSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using µFormer as an oracle. µProtein leverages single mutation data to predict optimal sequences with complex, multi-amino acid mutations through its modeling of epistatic interactions and a multistep search strategy. Except from state-of-the-art performance on benchmark datasets, µProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing the highest known activity level, in wet-lab, trained solely on single mutation data. These results demonstrate µProtein’s capability to discover impactful mutations across vast protein sequence space, offering a robust, efficient approach for protein optimization.

Publication Downloads

Mu-Protein

June 13, 2025

µProtein is an open-source framework for protein sequence optimization, combining a protein fitness prediction model with reinforcement learning to efficiently explore the mutational landscape. It demonstrates strong generalization across diverse proteins and has been experimentally validated to design high-functioning enzyme variants using only single-mutation data.

Download Data