This computational and experimental biologist sees a future in biophysically interpretable machine learning.
“I’ve had some weird ideas along the way, but I had the freedom to try out whatever I wanted, to explore different things.”
Shaoxun Liu fondly remembers living with his grandfather, who was a doctor, during the 2003 outbreak of severe acute respiratory syndrome. Liu describes his grandfather as a warm-hearted person who helped many people in Beijing.
Through him, “I realized that healthcare is the key to better quality of life,” says Liu, who is now a fifth-year PhD candidate in the labs of Harmen Bussemaker and Richard Mann.
Initially, Liu thought he would become a doctor like his grandfather. But biking to high school every day, Liu would pass the Chinese Academy of Agricultural Sciences. One day, he was inspired to knock on the door and see what the scientists were working on.
The researchers welcomed him with open arms and took him on board to select for beneficial and heritable traits among seeds that had been sent into space. Liu remembers it as a fun experience that drew his focus away from practicing medicine and toward studying the mechanisms of development.
Studying biochemistry as an undergraduate at the University of Southern California, Liu was soon drawn in by the popular fields of computer science and artificial intelligence. “That was when I had my first experience of reading a scientific paper out of interest, not for homework,” he says. “It was fun.”
He joined a lab that was doing click chemistry and stepping into computational biology by making functional predictions according to sequence-level information. There, he developed algorithms and used machine learning frameworks to make predictions using genomic information that were then transformed into ligand-binding properties.
Liu was initially agnostic as to what ligand he worked with: “In my undergrad years, I didn’t really care if I was dealing with DNA, a protein, or an antibody,” he says. “All I cared about was whether we could find a robust and interpretable framework that we could learn from.”
But when Liu met Bussemaker during his interview with the Columbia University Department of Biology, he was drawn to his work on DNA. Bussemaker’s work focuses on biophysical modeling of protein-ligand interactions—primarily, protein-DNA recognition.
Beyond being the building block of life, “DNA sequences can be easily encoded into a vector,” Liu explains. “That’s very important to building biophysically interpretable machine learning models.”
Liu committed to Bussemaker’s lab—with one caveat. “Since I come from a wet lab experimental background from my undergrad days, I still hold a strong belief that all predictions have to be benchmarked by wet-lab experiments,” Liu says. Bussemaker supported this idea, so they set up a joint mentorship with Richard Mann so he could test his predictions in test tubes and fruit flies.
Liu would build constructs that he predicted to have a specific function and then gather his own experimental data to validate the prediction. “Experimental validation is the cornerstone of a biologically meaningful algorithm because everything has to boil down to the phenotypes, behaviors, or health benefits or detriments of whatever species I’m working with,” Liu explains.
Of course, experimental and computational biology involve two distinct skillsets. Liu feels strongest in the computational part and pursues the experimental part out of necessity. “I have to see experimental validation data to find a peace of mind that my algorithms are working,” he explains.
Plus, combining the skillsets allows him to make efficient use of downtime. “I set up my experiment in the morning, I get back to my computer and code for the afternoon, and then the gel I’m running will be done, or the sequencing-run will finish,” he says. “Then I set up my algorithm to run overnight, and I can look at my experimental data.”
“Experimental validation is the cornerstone of a biologically meaningful algorithm because everything has to boil down to the phenotypes, behaviors, or health benefits or detriments of whatever species I’m working with.”
Liu’s recent first-author paper in Nucleic Acids Research built on a machine-learning approach pioneered by the Bussemaker lab and a high-throughput in vitro binding assay pioneered by the Mann lab as part of their long-standing collaboration. The team’s predictive mechanism yielded convincing results for anticipating the DNA-binding specificity of transcription-factor mutants.
The team was even able to make predictions for gain-of-function mutations within the genome. Such mutations are notoriously difficult because the genome not only lacks its usual function but could also have new, disruptive properties.
Today, Liu’s focus has more of an experimental emphasis. Nubbin is a protein in fruit flies that regulates wing development; when it’s knocked out, the fly no longer has wings. Liu has expressed the Nubbin protein in vitro and then probed it in vivo.
“We were able to find two specific positions that should be impactful in maintaining the functions of the two DNA-binding domains of Nubbin,” he explains. He’s working on transferring those mutations into an in vivo system using CRISPR-Cas9 and doing some in vivo DNA binding profiling.
Liu feels lucky to have had two great mentors. “I’ve had some weird ideas along the way, but I had the freedom to try out whatever I wanted, to explore different things,” he says.
All this had led to an enjoyable time at Columbia. “I have a pretty colorful life,” he says. “I don’t know how much of this comes from my mentors, or just Columbia as a whole, but PhD study can be fun and enjoyable in a lot of ways.”
When he’s waiting for his programs and reactions to finish, Liu plays basketball for a couple hours a few times per week. “When I play, I’m fully relaxed,” he says. “I don’t have to think about experiments.” Not only is basketball is a great way to meet students and professors in other fields; it helps him keep things in perspective. “If I fail in my experiments, at least I can try to get my wins back on the basketball court,” he says. One of those wins includes a 2025 intramural championship.
Mentoring other scientists has been an important part of his journey. Liu is proud to be able to share the techniques, attitudes, and communication skills he’s gained with his successors in the lab. “I feel like I’m passing the torch of science,” he says.
After graduating, Liu hopes to secure a postdoctoral position in a computational biophysics or functional genomics lab, using computational models to identify pathological targets and design therapeutic molecules.While his methods may have changed, his motivations have not: “Even now that I’m equipped with more modern and sophisticated knowledge about biology, my core motivation is still providing healthcare to people, just like my grandfather did.”
By Alexandra A. Taylor
