Humans don't understand biology, how can we possibly build a virtual cell?
Many (fantastic) research organizations have set the building of a comprehensive simulation of all perturbations against a cell as their grand vision. One not very heavily discussed implication of this is if we will need to solve biology to actually produce a real virtual cell. Can we build oracles for systems we don't understand, and may not understand for a long time?
As Kim Branson points out, biology just might not be describable by concise, elegant laws like Physics might be. The most direct description of a biological system might just be the system itself. This implies a simple model of a system like a cell, tissue or organism will require running a simulation of every time-step exhaustively. I suppose if we had infinite computing power, this is doable, but let's try to stay in the real world.
A practitioner might ask: "Why does this matter? We don't need to simulate an entire system to understand its behaviors under very specific conditions." Here, I will argue that this is probably true! A virtual cell can *just* be a useful collection of tools, a useful collection of tools: perturbation forecasters, cell-state embeddings, molecular designers/simulations, and, yes, humans running clever experiments and not a complete digital twin.
We don't necessarily need a theory of the universe for all biology, but just a good enough approximation of what happens when we poke it a very specific way. In some ways, the recent explosion of molecular design models are virtual cells pointed at localized problems, predicting how parts of the system of a cell interacts with others. Then, human intuition can (maybe) qualitatively extrapolate what that might mean for the whole system.
The field seems to have firmly moved away from modeling biological systems with explicit equations, moving towards AI-based models predicting perturbation responses in a more opaque fashion. As we saw with the domains of language and vision, with enough data and compute, learning systems tend to beat out human-curated models.
Well then, the best we can do is train piles of models that “behave right” within some envelope of conditions, regardless of whether we can explain them. We are still in the "predict the weather" stage over "simulate the planet", let's get some good weather predictions to reduce the amount of experimentation required.
———
A few months back, we saw the (sad) news that the cutting edge transformer-based models for cell perturbation prediction were underperforming simple linear models. Certainly, lots more to be done! Regardless, I wanted to highlight some interesting developments in the field in the comments.
Image credit: Chan Zuckerberg Initiative CELLxGENE Discover