Predicting the reactions of living cells—huge numbers of genes, proteins, and enzymes, embedded in complex pathways and feedback loops—is a challenging task.
Yet researchers are attempting just that, by building a computer model that predicts the behavior of a single cell of the bacterium Escherichia coli.
The new simulation is the largest of its kind yet, says Ilias Tagkopoulos, professor of computer science at the University of California, Davis. “The number of layers, and the amount of data involved are unprecedented.”
The dataset on which the model is based includes, for example, over 4,389 profiles of the expression of different genes and proteins across 649 different conditions. Both the dataset, named “Ecomics” and the integrated model, MOMA (Multi-Omics Model and Analytics) are available to other researchers to use and test.
The model, reported in Nature Communications, could be useful as a fast and inexpensive way to predict how an organism might behave in a specific experiment, Tagkopoulos says. Although no prediction can be as accurate as actually performing the experiment, this would help scientists design their hypotheses and experiments. Applications range from finding the best growth conditions in biotechnology to identifying key pathways for antibiotic and stress resistance.
Collecting and downloading the data took a week, but processing the data into a single dataset took two years of the three-year project. The team built models for four layers, starting with gene expression and working up to the activity at the whole-cell level. Then they integrated the layers together. They used techniques in machine learning to train the models to predict the behavior of each layer, and ultimately of the cell itself, under different conditions.
Researchers used the “Blue Waters” supercomputer at the National Center for Supercomputer Applications to build the model on computer clusters at UC Davis, and on supercomputers available through a national network. Blue Waters, one of the world’s most powerful supercomputers, is located at the National Center for Supercomputer Applications.
Although E. coli is a well-known organism, we are far from knowing everything about its biochemistry and metabolism, Tagkopoulos says.
“We are exploring a vast space here. Our aim is to create a crystal ball for the bacteria, which can help us decide what is the next experiment we should do to explore this space better.”
Tagkopoulos hopes to begin building similar databases and models for bacteria involved in foodborne illness, such as Salmonella enterica and Bacillus subtilis. He expects other researchers to draw on the Ecomics database, and hopes to make the MOMA model interface more accessible for biologists to use.
“We’re living in an amazing era at the intersection of computer science, engineering and biology,” he says. “It’s a very interesting time.”
The work was supported by the US Army Research Office and the National Science Foundation.
Source: UC Davis