Science Engine

Using AI, high-performance computing, and human expertise to solve society’s most challenging problems

Science Engine is harnessing advances in artificial intelligence (AI) and high-performance computing to revolutionize research and development (R&D) processes in industries that are anchored in hard sciences, including biopharma, chemicals, materials, cosmetics, foods, alternative energy and decarbonization.

Here is an example of a Copilot experience:

Our goals

Science Engine has four goals that support and complement each other:

Provide foundational AI models for science that learn over massive multi-modal data, including literature, patents, regulations, scientific forums, molecular assays, protein and crystal geometry, experiment logs, ‘omics results, imagery, 3D scans and sensor signals. These context-rich models are designed for multiple tasks, including prediction and generation.
Provide a system to scale and govern the consumption of AI (including last-mile models added by customers) through multiple experiences, from citizen apps to Python notebooks.
Work in partnership with market leaders in key science-based industries, on durable missions, to solve some of the most important problems currently facing humanity, executing in a way that delivers early value to SMEs while driving for longer-term breakthroughs.
Boost the creativity and productivity of SMEs and accelerate the R&D process by making all the world’s learning available, in context, enabling SMEs to add their own expertise and intuition, run “what if” scenarios, and turn their working style and methodology into apps for their teams.

AI as a platform for innovation

Until recently, every enterprise AI challenge required a unique, bottom-up solution. The arrival of the transformer architecture in 2019 changed that, making it possible to create models that could learn context: not just how a word is used in a particular sentence, but what appears in the preceding and succeeding paragraphs, how the emotion changes in the prose, what else that author wrote, how other authors treated similar subjects, and so on.

This allowed the development of foundational models: large-language models like GPT-3, language-image models like DALL-E 2, and code-interpretation models like Codex/Copilot. As a result, data scientists no longer need to start from scratch; the focus on AI is now tuning, adaptation and last-mile models.

Part of the Science Engine effort is delivering foundational models for science, to achieve an effect parallel to the one we are seeing in language, images/videos and code. Science Engine delivers foundational science models along with a system to consume those models and govern and evolve AI-amplified R&D activities and collaborations.

With Science Engine, our goal is to deliver foundational models for commercial sciences—models that transfer-learn the vocabulary of nature across domains. To achieve this goal, we must be able to interpret a broad range of complex information types and modalities: literature, patents, regulations and forums; sketches, charts, tables and schematics; protein, gene, chemical and geochemical assays and reaction and diffusion pathways; cellular and biopsy imagery; multiple ‘omics; EKG, ground-penetrating radar and other sensor signals; 3D and hyperspectral scans; and logs of experiments and pilots.

Yet while foundational models make it easy to complete the AI for any given application, they also create a challenge – particularly in an enterprise setting. When the use of AI scales, when there are hundreds of applications, last-mile AI models and data sources, then complex dependencies and relationships develop. In an enterprise setting, it is not enough to deliver foundational models. We also need to deliver the means to scale the consumption, evolution and governance of AI, including models upstream and downstream of the foundational models, via myriad experiences. Science Engine addresses this challenge.

The image, which consists of a stack of blue rectangles on the left and one vertical rectangle on the right, shows how Project Science Engine delivers foundational science models, along with a system to consume those models and to govern and evolve AI-amplified R&D activities and collaborations.

Our approach

We are working with market leaders in key commercial sciences industries on scenarios such as targeted protein degradation, interpreting multi-omics data to predict cardiovascular disease (CVD), screening generated molecules, formulating cosmetics that are informed by gene expression, using metagenomics for biofuels, applying metal-organic frameworks (MOFs) for carbon capture, and discovering catalysts and reaction pathways that are less energy intensive and have lower perfluoroalkyl/polyfluoroalkyl (PFAS)-like byproducts.

Working across commercial science domains is key – the foundational models allow transfer-learning across domains to bootstrap new R&D endeavors where there is a paucity of direct data. An example is the transplant/synthesis of organs, tissues and grafts. Here, there is simply not enough direct and explicit evidence to generalize from spontaneous reporting. As such, we have to dig deeper, understand the phenomena at a scientific level at various resolutions: molecular, protein, cellular, and the individual human’s history and environment.

These are the kinds of scenarios that require us to bring together not just AI and technology, but also market leaders, domain experts and people with the power and influence to effect societal change.