(US & Canada) Jeffrey Reid, VP, Chief Data Officer at Regeneron Genetics Center, speaks with Jack Berkowitz, Chief Data Officer at Securiti, in a video interview about the organization’s way of approaching genomic data, incorporating analytics in company strategy, accepting the data gaps and understanding what adds real value to data, and the role of AI.
Here are some insightful quotes from the interview:
On data strategy: "If we want to build a very large-scale data asset that will inform drug development well into the future, let's build it from scratch so that we can understand, manage and control it."
On building for the future: "The vision of starting a genetic center from greenfield was to commit to cloud computing and reach our destination faster and cheaper than building on-prem."
On their unique approach: "Instead of collecting data haphazardly and hoping for insights, we've built our systems with the purpose of generating high-quality data for analytics."
On the value of collaboration: "The hard part of our strategy is finding partners to work with and bringing resources and expertise to make that data happen."
On expanding data modalities: "We realized that a lot of the value we could add wasn't just about adding scale but about gathering more informative data assets from the samples we already have."
On the role of AI: "Using AI thoughtfully can help extract additional information from various data types, offering greater leverage for discovery."
On measurement challenges: "Measuring something like the state of a brain tumor creates significant challenges, but where we can, we leverage both existing data and AI to expand our insights."
On organizational impact: "We found that our partners were craving insights from the data resource we built, highlighting the value of integrating analytics into the strategy of the company."
Reid introduces Regeneron as one of the largest global genetics sequencing initiatives. The center collaborates with academic institutions, countries, and various groups to gather samples and build datasets that will shape the future of drug development.
Adding on, Reid says that he is excited about the success achieved and is eager to see how AI will revolutionize everything, from automating tasks to designing and deploying proteins to treat critical diseases.
Shedding light on the organization’s way of looking at genomic data, Reid mentions that the vision of Regeneron’s founder, George Yancopoulos, and the team back in 2012 was instrumental. He says that Regeneron had realized that following the traditional approach of piecing together fragmented datasets, such as 1,000 samples from one study and 500 from another, would not work.
Instead, the idea was to create a genetics center greenfield from scratch. Then, they committed to the cloud, understood its cost-effectiveness, and became the first group within Regeneron to be fully cloud-first as a large-scale sequencing center.
Similarly, during the data journey, Reid discovered that they were able to build one of the most coherent and consistent large-scale genetic datasets globally. It boiled down to the vision of looking at the problem and deciding to build a large-scale data asset from scratch so that it can be understood, managed, and controlled.
The challenging part of the strategy was finding partners to work with to sequence those samples. It requires bringing resources and expertise to the table to make that data happen, and that is not trivial. Therefore, he appreciates the hard work that the team has done over eleven years to deliver it.
Adding on, Reid states that the vision of not being overly tied to legacy data and instead, focusing on creating something coherent and scalable has been invaluable. He says that Regeneron is still committed to integrating all of the data and analyzing it as a whole, rather than breaking it up in chunks.
When asked about his take on having analytics as part of company strategy, Reid states that it has been easy for Regeneron, as the founding team understands the vision. He maintains that it starts with looking at the value it provides. It is being intentional about what a data asset that would meet future needs looks like, and for them, the case was compelling.
Sharing a hypothesis, Reid says that while looking at humans with loss-of-function variants that reduce the activity of a protein produced by a specific gene, it closely mimics what happens when an antibody blocks that protein. The hypothesis was not wrong, and the team found numerous protective variants that became solid leads for drug development and helped inform existing programs.
The real success, says Reid, was not in building the data asset but in seeing how quickly the colleagues at Regeneron were eager to leverage insights from the data resource. Moving forward, he mentions being aware that there are gaps in the data and that there are other types of data that they want to collect. However, Reid states that the one thing that he has learned is that the real value added to data is not just about increasing the scale. It is not about gathering more samples or people, but understanding what additional data assets can be gathered from samples that are already there, which will be most informative.
For instance, Reid says that in genetics, the focus is on germline genetics, except in rare cases like cancer, but the interest is also in understanding someone's current health. Measuring things like protein levels related to immune response can tell a lot about how the body is reacting to its environment.
That is why the team has been expanding to new types of data, such as protein expression, which allows them to see what the protein levels looked like at the moment the sample was taken.
Furthermore, when it comes to AI, the team has found that extracting information from medical imaging and identifying associations between image-derived variables and genetics has been incredibly useful.
This expansion in data modalities gives more leverage for discovery. It is not just about increasing the scale but broadening the scope and enhancing the quality of data, which allows for a deeper understanding of human disease.
Collecting such data can be challenging, especially in cases like measuring the state of brain tumors, says Reid. However, with AI, it is possible to extract insights from existing imaging data and extract variables that are not data points that a human analyst would necessarily see. He says that as long as AI is applied thoughtfully, there is a lot of potential to uncover subtle insights.
In conclusion, Reid says that deploying AI requires deliberate planning, but the tools provide massive opportunities for additional discovery, leveraging existing data, particularly related to imagery.
CDO Magazine appreciates Jeffrey Reid for sharing his insights with our global community.