Jeff Elton, PhD; Catherine Richards, PhD, MPH;
and Nathalie Horowicz-Mehler, PhD, MPH
I was recently asked by the head of R&D of a leading biopharmaceutical company why ConcertAI invests so significantly in building and expanding research-ready datasets for multiple solid cancers and hematological malignancies. As we advanced in the conversation, it became increasingly clear to him that these datasets were foundational for current standard-of-care understanding and for early-phase trial planning. It was a far-ranging conversation that addressed some of the very specific methodological approaches that ConcertAI has evolved in recent years, and as such I wanted to memorialize it one of my regular blogs.
Historically, real-world data (RWD) were derived from medical claims data and less frequently, laboriously created datasets from medical practice charts. Medical claims had the value of scale, standardization, and being ‘structured’ or machine readable and analyzable. However, in Oncology, they had limited value, as medical claims don’t contain critical clinical features such as tumor characteristics, patient response to treatment or adverse events that often need to be derived from unstructured fields. As such, oncology RWD evolved slowly until 2010 when new companies and investments took advantage of near full implementation of electronic medical records – unusually specialized to cancer care – thereby making data more accessible.
However, most of the early EMR systems were focused on the capture of data that allowed recording clinical activities for reimbursement, medical coding, scheduling, and other clinical operational workflows. They were not research tools. Even more to the point, and much like original medical charts (paper), key data were in the form of notes and appended documents (e.g., medical imaging interpretation reports, molecular diagnostics reports, etc.). So, the critical data informing a view of patient response and adverse events were there, but still not machine readable. As a consequence, organizations wanting to derive insights into the standard of care or do comparative effective analyses contracted for specific datasets to be built to their specifications. Most of these datasets were small, purpose-built, and derived from clinical practice settings accessible to the company performing the work. Most of the analyses were focused on approved medicines and were conducted by Medical Affairs or HEOR teams.
Now, roll forward and add the convergence of several forces: (1) the 21st Century Cures Act that highlight the potential utility of RWD in regulatory decisions; (2) 2018 FDA White Paper laying out an early framework of potential uses; (3) the COVID-19 pandemic and the need to capture data on patient care in an uninterrupted manner after patients moved away from in office visits where possible and participation in clinical trials; (4) receptivity of US and EU regulatory authorities for RWD analyses to supplement programs in rare or largely untreatable cancers; (5) a desire for trials to be run in settings where the results can be immediately generalized to the patients receiving a new therapy; and (6) three preliminary FDA guidance documents for Real-world Evidence generation in support of regulatory decisions in late 2021. RWD and associated RWE are gaining in importance and see increasing focus and funding in clinical development organizations – from first in human trials through to multi-parallel phase 2b’s. Novel concepts of ‘Hybrid’ approaches emerged where randomized controlled trials (RCT) can be run in parallel with RWD-derived data subject to the same study criteria and rigor to augment or benchmark trial data with data representing real-world clinical care for a larger more representative patient population.
But these data are not truly ‘research grade’ are they? EMR-derived data have limitations for sure – these need to be recognized and carefully managed. But the field has advanced much in the last four years and is continuing to do so. Data now are at scale, derived from multiple practice settings (e.g., academic, regional health systems and community), and have evolved with rigorous and consistent rules being applied to where and how concepts of response, adverse events, and outcomes can be derived. In most cases, these rules and concepts have been through a peer review process. These data can be compared to RCT data as a further assurance of the comparability of key concepts.
Now, we can also connect to the source systems for data captured only in notes. For example, medical images can be integrated with EMR data for new reads and interpretations specific to a set of research questions versus being subject to the narrow use and shorthand assessment between the radiologist and medical oncologist (this is not a statement of ‘quality’ only that the original intent may not parallel the new research questions). Similarly, genomic, and transcriptomic data are advancing rapidly with the number of mutations moving from 300 to 700 in 2015 to full exomes today, to cite only one example. With advanced use of encrypted tokens data can be linked and confederated for the same patient across multiple sources – including the original sources noted in medical notes and PDF documents in the EMR. Some of these are corroborative and others add more depth.
Finally, I want to consider how RWD can inform clinical development needs. Designing clinical trials is complex, representing the testing of a novel mechanism or approach, while generating clear data on safety and efficacy. Overlaying requirements for generalizability, executability, etc. only increases this complexity. A lack of insight into the current standard of care, patterns of diagnosis, selection of agents, etc. only places more risk on the trial, interpretation of trial results, and the ultimate utility of the new medicine. Oncology trials in particular tend to be small, restrictive, stage-specific, with a longer-term strategy to potentially broaden over time and subsequent trials. As such, clarity of the initial focus, creation of a base-line standard of care view, and ongoing alignment of outcomes with the trial results may inform the trial design, cohort definitions, and enhance interpretations and regulatory assessments.
ConcertAI saw these patterns emerging and saw a great unfulfilled need to provide cancer-specific and pan-tumor views of the current standard of care. The analytic concept was that the entire standard of care should be represented, where each treatment paradigm was represented with statistical significance. The variables represented would be meaningful to that standard of care but also provide views of critical biological markers of response and characteristics of that response – hence utility for RCT endpoints as well. These datasets are dynamic: evolving in terms of patient criteria and variables considered and are constantly aligned to new therapeutics reshaping the standard of care, targets of significance, diagnostic approaches, and clinical guidelines – in other words their relevance is assured. They are ever growing such that their power improves and can accommodate the pattern in oncology of narrower subsets of patients being defined to ever more specific treatment approaches.
So, with the cited forces at play transforming how we plan new clinical development programs, where we deploy those programs, how they are going to be assessed by regulators, and the requirement for ongoing data and evidence generation we see these data as foundational to the Clinical Development Enterprise. Not having this capability would limit the insights, constrain the planning, increase the risks, and diminish the level of regulatory discourse possible. When you further combine this with the move towards data science and AI-centric research and clinical development organizations the confederation of research-grade clinical data with the enterprises own-generated research, translational and clinical development data is even more compelling and essential.
The concept of ‘foundation’ refers to an underlying basis or substrate from which to build – research grade RWD, as part of the total R&D system of biopharma, is exactly that, foundational, for clinical development programs from early to late.