1058 - Reproducibility of Statistically Significant Phase III Oncology Trials: An In Silico<em> </em>Meta-Epidemiological Analysis
Presenter(s)

A. D. Sherry1, P. Msaouel2, A. M. Miller1, T. A. Lin1, J. Abi Jaoude3, R. Kouzy1, A. H. Passy1, T. Meirson4, N. Ignatiadis5, Z. R. McCaw6, E. van Zwet7, and E. B. Ludmir8; 1The University of Texas MD Anderson Cancer Center, Houston, TX, 2Department of Translational Molecular Pathology, Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, 3Stanford University School of Medicine, Stanford, CA, 4Davidoff Cancer Center, Petah-Tikva, Israel, 5University of Chicago, Chicago, IL, 6Insitro, South San Francisco, CA, 7Leiden University Medical Center, Leiden, Netherlands, 8Department of Gastrointestinal Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX
Purpose/Objective(s): In phase III oncology research, small P values are used almost universally to justify relative treatment effect superiority. However, the conventional assumption that P values = 0.05 imply reproducible effects has come under recent criticism, as statistically significant clinical trial results are often difficult to replicate in clinical practice. While phase III oncology trials directly inform practice, they are usually not repeated, limiting current understanding of treatment effect reproducibility. Using advanced modeling techniques, we investigated the relationship between P values and reproducibility.
Materials/Methods: We obtained the signal-to-noise ratio distribution in phase III oncology using the primary endpoint summary statistics from 632 two-arm, superiority-design trials enrolling 496,219 patients screened from ClinicalTrials.gov. With this distribution, based on previously published methods, we estimated successful replication probability as the probability that a replicate trial, having the same design, effect size, and standard error, would have a two-sided P = 0.05 and the same effect directionality as the original trial.1 The correct sign probability—that the estimated effect had the same direction as the true effect—was also calculated.2 95% CIs for the replication probabilities were computed using the Dvoretzky–Kiefer–Wolfowitz F-localization approach.
Results: Treatment effects at P of 0.05 and 0.01 had mean successful replication probabilities of 43% (95% CI: 32% to 45%) and 60% (95% CI: 50% to 61%), respectively. For effects at P of 0.05, 10-fold simulated increases in sample size were required to achieve a high mean replication probability of 87%. Trials with an overall primary endpoint that led directly to regulatory approval had a median replication probability of 66%. On the other hand, when P = 0.05, the mean probabilities of correct sign were = 97%.
Conclusion: While the direction of observed effects is likely correct when P = 0.05, our study reveals that treatment effects of phase III oncology trials at P of 0.05 and even 0.01 are unlikely to be replicated successfully unless enrollment numbers are inflated well beyond practicality. Given the interpretative value of replication probability, we provide an online calculator, leveraging the signal-to-noise ratio distribution of phase III oncology trials defined by this study, for estimating replication probabilities for individual trials: https://alexandersherry.shinyapps.io/shinyapp_replication_probability/. Alternative approaches towards improving the effective sample size and interpretation of phase III oncology trials are merited.