Background

  • PoS framework of Hampson, Bornkamp, et al. (2022) is well established at Novartis
    • Quantification of program-level risk
    • Goals include: transparency with project- and portfolio-level stakeholders
  • Study-level predictions are based on expert prior elictation (e.g. Holzhauer et al. (2022)) or on the predictive distribution from a Bayesian meta-analysis model

Today’s talk

  • Two examples where a deeper dive into study-level unknowns was beneficial
    • Quantification of study-level risk
    • Useful for project-level decision-making
  • Tailored modelling to answer the question “how likely is my study to succeed?”

How likely is my pivotal program to succeed?

Program-level PoS approach

Bringing in all relevant information

Steps to a program-level evaluation using the Hampson, Bornkamp, et al. (2022) approach

Program-level PoS approach

Outcome of an assessment

Probability of success to approval (top), and with additional requirement of meeting Target Product Profile (TPP) met (bottom). Paths show P(success) and hurdles show transition rates.

Program-level PoS approach

References

This PoS framework has been described in

  • Hampson, Bornkamp, et al. (2022)
  • Hampson, Holzhauer, et al. (2022)
  • Holzhauer et al. (2022)

and presented many times, including:

How likely is my trial to succeed?

Flow of efficacy predictions1

From a calibrated benchmark prior (for the mean effect) p(\mu) = w_t N(m_t, c) + (1 - w_t) N(m_n, c) … to the posterior distribution (for the mean effect), p(\mu,\tau | \{ \hat\theta_i, s_i \}) \propto \prod_i f(\hat\theta_i | \theta_i) p(\theta_i | \mu,\tau) \cdot p(\mu) \, p(\tau) … the MAP2 prior (for a future study mean), p(\theta^* \, | \, \{ \hat\theta_i, s_i \}) = \int \text{N}(\theta^* | \mu,\tau)\, dP(\mu,\tau | \{ \hat\theta_i, s_i \}) … and the predictive distribution (for a future study outcome), p(\hat\theta^* \, | \, \{ \hat\theta_i, s_i \}) = \int \text{N}(\hat\theta^* | \theta^*, s^*)\, dP(\theta^* \, | \, \{ \hat\theta_i, s_i \})

Two studies

Study 1

Design: Phase 3b H2H non-inferiority study

flowchart LR
  A[Eligible patients] --> B(1:1 randomize)
  B --> C[Investigational drug]
  B --> D[Competitor drug]
  C --> E(Test non-inferiority)
  D --> E

Timing: two options

Options for timing of H2H Phase-3b: (A) before pivotal readout or (B) gated on positive pivotal

Study 1: Strategy


What is “success”?

  • Prerequisite: Pivotal success (supriority to SoC)
  • Required: H2H noninferiority
  • Desired: H2H (numerical) superiority

What are the risks?

  • Commercial impact of failure
  • Challenging: choice of noninferiority margin
  • Risks magnified for concurrent Phase-3b due to incomplete information at design stage

Study 1: Enabling smart risks


Strategic decision makers would need to know:

  • What are plausible head-to-head effects based on what we know now?
  • How much additional risk from beginning Phase-3b concurrently as opposed to waiting on positive Phase-3?

Study 1: Quantifying risks

Inadequacy of power alone

  • Consider a study concept with sample size chosen to ensure 90% power to declare noninferiority (at a specified margin) given true head-to-head effect of zero
  • Does this adequately capture the level of risk? NO!
  • Several inputs to this calculation are based on uncertain quantities

Study 1: Quantifying risks

Uncertainty about H2H effect



Two versions of contrast-based network meta-analysis

  • Using what is known now (risk of concurrent trial)1
  • Using what will be known after pivotal results (risk of gated trial)2

Study 1: Communicating risks



Option A: go now w/ current knowledge

Pivotal outcome (random) Phase 3b outcome Total probability
Success (74%) Success (83%) 61%
Failure (17%) 13%
Failure (26%) Unimportant 26%

Option B: await pivotal outcome

Pivotal outcome (known) Phase 3b outcome Total probability
Success Success (83%) 83%
Failure (17%) 17%
Failure Not run N/A

Study 2

Impact of nonresponder imputation

Design

flowchart LR
  A[Eligible patients] --> A1[Adult patients]
  A --> A2[Pediatric patients]
  A1 --> B1(1:1 randomize)
  A2 --> B2(1:1 randomize)
  B1 --> D[Investigational]
  B1 --> E[Standard-of-Care]
  B2 --> D
  B2 --> E
  D --> F(Test for superiority)
  E --> F

Key Consideration

  • Nonresponder imputation for patients dropping out due to lack of tolerability
  • Dropout rate anticipated higher on the investigational treatment, especially among pediatric patients

Study 2: Strategy


What is “success”?

  • Required: Superiority overall
  • Desired: Superiority in both adults and pediatric patients

What are the risks?

  • Attenuation of treatment effect due to differential dropout rates
  • Higher degree of uncertainty about pediatric effects (pre-pivotal data is from adults)

Study 2: Quantifying risks

Inadequacy of power

Typical presentation: power under various scenarios

Scenario Dropout rates Treatment effect among adherents (difference in response rates) Power for overall comparison
Pediatric Adult
Active Control Active Control
1 10% 5% 4% 1% 20% 90%
2 10% 5% 4% 1% 15% 76%
3 15% 5% 4% 1% 20% 81%
4 15% 5% 4% 1% 15% 68%


Shortcoming

What is the relative plausibility of these scenarios based on what we know?

Study 2: Quantifying risks

Modelling all unknowns

Separate meta-analytic modelling of historical data for (1) treatment effect among completers, (2) dropout rates + elicited beliefs about pediatric dropout, and (3) posterior distribution of power function implied under theese models

Study 2: Communicating risks

Visual display: sensitivity of power

Example of a tornado plot illustrating systematic sensitivity analysis of power. Bounds for unknowns are calibrated based on degree of certainty (central 50% posterior intervals). Diamonds represent values assumed in sample-size calculation.

Conclusions

Themes

  • Understanding, quantifying, and communicating about risks in study design
  • Power alone does not sufficiently characterize risk in these cases
  • A slightly different emphasis than program-level PoS: more granular attention to study-specific considerations
    • Especially necessary outside pivotal Phase 3
    • e.g. in Study 1, calibrated benchmark prior is not directly relevant to the H2H effect

A PoS philosophy


  • Modelling when there is data
    • Potentially going beyond treatment effect: e.g. standard deviations, control response rates, overdispersion etc.
  • Eliciting to bridge data gaps
  • Sensitivity for controllables
    • Sample size, timing of assessment, estimands
  • Standardization as much as possible
    • Consistent communication AND methods

Acknowledgements

Quantitative PoS team, including

  • Markus Lange
  • Björn Holzhauer
  • Joseph Kahn
  • Alex Przybylski
  • Lou Whitehead
  • Steffen Ballerstedt

And other colleagues

  • Lisa Hampson
  • Jahangir Alam
  • David Ohlssen
  • Andrew Wright
  • Guoqin Su
  • Gen Zhu
  • Haoyi Fu

References

Hampson, Lisa V. 2023. “Role of Bayesian Methods in Evaluating and Communicating Risk.” In BAYES2023 Conference. Utrecht, Netherlands. https://www.bayes-pharma.org/wp-content/uploads/2023/11/22-HAMPSON-Quantitative-Decision-Making-for-drug-development-the-role-of-Bayesian-methods-in-evaluating-and-communicating-risk.pdf.
Hampson, Lisa V., Björn Bornkamp, Björn Holzhauer, Joseph Kahn, Markus R. Lange, Wen-Lin Luo, Giovanni Della Cioppa, Kelvin Stott, and Steffen Ballerstedt. 2022. “Improving the Assessment of the Probability of Success in Late Stage Drug Development.” Pharmaceutical Statistics 21 (2): 439–59. https://doi.org/https://doi.org/10.1002/pst.2179.
Hampson, Lisa V., and Björn Holzhauer. 2021. “Strategies for Improving the Assessment of the Probability of Success in Late Stage Drug Development.” In Bayesian Scientific Working Group (BSWG) KOL Series 21. https://view.officeapps.live.com/op/view.aspx?src=http%3A%2F%2Fwww.bayesianscientific.org%2Fwp-content%2Fuploads%2F2021%2F02%2FDIA_Hampson_Holzhauer_Feb2021.pptx&wdOrigin=BROWSELINK.
Hampson, Lisa V., Björn Holzhauer, Björn Bornkamp, Joseph Kahn, Markus R. Lange, Wen-Lin Luo, Pritibha Singh, Steffen Ballerstedt, and Giovanni Della Cioppa. 2022. “A New Comprehensive Approach to Assess the Probability of Success of Development Programs Before Pivotal Trials.” Clinical Pharmacology & Therapeutics 111 (5): 1050–60. https://doi.org/https://doi.org/10.1002/cpt.2488.
Holzhauer, Björn, Lisa V. Hampson, John Paul Gosling, Björn Bornkamp, Joseph Kahn, Markus R. Lange, Wen-Lin Luo, et al. 2022. “Eliciting Judgements about Dependent Quantities of Interest: The SHeffield ELicitation Framework Extension and Copula Methods Illustrated Using an Asthma Case Study.” Pharmaceutical Statistics 21 (5): 1005–21. https://doi.org/https://doi.org/10.1002/pst.2212.
Lange, Markus. 2021. “Unraveling a Single Number: Using Graphics to Explain Probability of Success.” In Basel Biometrics Society (BBS) Seminar. https://baselbiometrics.github.io/home/docs/talks/20210308/5_Lange.pdf.
———. 2022. “Strategies for Improving the Assessment of Probability of Success (PoS) in Late Stage Drug Development.” In Joint DSBS/FMS Meeting 2022. Copenhagen, Denmark. https://statistikframjandet.se/wp-content/uploads/2022/12/Markus-Lange-Strategies-for-improving-the-assessment-of-probability-of-success-PoS-in-late-stage-drug-development.pdf.
Neuenschwander, Beat, Gorana Capkun-Niggli, Michael Branson, and David J Spiegelhalter. 2010. “Summarizing Historical Information on Controls in Clinical Trials.” Clinical Trials 7 (1): 5–18. https://doi.org/10.1177/1740774509356002.