Dismantling the Fragility Index: A demonstration of statistical reasoning.
The Fragility Index has been introduced as a complement to the P-value to summarize the statistical strength of evidence for a trial's result. The Fragility Index (FI) is defined in trials with two equal treatment group sizes, with a dichotomous or time-to-event outcome, and is calculated as the minimum number of conversions from nonevent to event in the treatment group needed to shift the P-value from Fisher's exact test over the .05 threshold. As the index lacks a well-defined probability motivation, its interpretation is challenging for consumers. We clarify what the FI may be capturing by separately considering two scenarios: (a) what the FI is capturing mathematically when the probability model is correct and (b) how well the FI captures violations of probability model assumptions. By calculating the posterior probability of a treatment effect, we show that when the probability model is correct, the FI inappropriately penalizes small trials for using fewer events than larger trials to achieve the same significance level. The analysis shows that for experiments conducted without bias, the FI promotes an incorrect intuition of probability, which has not been noted elsewhere and must be dispelled. We illustrate shortcomings of the FI's ability to quantify departures from model assumptions and contextualize the FI concept within current debate around the null hypothesis significance testing paradigm. Altogether, the FI creates more confusion than it resolves and does not promote statistical thinking. We recommend against its use. Instead, sensitivity analyses are recommended to quantify and communicate robustness of trial results.