Validation of breast cancer risk prediction models in an independent prospective data set is rare. We drew on prospective data from the Nurses’ Health Study and the California Teachers Study to validate the Rosner-Colditz breast cancer incidence model and compare it to the Gail model.1 (see report) The Rosner-Colditz model includes a range of established reproductive factors that are directly related to breast cancer risk, body mass index, and alcohol intake. 2 These are known causes of breast cancer. In particular, we include age at menopause and type of menopause (surgical or natural) – factors omitted from the Gail model. After aligning time periods for follow-up, we restricted populations to comparable age ranges (47 to 74), and followed them for incident invasive breast cancer (follow-up 1994 to 2008, Nurses’ Health Study [NHS]; and 1995 to 2009, California Teachers Study [CTS]). We identified 2026 cases during 540,617 person-years in NHS, and 1400 cases during 288,111 person-years in CTS.
To reflect application of a breast risk prediction model in clinical practice such as mammography screening services or primary care, we fit the Rosner–Colditz log incidence model and the Gail model using baseline data. We imputed future use of hormones based on type and prior duration of use and other covariates at the baseline. We assessed performance using area under the curve (AUC) and calibration methods. Participants in the CTS had fewer children, were leaner, consumed more alcohol, and were more frequent users of postmenopausal hormones. Incidence rate ratios for breast cancer showed significantly higher breast cancer in the CTS (IRR= 1.32, 95% CI 1.24 to 1.42). Parameters for the log-incidence model summarizing the relation for reproductive variables, history of benign breast disease, menopause and use of hormone therapy as well as alcohol, obesity, and family history, were comparable across the two cohorts. In the NHS the AUC was 0.60 (se 0.006) and applying the model to the CTS the performance in the independent data set (validation) was 0.586 (se 0.008). The Gail model gave values of 0.547 (se 0.008), a statistically significant 4% lower. For women 47 to 69, more typical of those for whom risk estimation would be indicated clinically, the AUC values for the log incidence model are 0.608 in NHS and 0.609 in CTS; and for Gail are 0.569 and 0.572. In both cohorts, performance of both models dropped off in older women 70 to 87.
We also assessed calibration – a measure of how well the model predicts incidence for a population. Calibration showed good estimation against SEER (used as a measure of US national incidence rates for breast cancer) with a non-significant 4% underestimate of overall breast cancer incidence when applying the model in the CTS population.
In sum, the Rosner-Colditz model performs consistently well when applied in an independent data set. Performance is stronger predicting incidence among women 47 to 69 and over a 5-year time interval. AUC values exceed those for Gail by 3 to 5% based on AUC when both are applied to the independent validation data set. Models may be further improved with addition of breast density or other markers of risk beyond the current model. Research in collaboration with the Breast Health Center is currently pursing these improvement.
1. Rosner, B.A. et al. Validation of Rosner-Colditz breast cancer incidence model using an independent data set, the California Teachers Study. Breast Cancer Res Treat (2013).
2. Colditz, G. & Rosner, B. Cumulative risk of breast cancer to age 70 years according to risk factor status: data from the Nurses’ Health Study. Am J Epidemiol 152, 950-64. (2000).