Question 1
What is this patient’s overall survival trajectory?
Use the RSF when you want the full overall survival curve and an all-cause mortality framing.
A clinician-first guide to what HUMBLE predicts, how the models were trained, how well they validate, and how to interpret the figures without needing a statistics refresher first.
HUMBLE is designed for patients with uveal melanoma treated with low-dose-rate Iodine-125 plaque brachytherapy. It is best used as a counseling and risk-communication aid alongside clinical judgment, pathology, systemic evaluation, and local practice context.
What is this patient’s overall survival trajectory?
Use the RSF when you want the full overall survival curve and an all-cause mortality framing.
How much of the risk appears metastatic versus other-cause?
Use DeepHit when the distinction between metastasis-associated mortality and other-cause mortality matters clinically.
Why did the model move the estimate up or down?
Use SHAP and feature-importance views to translate the prediction into interpretable clinical drivers.
HUMBLE was developed on retrospective Hamilton Eye Institute cohorts of patients treated with low-dose-rate Iodine-125 plaque brachytherapy, then stress-tested on an independent Wills Eye Hospital cohort for overall survival. The intent is not only to predict, but to do so in a way clinicians can trust and audit.
Hamilton Eye Institute patients, with 162 deaths used for overall-survival modeling.
Hamilton patients, with 73 metastasis-associated and 89 other-cause deaths for competing-risks training.
Independent Wills Eye Hospital cohort used to test whether performance transfers beyond the originating center.
All patients were treated with low-dose-rate plaque brachytherapy, which defines the intended use population.
Why the sample sizes differ: Complete-case analysis outperformed imputation during ablation testing, and the RSF also uses an engineered Age × Volume term. That combination slightly reduces the analyzable RSF cohort compared with the DeepHit cohort.
A model can look excellent inside its home dataset and still underperform elsewhere. External validation asks whether the model captured transportable clinical signal rather than center-specific patterns.
Preserved external discrimination is encouraging, but absolute risk calibration can still vary by population and practice environment. That is why this page treats calibration as a first-class result, not an afterthought.
Both models were intentionally kept compact and clinically recognizable. The goal was not to feed in every possible variable, but to identify a parsimonious set that preserves performance while remaining understandable at the bedside.
Best used when the clinical question is: “What is this patient’s all-cause survival probability over time?”
Best used when the question is: “How much of the long-term risk appears metastatic versus other-cause?”
Tumor volume is approximated as an ellipsoid using π/12 × z × x × y, giving the models a geometry-sensitive summary rather than only raw dimensions.
Cancer stage stayed in the RSF because it improved overall-survival prediction, whereas DeepHit dropped it because of multicollinearity and retained bullous retinal detachment instead.
A shorter, clinically legible feature list makes bedside interpretation easier and reduces the sense that the prediction came from a black box of obscure inputs.
No single statistic makes a model “good.” The clearest reading comes from pairing discrimination with calibration, looking for external validation whenever available, and understanding what each plot is trying to show.
Fast rule of thumb: A model can rank patients well and still misstate absolute risk. That is why this page gives calibration equal billing with C-index and AUC, and why external validation is emphasized wherever it exists.
Asks whether higher-risk patients are generally ranked above lower-risk patients.
0.5 is random ordering and 1.0 is perfect ordering, but there is no universal cutoff that automatically makes a model clinically acceptable.
Shows discrimination at specific follow-up times rather than collapsing everything into one summary number.
Higher curves are better, and stability across years is often more reassuring than a single strong landmark.
The Integrated Brier Score summarizes overall prediction error across follow-up.
Lower is better. It complements discrimination because it penalizes inaccurate probabilities.
Checks whether predicted probabilities match observed outcomes.
Points near the diagonal are ideal. Above the line means the model was too pessimistic about survival; below means it was too optimistic.
For DeepHit, alive, metastasis-associated mortality, and other-cause mortality add up to 100% at each time point.
This matters when cause-specific counseling is more informative than all-cause survival alone.
Breaks a single patient prediction into feature-level pushes that move the estimate up or down from a baseline.
Helpful for explanation and communication, but not proof that a feature caused the outcome.
The RSF is the overall-survival engine in HUMBLE. It answers the broad counseling question: given this patient’s baseline ocular and clinical features, what does the long-term all-cause survival curve look like after plaque brachytherapy?
In plain language: the RSF was tuned aggressively, tested repeatedly, and then challenged on a completely separate institution to reduce the risk that its performance reflects optimism or center-specific patterns.
Bootstrap .632+, out-of-bag scoring, and cross-validation on the derivation cohort.
Independent cohort used to test whether ranking performance survives transport to a different center.
These figures show whether the RSF consistently separates higher-risk from lower-risk patients across follow-up when tested internally.
Cumulative dynamic AUC for the internal cross-validation analysis.
Sensitivity versus specificity trade-off averaged across time points in internal validation.
External validation is the strongest part of the RSF evidence story on this page because it asks whether performance survives a different institution, workflow, and patient mix.
Cumulative dynamic AUC in the independent Wills Eye cohort.
ROC view of discrimination in the independent Wills Eye cohort.
This is the absolute-risk check. A model can rank patients well and still misestimate survival probabilities, so calibration deserves its own read.
Predicted versus observed survival probability at five years.
Predicted versus observed survival probability at ten years.
Best-practice interpretation: Good discrimination does not guarantee accurate absolute probabilities. Even with encouraging RSF external discrimination, calibration should be checked in any local population before clinicians lean heavily on the exact survival percentages.
DeepHit adds a different clinical lens. Instead of treating every death the same, it separates metastasis-associated mortality from other-cause mortality, which makes the output more informative for oncology-focused counseling.
In plain language: DeepHit is more flexible than the RSF and is designed to learn cause-specific event patterns directly, but the evidence currently shown here is internal rather than external.
Population-level breakdown of alive, metastasis-associated mortality, and other-cause mortality over time.
Internal repeated 5 × 5-fold cross-validation in the Hamilton cohort.
Internal repeated 5 × 5-fold cross-validation in the Hamilton cohort.
Cause-specific discrimination for MAM and OCM across 1- to 10-year follow-up.
Cause-specific ROC curves at the 10-year landmark.
These panels compare predicted cumulative incidence against observed event rates for each competing-risk endpoint.
Predicted versus observed cumulative incidence for the metastatic endpoint.
Predicted versus observed cumulative incidence for the non-cancer endpoint.
Important context: The DeepHit results shown on this page are internally validated and useful, but they are not yet matched by the same external validation story shown for the RSF overall-survival model. That difference should stay visible to readers.
Interpretability in HUMBLE works at two levels: patient-specific explanation with SHAP and cohort-level importance with permutation importance. They answer different questions, and the page is stronger when that difference is explicit.
SHAP starts from a baseline prediction and then adds each feature’s contribution for an individual patient. Features can push the estimate toward higher risk or lower risk, making the model’s reasoning visible rather than mysterious.
Best for answering: “Why did this particular patient’s estimate change?”
Permutation importance is a cohort-level summary. It tells you which variables the RSF relies on most overall by measuring how much performance degrades when each feature is disrupted.
Best for answering: “What does the model tend to care about across the dataset?”
Example patient-level SHAP explanation used in the HUMBLE results experience.
Permutation importance ranking for the overall-survival model.
The HUMBLE program is still evolving. These posters, talks, and manuscripts show how the project has been communicated so far and where the evidence base is heading next.
Authors: Taylor Gonzalez DJ, Cernichiaro-Espinosa LA, Dave N, Djulbegovic MB, King BA, Wilson MW.
Digital EyeCon PosterVerse 2026, Bascom Palmer Eye Institute, Miami, FL. Apr 2026.
Authors: Cernichiaro-Espinosa LA, Taylor Gonzalez DJ, Djulbegovic MB, Dave N, King BA, Delsoz M, Shields CL, Wilson MW.
International Society of Ocular Oncology (ISOO) 2026, Rio de Janeiro, Brazil.
Authors: Taylor Gonzalez DJ, Dave N, Cernichiaro-Espinosa LA, Delsoz M, King BA, Wilson MW.
International Society of Ocular Oncology (ISOO) 2026, Rio de Janeiro, Brazil.
Authors: Cernichiaro-Espinosa LA, Choi SL, Taylor Gonzalez DJ, Hayes T, Mastellone J, Roberson RS, Stinson M, Pfeffer LM, Rinker LH, Choi HY, King BA, Wilson MW.
Ophthalmology Retina. 2026;10(2):204–214. doi:10.1016/j.oret.2025.08.004
Authors: Taylor Gonzalez DJ, Cernichiaro-Espinosa LA, Djulbegovic MB, Dave N, King BA, Nabavi A, Delsoz M, Yousefi S, Shields CL, Wilson MW.
American Academy of Ophthalmology Annual Meeting, Orlando, FL. Oct 2025.
Authors: L.A. Cernichiaro-Espinosa, D.J. Taylor Gonzalez, B.A. King, S. Choi, A. Nabavi, M. Delsoz, L. Pfeffer, S. Yousefi, M.W. Wilson, et al.
ARVO 2025 Annual Meeting.
Authors: King BA, Taylor Gonzalez DJ, Cernichiaro-Espinosa LA, Choi SL, Pfeffer L, Wilson MW.
ARVO 2025 Annual Meeting, Salt Lake City, UT. Jun 2025.
Next step: A full manuscript can tie this page even more tightly to the formal evidence base, especially for explaining model selection, calibration philosophy, and how local implementation should be monitored over time.