Clinical manuscript, web edition

About HUMBLE

A clinician-first guide to what HUMBLE predicts, how the models were trained, how well they validate, and how to interpret the figures without needing a statistics refresher first.

Hamilton derivation: 849 RSF / 857 DeepHit External RSF validation: 1,592 Wills Eye patients Outputs overall survival plus competing-risk estimates Includes SHAP-based patient-level explainability
What this page is for

The goal is simple: make the science easy to follow without watering it down.

  • Understand what RSF and DeepHit each answer clinically.
  • See where the training and validation cohorts came from.
  • Learn what C-index, AUC, IBS, calibration, CIF, and SHAP actually mean.
  • Read each figure with a short explanation of what to look for and why it matters.

Current scope

HUMBLE is designed for patients with uveal melanoma treated with low-dose-rate Iodine-125 plaque brachytherapy. It is best used as a counseling and risk-communication aid alongside clinical judgment, pathology, systemic evaluation, and local practice context.

Question 1

What is this patient’s overall survival trajectory?

Use the RSF when you want the full overall survival curve and an all-cause mortality framing.

Question 2

How much of the risk appears metastatic versus other-cause?

Use DeepHit when the distinction between metastasis-associated mortality and other-cause mortality matters clinically.

Question 3

Why did the model move the estimate up or down?

Use SHAP and feature-importance views to translate the prediction into interpretable clinical drivers.

01 · Evidence base

Who the models were built on

HUMBLE was developed on retrospective Hamilton Eye Institute cohorts of patients treated with low-dose-rate Iodine-125 plaque brachytherapy, then stress-tested on an independent Wills Eye Hospital cohort for overall survival. The intent is not only to predict, but to do so in a way clinicians can trust and audit.

RSF derivation cohort
849

Hamilton Eye Institute patients, with 162 deaths used for overall-survival modeling.

DeepHit derivation cohort
857

Hamilton patients, with 73 metastasis-associated and 89 other-cause deaths for competing-risks training.

External RSF validation
1,592

Independent Wills Eye Hospital cohort used to test whether performance transfers beyond the originating center.

Clinical setting
1984–2022

All patients were treated with low-dose-rate plaque brachytherapy, which defines the intended use population.

Why the sample sizes differ: Complete-case analysis outperformed imputation during ablation testing, and the RSF also uses an engineered Age × Volume term. That combination slightly reduces the analyzable RSF cohort compared with the DeepHit cohort.

Why external validation matters

A model can look excellent inside its home dataset and still underperform elsewhere. External validation asks whether the model captured transportable clinical signal rather than center-specific patterns.

What clinicians should take from this

Preserved external discrimination is encouraging, but absolute risk calibration can still vary by population and practice environment. That is why this page treats calibration as a first-class result, not an afterthought.

02 · Inputs

What goes into HUMBLE

Both models were intentionally kept compact and clinically recognizable. The goal was not to feed in every possible variable, but to identify a parsimonious set that preserves performance while remaining understandable at the bedside.

Random Survival Forest

Overall survival model

Best used when the clinical question is: “What is this patient’s all-cause survival probability over time?”

Age at surgery BCVA (LogMAR) Optic nerve proximity ≤1.5 mm Tumor volume AJCC T stage Apex height Max basal diameter Age × Volume interaction
DeepHit competing-risks model

Cause-specific mortality model

Best used when the question is: “How much of the long-term risk appears metastatic versus other-cause?”

Age at surgery BCVA (LogMAR) Optic nerve proximity ≤1.5 mm Tumor volume Apex height Max basal diameter Bullous retinal detachment

Volume is derived, not manually entered

Tumor volume is approximated as an ellipsoid using π/12 × z × x × y, giving the models a geometry-sensitive summary rather than only raw dimensions.

Feature sets were chosen by ablation, not guesswork

Cancer stage stayed in the RSF because it improved overall-survival prediction, whereas DeepHit dropped it because of multicollinearity and retained bullous retinal detachment instead.

Compact models are easier to trust

A shorter, clinically legible feature list makes bedside interpretation easier and reduces the sense that the prediction came from a black box of obscure inputs.

03 · Reading the evidence

How to read the plots and metrics

No single statistic makes a model “good.” The clearest reading comes from pairing discrimination with calibration, looking for external validation whenever available, and understanding what each plot is trying to show.

Fast rule of thumb: A model can rank patients well and still misstate absolute risk. That is why this page gives calibration equal billing with C-index and AUC, and why external validation is emphasized wherever it exists.

C-index

Asks whether higher-risk patients are generally ranked above lower-risk patients.

0.5 is random ordering and 1.0 is perfect ordering, but there is no universal cutoff that automatically makes a model clinically acceptable.

Time-dependent AUC / ROC

Shows discrimination at specific follow-up times rather than collapsing everything into one summary number.

Higher curves are better, and stability across years is often more reassuring than a single strong landmark.

IBS

The Integrated Brier Score summarizes overall prediction error across follow-up.

Lower is better. It complements discrimination because it penalizes inaccurate probabilities.

Calibration

Checks whether predicted probabilities match observed outcomes.

Points near the diagonal are ideal. Above the line means the model was too pessimistic about survival; below means it was too optimistic.

CIF / competing risks

For DeepHit, alive, metastasis-associated mortality, and other-cause mortality add up to 100% at each time point.

This matters when cause-specific counseling is more informative than all-cause survival alone.

SHAP

Breaks a single patient prediction into feature-level pushes that move the estimate up or down from a baseline.

Helpful for explanation and communication, but not proof that a feature caused the outcome.

04 · Model one

Random Survival Forest (RSF)

The RSF is the overall-survival engine in HUMBLE. It answers the broad counseling question: given this patient’s baseline ocular and clinical features, what does the long-term all-cause survival curve look like after plaque brachytherapy?

Clinical use case

Look here first when you want the main survival curve.

Question answered Overall survival probability over time
Endpoint All-cause mortality
Evidence shown here Internal bootstrap/CV plus external Wills validation
Architecture & training details

Architecture

  • 1,000 survival trees
  • Log-rank splitting rule
  • Max features per split: 4
  • Min samples to split: 5
  • Min samples per leaf: 10
  • Max depth: 10
  • Out-of-bag scoring enabled

Training and validation strategy

  • Exhaustive grid search over 768 configurations
  • Top configurations refined with 10-fold × 3-repeat CV
  • Bootstrap .632+ internal validation
  • Permutation-based feature importance, 15 repeats
  • External validation performed at Wills Eye Hospital

In plain language: the RSF was tuned aggressively, tested repeatedly, and then challenged on a completely separate institution to reduce the risk that its performance reflects optimism or center-specific patterns.

Internal validation

Hamilton Eye Institute

Bootstrap .632+, out-of-bag scoring, and cross-validation on the derivation cohort.

C-index (OOB) 0.8039
C-index (.632+) 0.8325
IBS 0.1102
1–10 year mean AUC 0.8276
External validation

Wills Eye Hospital

Independent cohort used to test whether ranking performance survives transport to a different center.

Harrell C-index 0.8093
95% CI [0.6921, 0.9006]
IBS 0.018
1–10 year mean AUC 0.7917

Internal validation: discrimination inside the Hamilton cohort

These figures show whether the RSF consistently separates higher-risk from lower-risk patients across follow-up when tested internally.

Figure 1

RSF discrimination across follow-up

Cumulative dynamic AUC for the internal cross-validation analysis.

How to read it: Higher curves indicate better separation between patients who do and do not experience the event by each year. Stability across the whole span is more reassuring than one isolated peak.
Why it matters: This answers the clinically useful question of whether the model keeps its ranking ability as follow-up gets longer.
Takeaway: The mean internal AUC from years 1 through 10 was 0.8276, suggesting strong and fairly stable discrimination across time.
Figure 2

RSF time-dependent ROC

Sensitivity versus specificity trade-off averaged across time points in internal validation.

How to read it: Curves closer to the upper-left corner perform better. The diagonal line is random chance.
Why it matters: Some clinicians find ROC space more intuitive than a C-index because it resembles familiar test-performance plots.
Takeaway: The averaged time-dependent ROC AUC was 0.824, consistent with good internal risk separation.

External validation: does performance travel to Wills Eye?

External validation is the strongest part of the RSF evidence story on this page because it asks whether performance survives a different institution, workflow, and patient mix.

Figure 3

External RSF discrimination across follow-up

Cumulative dynamic AUC in the independent Wills Eye cohort.

How to read it: Read this the same way as the internal AUC plot: higher is better, and consistency across years matters.
Why it matters: A model that stays strong here is more likely to have learned transportable clinical structure rather than only local idiosyncrasies.
Takeaway: The external mean AUC was 0.7917, showing that discrimination remained good in the independent cohort.
Figure 4

External RSF time-dependent ROC

ROC view of discrimination in the independent Wills Eye cohort.

How to read it: Again, farther from the diagonal and closer to the upper-left corner is better.
Why it matters: This gives a familiar classification-style picture of how well the model separates patients in a cohort it never saw during development.
Takeaway: The external ROC remained well above chance, matching the preserved external C-index and mean AUC.

Calibration: do the predicted survival probabilities match reality?

This is the absolute-risk check. A model can rank patients well and still misestimate survival probabilities, so calibration deserves its own read.

Figure 5

Five-year RSF calibration

Predicted versus observed survival probability at five years.

How to read it: The diagonal is perfect calibration. Above it means observed survival was better than predicted; below it means predictions were too optimistic.
Why it matters: If clinicians are going to quote absolute survival probabilities, this plot matters at least as much as the C-index.
Takeaway: Internal five-year calibration showed an ICI of 0.1238, which helps quantify average misalignment from the ideal diagonal.
Figure 6

Ten-year RSF calibration

Predicted versus observed survival probability at ten years.

How to read it: Read it the same way as the five-year panel, but long-horizon calibration is usually harder and more clinically demanding.
Why it matters: Long-term estimates are often the most tempting to cite and the easiest to over-trust, so they deserve a separate look.
Takeaway: The ten-year internal ICI was 0.1901, which is why long-horizon predictions should still be communicated with appropriate humility.

Best-practice interpretation: Good discrimination does not guarantee accurate absolute probabilities. Even with encouraging RSF external discrimination, calibration should be checked in any local population before clinicians lean heavily on the exact survival percentages.

05 · Model two

Competing risks (DeepHit)

DeepHit adds a different clinical lens. Instead of treating every death the same, it separates metastasis-associated mortality from other-cause mortality, which makes the output more informative for oncology-focused counseling.

Clinical use case

Look here when cause of death matters, not just whether death occurs.

Question answered Metastatic versus other-cause mortality over time
Outputs Alive + MAM + OCM probabilities summing to 100%
Evidence shown here Internal repeated cross-validation
Architecture & training details

Architecture

  • Shared trunk with two hidden layers: 128 and 64 neurons
  • Two cause-specific heads with 64 neurons each
  • Batch normalization and 20% dropout
  • 150 quantile-based discrete time bins

Training and validation strategy

  • Nested 5-fold cross-validation with 150 random configurations
  • Composite selection using IBS and C-index
  • AdamWR optimizer, lr 0.0005, weight decay 0.05
  • 300 epochs on the full cohort with batch size 256
  • Final reporting from repeated 5 × 5-fold internal CV

In plain language: DeepHit is more flexible than the RSF and is designed to learn cause-specific event patterns directly, but the evidence currently shown here is internal rather than external.

Figure 7

Competing-risk cumulative incidence functions

Population-level breakdown of alive, metastasis-associated mortality, and other-cause mortality over time.

How to read it: At any year, read vertically: the three components together represent the entire cohort. This is not just a survival curve split by color; it is a competing-risks decomposition.
Why it matters: It helps clinicians distinguish “risk of death” from “risk of metastatic death,” which are not the same counseling question.
Takeaway: This figure makes it visually obvious that metastatic and non-metastatic mortality follow different trajectories, which is exactly why a competing-risks model can add value.
Metastasis-associated mortality

DeepHit MAM performance

Internal repeated 5 × 5-fold cross-validation in the Hamilton cohort.

C-index 0.812
IBS 0.118
1–10 year mean AUC 0.851
10-year ROC AUC 0.84
Other-cause mortality

DeepHit OCM performance

Internal repeated 5 × 5-fold cross-validation in the Hamilton cohort.

C-index 0.791
IBS 0.126
1–10 year mean AUC 0.807
10-year ROC AUC 0.84
Figure 8

DeepHit time-dependent AUC

Cause-specific discrimination for MAM and OCM across 1- to 10-year follow-up.

How to read it: Read each line separately. Higher values indicate better year-specific discrimination for that cause.
Why it matters: This reveals whether cause-specific ranking performance changes as follow-up lengthens.
Takeaway: Mean AUCs were 0.851 for MAM and 0.807 for OCM, suggesting stronger separation for metastatic risk than for other-cause mortality.
Figure 9

DeepHit 10-year ROC

Cause-specific ROC curves at the 10-year landmark.

How to read it: As with the RSF ROC plots, curves closer to the upper-left corner perform better and the diagonal is random chance.
Why it matters: This gives a single clinically intuitive snapshot of cause-specific discrimination at a long-horizon counseling landmark.
Takeaway: Both cause-specific ROC curves sit well above chance at 10 years, consistent with the strong time-dependent AUC results.

DeepHit calibration

These panels compare predicted cumulative incidence against observed event rates for each competing-risk endpoint.

Figure 10

DeepHit calibration for metastasis-associated mortality

Predicted versus observed cumulative incidence for the metastatic endpoint.

How to read it: Points close to the diagonal indicate good agreement between predicted and observed metastatic event rates.
Why it matters: For cause-specific counseling, this is the absolute-risk check that complements the MAM C-index and AUC.
Takeaway: The plotted ICI values are lower at 5 years than at 10 years, suggesting the nearer-horizon metastatic estimates are easier to calibrate.
Figure 11

DeepHit calibration for other-cause mortality

Predicted versus observed cumulative incidence for the non-cancer endpoint.

How to read it: Read it exactly as you read the MAM calibration panel: the closer to the diagonal, the better the agreement.
Why it matters: This is important because clinicians may use the OCM estimate to contextualize metastatic risk, especially in older or medically complex patients.
Takeaway: The OCM panel gives a direct read on whether the model’s non-cancer mortality probabilities are clinically believable.

Important context: The DeepHit results shown on this page are internally validated and useful, but they are not yet matched by the same external validation story shown for the RSF overall-survival model. That difference should stay visible to readers.

06 · Explainability

How HUMBLE explains itself

Interpretability in HUMBLE works at two levels: patient-specific explanation with SHAP and cohort-level importance with permutation importance. They answer different questions, and the page is stronger when that difference is explicit.

Local explanation: SHAP waterfall in the predictor

SHAP starts from a baseline prediction and then adds each feature’s contribution for an individual patient. Features can push the estimate toward higher risk or lower risk, making the model’s reasoning visible rather than mysterious.

Best for answering: “Why did this particular patient’s estimate change?”

Global explanation: feature importance on this page

Permutation importance is a cohort-level summary. It tells you which variables the RSF relies on most overall by measuring how much performance degrades when each feature is disrupted.

Best for answering: “What does the model tend to care about across the dataset?”

Figure 12

How to read a SHAP waterfall

Example patient-level SHAP explanation used in the HUMBLE results experience.

Example SHAP waterfall plot
Step 1: Start at the baseline prediction, which represents the reference expectation in the background cohort.
Step 2: Bars pushing in one direction increase predicted risk, while bars pushing in the opposite direction decrease it.
Step 3: Longer bars have larger influence on the final estimate for that patient.
Step 4: Treat it as explanation, not causation: SHAP tells you how the model used the data, not what biologically caused the outcome.
Figure 13

Global RSF feature importance

Permutation importance ranking for the overall-survival model.

How to read it: Longer bars mean the model’s performance drops more when that feature is disrupted, so the model appears to rely on it more heavily overall.
Why it matters: This is the cohort-level companion to SHAP. It shows what matters globally, but not whether a feature raised or lowered risk for an individual patient.
Takeaway: The strongest features here deserve clinical attention, but the patient-level SHAP waterfall is still the better tool for explaining one person’s result.
07 · Scholarship

Publications and presentations

The HUMBLE program is still evolving. These posters, talks, and manuscripts show how the project has been communicated so far and where the evidence base is heading next.

2026
Poster presentation

Deep Learning Competing-Risk Model for Predicting Metastatic and Non-Metastatic Mortality in Uveal Melanoma After Plaque Brachytherapy

Authors: Taylor Gonzalez DJ, Cernichiaro-Espinosa LA, Dave N, Djulbegovic MB, King BA, Wilson MW.

Digital EyeCon PosterVerse 2026, Bascom Palmer Eye Institute, Miami, FL. Apr 2026.

2026
Paper podium presentation

AI-Driven Survival Prediction in Uveal Melanoma Patients Treated with Low-Dose-Rate Iodine-125 Brachytherapy: Development and External Validation

Authors: Cernichiaro-Espinosa LA, Taylor Gonzalez DJ, Djulbegovic MB, Dave N, King BA, Delsoz M, Shields CL, Wilson MW.

International Society of Ocular Oncology (ISOO) 2026, Rio de Janeiro, Brazil.

2026
Paper podium presentation

Tumor Volume Regression as a Prognostic Marker After Low-Dose-Rate Iodine-125 Plaque Brachytherapy for Uveal Melanoma

Authors: Taylor Gonzalez DJ, Dave N, Cernichiaro-Espinosa LA, Delsoz M, King BA, Wilson MW.

International Society of Ocular Oncology (ISOO) 2026, Rio de Janeiro, Brazil.

2026
Peer-reviewed article

Efficacy of Low-Dose-Rate Iodine-125 Plaque Brachytherapy in the Treatment of Uveal Melanoma

Authors: Cernichiaro-Espinosa LA, Choi SL, Taylor Gonzalez DJ, Hayes T, Mastellone J, Roberson RS, Stinson M, Pfeffer LM, Rinker LH, Choi HY, King BA, Wilson MW.

Ophthalmology Retina. 2026;10(2):204–214. doi:10.1016/j.oret.2025.08.004

2025
Oral presentation

Machine Learning–Based Prediction of Long-Term Survival in Uveal Melanoma: External Validation Study

Authors: Taylor Gonzalez DJ, Cernichiaro-Espinosa LA, Djulbegovic MB, Dave N, King BA, Nabavi A, Delsoz M, Yousefi S, Shields CL, Wilson MW.

American Academy of Ophthalmology Annual Meeting, Orlando, FL. Oct 2025.

2025
Poster presentation

AI-Driven Survival Prediction in Uveal Melanoma Patients Treated with Low-Dose-Rate Iodine-125 Brachytherapy

Authors: L.A. Cernichiaro-Espinosa, D.J. Taylor Gonzalez, B.A. King, S. Choi, A. Nabavi, M. Delsoz, L. Pfeffer, S. Yousefi, M.W. Wilson, et al.

ARVO 2025 Annual Meeting.

View publication

2025
Poster presentation

One-Year Tumor Volume Regression as a Prognostic Indicator Following Iodine-125 Brachytherapy in Uveal Melanoma

Authors: King BA, Taylor Gonzalez DJ, Cernichiaro-Espinosa LA, Choi SL, Pfeffer L, Wilson MW.

ARVO 2025 Annual Meeting, Salt Lake City, UT. Jun 2025.

Next step: A full manuscript can tie this page even more tightly to the formal evidence base, especially for explaining model selection, calibration philosophy, and how local implementation should be monitored over time.