Deep Learning Model “Sybil” Can Predict Lung Cancer Risk

A new report published in the Journal of Clinical Oncology in January 2023 proved that deep learning models could be used to predict risk from lung cancer. One model in particular, called Sybil.
This means that healthy people, as well as non-smokers, could undergo this test. When it does predict high risk, it can signal the specific at-risk region rather than being equally spread over the entire thorax.

Reading Time: 3 minutes

ai lung cancer

Illustration: Milica Mijajlovic

New findings on predicting lung cancer risk 

Lung cancer remains the #1 deadliest cancer in the world, accounting for 1 in 5 deaths caused by cancer. Recent data shows that the number of new lung cancer cases, as well as the number of lung cancer deaths, has been in continuous decline. This is primarily due to more people deciding to quit smoking but also because of new findings regarding early detection and treatment. 

Scholars from different institutions, such as the Massachusetts Institute of Technology (Cambridge), Harvard Medical School (Boston), and Chang Gung University (Taiwan), came to breakthrough findings regarding the prediction of lung cancer. The report was conducted by Peter G. Mikhael and Jeremy Wohlwend as first co-authors, and Ludvig Karstens, Florian J. Fintelmann, and Regina Barzilay as joint senior authors. 

They began their research with this premise: 


Low-dose computed tomography (LDCT) for lung cancer screening is effective, although most eligible people are not being screened. Tools that provide personalized future cancer risk assessment could focus approaches toward those most likely to benefit. We hypothesized that a deep learning model assessing the entire volumetric LDCT data could be built to predict individual risk without requiring additional demographic or clinical data.

Based on LDCT from the National Lung Screening Trial (NLST), mentioned scholars developed a model called Sybil. 

The model was tested on 6,282 LDCTs from NLST participants, 8,821 LDCTs from Massachusetts General Hospital (MGH), and 12,280 LDCTs from Chang Gung Memorial Hospital. The chosen participants had a diverse smoking history, including non-smokers. 

If you’re interested in the exact numbers and percentages, and know how to interpret them, we encourage you to take a look at the original article. If that’s not the case, this is the most important takeaway from the article: 


Sybil can accurately predict an individual's future lung cancer risk from a single LDCT scan to further enable personalized screening.

Now, what’s surprising is that lung cancer diagnosis is rapidly increasing among never- and lighter- smokers, which is yet another reason why screenings need to become more widespread and efficient. 

Not only that, but the sample needed to be expanded not only to heavier smokers but to the recently affected group as well. 


Because there’s a huge gap between the screened group and the diseased population. 

So huge that over 50% of women diagnosed with lung cancer worldwide are non-smokers, compared to 15-20% of non-smoking men. 

That’s also the main reason for joining forces with scholars from Taiwan, where they screened non-smokers as well. 

As it turns out, sophisticated technology wasn’t the only concern; engaging people in long-term screenings was yet another challenge that needed to be tackled. 

How does Sybil work? 

As shown in the previous years, two large controlled trials, NLST and Nelson, both based on LDCT, have proved efficient in lung cancer screenings (LCS), by contributing to a 20% and 24% decrease in lung cancer mortality. 

Thus, the US Preventive Services Task Force introduced annual LCS for those aged 50 and above, with a 20 pack-year smoking history. 

Now, the problem is, as the mentioned scholars pointed out, that < 10% of the eligible population has undergone LCS. Moreover, those who were screened, can’t really be considered representative examples since they weren’t encouraged to participate in follow-ups or long-term screenings. 

But that wasn’t the only problem. 

Most previous efforts to improve LCS have focused on identifying those at the highest risk for lung cancer and directing available resources to screen them. However, the screenings consisted of clinical and demographic variables as well as chest radiographs to model lung cancer risk among smokers. The follow-up LCS relied mostly on visible pulmonary nodule assessment. 

Later cancer detection algorithms introduced deep learning and were able to predict lung cancer within 1-2 years. However, they still didn’t manage to leverage image data from the patient’s previous screens, according to the mentioned scholars. 

So, what was Sybil’s approach? 

  1. Improving LCS by introducing individualized risk models for lung cancer prediction; 
  2. Combining demographic information, clinical risk factors, and radiologic annotations to improve existing models. 

We hypothesize that LDCT images contain information that is predictive of future lung cancer risk beyond currently identifiable features such as lung nodules. An algorithm that goes past visible nodules to predict future lung cancer risk over several years could further enhance patient management and LCS implementation strategies.

In addition, by using a single low-dose chest computed tomography (CT) scan and gathering this data for the past 15 years, Sybil is able to predict lung cancers occurring 1-6 years after a screen

Not only that, but this deep learning model is able to predict both short- and long-term lung cancer risk. 

Afterward, Sybil’s performance is evaluated on modern and independent test sets from Massachusetts General Hospital and Chang Gung Memorial Hospital, Taiwan. 

So far, Sybil is equally accurate across diverse sets of patients from both the US and Taiwan. 

The best part is – the code is publicly available. 

What could Sybil change in a clinical application? 

As pointed out in the original article, Sybil is able to correctly lateralize the location of future cancers and the likelihood that an LDCT receives a high-risk score. 

In other words, when it does predict high future lung cancer risk, it can signal the specific at-risk region rather than being equally spread over the entire thorax. 

Moreover, it can infer biologically relevant information, such as smoking duration, directly from LDCT images. 

What does “Sybil” stand for?
The name of this model wasn’t chosen by coincidence; quite the contrary – it fits the purpose so well. Namely, in Ancient Greece, Sybil was a name given to women who were thought to be able to see into the future. Both “Sybil” and “Sibyl” were equally represented.
In fact, sibyls were oracles during this era, with the most known one being located in Delphi, dating to as early as the eleventh century BC.

So, the main difference between existing models and Sybil is that: 

  • They either predict risk before a scan has been performed and can be used to steer high-risk patients toward screening, or; 
  • They predict risk after a scan has been performed and use data from the scan. 

What’s more, models similar to Sybil couldn’t be tested head-to-head because their code wasn’t publicly available. 

Finally, another useful application of Sybil can be for decreasing follow-up scans or biopsies among patients with nodules that are low risk. 

A journalist by day and a podcaster by night. She's not writing to impress but to be understood.