aimed analytics logo

Predicting Cancer-Related Sarcopenia with Machine Learning

Discover how cutting-edge machine learning models are revolutionizing the early diagnosis and management of cancer-related sarcopenia — and learn something new every day! By leveraging plasma proteomics data from the UK Biobank, researchers have identified key protein biomarkers that could pave the way for personalized interventions and improved patient outcomes.

Unlocking New Frontiers in Cancer Care: Predicting Sarcopenia with Machine Learning

Recent research from the Cancer Center at The First Hospital of Jilin University has made significant strides in predicting cancer-related sarcopenia using innovative plasma proteomics-based machine learning models. The study, leveraging data from the UK Biobank, provides insights that could revolutionize early diagnosis and personalized intervention strategies for cancer patients.

At a Glance:

  • Study Population: 4053 participants from the UK Biobank

  • Proteins Analyzed: 2923 plasma proteins

  • Key Findings: 7 proteins significantly associated with cancer-related sarcopenia

  • Best Model For Prediction: Naïve Bayes with an AUC of 0.781

  • Potential Impact: Early identification and targeted intervention for high-risk patients

Understanding Sarcopenia

Sarcopenia is a condition characterized by the progressive loss of skeletal muscle mass and strength, which can lead to physical disability, poor quality of life, and increased mortality. It is often associated with aging but can also occur as a result of chronic diseases, including cancer. In the context of cancer, sarcopenia is particularly concerning as it can exacerbate the already debilitating effects of the disease and its treatments. 

The pathogenesis of sarcopenia is multifactorial, involving systemic inflammation, metabolic disturbances, and the direct impact of tumors on muscle tissue. Early diagnosis and intervention are crucial for managing sarcopenia, as timely treatment can help preserve muscle function and improve patient outcomes. 

Traditional diagnostic criteria, such as those provided by the European Working Group on Sarcopenia in Older People (EWGSOP2), focus on the assessment of muscle strength and mass. However, these measurements can sometimes be impractical in clinical settings, highlighting the need for more accessible and reliable biomarkers, such as those identified through advanced proteomics and machine learning methods.

The Challenge of Cancer-Related Sarcopenia

Cancer-related sarcopenia, characterized by the progressive loss of skeletal muscle mass and strength, is a common complication in advanced cancer patients. It significantly worsens prognosis and reduces quality of life. Traditional diagnostic methods, such as the European Working Group on Sarcopenia in Older People (EWGSOP2) criteria, rely on muscle strength and mass assessments, which may not always be available or accurate in clinical practice. Therefore, there is a pressing need for novel biomarkers and predictive models.


Study Population

The study utilized data from the UK Biobank, a large-scale cohort of 502,642 participants. After excluding those without a sarcopenia diagnosis, the researchers analyzed 4053 participants, including 1176 with cancer-related sarcopenia.

Proteomic Analysis

The team performed plasma proteomic analysis on 2923 proteins from 134 patients with cancer-related sarcopenia and 340 with non-cancer-related sarcopenia. They identified seven proteins with the highest correlation to sarcopenia: ELN, MMP1, MDK, COL18A1, FBLN2, AMBP, and DLL1.

Findings and Model Development

Key Proteins

The study found significant associations between cancer-related sarcopenia and the seven proteins. For instance:

the largest positive correlations were found between ELN (36·86%) and AMPB (24·76%) and the level of muscle fat infiltration, and the largest negative correlations were found between COL18A (37·27%) and DLL1 (33·84%) and the level of muscle fat infiltration. 
Liu et al. (2024): Developing Plasma Proteomics-Based Machine Learning Models to Diagnose and Predict Cancer-Related Sarcopenia: A Study from the UK Biobank. DOI:

Machine Learning Models

Six machine learning models were evaluated, including Logistic Regression, Random Forest, Extreme Gradient Boosting, Support Vector Machine, Decision Tree, and Naïve Bayes. The Naïve Bayes model demonstrated the highest predictive accuracy (AUC = 0.781) when combined with the traditional biomarker GDF-15.

Model Interpretation and Validation

The Shapley Additive Explanations (SHAP) technique was used to interpret the model, quantifying the effects of each protein on the prediction results. The study also employed Weighted Quantile Sum (WQS) regression to assess the individual effects of each protein on future muscle composition.

Clinical Implications

The developed model has significant clinical implications. Early identification of high-risk patients allows for timely interventions, potentially improving patient outcomes and reducing healthcare costs. The study’s findings also provide new insights into the biological mechanisms underlying cancer-related sarcopenia, paving the way for personalized treatments.


This groundbreaking study demonstrates the potential of plasma proteomics-based machine learning models in predicting cancer-related sarcopenia. The model offers a more accurate and comprehensive approach than traditional methods, facilitating early diagnosis and personalized interventions.

Future Directions

Further research is needed to validate these findings in independent cohorts and to explore the biological mechanisms underlying the identified protein associations. The ultimate goal is to translate this machine learning model into clinical practice, improving the management and outcomes of cancer patients.

Read more