Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

MELBA 2026

Theo Di Piazza1,2 Carole Lazarus3 Olivier Nempont3 Loic Boussel1,2
1INSA Lyon, 2Hospices Civil de Lyon 3Philips Clinical Informatics

Abstract

With the growing volume of CT examinations, there is an increasing demand for automated tools such as organ segmentation, abnormality detection, and report generation to support radiologists in managing their clinical workload. Multi-label classification of 3D Chest CT scans remains a critical yet challenging problem due to the complex spatial relationships inherent in volumetric data and the wide variability of abnormalities. Existing methods based on 3D convolutional neural networks struggle to capture long-range dependencies, while Vision Transformers often require extensive pre-training on large-scale, domain-specific datasets to perform competitively. In this work, we propose a 2.5D alternative by introducing a new graph-based framework that represents 3D CT volumes as structured graphs, where axial slice triplets serve as nodes processed through spectral graph convolution, enabling the model to reason over inter-slice dependencies while maintaining complexity compatible with clinical deployment. Our method, trained and evaluated on 3 datasets from independent institutions, achieves strong cross-dataset generalization, and shows competitive performance compared to state-of-the-art visual encoders. We further conduct comprehensive ablation studies to evaluate the impact of various aggregation strategies, edge-weighting schemes, and graph connectivity patterns. Additionally, we demonstrate the broader applicability of our approach through transfer experiments on radiology report generation and abdominal CT data.

Video

Method

(1) Adjacent axial slices are grouped into triplets, each representing a node in a graph. (2) Edges between nodes are weighted according to their physical distance along the z-axis. (3) Node features are enhanced with Triplet Axial Slices positional embeddings, and then processed by a Spectral Block that incorporates Chebyshev graph convolution for structured spectral modeling. (4) The resulting node representations are aggregated via mean pooling and passed to a classification head to predict abnormalities.

Main figure

Experiments

Models are trained on CT-RATE, using 5-fold cross-validation patient-level splits. Evaluation include an internal validation on the held-out CT-RATE test set, as well as cross-dataset evaluations on RAD-ChestCT and the private CT-HCL. We benchmark CT-SSG against 2.5D and 3D visual encoders including convolutional, transformer-based and hybrid architectures.

Abnormality Classification

We first evaluate visual encoders on the multi-label abnormality classification task. Performance is assessed using F1-Score, AUROC, Accuracy and mAP.

Internal. Reported metrics are averaged across the 18 classes from CT-RATE. CT-SSG yields the best performances across all metrics (paired t-test across folds, p < 0.01), with a +Δ8% improvement in AUROC over ResNet3D, and +Δ6% over ViT-3D.
Classification - Quantitative results - CT-RATE
External [1/2]. Evaluation on RAD-ChestCT is limited to the 16 abnormalities shared with CT-RATE, grouping artery wall calcification and coronary artery wall calcification under the calcification label. Also, mosaic attenuation pattern is not available. CT-SSG demonstrates strong cross-dataset generalization, yielding the best performances across all metrics.
Classification - Quantitative results - RAD-ChestCT
External [2/2]. Evaluation on CT-HCL is limited to the 9 meta-classes, derived from radiology reports, shared with CT-RATE.
Classification - Quantitative results - CT-HCL
Qualitative analysis. We display gradient-weighted class activation maps (GradCAM) on CT-RATE test samples, extracted from the 2D ResNet module, illustrating CT-SSG's ability to predict abnormalities from relevent regions.
Classification - Qualitative results

Ablation study

Incremental ablation. Starting from the initialization of the node features to the full CT-SSG architecture, we quantify the impact of each component. The cumulative trend suggests that the model benefits from the synergistic effect of multiple choices.

Incremental ablation - Quantitative results

Graph convolutional operators. We also evaluate abnormality classification performance across different graph operators, replacing the spectral-based Chebyshev operator with spatial ones, including graph convolution and graph attention operators, also varying the graph topology. These empirical results highlight the advantage of spectral formulations over spatial ones to capture dependencies across axial slices.

Graph convolution - Quantitative results

Transfer to report generation

Quantitative analysis. CT-SSG is further evaluated on the automated report generation task. To isolate the effect of latent representation quality, we adopt a deliberately simple encoder-decoder architecture inspired by CT2Rep. The visual encoder is pretrained and kept frozen, while the decoder is trained on the next-token prediction task. We select a representative set of baselines, including the transformer-based ViViT and the fully convolutional ResNet3D. We report Natural Language Generation (BLEU-1, METEOR) and Clinical Efficacy (macro RadBERT F1-Score, CRG) metrics on the held-out CT-RATE test set. CT-SSG achieves substancial improvements over baseline visual encoders.

Report generation - Quantitative results

Qualitative analysis. Below, the figure shows examples of generated reports using CT-SSG as visual encoder. Color-coded terms indicate detected abnormalities, illustrating CT-SSG's ability to generate reports with relevant terminology.

Report generation - Qualitative results

Transfer to Abdominal CT

We further investigate the cross-domain generalization of our learned representations by extending evaluation to 3D abdominal CT scans, leveraging the Merlin Abdominal CT dataset. As only radiology reports are available, pseudo labels are extracted using LLaMA 3.1. We perform a linear evaluation on the abnormality classification task, varying the training set size from 100 to 10,000 samples. CT-SSG, trained on CT-RATE chest CT volumes, is kept frozen while a linear layer is trained on the classification task with Binary Cross-Entropy supervision. Compared to an end-to-end supervised baseline with no chest CT pretraining, the linear probing configuration demonstrates improved performances for low-data regime scenarios.

Quantitative results

Related Links

In this work of academic research, our experiments are run on public CT datasets. We acknowledge contributors from CT-RATE [1], RAD-ChestCT [2], and Merlin Abdominal CT [3]. for releasing the datasets to the research community.

[1] Generalist foundation models from a multimodal dataset for 3D CT. Hamamci et al. 2026.

[2] Machine-learning-based multiple abnormality prediction with large-scale chest CT volumes. Draelos et al. 2021.

[3] Merlin: a computed tomography vision–language foundation model and dataset. Blankemeier et al. 2026.

BibTeX

@article{dipiazza_2026_ctssg,
  author    = {Di Piazza, Theo and Lazarus, Carole and Nempont, Olivier and Boussel, Loic},
  title     = {Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans},
  journal   = {Machine Learning for Biomedical Imaging (MELBA)},
  year      = {2026},
}

More research

Explore additional recent work in medical image analysis related to this project. Click on the images to access the corresponding project pages.

Method 1

CT-AGRG
ISBI 2025
Report generation

Method 2

CT-Scroll
MIDL 2025
2.5D Representation learning

Method 3

UniCT
MICCAI 2026
Multi-task learning