Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

MELBA 2026

Theo Di Piazza^1,2 Carole Lazarus³ Olivier Nempont³ Loic Boussel^1,2

¹INSA Lyon, ²Hospices Civil de Lyon ³Philips Clinical Informatics

Abstract

With the growing volume of CT examinations, there is an increasing demand for automated tools such as organ segmentation, abnormality detection, and report generation to support radiologists in managing their clinical workload. Multi-label classification of 3D Chest CT scans remains a critical yet challenging problem due to the complex spatial relationships inherent in volumetric data and the wide variability of abnormalities. Existing methods based on 3D convolutional neural networks struggle to capture long-range dependencies, while Vision Transformers often require extensive pre-training on large-scale, domain-specific datasets to perform competitively. In this work, we propose a 2.5D alternative by introducing a new graph-based framework that represents 3D CT volumes as structured graphs, where axial slice triplets serve as nodes processed through spectral graph convolution, enabling the model to reason over inter-slice dependencies while maintaining complexity compatible with clinical deployment. Our method, trained and evaluated on 3 datasets from independent institutions, achieves strong cross-dataset generalization, and shows competitive performance compared to state-of-the-art visual encoders. We further conduct comprehensive ablation studies to evaluate the impact of various aggregation strategies, edge-weighting schemes, and graph connectivity patterns. Additionally, we demonstrate the broader applicability of our approach through transfer experiments on radiology report generation and abdominal CT data.

Video

Method

(1) Adjacent axial slices are grouped into triplets, each representing a node in a graph. (2) Edges between nodes are weighted according to their physical distance along the z-axis. (3) Node features are enhanced with Triplet Axial Slices positional embeddings, and then processed by a Spectral Block that incorporates Chebyshev graph convolution for structured spectral modeling. (4) The resulting node representations are aggregated via mean pooling and passed to a classification head to predict abnormalities.

Experiments

Models are trained on CT-RATE, using 5-fold cross-validation patient-level splits. Evaluation include an internal validation on the held-out CT-RATE test set, as well as cross-dataset evaluations on RAD-ChestCT and the private CT-HCL. We benchmark CT-SSG against 2.5D and 3D visual encoders including convolutional, transformer-based and hybrid architectures.

Abnormality Classification

We first evaluate visual encoders on the multi-label abnormality classification task. Performance is assessed using F1-Score, AUROC, Accuracy and mAP.

Internal. Reported metrics are averaged across the 18 classes from CT-RATE. CT-SSG yields the best performances across all metrics (paired t-test across folds, p < 0.01), with a +Δ8% improvement in AUROC over ResNet3D, and +Δ6% over ViT-3D.

Classification - Quantitative results - CT-RATE

External [1/2]. Evaluation on RAD-ChestCT is limited to the 16 abnormalities shared with CT-RATE, grouping artery wall calcification and coronary artery wall calcification under the calcification label. Also, mosaic attenuation pattern is not available. CT-SSG demonstrates strong cross-dataset generalization, yielding the best performances across all metrics.

Classification - Quantitative results - RAD-ChestCT

External [2/2]. Evaluation on CT-HCL is limited to the 9 meta-classes, derived from radiology reports, shared with CT-RATE.

Classification - Quantitative results - CT-HCL

Qualitative analysis. We display gradient-weighted class activation maps (GradCAM) on CT-RATE test samples, extracted from the 2D ResNet module, illustrating CT-SSG's ability to predict abnormalities from relevent regions.

Ablation study

Incremental ablation. Starting from the initialization of the node features to the full CT-SSG architecture, we quantify the impact of each component. The cumulative trend suggests that the model benefits from the synergistic effect of multiple choices.

Incremental ablation - Quantitative results

Graph convolutional operators. We also evaluate abnormality classification performance across different graph operators, replacing the spectral-based Chebyshev operator with spatial ones, including graph convolution and graph attention operators, also varying the graph topology. These empirical results highlight the advantage of spectral formulations over spatial ones to capture dependencies across axial slices.

Graph convolution - Quantitative results

Transfer to report generation

Quantitative analysis. CT-SSG is further evaluated on the automated report generation task. To isolate the effect of latent representation quality, we adopt a deliberately simple encoder-decoder architecture inspired by CT2Rep. The visual encoder is pretrained and kept frozen, while the decoder is trained on the next-token prediction task. We select a representative set of baselines, including the transformer-based ViViT and the fully convolutional ResNet3D. We report Natural Language Generation (BLEU-1, METEOR) and Clinical Efficacy (macro RadBERT F1-Score, CRG) metrics on the held-out CT-RATE test set. CT-SSG achieves substancial improvements over baseline visual encoders.

Report generation - Quantitative results

Qualitative analysis. Below, the figure shows examples of generated reports using CT-SSG as visual encoder. Color-coded terms indicate detected abnormalities, illustrating CT-SSG's ability to generate reports with relevant terminology.

Transfer to Abdominal CT

We further investigate the cross-domain generalization of our learned representations by extending evaluation to 3D abdominal CT scans, leveraging the Merlin Abdominal CT dataset. As only radiology reports are available, pseudo labels are extracted using LLaMA 3.1. We perform a linear evaluation on the abnormality classification task, varying the training set size from 100 to 10,000 samples. CT-SSG, trained on CT-RATE chest CT volumes, is kept frozen while a linear layer is trained on the classification task with Binary Cross-Entropy supervision. Compared to an end-to-end supervised baseline with no chest CT pretraining, the linear probing configuration demonstrates improved performances for low-data regime scenarios.

BibTeX

@article{dipiazza_2026_ctssg,
  author    = {Di Piazza, Theo and Lazarus, Carole and Nempont, Olivier and Boussel, Loic},
  title     = {Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans},
  journal   = {Machine Learning for Biomedical Imaging (MELBA)},
  year      = {2026},
}

More research

Explore additional recent work in medical image analysis related to this project. Click on the images to access the corresponding project pages.

CT-AGRG
ISBI 2025
Report generation

CT-Scroll
MIDL 2025
2.5D Representation learning

UniCT
MICCAI 2026
Multi-task learning