Poster Presentation NSW State Cancer Conference 2023

Transformer-guided multi-modal feature learning to standardise thorax radiotherapy structure nomenclature (#207)

Fahim Irfan Alam 1 2 3 , Matthew Field 1 2 3 , Phillip Chlap 1 2 3 , Shivani Kumar 1 2 3 , Ali Haidar 1 2 3 , Daniel Al Mouiee 1 2 3 , Janet Cui 1 2 3 , Vicky Chin 1 2 3 , Shalini Vinod 1 2 3 , Geoff Delaney 1 2 3 , Lois Holloway 1 2 3 4
  1. South Western Sydney Clinical Campus, School of Clinical Medicine, University of New South Wales, Sydney, NSW, Australia
  2. South Western Sydney Cancer Services, NSW Health, Sydney, NSW, Australia
  3. Ingham Institute for Applied Medical Research, Sydney, NSW, Australia
  4. Institute of Medical Physics, University of Sydney, Sydney, NSW, Australia

The utilisation of standardised naming for Organs-At-Risk (OARs) and Target Volumes (TVs) plays a crucial role in facilitating the analysis of real-world radiotherapy (RT) datasets in a platform that utilizes multiple large-sized clinical practice databases for improving patients' outcomes. Manual standardisation methods require substantial time and effort, which result in inconsistencies between different observers. To address this issue, a transformer-based multimodal framework has been investigated that combines features of different modalities to automate the standardisation of thorax RT structure names. 2D central slices containing the maximum number of tumour pixels on the transversal axis as image features, structure names as text features, spatial data pertaining to the context surrounding an RT structure, as well as geometric and dosimetry features are fused in a visual transformer to produce multimodal feature vector. This combined representation is further trained by dense layers, followed by a softmax function that eventually classifies the data into a standardised name. The model was developed and internally validated on “Liv-Breast”, a curated cohort of 1436 breast cancer patients, treated between 2014 and 2018 at the Liverpool and Macarthur Cancer Therapy Centres, Australia. The structures were classified into standardised nomenclature of OARs (combined lungs, contralateral breast, heart, left lung and right lung), primary TVs (breast Clinical Target Volume (CTV), breast Planning Target Volume (PTV), chestwall CTV, chestwall PTV, tumourbed CTV, tumourbed PTV), Nodal TVs (Axilla CTV, Axilla PTV, Internal Mammary Chain (IMC) CTV, IMC PTV, Supraclavicular Fossa (SCF) CTV, SCF PTV), control structures include planning risk volumes (PRV) (for example, heart PRV, lung PRV), boost structures (high risk volumes) and combined structures. Furthermore, the effectiveness of the model was validated on a publicly available lung cohort, specifically for the classification of OARs in the dataset (Heart, Left Lung, Right Lung, Combined Lung, Esophagus, Spinal Cord). During the experiments, the best performing model achieved an average 98.13% accuracy across all structures, both OARs and TVs in the Liv-Breast. The combination of multimodal features resulted in an accuracy of 100% for OARs within the lung cohort. This study has two potential outcomes. Firstly, the developed model can be employed to retrospectively classify inconsistent names into standardised nomenclature, adhering to national and international guidelines. Secondly, the utilisation of machine learning-based models, along with accessing previously untapped retrospective data, can also serve as a means of real-time quality assurance in clinics, reducing the risk of mislabeled data with subsequent treatment consequences.