CRAJ Spring 2025

Spring 2025 (Volume 35, Number 1)

Artificial Intelligence in Rheumatology: Friend or Foe?

By Carol Hitchon, MD, FRCPC, MSc; Liam O’Neil, MD, FRCPC, MHSc; and Pingzhao Hu, PhD

The incorporation of artificial intelligence (AI) into daily life and medical sciences is a rapidly emerging field. There is now pervasive monitoring of internet searches to target information presented during personal online search and social media feeds, writing tools such as ChatGPT (Generative Pre-trained Transformer), and dictation tools such as the AI scribes being supported by the Canadian Rheumatology Association. In research settings, large datasets containing a vast amount of clinical, imaging, and biological data are being analyzed to explore patterns of data to predict a variety of clinical states and outcomes. AI clearly has potential to be a powerful clinical and research tool, but caution is needed when interpreting AI studies and using AI tools.

There are several published guidelines that aid the interpretation of clinical studies using artificial intelligence models, including the updated “Minimum Information about CLinical Artificial Intelligence Modelling “(MI-CLAIM) checklist,¹ and guidance on the ethical incorporation of AI tools into clinical practice.² As applied to clinical studies, these guidelines stress the importance of choosing clinical data sets that are representative of the population under study and described in detail with clear unambiguous terminology. Accuracy and reproducibility of the generated models should be compared ideally to clinician-based models (still the “gold standard”) as well as to other machine learning models. Indeed, combined AI and clinician-based models often outperform clinician only based models. These quality assurance steps are critical as “garbage in leads to garbage out”.

In rheumatology, many groups in Canada and internationally are mining large clinic and biological data sets to come up with prediction models for categorizing clinical phenotypes, predicting clinical outcomes and understanding biological mechanisms. Less studied (in rheumatology) is the application of AI methods to radiographic image interpretation. Such studies have been used to evaluate magnetic resonance imaging scans (MRIs), computed tomography scans (CTs), mammograms, and ultrasounds in other specialties. In rheumatology, plain radiographs remain the most widely used imaging tool to assess joint damage in inflammatory arthritis and scores for joint damage are considered the “gold standard” for assessing damage progression in clinical studies. However manual scoring of radiographs for damage is time-consuming, requires expertise not always available, and thus is not practical for the busy rheumatologist. A few groups have used machine learning, a type of AI, to evaluate and quantitate radiographic joint damage in inflammatory arthritis.

Our research team is using AI methods to develop a tool to assist clinicians and researchers to score standard radiographs from patients with rheumatoid arthritis.³ Our first challenge was to have the computer accurately detect the target joints within the radiograph image. We developed an algorithm using a state-of-the-art image object detection tool, “You Only Look Once“ (YOLO), to detect specific objects in images commonly seen in daily life. We then fine-tuned this program on a publicly available dataset of pediatric joint radiographs and validated the detection tool using adult radiographs obtained from patients followed for up to 10 years as part of the Manitoba Early Arthritis cohort. The joint detection tool is able to identify and label the target joints with excellent accuracy in both pediatric and adult radiographs containing both hands or one hand. Our second challenge is to have the computer “score” the target joints for the presence of joint space narrowing and erosions in order to calculate the Sharp van der Heidje damage score. For this challenge we used serial radiographs obtained from patients followed in the Canadian Early Arthritis Cohort (CATCH) which had been scored by Dr. Van der Heidje and her team. These scored radiographs are considered the “gold standard” for assigning joint damage scores.

Using the CATCH radiographs, we developed an algorithm using machine learning methods to score the joints and combined this with the joint detection tool. We compared the accuracy of our algorithm to the results obtained using different machine learning methods commonly in use. The algorithm demonstrates very encouraging findings with good accuracy that, in some instances, exceeds that of other machine learning methods. We are working on fine tuning the algorithm to enable enhanced ongoing learning to improve the model’s performance over time. Our model will need to be replicated in other large imaging datasets that include RA radiographs with a wide range of damage scores. Our algorithm was designed for RA radiographs and similar studies in other arthropathies that may have distinct radiographic appearances are also needed.

The third challenge is to develop a user-friendly platform whereby clinicians can input a radiographic image and receive accurate joint damage scores. These scores can then be used to monitor patients for joint damage. This ongoing work is a practical example of how AI technology can be used to assist day-to-day clinical rheumatology practice, particularly in settings with limited radiology resources, or in research settings where high volume radiographic scoring is needed.

AI is clearly a potentially powerful clinical and research tool that, in time, can be feasibly incorporated into daily clinical practice to enhance targeted treatment of rheumatic disease. However, much caution and careful consideration of ethical principles are needed prior to widespread use.² Importantly, AI will never replace the clinical acumen of rheumatologists.

Carol Hitchon, MD, FRCPC Professor of Medicine, Department of Internal Medicine University of Manitoba Winnipeg, Manitoba

Liam O’Neil, MD, FRCPC Assistant Professor, Department of Internal Medicine University of Manitoba

Pingzhao Hu, PhD Associate Professor and Canada Research Chair, Computational Approaches to Health Research (Tier 2) Western University London, Ontario

Glossary:

Artificial Intelligence: any reasoning-based intelligence capable of analysis that comes from computer systems.

Machine learning: subset of AI whereby a computer gets “smarter” by “learning” from its mistakes.

Neural networks: style of machine learning that tries to mimic the way human brains work with a vast array of “neurons” that either turn on or not based on the inputted data.

References:

1. Miao BY, Chen IY, Williams CYK, et al. The MI-CLAIM-GEN checklist for generative artificial intelligence in health. Nat Med. 2025 doi: 10.1038/s41591-024-03470-0 [published Online First: 20250206]

2. Ning Y, Teixayavong S, Shang Y, et al. Generative artificial intelligence and ethical considerations in health care: a scoping review and ethics checklist. Lancet Digit Health. 2024;6(11):e848-e56. doi: 10.1016/s2589-7500(24)00143-2 [published Online First: 20240917]

3. Hitchon C AIS, Fung D, Liu Q, Lac L, Bartlett S, Bessette L, Boire G, Bykerk V, Hazlewood G, Keystone E, Pope J, Schieir O, Thorne C, Tin D, Valois M, van der Heijde D, (CATCH) Investigators C, O'Neil L, Hu P. Artificial Intelligence Models for Computer-Assisted Joint Detection and Sharp-van Der Heijde Score Prediction in Hand Radiographs from Patients with Rheumatoid Arthritis [abstract]. Arthritis Rheumatol 2023;75 (suppl 9).

4. Blanchard KJ. Artificial Intelligence: How we got here Artificial Intelligence - Everything you need to know A360 Media LLC 2024.