An industrial PhD
Advisors:
Cerquides Bueno, Jesús
University:
Abstract:
Does the vaccine against COVID-19 cause alterations in the menstrual cycle? Does it protect against the infection-increased risk of diabetes? These are examples of causal questions about the effects of clinical interventions. They are causal because they verse about causes - in this case the COVID-19 vaccine - and effects or consequences - in these cases, alterations in the menstrual cycle and protection against diabetes-. These questions are both important and difficult. Important for the obvious reason that they concern aspects of human health. Difficult for the complexity of the systems under study: the human body, human health, and their interaction with clinical interventions. There are several approaches to answering this type of question. This thesis is concerned with the data-driven, statistical} approach, particularly with using observational data, i.e., the data collected in scenarios where the clinical intervention of interest is not under the researchers' control. Traditionally, statistical correlational methods have been used to answer these questions with this type of data. In general, these methods only provide correlations without guaranteeing their causal nature. Nevertheless, in recent years, developments in the field of causal inference have provided us with methods that can offer some certainty of the causality of the measured relationships under the appropriate assumptions. Until recently, researchers' adoption of these methods has been hindered by three main factors: unawareness about their existence, inertia of the traditional methods, and, to a lesser extent, lack of trust in their performance. This tendency, though, has consistently changed in recent years in the literature of clinical studies. This thesis aims to test the hypothesis that causal inference methods should be the preferred choice for generating evidence on the effects of clinical interventions, with a particular focus on machine learning-based causal methods. For such purpose, we tackle three real-world use cases with real-world data, both using correlational and causal approaches, and we qualitatively assess and compare their performance (in a broad sense). In addition, we explore the field of machine learning (and mainly neural network)-based causal inference algorithms. The tackled questions are about the effect of the COVID-19 vaccine and vaccination timing on alterations of the menstrual cycle, the effect of the COVID-19 vaccine on the infection-heightened risk of diabetes onset, and the effect of antibiotic-loaded bone cement (a therapeutic option for patients undergoing total knee replacement surgery) on the survival of the prosthesis. Together with the aforementioned causal and correlational methods, we employ real-world observational data from large registries. As a result, we provide answers to the posed questions. In some cases, the provided answers and/or the employed methods were novel in the literature at their time of publication. In addition, we offer qualitative evidence of the benefits of causal methods compared to correlational methods. We conclude that, in general, and when possible, causal inference methods should be the preferred choice for answering these types of questions with observational data (i.e., when randomized experiments cannot be conducted).