AI and observational health data for early detection of endometriosis - Noémie Elhadad, PhD

Endofound Org,
Posted on May 2, 2024

International Medical Conference

Endometriosis 2024:
Elevating Sampson’s Century Legacy via
Deep Dive with AI

For the benefit of Endometriosis Foundation of America (EndoFound)

May 2-3, 2024 - JAY CENTER (Paris Room) - NYC

Okay, can you hear me better now? Yay. All right, so I'm going to be talking to you at a different scale from Katie looking at observational health data. And the question we want to ask is, can AI help detect endometriosis? And we could ask this at different points, but I'm going to be particularly focusing about detecting it early at the point of care. So a patient interacts with a healthcare system, and can we then have information about the patient that allows us to say there's a high likelihood of endometriosis, and why do we want to do this? Maybe we want to triage patients better to figure out who needs to get surgery and who doesn't. And we can always say that there will be at some point, some noninvasive diagnostic tests and we're all hoping for those. But there's still this question because specifically of endometriosis and maybe gender bias more generally, which is will there be clinicians who are willing to give this test to patients?

And so here, the hope is that AI is going to kind of take it over and really just inform the clinicians and the patient of the likelihood of endometriosis. And so there's different types of data we're interested in. It could be imaging, it could be electronic health record data, but it could also be data from your phone, from your watch, as well as maybe genomic or microbiome type of data. And if we give some of this, can we just say yes, no patient has a high risk of endometriosis. I think what's important is to, as we're doing this as technologists, is to primarily remember that we're dealing with a single patient, a human that is viewed from this complimentary data set, and they're all going to behave very differently. It might even look like they're different individuals, but really the excitement about AI is not only the access to these data streams, but also the fact that we can use them in unison.

So the preview of the TOC is yes, I strongly believe that AI can help with early detection of endo at the point of care, but there are a lot of details that are going to matter for us to actually create tools that are going to be useful at the point of care. So I'm going to focus on three types of considerations. The first one is what are criteria for success? The second one is, what are the considerations about the data sources that are specific for endometriosis? And finally, what are the algorithms and validation considerations that are against specifically appropriate for endometriosis? So I'm going to start with the criteria for success. I think it's useful to think about human-centered ai, which is an emergent field in the field of ai, which is thinking about, yes, there is technology, but the technology is for humans. And we can think about this principle as guiding principles for success.

The ethics presentation earlier this morning actually covered a lot of this. And so I'm going to go quickly through the one that we're covered. There's a lot of ethical consideration, but they're also a new consideration because we're dealing with AI in particular. And so one is benefic. We want to reduce the lack to diagnoses. We want to prevent harm. And harm could also be just an accessory care. Justice and fairness, a lot has been talked about. Transparency is one of these new type of AI specific principles, which are really the idea that users of AI have to be educated. Consumers like how much can they trust? A particular AI will depend on how transparent that AI system is. And so what do we mean by transparent? It should be very clear to users what data were used, what algorithms were used, how validated was it?

Is it reproducible to different populations and different health systems? And right now, I would say that's actually a gigantic challenge for AI and one that the White House has actually set within its effort on AI as a particularly important one. The next one is trust others and transparency, explainability and interpretability of AI is critical. So humans don't react well to being told yes, the patient has endometriosis or no, what they want to know is why did the AI decide such output? And so that's what trust is. That's one way to have more trust autonomy. And we talked about autonomy more from an ethical standpoint this morning. I think there's a lot of questions about autonomy for clinicians in particular because there are liability questions if the AI makes a mistake about a patient who's liable. And the autonomy of a clinician is really put in question here.

Patients also need to have autonomy with respect to these tools for sure. Utility and acceptability seems like a one-liner, but I think it's also one of the primary challenges of AI to be deployed in healthcare nowadays, which is that again, it's the same principle. Someone can spend a whole lot of time or a company or a research institute on a technology, but whether this technology will actually be used and will be non-disruptive to the existing workflows of clinicians is a major question. And so we have to ask ourselves how useful it will it be, not once we have created the technology, but really as we are creating it. And finally, we want to be patient-centered. And I'll get back to that within my slides. So let me now move on to these questions about considerations around observational health data and the challenges that are specific to endometriosis.

So I'm going to focus on no becoming traditional observational health data set that are longitudinal in nature. So we have maybe 10 years of view of a patient, maybe more in our data sets. For example, our electronic health record system. We have 30 years of data for our patients, so it goes way back, but it could be electronic health record, it could be insurance claims, it could also be from the self-tracking app. Some are not specific to endometriosis, and some could be specific to endometriosis. And I'm going to claim that we can learn something from each of these types of data. So EHR claims data, why do we want it? The scale is really what we're after. And the fact that it provides real world evidence, it's not scientific data, but it is really a good representation of how clinicians act around endometriosis patients. There are tremendous issues obviously with clinical data.

And Stacy Miser went through many of them. And so we need to be aware of them, we need to be thinking about the noisiness and of these codes, et cetera. But we also need to be very wary of the fact that foreign endometriosis specifically, we would have a hard time having certainty about who actually has endometriosis. And that could be because of underdiagnosis, that could be because of delayed diagnosis, but also because there are a lot of misdiagnosis. And finally, we want to really be thinking about these types of data as a representation of, again, clinician's behavior rather than the actual truth about a patient. So what could we do with this? This is one of the first analysis we had done. We had looked at nine different data sets and were interested in saying if we don't select any cohort, but we look at literally every woman who is in reproductive age, so that's about 188 million women.

And out of those, we know how we have a way of selecting who has endometriosis, and there is about 2 million of them in aggregate across all of these data sets. And what is not very visible is comparison of the most common problems that are being documented. And there in red is the endometriosis cohort, these 2 million patients. And in pink is all of the other 186 million patients. And there are patterns that are already with this very simple question out of all women, even if the comparison cohort has underdiagnosed patients, even if the endo maybe don't have endo, although that's never a problem in our data, we see patterns already. So that's extremely promising to us. But what we also see is that these patterns might not be reflective of what patients are experiencing. And so there is value in looking at patient generated data. I'm not going to talk too much about this, but I really want us as a community to think about these data sets that are not specific to endometriosis.

And yet again, because of their size could really help us distinguish and discriminate between who and who doesn't have endometriosis. So here we did a partnership with Clue the Menstrual Trapper. There's about 10 million users across the globe and there's a lot of challenges. But basically at the end of the day we identified that there were about 10% of our users in our analysis that had symptoms that look a lot like endometriosis, which is interesting again, because a lot of the early papers on the prevalence of endo say 10% by the way, in this observational health data set, we see a very different prevalence. We see between 2% and 6%. And so I think we can think a lot about what does that mean about under diagnosis? Obviously one thing that we've been focusing a lot on is with respect to specifically endometriosis patient generated data.

And so why are we interested in those? Because it's very specific to Endo patient, but we can allow to have highly granular, super specific to endo type of questions that we're asking patients. And so that's what we did. This paper describes how we went at identifying what are the lived experiences of patients in a very structured fashion, what are the pain, what are self-management, what type of symptoms, et cetera. And we created an app, which is actually a research app called Fando and is available and there's an informed consent and everything privacy is paramount obviously, but we were able to gather so far about 17,000 participants again throughout the world. And who are the participants? There's a bunch of them. There are actually more that do not have endometriosis but are curious and some, so they don't have an official diagnosis. And then there are some who say, I'm pretty sure I don't have it.

I just enjoy tracking myself. So what can we learn when we have this type of data? We can learn granular descriptions of pain. So these are kind of an aggregate picture of the different body locations of pain for our patients. And what's interesting is we can confirm that it's not a pelvic disease or only affects the reproductive organs. We can also show that there are tremendous within patient variability. And again, Stacy referred to this a little bit on her talk here. We're showing pain tracking variation from one week to the next. And we're seeing that we go from a whole lot of pain, severe, not severe, different types of body things. And the point of that diagram is that you don't see anything. Do you Hold on, is that a lot of people who have severe will continue being severe, but these guys will go through many types of pain trajectory.

Basically what we also find is that it's not that pattern extends to everything. So not only pain, but we have an AI derived health status variation, which actually I'll mention a little bit later. And what we see, what I'm showing here is on this side is for just a few patients, what happens across four, I don't know how many 12 weeks I think, or maybe more, each colon is a week and each row is one patient. And basically green means you're doing not too bad. Red is doing like you're not really doing well. This is not a good week. And the point here again is that there's this tremendous variations from one week to the next. There's also the interesting phenomenon, which is that these variations are independent of periods. So for the patients who were actually menstruating, we are finding that there's no correlation basically between these variations in status.

And this is kind of like a large population picture of all of our patients. So some more challenges. So these are different data sources who has endometriosis? We're using diagnosis guidelines to identify these patients and diagnosis guidelines. Were not done for that purpose at all. The guidelines are done for clinicians to decide what tests to run, et cetera. So there's this translation process that is complex. We've been putting out their open source validated phenotype definitions that are no used as standard for emerge, for example. But I think what's interesting is that there are changes in guidelines, right? And so Ashray has put out these guidelines now that are much more symptoms based with I think the intent of catching patients earlier. And what we're finding, in fact, I won't go to the details here, but what we're finding is that across different data sets, the change of guidelines does not make the cohort bigger, but rather there's an overlap of populations.

And so we want to be careful about this, right? We don't want to just change to the next guideline. We want to be as representative of different types of patients. How much time do I have? Okay, so algorithm considerations. A lot of these considerations that I'm putting here are specifically problematic for endometriosis. There is missing data representation of patients, but also within each individual missing data, there's noisy data as I mentioned. For example, the uncertainty about actual diagnoses, very varied types of patients and high temporal variations within patient trajectories. And then we have a whole lot of covariates to analyze. And furthermore, these covariates have nonlinear relationship among them. In other words, we need heavy duty AI techniques to be able to handle these things. So this is one example. This is how we found this AI derived health status information. We used a probabilistic approach called mixed membership learning.

We validated it through worth and clinicians looking at them. But the point here is that again, we found certain types of patients that are different according to what the patients are saying. These do not correlate with surgical phenotypes, which is by the way, something that all the patients have been telling us that the surgical phenotypes are not correlating with their own experience of symptoms. But what we found was that the distinction between these groups of patients was more about the severity. It was a disease across the whole body. But the severity is what distinguished these groups. This is an example of prediction now looking EHR data. And the reason I'm showing this is can we do early detection? This idea of transparency and reproducibility is critical. And so here we're looking at how much, if I train a model on one of these dataset, can I now transport it? Is it portable to a different dataset?

And then we can look at discriminatory features that kind of help us both validate, but also inform what could be important in identifying patients. And finally, how early can we detect endometriosis? I think we need to be careful about this as again, we're designing these algorithms. It's very easy to say a week before the patient was diagnosed, we have built an AI model that has A-U-P-R-C of something like 99%. So it's performing perfectly. Basically the problem is it's absolutely not useful to clinicians or patients by the time they got here. A lot has been documented and they're going into surgery or they're going to get the imaging. What's interesting is to see instead three years prior to their diagnosis, could they have been identified two and a half years, et cetera, et cetera. And so that's the kind of analysis we're doing. So final thoughts, yes, AI can help.

How do we get there? I think advocacy and awareness is critical because patients need to be aware that AI can help, but also it helps them boost their own experience. Team science, basic scientists, clinicians, statisticians, AI scientists, but also patients so that there is all these different expertise together. We probably need more science industry partnership. And I hate to say this, but funding. So I'll put my chair hat for a second. If you are, and this is not showing very well, but if you're interested in this kind of questions, that's exactly the type of work that my whole department does. We have a number of training opportunities for doctors and residents and fellows. And so feel free to check out our website and I'll stop here. Thank you.

<< Previous