A Stanford University-led team of scientists has developed a machine learning tool that can analyse electronic healthcare records (EHR) to identify individuals who are likely to have familial hypercholesterolemia (FH), an underdiagnosed genetic cause of elevated low-density lipoprotein (LDL) cholesterol, which puts patients at a 20-fold increased risk of coronary artery disease. In separate test runs the classifier, described today in npj Digital Medicine, correctly identified more than 80% of cases—its positive predictive value (PPV)—and demonstrated 99% specificity.

The team says the classifier could help to flag up patients who are most likely to have FH, so that they and their families can undergo further genetic testing. “Theoretically, when someone comes into the clinic with high cholesterol or heart disease, we would run this algorithm,” said Nigam Shah, MBBS, PhD, Stanford University associate professor of medicine and biomedical data science. “If they’re flagged, it means there’s an 80% chance that they have FH. Those few individuals could then get sequenced to confirm the diagnosis and could start an LDL-lowering treatment right away.” Shah and colleagues report on development and evaluation of the classifier in a paper titled, “Finding missing cases of familial hypercholesterolemia in health systems using machine learning.”

Familial hypercholesterolemia is an autosomal dominant condition that is estimated to affect about 1 in 250 people, making it “among the most common morbid monogenic disorders,” the authors explained. People with FH carry a mutation that hinders their bodies’ ability to clear harmful LDL cholesterol that collects in and clogs arteries. The genetic mutation effectively leads to lifelong raised levels of LDL cholesterol, and without intervention about 50% of men with FH will have a heart attack by age 50 years, and 30% of women will suffer a heart attack by age 60.

Data expert Nigam Shah, MBBS, PhD, says the algorithm learned how to identify information in electronic medical records that would flag patients at risk for FH. [Steve Fisch]
Although the risks of FH-related atherosclerotic cardiovascular disease can be reduced significantly by starting lipid-lowering treatment early, it is estimated that fewer than 1 in 10 people with FH are diagnosed in the United States. “We think that less than 10% of individuals with FH in the United States actually know that they have it,” commented Joshua Knowles, MD, PhD, assistant professor of cardiovascular medicine at Stanford.

FH runs strongly in families, so identifying one individual with the condition means that relatives can also be screened, which “has been shown to be highly cost-effective in reducing excess morbidity in family members,” the researchers added. “So screening family members of FH patients is really important, just like it would be with breast cancer or any other genetically linked illness,” Shah said.

Unfortunately, hospitals don’t have the means to sequence patients on a large scale, Shah pointed out. “The problem is, the chance that someone seen in the cardiology clinic has this genetic condition is somewhere around 1 in 90, or 1 in 100, so it doesn’t make sense to sequence every single person.

As part of the FH Foundation′s FIND (Flag, Identify, Network, Deliver) FH initiative, the Stanford University-led team developed and validated a supervised machine-learning algorithm to identify probable FH cases. Using data from Stanford’s FH clinic the team trained the algorithm to asses patient data including their family history, current prescriptions, lipid levels, and laboratory test results, to understand those factors that may indicate FH.

Shah compared this process to how a spam filter might be trained to recognize junk emails. Spam filters don’t just apply rules that are applied by programmers based on, say, which words to look for within the email. Rather, they learn what to flag as suspicious by evaluating actual emails. Similarly, the FH algorithm learns by looking at the EHR records of real patients.

The team first tested their trained algorithm using EHR records from 197 FH patients and another 6590 patients without FH, which were held within the Stanford Health Care system. From those patients flagged by the algorithm, the team reviewed 100 patient charts, extrapolating that the algorithm had detected patients who had FH with 88% accuracy. “In the end, you get a ranking that shows who is most likely to have the disease,” said Shah. “Those who rank at the top have the highest likelihood and, as you move toward the bottom, the likelihood tapers off.”

They then carried out a second, external validation using the EHR of another 466 FH patients and 5000 matched noncases within the Geiseinger Healthcare System. “The predictions came back with 85% accuracy, and we knew that many of the Geisinger patients had a confirmed FH diagnosis with genetic sequencing,” Shah said. “So that’s how we convinced ourselves that yes, this indeed works.”

The team’s calculations indicated that the classifier would be far more cost-effective than universal genetic testing. “… compared to the implementation of universal genetic testing or clinical criteria-based screening, the economics of EHR-based detection of FH through machine-learning are extremely favorable and can massively improve the ability of a health system to find patients at risk,” they wrote. “We believe the use of supervised learning to build a classifier that finds undiagnosed cases of FH is a compelling example of machine learning that matters … Applied broadly, using our classifier to screen using EHRs could identify many thousands of the undiagnosed patients with FH and lead to more effective therapy and screening of families.”

Cardiologist Joshua Knowles, MD, PhD, helped develop an algorithm that can predict a patient’s risk of a potentially fatal heart disease known as familial hypercholesterolemia. [Norbert von der Groeben]
The researchers acknowledged that while the software could help to improve FH diagnosis, it wouldn’t identify every case. “Not everything can be solved by an algorithm,” Shah commented. They also pointed out that their work did have certain limitations, and anticipate that, as with the development of most machine learning approaches, including more training data may have further improved the classifier. “We anticipate continuously refining our classifier as newly diagnosed cases accrue,” the team stated.

As a next step, the researchers are working to set up the algorithm in clinical settings at Stanford Healthcare and at other sites, in partnership with the FH Foundation. “We’re also thinking about how we can work with the FH Foundation to implement networks of family screening to reach more patients who might have the disease and not know it,” Shah noted.

This site uses Akismet to reduce spam. Learn how your comment data is processed.