UMLS as a lens for extracting health data value

Mark Tuttle, FACMI • February 20, 2018 • 2 minute read

We live in a world where things thought to be impossible, or at least extremely difficult, just a decade ago are now routine. Computers can translate one language into another, understand the spoken word, recognize our faces in a picture, and almost drive our car autonomously. And yet computers can’t do much with our health data that improves care.

So, where is the disconnect?

First, if you’re reading this blog you know that our health information is distributed in different computer systems, potentially a dozen or more. But once that challenge is overcome both caregivers and patients are left with a growing collection of data that are hard for any human to comprehend usefully. And, as it turns out, computers can’t do much useful with it either, yet.

Second, the reason that computers are not empowered to help us with our healthcare is that the descriptions of us and of our care are, in technical parlance, not “comparable”, generally. If a computer can’t tell if two entries in our accumulated records name the same test result, or the same medication, or, better, whether the entries are related to one another in some important way - are two different medications from the same class, for instance - there’s not much a computer can do. A critical part of empowering computers to make use of our health data is a solution to “the naming problem”. Put differently, again in technical parlance, computers need to “interoperate by meaning”, at least where our healthcare data is concerned. An enabling resource for such interoperation is called the UMLS, short for the Unified Medical Language System. A major component of the UMLS is the Metathesaurus, a large corpus of the authoritative vocabularies used in healthcare and biomedical research. It was developed and is now maintained by a Federal Agency - The National Library of Medicine. The critical feature of this corpus is its organization by meaning - simply put, it represents synonymy and potentially useful relatedness among the many millions of names and codes that might appear in your medical records.

Third, a little appreciated fact is that the names for important things in our records are a moving target. For example, new lab tests and new medications appear daily, and the names of diseases continue to evolve in the face of new knowledge. Staying on top of this is a huge challenge so the Metathesaurus is released multiple times each year. Maintaining currency and releasing the Metathesaurus in a uniform representation is exactly the kind of thing that computers should do, and this is being done by the UMLS.




About the Author
Mark Samuel Tuttle, FACMI has more than 40 years of experience as a healthcare interoperation consultant and senior data scientist. He was the lead extra-mural architect of the National Library of Medicine Unified Medical Language System Metathesaurus, and worked with several Federal Agencies on the creation and use of standard vocabularies for healthcare and biomedical research. He co-founded Apelon where he remains on the Board of Directors. Before that he taught computer science at UC Berkeley and UC San Francisco. He has authored or co-authored over 100 publications.