I worked at the Regenstrief Institute several decades ago and they were working on this very problem. It’s tough nut on many fronts, and we were working on just one: Standardizing lab and clinical result specifiers. If you get in an auto accident and the people treating you want to know if you’ve recently had XYZ test, the problem is that every hospital uses a different coding system for that test. WHO was trying to standardize the coding system for those tests. I left before the project was finished and, as far as I know, they are still grappling with it...and that’s just one facet of the problem.
It doesn’t take “the internet of things” (iDevices) to come up with a “Dewey Decimal System” for medical codes.
Just wait until you deal with :
a) legacy data
b) mistyped / miscoded forms (OCR is not a panacaea)
c) revisions (e.g. what used to be an atypical dose, medicine, or symptom characterized in the electronic question as “other’, now ‘mainstreamed’; or changes in classification or terminology)
d) deciphering abbreviations and incompatible hash tables / abbreviations between different sources
e) Different databases with different field sizes and consequent truncations
f) errors in things like weight-dependent dosage vs. body-surface-area dosages, where the person checks one but fills in the other
g) differential diagnosis where the doctors are stumped for six months or a year
Don’t be silly. I saw on tv how all these databases can magically talk to one another and the government can pull up all the information on anything, and anyone, at anytime.
Seriously, the problems are huge, but big data systems have the ability to cross link data that are not normalized. Its not always 100%, (no data is), but it is pretty amazing.
SGML has been in the same boat since the 1960s—it is really impossible to reliably define all the elements in any specialty, much less the world. Luckily, we can now extract/interpolate the results in a manner that (somewhat logically) allows for type errors and missing attributes.