Archive for June 2009

Metadata and BIO eDiscovery

Metadata i essentially “data about other data”, of any sort in any media. Metadata within what we call BIO eDiscovery must be cautiously analyzed. It must be carefully looked at simply because of the complexity of medical terminology, codes and at times–even slang.

An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.

In data processing, metadata provides information about, or documentation of, other data managed within an application or environment. This commonly defines the structure or schema of the primary data. The term should be used with caution as all data is about something, and is therefore “metadata” in a sense, and vice versa.

For example, metadata would document data about data elements or attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.). Metadata may include descriptive information about the context, quality and condition, or characteristics of the data. It may be recorded with high or low granularity.

Lets look at some possible examples:

Examples of metadata regarding a book would be the title, author, date of publication, subject, a unique identifier (such an International Standard Book Number (ISBN)), its dimensions, number of pages, and the language of the text. We can be referring to electronic data, or not, it depends on the scenario.

Metadata for a photograph or an X-Ray would typically include the date and time at which it was taken and details of the camera settings (such as focal length, aperture, exposure). Many digital cameras record metadata in exchangeable image file format (EXIF). If this is the case, what type of image file is it? Also–what type of digital camera is it—and where is everything housed?

Audio recordings may also be labelled with metadata. When audio formats moved from analogue to digital, it became possible to embed this metadata within the digital content itself. Look at some applications such as Sound Forge to analyze this in a more granular fashion.

Metadata can be used to name, describe, catalogue and indicate ownership or copyright for a digital audio file, and its presence makes it much easier to locate a specific audio file within a group - through use of a search engine that accesses the metadata. As different digital audio formats were developed, it was agreed that a standardized and specific location would be set aside within the digital files where this information could be stored.

As a result, almost all digital audio formats, including mp3, broadcast wav and AIFF files, have similar standardized locations that can be populated with metadata. This “information about information” has become one of the great advantages of working with digital audio files - since the catalogue and descriptive information that makes up the metadata is built right into the audio file itself, ready for easy access and use.

In regards to webpages, the HTML used to mark-up web pages allows for the inclusion of a variety of types of meta data, from simple descriptive text, dates and keywords to highly-granular information such as the Dublin Core and e-GMS standards.

Pages can be geotagged with coordinates. Metadata may be included in the page’s header or in a separate file. Microformats allow on-page data to be marked up as meta data. Even the Hypertext Transfer Protocol used to link web pages also includes metadata.

What does all of this mean? Essentially, try to think about the hundreds of thousands of minute medical details that lawyers and analysts might miss, but a trained medical professional would not. This is why BIO eDiscovery is so important, because medical professionals use in essence a different medical language than other people. Hence, the metadata is even more difficult to distinguish at times.

Christopher Bressi

chris@SocietyofCollegeMedicine.com

|