Mappathon 2018
Metadata mapping challenge

Important Notices

  1. Testset for the Mappathon NOW available.
  2. Up to now six teams have registered for the Mappathon.
  3. Mappathon Workshop will take place on Tuesday, 4th September from 8:30 to 10 a.m.
  4. Mappathon Validator is now available!
  5. Mapping ground truth is available after registration!
  6. Registration is now open! register@mappathon.de
  7. Training data is available
  8. Mappathon 2018 is associated with the GMDS Jahrestagung in Osnabrück, taking place from 2nd September until 4th September 2018

Overview

Welcome to Mappathon 2018, a metadata mapping challenge at the GMDS Jahrestagung 2018.

Scientific challenges offer the evident comparison of different approaches and take place regularly in the field of medical image processing. This workshop aims to adjust the principle of a challenge to the research community in eHealth. Mappathon is a metadata mapping challenge that asks for methods to find corresponding data elements within similar datasets.

Mappathon is a metadata mapping challenge with the aim to find corresponding data elements within a set of (similar) medical records/ datasets and to correlate data elements among each other. For our challenge, datasets of routine documentation and clinical research are provided by the Portal of Medical Data Models. Training datasets will be curated and made available for download in different formats like FHIR questionnaires or CDISC ODM (Operational Data Model). Suitable mappings, manually determined by an expert committee as well as the evaluation matrix, according to FHIR ConceptMaps relations, are transparently published.

How it works

Participants are invited to download the training set, including datasets and eCRFs as well as the corresponding expert mapping of related data elements. This will allow to validate and optimize the algorithms and methods. Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and terminology servers.

During the training phase the organizers will enable an automated evaluation service for checking on results. During the workshop a set of test cases will be released of which participants will be asked to run their algorithm on and upload their mapping results.

To complete a successful participation, participants will need to submit a short abstract, describing the applied method. Each team will be asked to give a brief presentation detailing their approach within the workshop. The organizers will then evaluate each case and establish a ranking of the participating teams. All results will be presented during the workshop and will be discussed with invited experts and all workshop attendees.

Why participating in a challenge?

Varying datasets and the related heterogeneity make it nearly impossible to compare different approaches in a fair way. By providing a high-quality dataset publicly as well as pre-defined evaluation rules, this challenge aims to overcome these limitations and to create a common framework for the comprehensible and adequate comparison of results.

Clinical incentive

To enable secondary use of medical routine data, it is necessary to create a general understanding of given information. As a common practice, this understanding is achieved through metadata and its interconnections. Metadata can be stored in so-called metadata repositories (MDR). The functionalities of such a MDR include pure storage, administration and other specific metadata functionalities like matching and mapping. Matching: The discovery of related or equivalent metadata; Mapping: The relationship between data elements such as conversion rules. These rules are difficult to determine and often require manual effort. Therefore, there is a great need for advanced data analysis techniques promoting the definition of matchings and mappings.

Dates

June Registration opens and distribution of training data
July Distribution of the evaluation matrix and suitable mappings
31th August Deadline for submitting test data results
September Presentation of method and result on the workshop

Participate

For registration please send a mail to register@mappathon.de The mail should contain following information the name of the team and all team members with names and e-mail addresses.

Participate in Mappathon

Data distribution will be handled by the Medical Data Models Portal. There you will find explanations on how to download the data in different formats. Alternatively, we have prepared all forms as pdf, all training data records in CDISC ODM format and all corresponding ground truth for download (Password will be send after registration, all data without warranty).

Download all Mappathon data status 29 august 2018
Download all Mappathon test data status 03 september 2018

 

Rules

What is required to participate?

Each team wishing to participate in the Mappathon challenge is required to:

  1. Register at the data distribution and evaluation platform.
  2. Submit an abstract describing their method and their results on the training dataset.
  3. Upload the test data results before the deadline.
  4. Register to the associated GMDS Jahrestagung 2018.
  5. Be present at the workshop with at least one team member.

Which methods are called for?

Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and any terminology servers.

Abstract

Mappathon requires the participating teams to submit an abstract, which will be reviewed by the organizers. The text will be uploaded to the challenge result web-page.

Requirements


Review


Submission

MAPPING

MAPPING

The Mappathon Challenge provides two tasks, which will be independently evaluated.

  1. The first task is a multilabel classification, it is the problem of classifying source data elements to one or more target data elements.
  2. The second task is a multiclass classification, it is the problem of classifying source data elements in a specific relation to (multiple) target data elements.

Equivalenz Classes

The Evaluation Classification of Mappathon is based on the FHIR v3.0.1 ConceptMap, which uses the ConceptMapEquivalence value set. The number of classes has been significatly reduced and the classes are being utilzed on data element but not on concept level. "Mappings are one way - from the source to the destination. In many cases, the reverse mappings are valid, but this cannot be assumed to be the case." [FHIR v3.0.1 ConceptMap]


PRIVACY AND DATA COPYRIGHT

By registering, each team agrees to use the provided data only in the scope of the workshop and neither pass it on to a third party nor use it for other publications. After the workshop took place, the data will be released under a research license.

Validator

The mapping results can be evaluated using the Online Mappathon Validator. Mappathon provides a REST interface for uploading mapping results under https://validate.mappathon.de and an additional client available on the ITCR GitHub. Example:

The Mappathon Validator calculates the Zero-one Classification Loss and the Mappathon Score according to the evaluation matrix.

Mappathon Score- Evaluation Matrix

We have established a look up table (E) to evaluate the equivalence classes. An example: If a mapping is defined as "wider", but is actually defined according to the gold standard as "equivalent", there is a Mappathon Score of 0.6.

Zero-one Classification Loss

FORMAT

All results must be provided in the same format. The appropriate format is shown in the examples:

Following the given example, the single data element will be identified by the CDISC ODM OIDs "StudyOID underscore ItemGroupOID underscore ItemOID":

(Abfahrt Unfallstelle)-[:equivalent]->(Beginn Transport) (Study/@OID_ItemGroupDef/@OID_ItemDef/@OID)-[:equivalent]->(Study/@OID_ItemGroupDef/@OID_ItemDef/@OID) (S.0012_IG.12_I.16)-[:equivalent]->(S.0143_IG.67_I.13)

Example

Example without errors

Example with errors

Data

All metadata sets will be in german language and please notice, that there will be no instance data provided. All traing datasets will be at least partly annotated with UMLS CUIs, there will be no coding at the test dataset. The datasets are available in all kinds of formats (CDISC ODM, FHIR, ADL...).

There will be different use cases with a varying number of training datasets. A lot of effort is done to built the use cases out of one Routine Clinical Dataset (RCD) and several Research Datasets (RD). Ground truth will be available after registration. Source and target each reflect to a form. However, in the MDM portal these forms can be divided into different ODM files. The StudyIDs (e.g. S.0021- 2.0026 or S.0010) uniquely identifies the forms. A collection of all training data records and corresponding ground truth is available for download here.

Traing data set: Emergency Admission

Test data set: Emergency Protocol

data form provider

Results

On-site results

The ranking will be presented in September 2018 at the challenge event.

Pre-event evaluation (comming soon)

Organizers

This challenge is organized by

Ann-Kristin Kock-Schoppenhauer, IT Center for Clinical Research and Institute of Medical Informatics, Universität zu Lübeck, Germany

Dr. Philipp Bruland, Institute of Medical Informatics, University of Münster, Germany

Dennis Kadioglu, Medical Informatics Group, Universitätsklinikum Frankfurt, Germany

Contributors

Hannes Ulrich, IT Center for Clinical Research and Institute of Medical Informatics, Universität zu Lübeck, Germany

Dominik Brammen, Universitätsklinikum Magdeburg Ä.o.R.

Kerstin Kulbe, Institute of Medical Informatics, Universität zu Lübeck, Germany

Impressum

Institut für Medizinische Informatik
Ratzeburger Allee 160
23538 Lübeck
Tel.: +49 451 3101-5601
Fax: +49 451 3101-5604

References for the images and graphics used

Graph representing the metadata of thousands of archive documents, documenting the social network of hundreds of League of Nations personals. Grandjean, Martin (2014)."La connaissance est un réseau" . Les Cahiers du Numérique 10 (3): 37-54. DOI:10.3166/LCN.10.3.37-54.