Mappathon 2018
Metadata
mapping challenge

Important Notices

  1. Results, Abstracts and Organizers comment online.
  2. Testset for the Mappathon NOW available.
  3. Up to now six teams have registered for the Mappathon.
  4. Mappathon Workshop will take place on Tuesday, 4th September from 8:30 to 10 a.m.
  5. Mappathon Validator is now available!
  6. Mapping ground truth is available after registration!
  7. Registration is now open! register@mappathon.de
  8. Training data is available
  9. Mappathon 2018 is associated with the GMDS Jahrestagung in Osnabrück, taking place from 2nd September until 4th September 2018

Overview

Welcome to Mappathon 2018, a metadata mapping challenge at the GMDS Jahrestagung 2018.

Scientific challenges offer the evident comparison of different approaches and take place regularly in the field of medical image processing. This workshop aims to adjust the principle of a challenge to the research community in eHealth. Mappathon is a metadata mapping challenge that asks for methods to find corresponding data elements within similar datasets.

Mappathon is a metadata mapping challenge with the aim to find corresponding data elements within a set of (similar) medical records/ datasets and to correlate data elements among each other. For our challenge, datasets of routine documentation and clinical research are provided by the Portal of Medical Data Models. Training datasets will be curated and made available for download in different formats like FHIR questionnaires or CDISC ODM (Operational Data Model). Suitable mappings, manually determined by an expert committee as well as the evaluation matrix, according to FHIR ConceptMaps relations, are transparently published.

How it works

Participants are invited to download the training set, including datasets and eCRFs as well as the corresponding expert mapping of related data elements. This will allow to validate and optimize the algorithms and methods. Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and terminology servers.

During the training phase the organizers will enable an automated evaluation service for checking on results. During the workshop a set of test cases will be released of which participants will be asked to run their algorithm on and upload their mapping results.

To complete a successful participation, participants will need to submit a short abstract, describing the applied method. Each team will be asked to give a brief presentation detailing their approach within the workshop. The organizers will then evaluate each case and establish a ranking of the participating teams. All results will be presented during the workshop and will be discussed with invited experts and all workshop attendees.

Why participating in a challenge?

Varying datasets and the related heterogeneity make it nearly impossible to compare different approaches in a fair way. By providing a high-quality dataset publicly as well as pre-defined evaluation rules, this challenge aims to overcome these limitations and to create a common framework for the comprehensible and adequate comparison of results.

Clinical incentive

To enable secondary use of medical routine data, it is necessary to create a general understanding of given information. As a common practice, this understanding is achieved through metadata and its interconnections. Metadata can be stored in so-called metadata repositories (MDR). The functionalities of such a MDR include pure storage, administration and other specific metadata functionalities like matching and mapping. Matching: The discovery of related or equivalent metadata; Mapping: The relationship between data elements such as conversion rules. These rules are difficult to determine and often require manual effort. Therefore, there is a great need for advanced data analysis techniques promoting the definition of matchings and mappings.

Dates

June Registration opens and distribution of training data
July Distribution of the evaluation matrix and suitable mappings
31th August Deadline for submitting test data results
September Presentation of method and result on the workshop

Participate

For registration please send a mail to register@mappathon.de The mail should contain following information the name of the team and all team members with names and e-mail addresses.

Participate in Mappathon

Data distribution will be handled by the Medical Data Models Portal. There you will find explanations on how to download the data in different formats. Alternatively, we have prepared all forms as pdf, all training data records in CDISC ODM format and all corresponding ground truth for download (Password will be send after registration, all data without warranty).

Download all Mappathon data status 29 august 2018
Download all Mappathon test data status 03 september 2018

 

Rules

What is required to participate?

Each team wishing to participate in the Mappathon challenge is required to:

  1. Register at the data distribution and evaluation platform.
  2. Submit an abstract describing their method and their results on the training dataset.
  3. Upload the test data results before the deadline.
  4. Register to the associated GMDS Jahrestagung 2018.
  5. Be present at the workshop with at least one team member.

Which methods are called for?

Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and any terminology servers.

Abstract

Mappathon requires the participating teams to submit an abstract, which will be reviewed by the organizers. The text will be uploaded to the challenge result web-page.

Requirements


Review


Submission

MAPPING

MAPPING

The Mappathon Challenge provides two tasks, which will be independently evaluated.

  1. The first task is a multilabel classification, it is the problem of classifying source data elements to one or more target data elements.
  2. The second task is a multiclass classification, it is the problem of classifying source data elements in a specific relation to (multiple) target data elements.

Equivalenz Classes

The Evaluation Classification of Mappathon is based on the FHIR v3.0.1 ConceptMap, which uses the ConceptMapEquivalence value set. The number of classes has been significatly reduced and the classes are being utilzed on data element but not on concept level. "Mappings are one way - from the source to the destination. In many cases, the reverse mappings are valid, but this cannot be assumed to be the case." [FHIR v3.0.1 ConceptMap]


PRIVACY AND DATA COPYRIGHT

By registering, each team agrees to use the provided data only in the scope of the workshop and neither pass it on to a third party nor use it for other publications. After the workshop took place, the data will be released under a research license.

Validator

The mapping results can be evaluated using the Online Mappathon Validator. Mappathon provides a REST interface for uploading mapping results under https://validate.mappathon.de and an additional client available on the ITCR GitHub. Example:

The Mappathon Validator calculates the Zero-one Classification Loss and the Mappathon Score according to the evaluation matrix.

Mappathon Score- Evaluation Matrix

We have established a look up table (E) to evaluate the equivalence classes. An example: If a mapping is defined as "wider", but is actually defined according to the gold standard as "equivalent", there is a Mappathon Score of 0.6.

Zero-one Classification Loss

FORMAT

All results must be provided in the same format. The appropriate format is shown in the examples:

Following the given example, the single data element will be identified by the CDISC ODM OIDs "StudyOID underscore ItemGroupOID underscore ItemOID":

(Abfahrt Unfallstelle)-[:equivalent]->(Beginn Transport) (Study/@OID_ItemGroupDef/@OID_ItemDef/@OID)-[:equivalent]->(Study/@OID_ItemGroupDef/@OID_ItemDef/@OID) (S.0012_IG.12_I.16)-[:equivalent]->(S.0143_IG.67_I.13)

Example

Example without errors

Example with errors

Data

All metadata sets will be in german language and please notice, that there will be no instance data provided. All traing datasets will be at least partly annotated with UMLS CUIs, there will be no coding at the test dataset. The datasets are available in all kinds of formats (CDISC ODM, FHIR, ADL...).

There will be different use cases with a varying number of training datasets. A lot of effort is done to built the use cases out of one Routine Clinical Dataset (RCD) and several Research Datasets (RD). Ground truth will be available after registration. Source and target each reflect to a form. However, in the MDM portal these forms can be divided into different ODM files. The StudyIDs (e.g. S.0021- 2.0026 or S.0010) uniquely identifies the forms. A collection of all training data records and corresponding ground truth is available for download here.

Traing data set: Emergency Admission

Test data set: Emergency Protocol

data form provider

Results

At the 63rd GMDS Annual Meeting in Osnabrück, the working group "Use of electronic patient files for clinical research" organized the first Mappathon. After registration in June, six teams worked out algorithms and solutions on the basis of training data provided. The teams were able to submit their results online for review on an ongoing basis. The validation was carried out on the basis of elaborately developed hand mappings in cooperation with clinical partners.

Finally, the participants were faced with the task of applying the algorithms and solutions developed during this preparatory phase to the test data sets released during the course of the conference. In a workshop, the teams finally had the opportunity to present and discuss their respective results and in particular their methods. On the basis of an evaluation matrix, which was also published at the beginning, the AG management selected the team whose results most closely corresponded to the Gold Standard. In addition, the workshop participants had the opportunity to vote on which of the shown solutions they considered to be the most innovative and original.

Thus, at the closing event of this year's GMDS conference, a total of two prizes of 125 € each could be awarded. The prize for the best mapping went to the team "MDRcupid" consisting of Noemi Deppenwiese and Hannes Ulrich from the University of Lübeck and the prize for the most innovative solution to the team "Marvelous Mappers" consisting of Michael Storck, Philipp Neuhaus and Stefan Hegselmann from the University of Münster. We take this opportunity to congratulate both teams once again. Above all, however, we would like to thank all those who took part - be it as a team during the preparation and final phase, or as participants of the workshop.

We are planning another Mappathon in 2019 and are already looking forward to the interesting solutions and exciting discussions.

On-site results

The ranking as presented in September 2018 at the challenge event.

Test Datasets 1-3

Organizers

This challenge is organized by

Ann-Kristin Kock-Schoppenhauer, IT Center for Clinical Research and Institute of Medical Informatics, Universität zu Lübeck, Germany

Dr. Philipp Bruland, Institute of Medical Informatics, University of Münster, Germany

Dennis Kadioglu, Medical Informatics Group, Universitätsklinikum Frankfurt, Germany

Contributors

Hannes Ulrich, IT Center for Clinical Research and Institute of Medical Informatics, Universität zu Lübeck, Germany

Dominik Brammen, Universitätsklinikum Magdeburg Ä.o.R.

Kerstin Kulbe, Institute of Medical Informatics, Universität zu Lübeck, Germany

Impressum

Institut für Medizinische Informatik
Ratzeburger Allee 160
23538 Lübeck
Tel.: +49 451 3101-5601
Fax: +49 451 3101-5604

References for the images and graphics used

Graph representing the metadata of thousands of archive documents, documenting the social network of hundreds of League of Nations personals. Grandjean, Martin (2014)."La connaissance est un réseau" . Les Cahiers du Numérique 10 (3): 37-54. DOI:10.3166/LCN.10.3.37-54.