Welcome to Mappathon 2018, a metadata mapping challenge at the GMDS Jahrestagung 2018.
Scientific challenges offer the evident comparison of different approaches and take place regularly in the field of medical image processing. This workshop aims to adjust the principle of a challenge to the research community in eHealth. Mappathon is a metadata mapping challenge that asks for methods to find corresponding data elements within similar datasets.
Mappathon is a metadata mapping challenge with the aim to find corresponding data elements within a set of (similar) medical records/ datasets and to correlate data elements among each other. For our challenge, datasets of routine documentation and clinical research are provided by the Portal of Medical Data Models. Training datasets will be curated and made available for download in different formats like FHIR questionnaires or CDISC ODM (Operational Data Model). Suitable mappings, manually determined by an expert committee as well as the evaluation matrix, according to FHIR ConceptMaps relations, are transparently published.
Participants are invited to download the training set, including datasets and eCRFs as well as the corresponding expert mapping of related data elements. This will allow to validate and optimize the algorithms and methods. Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and terminology servers.
During the training phase the organizers will enable an automated evaluation service for checking on results. During the workshop a set of test cases will be released of which participants will be asked to run their algorithm on and upload their mapping results.
To complete a successful participation, participants will need to submit a short abstract, describing the applied method. Each team will be asked to give a brief presentation detailing their approach within the workshop. The organizers will then evaluate each case and establish a ranking of the participating teams. All results will be presented during the workshop and will be discussed with invited experts and all workshop attendees.
Varying datasets and the related heterogeneity make it nearly impossible to compare different approaches in a fair way. By providing a high-quality dataset publicly as well as pre-defined evaluation rules, this challenge aims to overcome these limitations and to create a common framework for the comprehensible and adequate comparison of results.
To enable secondary use of medical routine data, it is necessary to create a general understanding of given information. As a common practice, this understanding is achieved through metadata and its interconnections. Metadata can be stored in so-called metadata repositories (MDR). The functionalities of such a MDR include pure storage, administration and other specific metadata functionalities like matching and mapping. Matching: The discovery of related or equivalent metadata; Mapping: The relationship between data elements such as conversion rules. These rules are difficult to determine and often require manual effort. Therefore, there is a great need for advanced data analysis techniques promoting the definition of matchings and mappings.
|June||Registration opens and distribution of training data|
|July||Distribution of the evaluation matrix and suitable mappings|
|31th August||Deadline for submitting test data results|
|September||Presentation of method and result on the workshop|
For registration please send a mail to email@example.com The mail should contain following information the name of the team and all team members with names and e-mail addresses.
Data distribution will be handled by the Medical Data Models Portal. There you will find explanations on how to download the data in different formats. Alternatively, we have prepared all forms as pdf, all training data records in CDISC ODM format and all corresponding ground truth for download (Password will be send after registration, all data without warranty).
Each team wishing to participate in the Mappathon challenge is required to:
Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and any terminology servers.
Mappathon requires the participating teams to submit an abstract, which will be reviewed by the organizers. The text will be uploaded to the challenge result web-page.
The Mappathon Challenge provides two tasks, which will be independently evaluated.
The Evaluation Classification of Mappathon is based on the FHIR v3.0.1 ConceptMap, which uses the ConceptMapEquivalence value set. The number of classes has been significatly reduced and the classes are being utilzed on data element but not on concept level. "Mappings are one way - from the source to the destination. In many cases, the reverse mappings are valid, but this cannot be assumed to be the case." [FHIR v3.0.1 ConceptMap]
By registering, each team agrees to use the provided data only in the scope of the workshop and neither pass it on to a third party nor use it for other publications. After the workshop took place, the data will be released under a research license.
The mapping results can be evaluated using the Online Mappathon Validator. Mappathon provides a REST interface for uploading mapping results under https://validate.mappathon.de and an additional client available on the ITCR GitHub. Example:
The Mappathon Validator calculates the Zero-one Classification Loss and the Mappathon Score according to the evaluation matrix.
All results must be provided in the same format. The appropriate format is shown in the examples:
Following the given example, the single data element will be identified by the CDISC ODM OIDs "StudyOID underscore ItemGroupOID underscore ItemOID":
(Abfahrt Unfallstelle)-[:equivalent]->(Beginn Transport) (Study/@OID_ItemGroupDef/@OID_ItemDef/@OID)-[:equivalent]->(Study/@OID_ItemGroupDef/@OID_ItemDef/@OID) (S.0012_IG.12_I.16)-[:equivalent]->(S.0143_IG.67_I.13)
Example without errors
Example with errors
All metadata sets will be in german language and please notice, that there will be no instance data provided. All traing datasets will be at least partly annotated with UMLS CUIs, there will be no coding at the test dataset. The datasets are available in all kinds of formats (CDISC ODM, FHIR, ADL...).
There will be different use cases with a varying number of training datasets. A lot of effort is done to built the use cases out of one Routine Clinical Dataset (RCD) and several Research Datasets (RD). Ground truth will be available after registration. Source and target each reflect to a form. However, in the MDM portal these forms can be divided into different ODM files. The StudyIDs (e.g. S.0021- 2.0026 or S.0010) uniquely identifies the forms. A collection of all training data records and corresponding ground truth is available for download here.
At the 63rd GMDS Annual Meeting in Osnabrück, the working group "Use of electronic patient files for clinical research" organized the first Mappathon. After registration in June, six teams worked out algorithms and solutions on the basis of training data provided. The teams were able to submit their results online for review on an ongoing basis. The validation was carried out on the basis of elaborately developed hand mappings in cooperation with clinical partners.
Finally, the participants were faced with the task of applying the algorithms and solutions developed during this preparatory phase to the test data sets released during the course of the conference. In a workshop, the teams finally had the opportunity to present and discuss their respective results and in particular their methods. On the basis of an evaluation matrix, which was also published at the beginning, the AG management selected the team whose results most closely corresponded to the Gold Standard. In addition, the workshop participants had the opportunity to vote on which of the shown solutions they considered to be the most innovative and original.
Thus, at the closing event of this year's GMDS conference, a total of two prizes of 125 € each could be awarded. The prize for the best mapping went to the team "MDRcupid" consisting of Noemi Deppenwiese and Hannes Ulrich from the University of Lübeck and the prize for the most innovative solution to the team "Marvelous Mappers" consisting of Michael Storck, Philipp Neuhaus and Stefan Hegselmann from the University of Münster. We take this opportunity to congratulate both teams once again. Above all, however, we would like to thank all those who took part - be it as a team during the preparation and final phase, or as participants of the workshop.
We are planning another Mappathon in 2019 and are already looking forward to the interesting solutions and exciting discussions.
The ranking as presented in September 2018 at the challenge event.