Welcome to Mappathon 2018, a metadata mapping challenge at the GMDS Jahrestagung 2018.
Scientific challenges offer the evident comparison of different approaches and take place regularly in the field of medical image processing. This workshop aims to adjust the principle of a challenge to the research community in eHealth. Mappathon is a metadata mapping challenge that asks for methods to find corresponding data elements within similar datasets.
Mappathon is a metadata mapping challenge with the aim to find corresponding data elements within a set of (similar) medical records/ datasets and to correlate data elements among each other. For our challenge, datasets of routine documentation and clinical research are provided by the Portal of Medical Data Models. Training datasets will be curated and made available for download in different formats like FHIR questionnaires or CDISC ODM (Operational Data Model). Suitable mappings, manually determined by an expert committee as well as the evaluation matrix, according to FHIR ConceptMaps relations, are transparently published.
Participants are invited to download the training set, including datasets and eCRFs as well as the corresponding expert mapping of related data elements. This will allow to validate and optimize the algorithms and methods. Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and terminology servers.
During the training phase the organizers will enable an automated evaluation service for checking on results. During the workshop a set of test cases will be released of which participants will be asked to run their algorithm on and upload their mapping results.
To complete a successful participation, participants will need to submit a short abstract, describing the applied method. Each team will be asked to give a brief presentation detailing their approach within the workshop. The organizers will then evaluate each case and establish a ranking of the participating teams. All results will be presented during the workshop and will be discussed with invited experts and all workshop attendees.
Varying datasets and the related heterogeneity make it nearly impossible to compare different approaches in a fair way. By providing a high-quality dataset publicly as well as pre-defined evaluation rules, this challenge aims to overcome these limitations and to create a common framework for the comprehensible and adequate comparison of results.
To enable secondary use of medical routine data, it is necessary to create a general understanding of given information. As a common practice, this understanding is achieved through metadata and its interconnections. Metadata can be stored in so-called metadata repositories (MDR). The functionalities of such a MDR include pure storage, administration and other specific metadata functionalities like matching and mapping. Matching: The discovery of related or equivalent metadata; Mapping: The relationship between data elements such as conversion rules. These rules are difficult to determine and often require manual effort. Therefore, there is a great need for advanced data analysis techniques promoting the definition of matchings and mappings.
|June||Registration opens and distribution of training data|
|July||Distribution of the evaluation matrix and suitable mappings|
|31th August||Deadline for submitting test data results|
|September||Presentation of method and result on the workshop|
For registration please send a mail to firstname.lastname@example.org The mail should contain following information the name of the team and all team members with names and e-mail addresses.
Data distribution will be handled by the Medical Data Models Portal. There you will find explanations on how to download the data in different formats. Alternatively, we have prepared all forms as pdf, all training data records in CDISC ODM format and all corresponding ground truth for download (Password will be send after registration, all data without warranty).
Each team wishing to participate in the Mappathon challenge is required to:
Any automatic method that predicts the valid mapping is of great interest. There is no restriction on new, innovative or unpublished methods and no limitations on including external information like terminologies. Participants are invited to use coding systems and any terminology servers.
Mappathon requires the participating teams to submit an abstract, which will be reviewed by the organizers. The text will be uploaded to the challenge result web-page.
The Mappathon Challenge provides two tasks, which will be independently evaluated.
The Evaluation Classification of Mappathon is based on the FHIR v3.0.1 ConceptMap, which uses the ConceptMapEquivalence value set. The number of classes has been significatly reduced and the classes are being utilzed on data element but not on concept level. "Mappings are one way - from the source to the destination. In many cases, the reverse mappings are valid, but this cannot be assumed to be the case." [FHIR v3.0.1 ConceptMap]
By registering, each team agrees to use the provided data only in the scope of the workshop and neither pass it on to a third party nor use it for other publications. After the workshop took place, the data will be released under a research license.
The mapping results can be evaluated using the Online Mappathon Validator. Mappathon provides a REST interface for uploading mapping results under https://validate.mappathon.de and an additional client available on the ITCR GitHub. Example:
The Mappathon Validator calculates the Zero-one Classification Loss and the Mappathon Score according to the evaluation matrix.
All results must be provided in the same format. The appropriate format is shown in the examples:
Following the given example, the single data element will be identified by the CDISC ODM OIDs "StudyOID underscore ItemGroupOID underscore ItemOID":
(Abfahrt Unfallstelle)-[:equivalent]->(Beginn Transport) (Study/@OID_ItemGroupDef/@OID_ItemDef/@OID)-[:equivalent]->(Study/@OID_ItemGroupDef/@OID_ItemDef/@OID) (S.0012_IG.12_I.16)-[:equivalent]->(S.0143_IG.67_I.13)
Example without errors
Example with errors
All metadata sets will be in german language and please notice, that there will be no instance data provided. All traing datasets will be at least partly annotated with UMLS CUIs, there will be no coding at the test dataset. The datasets are available in all kinds of formats (CDISC ODM, FHIR, ADL...).
There will be different use cases with a varying number of training datasets. A lot of effort is done to built the use cases out of one Routine Clinical Dataset (RCD) and several Research Datasets (RD). Ground truth will be available after registration. Source and target each reflect to a form. However, in the MDM portal these forms can be divided into different ODM files. The StudyIDs (e.g. S.0021- 2.0026 or S.0010) uniquely identifies the forms. A collection of all training data records and corresponding ground truth is available for download here.
The ranking will be presented in September 2018 at the challenge event.