A Course On Reproducible research, Data pipelines and Scientific computing (CORDS)


The ongoing digital transformation and the push towards open science sets new challenges when it comes to data processing and reproducibility. Virtually any project in the natural sciences is based on primary data that are then processed with numerical, statistical, and/or heuristic models before yielding results that can actually be interpreted. Keeping track of this data pipeline is crucial to ensure reproducibility. The continuous increase in complexity of data-sets and models pushes manual handling of the data pipeline to its limits. Our project aims at addressing these issues by providing a course for WSL on the usage of modern software engineering tools and high-performance computing to enable the automation of the data pipeline and ultimately facilitating reproducible and open science. The proposed course, targeted at developing a fully automated data and simulation pipeline will also include high-performance physics-based models and spatial information processing.