Three days of DataLad-Workshop & Hackathon in Aachen

In everyday research, a lot of mostly heterogeneous data is generated, which is often processed and analyzed collaboratively. This involves complex workflows and ML pipelines consisting of numerous transformation and analysis steps. Coordinating these between the parties involved, keeping track of the versions used, and at the same time securing all the information necessary for later reproducibility and reusability is a task that requires specialized tools and standardized processes.

One such tool is DataLad – a free, distributed open-source data management system based on Git and git-annex. With DataLad, you can also version many very large files, organize data as datasets (with additional structure via sub-datasets), download content on demand to your work computer, document provenance, and enable collaborative work—via command line or GUI, across platforms.

All of this is important in order to make research data traceable, reusable, and structurable—without having to resort to expensive, proprietary software. For the humanities and cultural sciences, this means, for example, that image collections, 3D models, and measurement and metadata can be neatly versioned, shared, and maintained over the long term.

To learn more about DataLad, FDM experts from DKZ.2R, WiNoDa, and the NFDI consortia NFDI4ING and NFDI4Objects met at the IT Center of RWTH Aachen University from June 30 to July 2, 2025, to test data versioning with DataLad in a practical setting. The aim was to train multipliers who would then introduce DataLad to their teams – prior knowledge of Git was helpful but not a strict requirement. To ensure the workshop and subsequent hackathon were as application-oriented as possible, all participants brought prepared use cases and their own datasets with them.

A look into the still empty event room. On the wall, you can see the slide with the logos of the participating organizations.
The air of anticipation in the still empty venue. (c) Fabian Riebschläger

The workshop was led by Michael Hanke, Adina Wagner, Stephan Heunis, and Michał Szczepanik from the Institute of Neuroscience and Medicine, Brain and Behavior (INM-7) at Forschungszentrum Jülich and the Institute of Systems Neuroscience at the Faculty of Medicine at Heinrich Heine University Düsseldorf.

The program provided a compact introduction on the first day: first Git, then the central concepts of DataLad and its possible applications in research data management. There was also an introduction to Forgejo (-Aneksajo) – a kind of self-hostable GitHub that can be used to manage DataLad datasets. In addition, the participants presented their use cases. Day 2 started with a practical introduction to the use of Datalad and a deepening of the theory learned so far; the rest of the day was devoted to the hackathon – the teams worked on their own use cases. On the last day, the groups presented their results and discussed application scenarios.

Just as important as the technology was the exchange: lots of conversations during the breaks, joint problem solving – and in the evenings we sat down together for dinner. The catering on site was excellent and was sponsored by DKZ.2R.

In addition to the actual program, the tour of the IT Center’s AiXCAVE—including hands-on testing—was a special highlight. The AiXCAVE is a five-sided, immersive VR environment at the IT Center. It was installed in 2012 and is used for research, visualization, and interactive exploration. It was exciting to see, for example, how such a complex 3D simulation of Aachen Cathedral can be discussed collaboratively and in its original size.

Three people (plus one hidden) wearing VR glasses are standing in a cube made of screens. It looks as if they are floating under the ceiling of Aachen Cathedral.
Members of WiNoDa float in the AiXCAVE under the virtual roof of Aachen Cathedral. (c) Asta v. Schröder

A little theory, a lot of practice, and teamwork: the mix of introduction and hackathon showed how DataLad makes decentralized collaboration easier and more reproducible – from the initial folder structure to the joint publication of a dataset. Those who want to not only store data but also manage it sustainably will find a tool here with extensive documentation and an active and helpful community.

Datalad: https://www.datalad.org/ 
AixCAVE: https://www.itc.rwth-aachen.de/cms/it-center/forschung-projekte/forschungsschwerpunkte/virtuelle-realitaet/~fgqa/aixcave/?lidx=1

Unless otherwise stated, all content is published under cc-by 4.0. Suggested citation:
Riebschläger, Fabian. (2025). Three days of DataLad-Workshop & Hackathon in Aachen. WiNoDa Knowledge Lab. https://winoda.de/en/2025/09/25/three-days-of-datalad-workshop-hackathon-in-aachen/ (Accessed on October 1, 2025 at 09:42)
Scroll to Top