Skip to main content

Research Cluster

As part of the DiTraRe work programme, four Research Clusters are based on a scientific use case that raises concrete questions.

 

White cube with individual cube particles that thin out towards the top.
Research Cluster

Protected Data Spaces

Various categories of research data are subject to legal restrictions, including data protection laws, personal rights, and copyrights. Ethical restrictions on data sharing may also apply, such as the geolocation of sensitive sites or politically or socially sensitive content.

Nevertheless, there is a legitimate research interest in accessing such data. At the LSC, we assess which categories of data can be reused for research purposes and propose legal, ethical, and technical solutions, considering different levels of data sensitivity.

Our work includes investigating methods for pseudonymisation and anonymisation, as well as techniques for linking sensitive data with non-critical data in distributed knowledge organisation systems. Additionally, we examine researchers’ awareness of security and privacy risks and the potential consequences of handling sensitive data.

Cartoon depiction of data storage and links to data utilisation options.
USE CASE
Sensitive data in sports science

The MO|RE data platform makes physical fitness data derived from sports science studies available to both academia and the public. Research would strongly benefit from linking health with physical fitness data, e.g. in longitudinal data sets. However, publishing sensitive health (e.g. BMI, blood pressure) and other personal data (e.g. geolocation, social status) is challenging. An overarching concept for the secure handling of sensitive data is lacking, ranging from a trustworthy IT environment to sophisticated access management and auditing mechanisms, which ensures compliance with legal regulations.

logo-more-data.png
Schematic representation of data streams through different levels.
Research Cluster

Smart Data Acquisition

This Research Cluster investigates innovative technical and societal method, quality criteria for data acquisition as well as partially automated procedures for documentation, analysis and interpretation of data, thus fostering the acceleration of research processes. 

It assesses associated opportunities and risks, including legal challenges related to IP protection.

The Chemotion Electronic Lab Notebook (ELN) will serve as a testbed to investigate the efficiency of data acquisition and analysis as well as the establishment of trust and accountability.

Cartoon representation of a laptop that is connected to various data sources.
USE CASE
Chemotion Electronic Lab Notebook
(KIT-IBCS, Dr. Nicole Jung)

Chemistry labs in academia make limited use of lab automation and device integration. Despite current research data guidelines by funders and positive examples in industry, there is reluctance to adopt technologies such as ELNs. Concerns include dependencies on software and technologies not under control of scientists, faulty methods for data assignment and data analysis, and missing control over re-use of their data.

logo-chemotion.svg
Reticular representation of the human brain.
Research Cluster

AI-Based Knowledge Realms

Machine learning and AI have the potential to unlock new discoveries and innovation. They address the challenges posed by the ever-growing volumes of data and present opportunities to semantically link information that is currently separated.

However, there are also associated risks, including the legal assessment of the use of synthetic training data for AI systems, limited or biased training data and quality problems in indexing, as well as a lack of acceptance by users due to unverifiable decisions by AI systems.

This is particularly relevant in terms of the social, political and economic implications of AI-based decisions made by models that are difficult to explain or understand.

Cartoon depiction of a reading robot whose brain is sixty per cent charged.
USE CASE
Artificial Intelligence in Biomedical Engineering
(KIT-IBT, Dr. Axel Loewe)

KIT-IBT develops computer models of the human heart to predict cardiovascular diseases earlier and more accurately using software engineering, algorithmics, numerics, signal
processing, data analysis, and machine learning. We employ AI methods trained on purely synthetic or hybrid (simulated + clinical) datasets to help decipher disease mechanisms. Simulated data are often essential to overcome issues of data privacy and existing bias in most available datasets, but raise questions of explainability of AI decisions and trust.

img-artificial-intelligence-biomedical-engineering.png
Schematic representation of data links.
Research Cluster

Publication Cultures

New publication formats, beyond the classic peer-reviewed article, are gaining importance. Data publications ensure the transparency and reproducibility of scientific findings and form the basis for further research thus saving resources. It is essential to include software used to generate or interpret data along with data publications as a quality assurance measure. It is imperative to acknowledge both data publications and software as first-class scholarly outputs.

Existing publishing infrastructures are not yet optimally designed to accommodate data and software. The dynamic evolution of the legal framework necessitates a comprehensive examination of European and national data laws and policies, and their implications for these emerging formats, as well as researchers' propensity to share data, algorithms and software.

The transition to Open Science must be accompanied by a suitable communication strategy to help prevent misinterpretation of research results. This strategy must take into account new communication formats and stakeholders such as science communicators or decision-makers to improve the exchange between science and society.

Cartoon representation of a robot holding different data media in each of its four arms.
USE CASE
Publication of large datasets

KIT-IMK is responsible for generating and analysing extensive datasets in chemistry-climate simulations and satellite data for atmospheric observation. However, the current approach to publication is inefficient due to the size of the data. Re-use is currently restricted due to the absence of effective methods for efficiently exploring such datasets and evaluating their relevance to other research questions. The selection of subsets for re-use or peer review is currently not possible.

img-use.case-publication-large-datasets.png