Data Science Challenges
Data science challenges are competitive events where individuals or teams work on real-world problems using data analysis and machine learning techniques. Below are challenges I participated in together with a team:
Wattβs up? Synthetic Data for Buildings
by TU Vienna, Austria
22 β 23 February 2025
π 3st Place / 8 Teams β 80β¬ Prize
Learn more about the challenge
Overview
The task was to develop a model that can input timecourse data of electricity
consumption and output a synthetic version of this dataset that ensures the
privacy of the individuals. This is important because electricity consumption
data are sensitive and cannot be easily shared. However, widespread availability
of such datasets could be very beneficial for model development, allowing insights
that help save energy.
Approach
We chose an approach that samples pieces from the actual data, reassembles them,
and adds Gaussian noise on top. This produced instances that closely resembled
the real data in terms of statistical properties but were different enough to
ensure they cannot be linked to individual examples of the real data.
Autoimmune Disease Machine Learning Challenge
by Eric and Wendy Schmidt Center at the Broad Institute, USA
Part 1: Oktober 2024 β February 2025Part 2: November 2024 β March 2025
Part 3: December 2024 β April 2025
Part 1: π 4st Place / 140 Teams β 900$ Prize
Part 2: π still ongoing
Part 3: π still ongoing
Learn more about the challenge
Overview
Autoimmune diseases occur when the immune system mistakenly attacks healthy
cells, disrupting the bodyβs natural defense mechanisms. These diseases affect
50 million people in the U.S., with cases rising globally.
One of the most prevalent autoimmune diseases is inflammatory bowel disease
(IBD). To better understand and treat IBD, researchers sequence the RNA
(transcriptome) of individual affected cells. However, this process is highly
complex and expensive, making a predictive model for transcriptomics a valuable
tool in advancing research.
This challenge consisted of three independent sub-challenges, each addressing a
distinct aspect of this broader research problem:
- The task of Part 1 of this challenge was to create a model that can deliver this expensive and hard to obtain information from readily available pathology images.
- The task of Part 2 was to predict all genes of a cell from a (predicted) subpart.
- The task of Part 3 was to rank all genes based on their ability to distinguish between normal cells and ones showing dysplasia.
Approach
For task 1, we used the ResNet50 model implemented in PyTorch to directly
map from the input images to the 460 genes (end-to-end deep learning approach).
For task 2, we took our trained ResNet50 model and its predictions, which served as the input to a standard neural network, which mapped from the 460 predicted genes to all genes of the cell.
For task 3, we submitted many different approaches. Notable examples include (i) determining the AUC value for all features, and sorting them according to that, and (ii) predicting the cells with dysplasia with the help of a neural network model, for which we extracted feature importances with the Integrated Gradients method from the Captum library. Integrated Gradients is a feature importance algorithm for neural networks.
LEC Data Challenge 2023
by LEC Gmbh, Austria
July β August 2023
π 1st Place / 46 Teams β 4,000β¬ Prize
Learn more about the challenge | See the winners on LinkedIn |
Overview
The task was to develop a machine learning solution to identify
cylinder-specific events in time-series data from large combustion engines.
Approach
We trained an XGBoost algorithm to produce a model that predicted, for each
time point, which type of issue was present, if any. As input, we used slices
of the time course (nearest values in time). Because issues always covered
larger continuous time windows in the training data, we afterward
algorithmically identified regions in the XGBoost predictions where many
issues were detected and unified them into a continuous stretch with one
issue type.
Trigger Detection Challenge 2023
by Bauhaus-University Weimar, Germany
April β May 2023
Learn more about the challenge
Overview
A multi-label classification challenge to assign trigger warning
labels (such as violence and blood) to fanfiction documents based on their
content.
As I still was a beginner, I honestly was a bit overwhelmed and left the
challenge before it ended. However, I learned a lot about the workflow on how
to tackle such tasks.