student projects
List of available projects
If you are an ETH student in CS, EE or Statistics (math can also be arranged) and interested in doing a project or thesis, please fill out this form and email both Fanny and the project advisor. There are more opportunities in the lab than the listed projects, if you are interested in the general direction of trustworthy ML or causal inference (both empirically and theoretically) with excellent mathematical or coding background, feel free to contact us.
Anytime-valid Inference Using Foundation Models
Leverage predictions from foundation models to improve the efficiency of sequential randomized experiments.
The Hybrid Augmented Inverse Probability Weighting (H-AIPW) estimator has demonstrated efficiency gains in randomized experiments by incorporating predictions from foundation models. However, its current form requires a fixed sample size to be determined before the experiment begins. Since the efficiency gains depend on the accuracy of the model, which is unknown in advance, it is not clear how to determine a “sufficient” sample size a priori.
This project aims to extend the H-AIPW framework to the sequential setting. Leveraging recent theoretical advances in sequential inference, we propose to develop a robust methodology that guarantees valid statistical inference at any stopping time, regardless of the bias in the predictions from the foundation models.
Key objectives include:
- Develop a sequential version of the H-AIPW estimator and establish theoretical guarantees that allow the construction of anytime-valid confidence intervals
- Validate the proposed methodology across several experimental settings, assessing both the efficiency improvements and the robustness of inferential guarantees
Prerequisites: Strong background in mathematics and statistics
Efficient Clinical Trials
The project will focus on developing machine learning models to improve the efficiency of clinical trials.
The student will conduct their thesis at Harvard University using real clinical data.
The Hybrid Augmented Inverse Probability Weighting (H-AIPW) estimator has demonstrated efficiency gains in randomized experiments by incorporating predictions from foundation models. However, these gains have only been demonstrated in the context of political and social science experiments, limiting the generalizability of the findings to other domains.
This project proposes applying the H-AIPW framework within the medical domain. We aim to demonstrate substantial efficiency improvements by utilizing predictive machine learning models trained on extensive observational healthcare data (e.g. Medicare data).
Key objectives include:
- Developing predictive machine learning models using large-scale observational healthcare data
- Evaluating efficiency gains on an extensive collection of clinical trials using the H-AIPW estimator and related frameworks
Key Skills & Qualifications:
- Proficiency in Python and deep learning frameworks (e.g. PyTorch)
- Strong foundation in machine learning and statistics
- Interest in medical AI and clinical data applications
- Ability to collaborate in a multidisciplinary team
Efficient Randomized Experiments Using Foundation Models
Forecasting vegetation health for food security
The project will focus on developing machine learning models to forecast vegetation status across scales towards a food security early warning system for East Africa.
The student will conduct their thesis at the Max Planck Institute for Biogeochemistry in Jena, Germany using satellite imagery and climate data.
Frequent droughts threaten the livelihoods of pastoral communities in the Horn of Africa. Early warning systems (EWS) can provide crucial information to enable anticipatory action and improve food security. However, most EWS rely on weather variables, yet for grazing cattle the vegetation condition on the ground is more important.
This project proposes to train machine learning methods on satellite imagery in East Africa to provide forecasts of vegetation health. We aim to improve on previous works, by studying the benefits from increasing spatial resolution of the satellite data and through adopting a probabilistic view.
Key objectives include:
- Developing probabilistic machine learning models using large-scale Earth observation data
- Evaluating accuracy gains from predicting vegetation health status at higher spatial resolution
Key Skills & Qualifications:
- Proficiency in Python and deep learning frameworks (e.g. PyTorch)
- Strong foundation in machine learning and statistics
- Interest in AI for Earth and climate science
- Ability to collaborate in a multidisciplinary team
Example of previous student projects
Tight bounds for maximum l1-margin classifiers
Stefan Stojanovic with Konstantin Donhauser and Fanny Yang. ALT 2024. [paper]
Certified private data release for sparse Lipschitz functions
Johan Lokna and Robert Hoenig with Konstantin Donhauser, Amartya Sanyal, March Boedihardjo, and Fanny Yang. AISTATS 2024. [paper]
Can semi-supervised learning use all the data effectively? A lower bound perspective
Gizem Yüce with Alexandru Ţifrea, Amartya Sanyal, and Fanny Yang. NeurIPS 2023, Spotlight 🏅. [paper]
Strong inductive biases provably prevent harmless interpolation
Marco Milanta with Michael Aerni, Konstantin Donhauser, and Fanny Yang. ICLR 2023. [paper]
Why adversarial training can hurt robust accuracy
Jacob Clarysse with Julia Hörrmann and Fanny Yang. ICLR 2023. [paper]
How unfair is private learning?
Yaxi Hu with Amartya Sanyal and Fanny Yang. UAI 2022, Oral 🏅. [paper]