student projects

List of available projects

If you are an ETH student in CS, EE or Statistics (math can also be arranged) and interested in doing a project or thesis, please fill out this form and email both Fanny and the project advisor. There are more opportunities in the lab than the listed projects, if you are interested in the general direction of trustworthy ML or causal inference (both empirically and theoretically) with excellent mathematical or coding background, feel free to contact us.

Beyond Accuracy on the Line: Evaluating Out-of-Distribution Generalization in Machine Learning

Develop novel ways to evaluate machine learning methods out-of-distribution which reflect their true generalization capabilities.

The accuracy of machine learning methods often drops when they are trained on one domain and deployed on another. This is a finding which has been observed over and over empirically. However, it is less clear which actions can be undertaken to mitigate this failure, if any. One intuitive approach, distributionally robust optimization (DRO), aims to find a model which performs well on all test datasets “close” to the training data in some probability distance. However, this approach results in overly pessimistic models which are robust against “unrealistic” distribution shifts. Instead, a number of methods have been proposed which aim to identify and exclusively use stable (“causal”) relationships in the data. Corresponding theory states that such methods are guaranteed to perform better than standard empirical risk minimization (ERM) under worst-case distribution shifts. When put to the test empirically, these findings do not seem to hold up well: on real-world datasets, causality- (or invariance-) based generalization methods are very often outperformed by ERM, and seem to generalize worse both in- and out-of-distribution (OOD) [Nastl & Hardt, 2024, Salaudeen et al., 2025]. This seems consistent with the “accuracy-on-the-line” hypothesis, which postulates that the ranking of models is often preserved across distributions. In recent work, it has been argued that this mismatch between theoretical and empirical findings is an “artifact” of misspecified OOD datasets, which do not contain sufficiently adversarial shifts.

The goal is to resolve the mismatch between theoretical and empirical findings in multiple ways:

  • by verifying whether invariance-based OOD methods rank better if the distribution shift is constructed to be worst-case;
  • by constructing benchmarks with varying strength and complexity of the distribution shift which could help evaluate a “spectrum” of OOD generalization of models;
  • by providing a theoretical justification of recent empirical findings through analysis of the mismatch between benchmarks and assumptions.

Goals of the project:

  • Create novel evaluation schemes for OOD machine learning methods;
  • Construct novel benchmarks which more accurately measure OOD generalization;
  • Test a large variety of large-scale models via the novel schemes and benchmarks.

Key Skills & Qualifications:

  • Strong background in machine learning, familiarity with reading and understanding research papers.
  • Solid Python and PyTorch skills, plus basic experience running experiments on GPUs.
  • Background in statistics, basic machine learning theory, training and evaluation of machine learning models.
  • Interest in out-of-distribution generalization, design of safe and robust models, and causality
Contact: Julia and Jun

Example of previous student projects

Tight bounds for maximum l1-margin classifiers

Stefan Stojanovic with Konstantin Donhauser and Fanny Yang. ALT 2024. [paper]

Certified private data release for sparse Lipschitz functions

Johan Lokna and Robert Hoenig with Konstantin Donhauser, Amartya Sanyal, March Boedihardjo, and Fanny Yang. AISTATS 2024. [paper]

Can semi-supervised learning use all the data effectively? A lower bound perspective

Gizem Yüce with Alexandru Ţifrea, Amartya Sanyal, and Fanny Yang. NeurIPS 2023, Spotlight 🏅. [paper]

Strong inductive biases provably prevent harmless interpolation

Marco Milanta with Michael Aerni, Konstantin Donhauser, and Fanny Yang. ICLR 2023. [paper]

Why adversarial training can hurt robust accuracy

Jacob Clarysse with Julia Hörrmann and Fanny Yang. ICLR 2023. [paper]

How unfair is private learning?

Yaxi Hu with Amartya Sanyal and Fanny Yang. UAI 2022, Oral 🏅. [paper]