student projects

List of available projects

If you are an ETH student in CS, EE or Statistics (math can also be arranged) and interested in doing a project or thesis, please fill out this form and email both Fanny and the project advisor. There are more opportunities in the lab than the listed projects, if you are interested in the general direction of trustworthy ML or causal inference (both empirically and theoretically) with excellent mathematical or coding background, feel free to contact us.

Performance-Aware Optimization for Unlearning in Large Language Models

The project will study optimization-based unlearning schemes that explicitly control downstream performance, aiming to stay close to an ideal retrained baseline.

A common approach to unlearning in large language models is to directly update the weights of a pretrained model. Standard optimization-based schemes typically apply gradient ascent (or related updates) on a “forget” set to increase its loss, and add a regularization term—often a KL divergence—to keep the updated model close to the original parameters.

These methods can suppress the targeted content, but often at the cost of substantial utility loss on downstream tasks. The natural gold standard is the retrained model: a model trained from scratch only on the retain data (the original training data minus the forget set). In practice, however, most work focuses on tuning regularization strength and form, without explicitly asking whether the resulting unlearned model remains useful on relevant tasks or approximates the behaviour of this retrained baseline.

Problem statement: The project will investigate optimization-based unlearning from a performance-aware perspective. Instead of regularizing only towards the original model, the goal is to design unlearning objectives that also control performance on data drawn from the retain distribution, for example through validation-based penalties or constraints defined on a retain/validation set. There is also scope for theory in simplified settings (e.g., convex models or small classifiers) to understand which regularizers or constrained formulations yield solutions that are both “unlearned” and close to the retrained solution.

Goals of the project:

  • Define a rigorous optimization-based unlearning setup on a pretrained open-source LLM, with clearly specified forget and retain/validation sets and metrics for both unlearning quality and retained utility.
  • Develop and implement at least one performance-aware unlearning objective or constraint that explicitly incorporates retain/validation performance, with an optional theoretical analysis in a simplified setting.
  • Empirically compare the proposed schemes against standard KL-regularized unlearning in terms of forgetting, downstream utility, and proximity to a model retrained from scratch on the retain data.

Key Skills & Qualifications:

  • Strong background in machine learning and deep learning, and comfortable reading research papers.
  • Solid Python and PyTorch skills, plus basic experience running experiments on GPUs.
  • Interest in optimization, constrained objectives, and connecting empirical behaviour with simple theoretical models.
  • Able to work fairly independently once the experimental setup is defined, and to organize and interpret results across multiple baselines and configurations.

Beyond Accuracy on the Line: Evaluating Out-of-Distribution Generalization in Machine Learning

Develop novel ways to evaluate machine learning methods out-of-distribution which reflect their true generalization capabilities.

The accuracy of machine learning methods often drops when they are trained on one domain and deployed on another. This is a finding which has been observed over and over empirically. However, it is less clear which actions can be undertaken to mitigate this failure, if any. One intuitive approach, distributionally robust optimization (DRO), aims to find a model which performs well on all test datasets “close” to the training data in some probability distance. However, this approach results in overly pessimistic models which are robust against “unrealistic” distribution shifts. Instead, a number of methods have been proposed which aim to identify and exclusively use stable (“causal”) relationships in the data. Corresponding theory states that such methods are guaranteed to perform better than standard empirical risk minimization (ERM) under worst-case distribution shifts. When put to the test empirically, these findings do not seem to hold up well: on real-world datasets, causality- (or invariance-) based generalization methods are very often outperformed by ERM, and seem to generalize worse both in- and out-of-distribution (OOD) [Nastl & Hardt, 2024, Salaudeen et al., 2025]. This seems consistent with the “accuracy-on-the-line” hypothesis, which postulates that the ranking of models is often preserved across distributions. In recent work, it has been argued that this mismatch between theoretical and empirical findings is an “artifact” of misspecified OOD datasets, which do not contain sufficiently adversarial shifts.

The goal is to resolve the mismatch between theoretical and empirical findings in multiple ways:

  • by verifying whether invariance-based OOD methods rank better if the distribution shift is constructed to be worst-case;
  • by constructing benchmarks with varying strength and complexity of the distribution shift which could help evaluate a “spectrum” of OOD generalization of models;
  • by providing a theoretical justification of recent empirical findings through analysis of the mismatch between benchmarks and assumptions.

Goals of the project:

  • Create novel evaluation schemes for OOD machine learning methods;
  • Construct novel benchmarks which more accurately measure OOD generalization;
  • Test a large variety of large-scale models via the novel schemes and benchmarks.

Key Skills & Qualifications:

  • Strong background in machine learning, familiarity with reading and understanding research papers.
  • Solid Python and PyTorch skills, plus basic experience running experiments on GPUs.
  • Background in statistics, basic machine learning theory, training and evaluation of machine learning models.
  • Interest in out-of-distribution generalization, design of safe and robust models, and causality
Contact: Julia and Jun

Example of previous student projects

Tight bounds for maximum l1-margin classifiers

Stefan Stojanovic with Konstantin Donhauser and Fanny Yang. ALT 2024. [paper]

Certified private data release for sparse Lipschitz functions

Johan Lokna and Robert Hoenig with Konstantin Donhauser, Amartya Sanyal, March Boedihardjo, and Fanny Yang. AISTATS 2024. [paper]

Can semi-supervised learning use all the data effectively? A lower bound perspective

Gizem Yüce with Alexandru Ţifrea, Amartya Sanyal, and Fanny Yang. NeurIPS 2023, Spotlight 🏅. [paper]

Strong inductive biases provably prevent harmless interpolation

Marco Milanta with Michael Aerni, Konstantin Donhauser, and Fanny Yang. ICLR 2023. [paper]

Why adversarial training can hurt robust accuracy

Jacob Clarysse with Julia Hörrmann and Fanny Yang. ICLR 2023. [paper]

How unfair is private learning?

Yaxi Hu with Amartya Sanyal and Fanny Yang. UAI 2022, Oral 🏅. [paper]