student projects

List of available projects

If you are an ETH student in CS, EE or Statistics (math can also be arranged) and interested in doing a project or thesis, please fill out this form and email both Fanny and the project advisor. There are more opportunities in the lab than the listed projects, if you are interested in the general direction of trustworthy ML or causal inference (both empirically and theoretically) with excellent mathematical or coding background, feel free to contact us.

Performance-Aware Optimization for Unlearning in Large Language Models

The project will study optimization-based unlearning schemes that explicitly control downstream performance, aiming to stay close to an ideal retrained baseline.

A common approach to unlearning in large language models is to directly update the weights of a pretrained model. Standard optimization-based schemes typically apply gradient ascent (or related updates) on a “forget” set to increase its loss, and add a regularization term—often a KL divergence—to keep the updated model close to the original parameters.

These methods can suppress the targeted content, but often at the cost of substantial utility loss on downstream tasks. The natural gold standard is the retrained model: a model trained from scratch only on the retain data (the original training data minus the forget set). In practice, however, most work focuses on tuning regularization strength and form, without explicitly asking whether the resulting unlearned model remains useful on relevant tasks or approximates the behaviour of this retrained baseline.

Problem statement: The project will investigate optimization-based unlearning from a performance-aware perspective. Instead of regularizing only towards the original model, the goal is to design unlearning objectives that also control performance on data drawn from the retain distribution, for example through validation-based penalties or constraints defined on a retain/validation set. There is also scope for theory in simplified settings (e.g., convex models or small classifiers) to understand which regularizers or constrained formulations yield solutions that are both “unlearned” and close to the retrained solution.

Goals of the project:

  • Define a rigorous optimization-based unlearning setup on a pretrained open-source LLM, with clearly specified forget and retain/validation sets and metrics for both unlearning quality and retained utility.
  • Develop and implement at least one performance-aware unlearning objective or constraint that explicitly incorporates retain/validation performance, with an optional theoretical analysis in a simplified setting.
  • Empirically compare the proposed schemes against standard KL-regularized unlearning in terms of forgetting, downstream utility, and proximity to a model retrained from scratch on the retain data.

Key Skills & Qualifications:

  • Strong background in machine learning and deep learning, and comfortable reading research papers.
  • Solid Python and PyTorch skills, plus basic experience running experiments on GPUs.
  • Interest in optimization, constrained objectives, and connecting empirical behaviour with simple theoretical models.
  • Able to work fairly independently once the experimental setup is defined, and to organize and interpret results across multiple baselines and configurations.

Beyond Accuracy on the Line: Evaluating Out-of-Distribution Generalization in Machine Learning

Develop novel ways to evaluate machine learning methods out-of-distribution which reflect their true generalization capabilities.

The accuracy of machine learning methods often drops when they are trained on one domain and deployed on another. This is a finding which has been observed over and over empirically. However, it is less clear which actions can be undertaken to mitigate this failure, if any. One intuitive approach, distributionally robust optimization (DRO), aims to find a model which performs well on all test datasets “close” to the training data in some probability distance. However, this approach results in overly pessimistic models which are robust against “unrealistic” distribution shifts. Instead, a number of methods have been proposed which aim to identify and exclusively use stable (“causal”) relationships in the data. Corresponding theory states that such methods are guaranteed to perform better than standard empirical risk minimization (ERM) under worst-case distribution shifts. When put to the test empirically, these findings do not seem to hold up well: on real-world datasets, causality- (or invariance-) based generalization methods are very often outperformed by ERM, and seem to generalize worse both in- and out-of-distribution (OOD) [Nastl & Hardt, 2024, Salaudeen et al., 2025]. This seems consistent with the “accuracy-on-the-line” hypothesis, which postulates that the ranking of models is often preserved across distributions. In recent work, it has been argued that this mismatch between theoretical and empirical findings is an “artifact” of misspecified OOD datasets, which do not contain sufficiently adversarial shifts.

The goal is to resolve the mismatch between theoretical and empirical findings in multiple ways:

  • by verifying whether invariance-based OOD methods rank better if the distribution shift is constructed to be worst-case;
  • by constructing benchmarks with varying strength and complexity of the distribution shift which could help evaluate a “spectrum” of OOD generalization of models;
  • by providing a theoretical justification of recent empirical findings through analysis of the mismatch between benchmarks and assumptions.

Goals of the project:

  • Create novel evaluation schemes for OOD machine learning methods;
  • Construct novel benchmarks which more accurately measure OOD generalization;
  • Test a large variety of large-scale models via the novel schemes and benchmarks.

Key Skills & Qualifications:

  • Strong background in machine learning, familiarity with reading and understanding research papers.
  • Solid Python and PyTorch skills, plus basic experience running experiments on GPUs.
  • Background in statistics, basic machine learning theory, training and evaluation of machine learning models.
  • Interest in out-of-distribution generalization, design of safe and robust models, and causality
Contact: Julia and Jun

Training Budget-Aware Verifiers for Test-Time Scaling of Large Language Models

Improve verifiers for resampling-based test-time scaling of large language models by using loss functions that target specific regions of the ROC curve.

Test-time scaling aims to improve large language model (LLM) performance by spending additional computing power during inference rather than training. Most simply, this can be achieved by sampling multiple answers x from an LLM and only returning correct answers. However, in many domains determining whether an answer is correct (y(x)=1) or incorrect (y(x)=0) is non-trivial. In these cases, it is common to make use of an imperfect LLM-based verifier (or reward model) f, predicting the correctness of answers [1]. Test-time scaling can then be achieved via best-of-N [2] – sampling N answers and returning the one with the highest verification score f(x) – or rejection sampling – resampling answers x until one of their scores f(x) exceeds a fixed quality threshold t. The larger N or t, the more samples are sampled before returning an answer and – usually – the better the quality of the resulting answer distribution. This quality depends on how indicative large verification scores f(x) are of correct answers y(x). More concretely, it can be shown that the expected accuracy of answers produced by rejection sampling and best-of-N is determined by the ROC curve of the verifier f [3]. In particular, the top right region of the ROC curve (high false and true positive rates) determines performance at smaller test-time budgets, while performance at larger budgets depends on the bottom left region of the ROC curve (low false and true positive rates).

Problem statement: The project’s focus is on investigating how different loss functions for verifier training affect test-time scaling by changing the shape of the ROC curve. In particular, different loss functions can yield incomparable ROC curves that are better in some and worse in other regions [4]. Correspondingly, training verifiers with specialized loss functions that target specific regions of the ROC could significantly improve accuracy for test-time scaling methods at a fixed budget.

Goals of the project:

  • Identify or define loss functions that are well-suited for targeting specific regions of the ROC curve associated with a given target budget for test-time scaling.
  • Perform experiments to confirm the impact of different loss functions on the ROC curves of trained LLM verifiers, as well as test-time scaling performance, both in- and out-of-distribution.
  • Potentially derive novel theoretical guarantees on the the relationship between the loss functions’ values and the behaviour of the associated ROC curves.

Key Skills & Qualifications:

  • Strong background in machine learning, familiarity with reading and understanding research papers
  • Solid Python and PyTorch skills, plus basic experience with running experiments on GPUs
  • Background or interest in both statistics and language models.
  • Ability to work fairly independently, given a well-defined plan.
Contact: Florian

Example of previous student projects

Tight bounds for maximum l1-margin classifiers

Stefan Stojanovic with Konstantin Donhauser and Fanny Yang. ALT 2024. [paper]

Certified private data release for sparse Lipschitz functions

Johan Lokna and Robert Hoenig with Konstantin Donhauser, Amartya Sanyal, March Boedihardjo, and Fanny Yang. AISTATS 2024. [paper]

Can semi-supervised learning use all the data effectively? A lower bound perspective

Gizem Yüce with Alexandru Ţifrea, Amartya Sanyal, and Fanny Yang. NeurIPS 2023, Spotlight 🏅. [paper]

Strong inductive biases provably prevent harmless interpolation

Marco Milanta with Michael Aerni, Konstantin Donhauser, and Fanny Yang. ICLR 2023. [paper]

Why adversarial training can hurt robust accuracy

Jacob Clarysse with Julia Hörrmann and Fanny Yang. ICLR 2023. [paper]

How unfair is private learning?

Yaxi Hu with Amartya Sanyal and Fanny Yang. UAI 2022, Oral 🏅. [paper]