student projects
List of available projects
If you are an ETH student in CS, EE or Statistics (math can also be arranged) and interested in doing a project or thesis, please fill out this form and email both Fanny and the project advisor.
Copyright-aware Weight Merging of ML Models
Develop novel methods for merging ML model weights that prevent copyright infringement while maintaining model performance.
Copyright protection in machine learning is challenging due to the frequent use of large, uncurated datasets from the internet, making it difficult to ensure compliance with copyright restrictions. Typically, multiple versions of the same model are trained on disjoint data subsets for computational efficiency and then merged at inference. This model-merging process presents a unique opportunity to enforce copyright protection by controlling how the model’s weights are combined.
The goal is to optimize the merged model to avoid reproducing memorized copyrighted content, while still allowing limited access to this data, in line with legal regulations. This project aims to explore novel methods for merging model weights that not only prevent copyright infringement but also maintain high model performance.
The student will be expected to:
- Develop algorithms for copyright-aware weight merging
- Test them on large-scale models
- Evaluate the trade-off between copyright protection and task accuracy
Prerequisites: Background in optimization, language models, and privacy
Efficient Pareto Set Approximation in Multi-Objective Neural Networks
Explore efficient methods for approximating Pareto sets in large neural networks with multiple competing objectives.
Multi-objective optimization problems are common in machine learning, particularly in multi-task learning where conflicting objectives make it challenging to find a single model that performs well across all tasks. The Pareto set, representing non-dominated solutions, is a useful alternative for managing trade-offs between objectives, but efficiently learning this set for large neural networks remains an open problem due to the complexity and non-convexity of the optimization landscape.
This project aims to explore efficient methods for approximating the Pareto set, focusing on scalable solutions that could be applied to large models like language models. The expected outcome is a theoretically grounded framework and an algorithm for Pareto front approximation, with experimental validation across multiple tasks.
The student is expected to:
- Review existing methods for multi-objective optimization
- Propose novel or improved approaches for approximating the Pareto set
- Implement these methods with experiments on large models
Prerequisites: Background in optimization and statistics
Anytime-valid Inference Using Foundation Models
Leverage predictions from foundation models to improve the efficiency of sequential randomized experiments.
The Hybrid Augmented Inverse Probability Weighting (H-AIPW) estimator [1] has demonstrated efficiency gains in randomized experiments by incorporating predictions from foundation models. However, its current form requires a fixed sample size to be determined before the experiment begins. Since the efficiency gains depend on the accuracy of the model, which is unknown in advance, it is not clear how to determine a “sufficient” sample size a priori.
This project aims to extend the H-AIPW framework to the sequential setting. Leveraging recent theoretical advances in sequential inference [2], we propose to develop a robust methodology that guarantees valid statistical inference at any stopping time, regardless of the bias in the predictions from the foundation models.
Key objectives include:
- Develop a sequential version of the H-AIPW estimator and establish theoretical guarantees that allow the construction of anytime-valid confidence intervals
- Validate the proposed methodology across several experimental settings, assessing both the efficiency improvements and the robustness of inferential guarantees
Prerequisites: Strong background in mathematics and statistics