SML group at ETH

ROC-n-reroll: How verifier imperfection affects test-time scaling

Florian E. Dorner, Yatong Chen, André F. Cruz, and Fanny Yang

arXiv preprint, 2025

abstract arXiv

Test-time scaling aims to improve language model performance by leveraging additional compute during inference. While many works have empirically studied techniques like Best-of-N (BoN) and rejection sampling that make use of a verifier to enable test-time scaling, there is little theoretical understanding of how verifier imperfection affects performance. In this work, we address this gap. Specifically, we prove how instance-level accuracy of these methods is precisely characterized by the geometry of the verifier’s ROC curve. Interestingly, while scaling is determined by the local geometry of the ROC curve for rejection sampling, it depends on global properties of the ROC curve for BoN. As a consequence when the ROC curve is unknown, it is impossible to extrapolate the performance of rejection sampling based on the low-compute regime. Furthermore, while rejection sampling outperforms BoN for fixed compute, in the infinite-compute limit both methods converge to the same level of accuracy, determined by the slope of the ROC curve near the origin. Our theoretical results are confirmed by experiments on GSM8K using different versions of Llama and Qwen to generate and verify solutions.
Efficient Randomized Experiments Using Foundation Models

Piersilvio De Bartolomeis, Javier Abad, Guanbo Wang, Konstantin Donhauser, Raymond M. Duch, Fanny Yang, and Issa J. Dahabreh

arXiv preprint, 2025

abstract arXiv slides code

Randomized experiments are the preferred approach for evaluating the effects of interventions, but they are costly and often yield estimates with substantial uncertainty. On the other hand, in silico experiments leveraging foundation models offer a cost-effective alternative that can potentially attain higher statistical precision. However, the benefits of in silico experiments come with a significant risk: statistical inferences are not valid if the models fail to accurately predict experimental responses to interventions. In this paper, we propose a novel approach that integrates the predictions from multiple foundation models with experimental data while preserving valid statistical inference. Our estimator is consistent and asymptotically normal, with asymptotic variance no larger than the standard estimator based on experimental data alone. Importantly, these statistical properties hold even when model predictions are arbitrarily biased. Empirical results across several randomized experiments show that our estimator offers substantial precision gains, equivalent to a reduction of up to 20% in the sample size needed to match the same precision as the standard estimator based on experimental data alone.

Learning Pareto fronts in high dimensions: How can regularization help?

Tobias Wegel, Filip Kovačević, Alexandru Tifrea, and Fanny Yang

International Conference on Artificial Intelligence and Statistics (AISTATS), 2025

abstract arXiv pdf poster

Modern machine learning methods often have to rely on high-dimensional data that is expensive to label, while unlabeled data is abundant. When the data exhibits low-dimensional structure such as sparsity, conventional regularization techniques are known to improve generalization for a single objective (e.g., prediction risk). However, it is largely unexplored how to leverage this structure in the context of multi-objective learning (MOL) with multiple competing objectives. In this work, we discuss how the application of vanilla regularization approaches can fail, and propose the first MOL estimator that provably yields improved performance in the presence of sparsity and unlabeled data. We demonstrate its effectiveness experimentally for multi-distribution learning and fairness-risk trade-offs.
Doubly robust identification of treatment effects from multiple environments

Piersilvio De Bartolomeis, Julia Kostin, Javier Abad, Yixin Wang, and Fanny Yang

International Conference on Learning Representations (ICLR), 2025

abstract arXiv pdf code

Practical and ethical constraints often dictate the use of observational data for causal inference, particularly in medicine and social sciences. Yet, observational datasets are prone to confounding, potentially compromising the validity of conclusions. While adjusting for all available covariates is a common corrective strategy, this approach can introduce bias, especially when post-treatment variables are present or some variables remain unobserved—a frequent scenario in practice. Avoiding this bias often requires detailed knowledge of the underlying causal graph, a challenging and often impractical prerequisite. In this work, we propose RAMEN, an algorithm that tackles this challenge by leveraging the heterogeneity of multiple data sources without the need to know the complete causal graph. Notably, RAMEN achieves doubly robust identification: we identify the treatment effect if either the causal parents of the treatment or those of the outcome are observed. Empirical evaluations across synthetic, semi-synthetic, and real-world datasets show that our approach significantly outperforms existing methods.
Copyright-Protected Language Generation via Adaptive Model Fusion

Javier Abad, Konstantin Donhauser, Francesco Pinto*, and Fanny Yang*

International Conference on Learning Representations (ICLR), Oral, 2025

abstract arXiv pdf poster slides video code

The risk of language models reproducing copyrighted material from their training data has led to the development of various protective measures. Among these, inference-time strategies that impose constraints via post-processing have shown promise in addressing the complexities of copyright regulation. However, they often incur prohibitive computational costs or suffer from performance trade-offs. To overcome these limitations, we introduce Copyright-Protecting Model Fusion (CP-Fuse), a novel approach that combines models trained on disjoint sets of copyrighted material during inference. In particular, CP-Fuse adaptively aggregates the model outputs to minimize the reproduction of copyrighted content, adhering to a crucial balancing property that prevents the regurgitation of memorized data. Through extensive experiments, we show that CP-Fuse significantly reduces the reproduction of protected material without compromising the quality of text and code generation. Moreover, its post-hoc nature allows seamless integration with other protective measures, further enhancing copyright safeguards. Lastly, we show that CP-Fuse is robust against common techniques for extracting training data.

Achievable distributional robustness when the robust risk is only partially identified

Julia Kostin, Nicola Gnecco, and Fanny Yang

Neural Information Processing Systems (NeurIPS), 2024

abstract arXiv poster slides

In safety-critical applications, machine learning models should generalize well under worst-case distribution shifts, that is, have a small robust risk. Invariance-based algorithms can provably take advantage of structural assumptions on the shifts when the training distributions are heterogeneous enough to identify the robust risk. However, in practice, such identifiability conditions are rarely satisfied – a scenario so far underexplored in the theoretical literature. In this paper, we aim to fill the gap and propose to study the more general setting of partially identifiable robustness. In particular, we define a new risk measure, the identifiable robust risk, and its corresponding (population) minimax quantity that is an algorithm-independent measure for the best achievable robustness under partial identifiability. We introduce these concepts broadly, and then study them within the framework of linear structural causal models for concreteness of the presentation. We use the introduced minimax quantity to show how previous approaches provably achieve suboptimal robustness in the partially identifiable case. We confirm our findings through empirical simulations and real-world experiments and demonstrate how the test error of existing robustness methods grows increasingly suboptimal as the proportion of previously unseen test directions increases.
Robust Mixture Learning when Outliers Overwhelm Small Groups

Daniil Dmitriev*, Rares-Darius Buhai*, Stefan Tiegel, Alexander Wolters, Gleb Novikov, Amartya Sanyal, David Steurer, and Fanny Yang

Neural Information Processing Systems (NeurIPS), 2024

abstract arXiv pdf slides video

We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, much less is known when outliers may crowd out low-weight clusters - a setting we refer to as list-decodable mixture learning (LD-ML). In this case, adversarial outliers can simulate additional spurious mixture components. Hence, if all means of the mixture must be recovered up to a small error in the output list, the list size needs to be larger than the number of (true) components. We propose an algorithm that obtains order-optimal error guarantees for each mixture mean with a minimal list-size overhead, significantly improving upon list-decodable mean estimation, the only existing method that is applicable for LD-ML. Although improvements are observed even when the mixture is non-separated, our algorithm achieves particularly strong guarantees when the mixture is separated: it can leverage the mixture structure to partially cluster the samples before carefully iterating a base learner for list-decodable mean estimation at different scales.
Minimum Norm Interpolation Meets The Local Theory of Banach Spaces

Gil Kur, Pedro Abdalla*, Pierre Bizeul*, and Fanny Yang

International Conference on Machine Learning (ICML), 2024

abstract pdf

Minimum-norm interpolators have recently gained attention primarily as an analyzable model to shed light on the double descent phenomenon observed for neural networks. The majority of the work has focused on analyzing interpolators in Hilbert spaces, where typically an effectively low-rank structure of the feature covariance prevents a large bias. More recently, tight vanishing bounds have also been shown for isotropic high-dimensional data for lp-spaces with p in [1,2), leveraging sparse structure of the ground truth. However, these proofs are tailored to specific settings and hard to generalize. This paper takes a first step towards establishing a general framework that connects generalization properties of the interpolators to well-known concepts from high-dimensional geometry, specifically, from the local theory of Banach spaces. In particular, we show that under 2-uniform convexity, the bias of the minimal norm solution is bounded by the Gaussian complexity of the class. We then prove a “reverse” Efron-Stein lower bound on the expected conditional variance of the minimal norm solution under cotype 2. Finally, we prove that this bound is sharp for lp-linear regression under sub-Gaussian covariates.
Privacy-preserving data release leveraging optimal transport and particle gradient descent

Konstantin Donhauser*, Javier Abad*, Neha Hulkund, and Fanny Yang

International Conference on Machine Learning (ICML), 2024

abstract arXiv pdf poster code

We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for marginal-based private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.
Detecting critical treatment effect bias in small subgroups

Piersilvio De Bartolomeis, Javier Abad, Konstantin Donhauser, and Fanny Yang

Conference on Uncertainty in Artificial Intelligence (UAI), 2024

abstract arXiv pdf slides code

Randomized trials are considered the gold standard for making informed decisions in medicine, yet they often lack generalizability to the patient populations in clinical practice. Observational studies, on the other hand, cover a broader patient population but are prone to various biases. Thus, before using an observational study for decision-making, it is crucial to benchmark its treatment effect estimates against those derived from a randomized trial. We propose a novel strategy to benchmark observational studies beyond the average treatment effect. First, we design a statistical test for the null hypothesis that the treatment effects estimated from the two studies, conditioned on a set of relevant features, differ up to some tolerance. We then estimate an asymptotically valid lower bound on the maximum bias strength for any subgroup in the observational study. Finally, we validate our benchmarking strategy in a real-world setting and show that it leads to conclusions that align with established medical knowledge.
Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

Piersilvio De Bartolomeis*, Javier Abad*, Konstantin Donhauser, and Fanny Yang

International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

abstract arXiv pdf poster slides code

In the era of fast-paced precision medicine, observational studies play a major role in properly evaluating new drugs in clinical practice. Yet, unobserved confounding can significantly compromise causal conclusions from observational data. We propose a novel strategy to quantify unobserved confounding by leveraging randomized trials. First, we design a statistical test to detect unobserved confounding with strength above a given threshold. Then, we use the test to estimate an asymptotically valid lower bound on the unobserved confounding strength. We evaluate the power and validity of our statistical test on several synthetic and semi-synthetic datasets. Further, we show how our lower bound can correctly identify the absence and presence of unobserved confounding in a real-world setting.
Certified private data release for sparse Lipschitz functions

Konstantin Donhauser, Johan Lokna, Amartya Sanyal, March Boedihardjo, Robert Hoenig, and Fanny Yang

International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

abstract arXiv pdf

As machine learning has become more relevant for everyday applications, a natural requirement is the protection of the privacy of the training data. When the relevant learning questions are unknown in advance, or hyper-parameter tuning plays a central role, one solution is to release a differentially private synthetic data set that leads to similar conclusions as the original training data. In this work, we introduce an algorithm that enjoys fast rates for the utility loss for sparse Lipschitz queries. Furthermore, we show how to obtain a certificate for the utility loss for a large class of algorithms.
PILLAR: How to make semi-private learning more effective

Francesco Pinto, Yaxi Hu, Fanny Yang, and Amartya Sanyal

IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2024

abstract arXiv pdf poster video

In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networks pre-trained on public (labelled or unlabelled) data, whose distribution can significantly differ from the one on which SP learning is performed. To validate its empirical effectiveness, we propose a wide variety of experiments under tight privacy constraints (ε=0.1) and with a focus on low-data regimes. In all of these settings, our algorithm exhibits significantly improved performance over available baselines that use similar amounts of public data.
Tight bounds for maximum l1-margin classifiers

Stefan Stojanovic, Konstantin Donhauser, and Fanny Yang

Algorithmic Learning Theory (ALT), 2024

abstract arXiv pdf slides video

Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum l1-margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the l1-norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum l1-margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order ||w*||_1^2/3/n^1/3 for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order 1/sqrt(log(d/n)). We are therefore first to show benign overfitting for the maximum l1-margin classifier.

Can semi-supervised learning use all the data effectively? A lower bound perspective

Alexandru Ţifrea*, Gizem Yüce*, Amartya Sanyal, and Fanny Yang

Neural Information Processing Systems (NeurIPS), Spotlight, 2023

abstract arXiv pdf poster

Prior works have shown that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning (SL) algorithms. However, existing theoretical analyses focus on regimes where the unlabeled data is sufficient to learn a good decision boundary using unsupervised learning (UL) alone. This begs the question: Can SSL algorithms simultaneously improve upon both UL and SL? To this end, we derive a tight lower bound for 2-Gaussian mixture models that explicitly depends on the labeled and the unlabeled dataset size as well as the signal-to-noise ratio of the mixture distribution. Surprisingly, our result implies that no SSL algorithm can improve upon the minimax-optimal statistical error rates of SL or UL algorithms for these distributions. Nevertheless, we show empirically on real-world data that SSL algorithms can still outperform UL and SL methods. Therefore, our work suggests that, while proving performance gains for SSL algorithms is possible, it requires careful tracking of constants.
Margin-based sampling in high dimensions: When being active is less efficient than staying passive

Alexandru Tifrea*, Jacob Clarysse*, and Fanny Yang

International Conference on Machine Learning (ICML), 2023

abstract arXiv pdf poster

It is widely believed that given the same labeling budget, active learning (AL) algorithms like margin-based active learning achieve better predictive performance than passive learning (PL), albeit at a higher computational cost. Recent empirical evidence suggests that this added cost might be in vain, as margin-based AL can sometimes perform even worse than PL. While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that PL outperforms margin-based AL even for noiseless data and when using the Bayes optimal decision boundary for sampling. Insights from our proof indicate that this high-dimensional phenomenon is exacerbated when the separation between the classes is small. We corroborate this intuition with experiments on 20 high-dimensional datasets spanning a diverse range of applications, from finance and histology to chemistry and computer vision.
Strong inductive biases provably prevent harmless interpolation

Michael Aerni*, Marco Milanta*, Konstantin Donhauser, and Fanny Yang

International Conference on Learning Representations (ICLR), 2023

abstract arXiv pdf poster code

Classical wisdom suggests that estimators should avoid fitting noise to achieve good generalization. In contrast, modern overparameterized models can yield small test error despite interpolating noise – a phenomenon often called "benign overfitting" or "harmless interpolation". This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator’s inductive bias, i.e., how heavily the estimator favors solutions with a certain structure: while strong inductive biases prevent harmless interpolation, weak inductive biases can even require fitting noise to generalize well. Our main theoretical result establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels, where the filter size regulates the strength of the inductive bias. We further provide empirical evidence of the same behavior for deep neural networks with varying filter sizes and rotational invariance.
Why adversarial training can hurt robust accuracy

Jacob Clarysse, Julia Hörrmann, and Fanny Yang

International Conference on Learning Representations (ICLR), 2023

abstract pdf poster video

Machine learning classifiers with high test accuracy often perform poorly under adversarial attacks. It is commonly believed that adversarial training alleviates this issue. In this paper, we demonstrate that, surprisingly, the opposite may be true – Even though adversarial training helps when enough data is available, it may hurt robust generalization in the small sample size regime. We first prove this phenomenon for a high-dimensional linear classification setting with noiseless observations. Our proof provides explanatory insights that may also transfer to feature learning models. Further, we observe in experiments on standard image datasets that the same behavior occurs for perceptible attacks that effectively reduce class information such as mask attacks and object corruptions.

How unfair is private learning?

Amartya Sanyal*, Yaxi Hu*, and Fanny Yang

Conference on Uncertainty in Artificial Intelligence (UAI), Oral, 2022

abstract arXiv pdf slides

As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. To corroborate our theoretical results in practice, we provide an extensive set of experimental results using a variety of synthetic, vision (CIFAR-10 and CelebA), and tabular (Law School) datasets and learning algorithms.
Semi-supervised novelty detection using ensembles with regularized disagreement

Alexandru Țifrea, Eric Stavarache, and Fanny Yang

Conference on Uncertainty in Artificial Intelligence (UAI), 2022

abstract arXiv pdf poster video code

Despite their excellent performance on in-distribution (ID) data, machine learning-based prediction systems often predict out-of-distribution (OOD) samples incorrectly while indicating high confidence. Instead, they should flag samples that are not similar to the training data, for example when new classes emerge over time. Even though current OOD detection algorithms can successfully distinguish completely different data sets, they fail to reliably identify samples from novel classes. We develop a new ensemble-based procedure that promotes model diversity and exploits regularization to limit disagreement to only OOD samples, using a batch containing an unknown mixture of ID and OOD data. We show that our procedure significantly outperforms state-of-the-art methods, including those that have access, during training, to data that is known to be OOD. We run extensive comparisons of our approach on a variety of novel-class detection scenarios, on standard image data sets such as SVHN/CIFAR-10/CIFAR-100 as well as on new disease detection on medical image data sets.
Fast rates for noisy interpolation require rethinking the effects of inductive bias

Konstantin Donhauser, Nicolo Ruggeri, Stefan Stojanovic, and Fanny Yang

International Conference on Machine Learning (ICML), 2022

abstract arXiv pdf poster slides video

Good generalization performance on high-dimensional data crucially hinges on a simple structure of the ground truth and a corresponding strong inductive bias of the estimator. Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: Our results suggest that, while a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise. Specifically, for both linear regression and classification with a sparse ground truth, we prove that minimum \ell_p-norm and maximum \ell_p-margin interpolators achieve fast polynomial rates up to order 1/n for p > 1 compared to a logarithmic rate for p = 1. Finally, we provide experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice.
Tight bounds for minimum l1-norm interpolation of noisy data

Guillaume Wang*, Konstantin Donhauser*, and Fanny Yang

International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

abstract arXiv pdf poster

We provide matching upper and lower bounds of order σ2/log(d/n) for the prediction error of the minimum ℓ1-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when d≫n, and is the first to imply asymptotic consistency of noisy minimum-norm interpolation for isotropic features and sparse ground truths. Our work complements the literature on "benign overfitting" for minimum ℓ2-norm interpolation, where asymptotic consistency can be achieved only when the features are effectively low-dimensional.

Self-supervised Reinforcement Learning with Independently Controllable Subgoals

Andrii Zadaianchuk, Georg Martius, and Fanny Yang

Conference on Robot Learning (CoRL), 2021

abstract arXiv pdf poster

To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.
How rotational invariance of common kernels prevents generalization in high dimensions

Konstantin Donhauser, Mingqi Wu, and Fanny Yang

International Conference on Machine Learning (ICML), 2021

abstract arXiv pdf poster

Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. However, its behavior in high dimensions is much less understood. Recent work establishes consistency for high-dimensional kernel regression for a number of specific assumptions on the data distribution. In this paper, we show that in high dimensions, the rotational invariance property of commonly studied kernels (such as RBF, inner product kernels and fully-connected NTK of any depth) leads to inconsistent estimation unless the ground truth is a low-degree polynomial. Our lower bound on the generalization error holds for a wide range of distributions and kernels with different eigenvalue decays. This lower bound suggests that consistency results for kernel ridge regression in high dimensions generally require a more refined analysis that depends on the structure of the kernel beyond its eigenvalue decay.
Interpolation can hurt robust generalization even when there is no noise

Konstantin Donhauser*, Alexandru Tifrea*, Michael Aerni, Reinhard Heckel, and Fanny Yang

Neural Information Processing Systems (NeurIPS), 2021

abstract arXiv pdf workshop poster slides video

Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.

Understanding and Mitigating the Tradeoff between Robustness and Accuracy

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang

International Conference on Machine Learning (ICML), 2020

abstract arXiv pdf workshop slides video code

Adversarial training augments the training set with perturbations to improve the robust error (over worst-case perturbations), but it often leads to an increase in the standard error (on unperturbed test inputs). Previous explanations for this tradeoff rely on the assumption that no predictor in the hypothesis class has low standard and robust error. In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error. In particular, we show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor. We then prove that the recently proposed robust self-training (RST) estimator improves robust error without sacrificing standard error for noiseless linear regression. Empirically, for neural networks, we find that RST with different adversarial training methods improves both standard and robust error for random and adversarial rotations and adversarial ℓ∞ perturbations in CIFAR-10.

Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness

Fanny Yang, Zuowen Wang, and Christina Heinze-Deml

Neural Information Processing Systems (NeurIPS), 2019

abstract arXiv pdf poster code

This work provides theoretical and empirical evidence that invariance-inducing regularizers can increase predictive accuracy for worst-case spatial transformations (spatial robustness). Evaluated on these adversarially transformed examples, we demonstrate that adding regularization on top of standard or adversarial training reduces the relative error by 20% for CIFAR10 without increasing the computational cost. This outperforms handcrafted networks that were explicitly designed to be spatial-equivariant. Furthermore, we observe for SVHN, known to have inherent variance in orientation, that robust training also improves standard accuracy on the test set. We prove that this no-trade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit.

How robust accuracy suffers from certified training with convex relaxations

Piersilvio De Bartolomeis, Jacob Clarysse, Amartya Sanyal, and Fanny Yang

NeurIPS Workshop on empirical falsification (Long Talk) 2022

abstract arXiv

Adversarial attacks pose significant threats to deploying state-of-the-art classifiers in safety-critical applications. Two classes of methods have emerged to address this issue: empirical defences and certified defences. Although certified defences come with robustness guarantees, empirical defences such as adversarial training enjoy much higher popularity among practitioners. In this paper, we systematically compare the standard and robust error of these two robust training paradigms across multiple computer vision tasks. We show that in most tasks and for both 𝓁∞-ball and 𝓁2-ball threat models, certified training with convex relaxations suffers from worse standard and robust error than adversarial training. We further explore how the error gap between certified and adversarial training depends on the threat model and the data distribution. In particular, besides the perturbation budget, we identify as important factors the shape of the perturbation set and the implicit margin of the data distribution. We support our arguments with extensive ablations on both synthetic and image datasets.
Provable concept learning for interpretable predictions using variational inference

Armeen Taeb, Nicolo Ruggeri, Carina Schnuck, and Fanny Yang

ICML Workshop AI4Science 2022

abstract arXiv poster video code

In safety critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixel level attributions or use previously known concepts. In this paper we aim to provide explanations by provably identifying \emphhigh-level, previously unknown concepts. To this end, we propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP) – a VAE-based classifier that uses visually interpretable concepts as linear predictors. Assuming that the data generating mechanism involves predictive concepts, we prove that our method is able to identify them while attaining optimal classification accuracy. We use synthetic experiments for validation, and also show that on real-world (PlantVillage and ChestXRay) datasets, CLAP effectively discovers interpretable factors for classifying diseases.

papers

preprints

recent conference publications

workshop papers