This course is designed to prepare master students for successful research in ML, and prepare PhD students to find new research ideas related to ML theory. Content wise, the technical part will focus on generalization bounds using uniform convergence, and non-parametric regression.
By the end of the course
Learning objectives
acquire enough mathematical background to understand a good fraction of theory papers published in the typical ML venues. For this purpose, students will learn common mathematical techniques from statistics and optimization in the first part of the course and apply this knowledge in the project work
critically examine recently published work in terms of relevance and determine impactful (novel) research problems. This will be an integral part of the project work and involves experimental as well as theoretical questions
find and outline an approach (some subproblem) to prove a conjectured theorem. This will be practiced in lectures / exercise and homeworks and potentially in the final project.
effectively communicate and present the problem motivation, new insights and results to a technical audience. This will be primarily learned via the final presentation and report as well as during peer-grading of peer talks.
As graduates students we expect you to take this class because you want to learn the material and how to do research. All assessments are designed to maximize the learning effect. Cheating will harm yourself and hence it is of your own interest to adhere to the following policy.
All homework is submitted individually, and must be in your own words.
You may discuss only at a high level with up to two classmates; please list their IDs on the first page of your homework. Everyone must still submit an individual write-up, and yours must be in your own words; indeed, your discussions with classmates should be too high level for it to be possible that they are not in your own words.
We prefer you do not dig around for homework solutions; if you do rely upon external resources, cite them, and still write your solutions in your own words.
When integrity violations are found, they will be submitted to the department’s evaluation board.
Date | Topic | Location | Material | Assignments | |
---|---|---|---|---|---|
16.9 | Lecture: Introduction and concentration bounds (Recording) | CAB G59 | MW 2 | ||
19.9 | Lecture: Uniform tail bound and McDiarmid | CHN G42 | MW 2,3,4 | ||
23.9. | no class | ||||
26.9. | Lecture: Azuma-Hoeffding and the uniform law | CHN G42 | MW 4 | HW 1 | |
30.9. | Lecture: Uniform law and Rademacher complexity | CAB G59 | MW 2,4 | ||
3.10. | Lecture: VC bound and margin bounds Exercise sheet | CHN G42 | SS 7, 26 | ||
7.10. | Lecture: Metric entropy | CAB G59 | MW 5 | HW 1 and de-registration due by 8.10. 23:59, HW 1 sol | |
10.10. | Lecture: Chaining | CHN G42 | SS 26 | ||
14.10. | No class | Project sign-up 14:00 | |||
17.10. | No class | ||||
21.10. | Lecture: Non-parametric regression and kernels | CAB G59 | SC 4, MW 12, 13 | ||
24.10. | Lecture: Kernel ridge regression | CHN G42 | MW 13 | Project proposals due | |
28.10. | Lecture: Random design | CAB G59 | MW 14 | HW 2 | |
31.10. | Lecture: Minimax lower bounds | CHN G42 | MW 15 | ||
4.11. | Lecture: Minimax lower bounds | CAB G59 | MW 15 | ||
7.11. | Interactive session: Multi-objective learning | CHN G42 | Exercise sheet Solution | ||
11.11. | [Lecture: Double Descent] | CAB G59 | HW 2 due 23:59, HW 2 sol | ||
14.11. | No class | ||||
17./18.11. | ORALS | ||||
21.11. | Guest lecture | CHN G42 | |||
25.11. | Guest lecture | CAB G59 | |||
28.11. | Guest lecture by Amartya Sanyal | CHN G42 | Presentation draft due | ||
2.12. | No class | ||||
5.12. | No class | ||||
9.12. | [Presentations 1], see full schedule | CAB G59 | |||
12.12. | [Presentations 2], see full schedule | CHN G42 | |||
16.12. | [Presentations 3], see full schedule | CAB G59 | |||
19.12. | [Presentations 4], see full schedule | CHN G42 | [Peer-grading due] | ||
12.1. | No class | Project reports due |
Links to books are online resources free from the ETH Zurich network:
Learning Theory
Martin Wainwright: High-dimensional statistics: Core reference for the course
Francis Bach: Learning Theory from first principles: New book that includes some learning theory for neural networks
Percy Liang: Statistical Learning Theory, Stanford Lecture notes
Steinwart and Christmann: Support Vector Machines: more mathematical treatment of RKHS
Some more background reading for your general wisdom, knowledge and entertainment
Keener: Theoretical Statistics: e.g. asymptotic optimality (MLE), UMVU testing
van der Vaart and Wellner: Weak Convergence and Empirical Processes