Until 18.3. 23:59, send an email to fan.yang at inf.ethz.ch with a pdf containing:
Consider this as an opportunity for you to receive feedback
.pdf
format, according to instruction below.pdf
with
The project report and presentation should include the following sections:
In presentation (and report):
“Detective hat”: Intuitive (not just technical level) understanding of proof, assumptions, statement in depth
“Reviewer hat”: Which relevant questions does it shed light on and it actually answer it / solve the problem? How significant is the addition of this paper compared to existing literature
What are interesting, impactful follow-up questions they did not answer? Show evidence that the question(s) you identified are indeed relevant to understand important phenomena in practice. You can start with the paper’s weakness. Examples are
In report:
Break down the problem as much as you can into chunks that you can indeed pursue (or at least, the first few steps), e.g. to prove a conjecture give intuition, lemmas you think you need, and try to prove some of them.
Show your attempts to tackle the first few steps.
Grade will depend equally on
Each of the project proposal, progress report, and final project report should be neatly typeset as a PDF document using TeX, LaTeX, or similar systems with bibliographic references (e.g., using BibTeX). Check here for typesetting resources.
Basics:
Content (see this for corresponding details)
The later your presentation, the more of the last bullet point is expected to be included.
Grade determined by peer feedback and self-feedback on the presentation
See rubric which will be used for assessment.
Content:
The combination of presentation slides and report should contain the content described above. The way to split it will probably be different in each project. If you were able to discuss the proof in the amount of detail you find insightful during the presentation, you can focus more on your own work.
Furthermore the pure reproduction (including necessary restructuring and rephrasing) of paper result + proof presentation sections should not constitute more than 50% of the report. Your own investigations should be in the focus here. That may include an extensive literature review, experimental explorations, follow-up theoretical conjectures/results etc. If the proof is poorly presented in the original paper (i.e. convoluting the key ingredients etc.), a simpler proof will also count towards “own investigations”
Length: In the Neurips format (see template below) it should be at most 10 pages main text, excluding references and appendix, where you could add more experiments and proof of technical lemmas etc.
Style:
Please see the following guidelines which will be used to guide the presentation portion of the report grade
Note: Many of these have been published at conferences or journals by now. Link to arxiv is provided to find the newest version.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani arXiv:1903.08560 [math.ST]
Just Interpolate: Kernel “Ridgeless” Regression Can Generalize Tengyuan Liang, Alexander Rakhlin arXiv:1808.00387 [math.ST]
The generalization error of random features regression: Precise asymptotics and double descent curve. Song Mei, Andrea Montanari. arXiv:1908.05355 [math.ST]
Margin-Based Generalization Lower Bounds for Boosted Classifiers Allan Grønlund, Lior Kamma, Kasper Green Larsen, Alexander Mathiasen, Jelani Nelson. arXiv:1909.12518 [cs.LG]
Minimum “Norm” Neural Networks are Splines Rahul Parhi, Robert D. Nowak arXiv:1910.02333 [stat.ML]
Mad Max: Affine Spline Insights into Deep Learning Randall Balestriero, Richard Baraniuk arXiv:1805.06576 [stat.ML]
Deep Neural Networks as Gaussian Processes. Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein. arXiv:1711.00165 [stat.ML]
On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti, Julien Mairal. arXiv:1905.12173 [stat.ML]
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot, Franck Gabriel, Clément Hongler arXiv:1806.07572 [cs.LG]
What Can ResNet Learn Efficiently, Going Beyond Kernels? Zeyuan Allen-Zhu, Yuanzhi Li. arXiv:1905.10337 [cs.LG]
Nonparametric regression using deep neural networks with ReLU activation function. Johannes Schmidt-Hieber. arXiv:1708.06633 [math.ST]
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization Sanjeev Arora, Nadav Cohen, Elad Hazan. arXiv:1802.06509 [cs.LG]
Uniform convergence may be unable to explain generalization in deep learning Vaishnavh Nagarajan, J. Zico Kolter arXiv:1902.04742 [cs.LG]
Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian arXiv:1703.02930 [cs.LG]
Size-Independent Sample Complexity of Neural Networks Noah Golowich, Alexander Rakhlin, Ohad Shamir arXiv:1712.06541 [cs.LG]
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks Ziwei Ji, Matus Telgarsky. arXiv:1909.12292 [cs.LG]
Gradient Descent Finds Global Minima of Deep Neural Networks Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai [arXiv:1811.03804 [cs.LG]]
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. Lenaic Chizat, Francis Bach. arXiv:1805.09545 [math.OC].
Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks. Spencer Frei, Yuan Cao, Quanquan Gu. arXiv:1910.02934 [cs.LG]
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang. arXiv:1901.08584 [cs.LG]
Learning Halfspaces and Neural Networks with Random Initialization. Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, Michael I. Jordan arXiv:1706.03175 [cs.LG]
Bridging Theory and Algorithm for Domain Adaptation. Yuchen Zhang*, Tianle Liu, Mingsheng Long, Michael I. Jordan (* not the same as above) arXiv:1904.05801 [cs.LG], supplement
Robust learning with the Hilbert-Schmidt independence criterion. Daniel Greenfeld, Uri Shalit arXiv:1910.00270 [cs.LG]
Certifying Some Distributional Robustness with Principled Adversarial Training. Aman Sinha, Hongseok Namkoong, John Duchi. arXiv:1710.10571 [stat.ML]
Rademacher Complexity for Adversarially Robust Generalization Dong Yin, Kannan Ramchandran, Peter Bartlett. arXiv:1810.11914 [cs.LG]
Adversarial Risk Bounds via Function Transformation Justin Khim, Po-Ling Loh arXiv:1810.09519 [stat.ML]
You can choose your own paper, however you have to double check with the instructor before registering the paper. Other possible topics: