Project basics


  • Form groups of two if possible
  • Pick a paper that includes an extensive proof out of the list (or double-check your own suggestion with me)
  • Sign up here for a paper/project.


  • Project proposal: 18.3.
  • Mid-project drafts: 5.5.
  • Report is due two weeks after end of semester: 14.6.



Until 18.3. 23:59, send an email to fan.yang at with a pdf containing:

  • Choice of paper, group members
  • Division of labor in the group
  • Identify and motivate the problem of the paper in your own words(as defined in below)


Consider this as an opportunity for you to receive feedback

  • Draft presentation slides in .pdf format, according to instruction below
  • A .pdf with
    • report on what each individual has done
    • plan and division of labor until report deadline

Final content

The project report and presentation should include the following sections:

  1. Identify and motivate problem - why should I / the community care? Including literature review (done-ish)

In presentation (and report):

  1. “Detective hat”: Intuitive (not just technical level) understanding of proof, assumptions, statement in depth

    • Present it in a modular way: a) give intuition, b) proof depends on lemma 1,2,3, which are insightful and intuitive (give rise to potential new proofs) c) proofs of lemmas that are technical
    • If a proof is written in a way that obscures insights, fix that
    • Examine if results follow from other older known results and it’s just a reformulation, facilitated by ignorance (reinventing the wheel)
  2. “Reviewer hat”: Which relevant questions does it shed light on and it actually answer it / solve the problem? How significant is the addition of this paper compared to existing literature

  3. What are interesting, impactful follow-up questions they did not answer? Show evidence that the question(s) you identified are indeed relevant to understand important phenomena in practice. You can start with the paper’s weakness. Examples are

    • They developed theory but assumptions are too removed from practice they aim to prove, i.e. no recognizable link (example could be toy example that shares phenomena). In this case, come up with and run experiments that prove this point. Come up with idea what is needed (theoretically or experimentally) to make the paper impactful
    • You see that the experiments are non-rigorous and not reproducible, they do not support the theory nor heuristic claims.
    • The problem they aim to solve is not exactly relevant. Identify which possible tweaks in their problem formulation could fix that
    • By doing some experiments you find a curious phenomenon that hasn’t been considered and is not explained by the paper
    • Tightening upper bounds by constant or log factors didn’t lead to new insights.

In report:

  1. Break down the problem as much as you can into chunks that you can indeed pursue (or at least, the first few steps), e.g. to prove a conjecture give intuition, lemmas you think you need, and try to prove some of them.

  2. Show your attempts to tackle the first few steps.

Grade will depend equally on

  • clarity of presentation in class, report (will update rubric)
  • critical review of paper, motivation & evidence (theoretical or practical) for the relevance of your problem,
  • and progress made in tackling the subquestions identified in 5.

Each of the project proposal, progress report, and final project report should be neatly typeset as a PDF document using TeX, LaTeX, or similar systems with bibliographic references (e.g., using BibTeX). Check here for typesetting resources.




  • Length: 18 minutes total using slides (beamer recommended, e.g. using pandoc markdown) + 5 minutes discussion
  • For groups of two: Each person should present for 9 minutes
  • Watch this video on good slides:
  • Find more detailed, generally applicable project feedback, addressing common issues here

Content (see this for corresponding details)

  • Short problem introduction, motivation and lit review (20%)
  • Proof presentation (30%)
  • Paper discussion (30%)
  • Follow-up questions + motivation and possible methods of attack (20%)

The later your presentation, the more of the last bullet point is expected to be included.

Grade determined by peer feedback and self-feedback on the presentation

  • Effectiveness of Slides
  • Clarity of speech, voice
  • Organization of content

See rubric which will be used for assessment.



  • The combination of presentation slides and report should contain the content described above. The way to split it will probably be different in each project. If you were able to discuss the proof in the amount of detail you find insightful during the presentation, you can focus more on your own work.

  • Furthermore the pure reproduction (including necessary restructuring and rephrasing) of paper result + proof presentation sections should not constitute more than 50% of the report. Your own investigations should be in the focus here. That may include an extensive literature review, experimental explorations, follow-up theoretical conjectures/results etc. If the proof is poorly presented in the original paper (i.e. convoluting the key ingredients etc.), a simpler proof will also count towards “own investigations”

  • Length: In the Neurips format (see template below) it should be at most 10 pages main text, excluding references and appendix, where you could add more experiments and proof of technical lemmas etc.


Please see the following guidelines which will be used to guide the presentation portion of the report grade

  • Follow this as a guideline for content
  • Follow this as a guideline for good academic writing in general and this specifically for mathematical writing
  • Follow the latex specific style guide for the report
  • Use this paper template
  • On contributions for individuals: please indicate who contributed what

Presentation schedule

Date Presenter Paper title Peer graders
13.5. Parnian Kassraie Surprises in High-Dimensional Ridgeless Least Squares Interpolation [Slides][Video] DW, JL, GB, FY, MC, GW
Mathieu Chevalley,  Daniel Waelchli Deep Neural Networks as Gaussian Processes [Slides][Video] NZ, QW, SS, AG, VR, AJ
Thomas Allard Neural Tangent Kernel: Convergence and Generalization in Neural Networks [Slides][Video] PK, MC, SS, GB, YA, FY
20.5. Gregor Bachmann,  Yashas Annadani Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks [Slides][Video] TA, AE, SK, ZW, AJ, TK
Guillaume Wang,  Nicolas Zucchet What Can ResNet Learn Efficiently, Going Beyond Kernels? [Slides][Video] PK, TA, SK, SG, FY
Susanne Keller,  Shirin Goshtasbpour Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks [Slides][Video] NZ, AE, AG, JL, TK, AJ
Aleksandr Efremov,  Scott Sussex On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization [Slides][Video] NZ, GB, SG, QW, VR, AT
27.5. Qian Wang Bridging Theory and Algorithms for Domain Adaptation, supp [Slides] [Video] MC, DW, SS, AG, AT, YA
Ayush Garg Robust learning with the Hilbert-Schmidt independence criterion [Slides] [Video] AE, YA, ZW, JL, TK, AT
Zuowen Wang,  Jinzhou Li Rademacher Complexity for Adversarially Robust Generalization [Slides] [Video] DW, TA, GW, SK, VR, FY
Vaclav Rohzon,  Timon Knigge Kernel-based methods for bandit convex optimization [Slides] [Video] GW, SG, QW, ZW, AJ, FY


  • Please put your name down on Google-Spreadsheet (announced on piazza on 11.3.) which paper you want to work on.
  • If there are two papers listed for one number it means that you may choose to focus on one
  • One group per paper, max two people per group. First come first serve.

Note: Many of these have been published at conferences or journals by now. Link to arxiv is provided to find the newest version.

  1. Surprises in High-Dimensional Ridgeless Least Squares Interpolation Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani arXiv:1903.08560 [math.ST]

    Just Interpolate: Kernel “Ridgeless” Regression Can Generalize Tengyuan Liang, Alexander Rakhlin arXiv:1808.00387 [math.ST]

  2. The generalization error of random features regression: Precise asymptotics and double descent curve. Song Mei, Andrea Montanari. arXiv:1908.05355 [math.ST]

  3. Margin-Based Generalization Lower Bounds for Boosted Classifiers Allan Grønlund, Lior Kamma, Kasper Green Larsen, Alexander Mathiasen, Jelani Nelson. arXiv:1909.12518 [cs.LG]

  4. Minimum “Norm” Neural Networks are Splines Rahul Parhi, Robert D. Nowak arXiv:1910.02333 [stat.ML]

    Mad Max: Affine Spline Insights into Deep Learning Randall Balestriero, Richard Baraniuk arXiv:1805.06576 [stat.ML]

  5. Deep Neural Networks as Gaussian Processes. Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein. arXiv:1711.00165 [stat.ML]

  6. On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti, Julien Mairal. arXiv:1905.12173 [stat.ML]

    Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot, Franck Gabriel, Clément Hongler arXiv:1806.07572 [cs.LG]

  7. What Can ResNet Learn Efficiently, Going Beyond Kernels? Zeyuan Allen-Zhu, Yuanzhi Li. arXiv:1905.10337 [cs.LG]

  8. Nonparametric regression using deep neural networks with ReLU activation function. Johannes Schmidt-Hieber. arXiv:1708.06633 [math.ST]

  9. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization Sanjeev Arora, Nadav Cohen, Elad Hazan. arXiv:1802.06509 [cs.LG]

  10. Uniform convergence may be unable to explain generalization in deep learning Vaishnavh Nagarajan, J. Zico Kolter arXiv:1902.04742 [cs.LG]

  11. Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian arXiv:1703.02930 [cs.LG]

    Size-Independent Sample Complexity of Neural Networks Noah Golowich, Alexander Rakhlin, Ohad Shamir arXiv:1712.06541 [cs.LG]

  12. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks Ziwei Ji, Matus Telgarsky. arXiv:1909.12292 [cs.LG]

  13. Gradient Descent Finds Global Minima of Deep Neural Networks Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai [arXiv:1811.03804 [cs.LG]]

  14. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. Lenaic Chizat, Francis Bach. arXiv:1805.09545 [math.OC].

  15. Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks. Spencer Frei, Yuan Cao, Quanquan Gu. arXiv:1910.02934 [cs.LG]

  16. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang. arXiv:1901.08584 [cs.LG]

  17. Learning Halfspaces and Neural Networks with Random Initialization. Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, Michael I. Jordan arXiv:1706.03175 [cs.LG]

  18. Bridging Theory and Algorithm for Domain Adaptation. Yuchen Zhang*, Tianle Liu, Mingsheng Long, Michael I. Jordan (* not the same as above) arXiv:1904.05801 [cs.LG], supplement

  19. Robust learning with the Hilbert-Schmidt independence criterion. Daniel Greenfeld, Uri Shalit arXiv:1910.00270 [cs.LG]

  20. Certifying Some Distributional Robustness with Principled Adversarial Training. Aman Sinha, Hongseok Namkoong, John Duchi. arXiv:1710.10571 [stat.ML]

  21. Rademacher Complexity for Adversarially Robust Generalization Dong Yin, Kannan Ramchandran, Peter Bartlett. arXiv:1810.11914 [cs.LG]

    Adversarial Risk Bounds via Function Transformation Justin Khim, Po-Ling Loh arXiv:1810.09519 [stat.ML]

You can choose your own paper, however you have to double check with the instructor before registering the paper. Other possible topics:

  • Online convex optimization,
  • Density estimation - GAN theory