Until 18.3. 23:59, send an email to the supervisor of your project listed here with a pdf containing:
Consider this as an opportunity for you to receive feedback
.pdf
format, according to instruction below.pdf
with
The project report and presentation should include the following sections:
In presentation (and report):
“Detective hat”: Intuitive (not just technical level) understanding of proof, assumptions, statement in depth
“Reviewer hat”: Which relevant questions does it shed light on and it actually answer it / solve the problem? How significant is the addition of this paper compared to existing literature
What are interesting, impactful follow-up questions they did not answer and would be interesting to pursue? Show evidence that the question(s) you identified are indeed relevant to understand important phenomena in practice and are novel in the literature. You can start with the paper’s weakness. Examples are
In report:
Break down the problem as much as you can into chunks that you can indeed pursue (or at least, the first few steps), e.g. to prove a conjecture give intuition, lemmas you think you need, and try to prove some of them.
Show your attempts to tackle the first few steps.
Each of the project proposal, progress report, and final project report should be neatly typeset as a PDF document using TeX, LaTeX, or similar systems with bibliographic references (e.g., using BibTeX). Check here for typesetting resources.
The total project grade will depend equally on
Basics:
Content (see this for corresponding details)
The later your presentation, the more of the last bullet point is expected to be included.
Grade determined by peer feedback and self-feedback on the presentation
See rubric which will be used for assessment.
Content:
The combination of presentation slides and report should contain the content described above. The way to split it will probably be different in each project. If you were able to discuss the proof in the amount of detail you find insightful during the presentation, you can focus more on your own work.
Furthermore the pure reproduction (including necessary restructuring and rephrasing) of paper result + proof presentation sections should not constitute more than 50% of the report. Your own investigations should be in the focus here. That may include an extensive literature review, experimental explorations, follow-up theoretical conjectures/results etc. If the proof is poorly presented in the original paper (i.e. convoluting the key ingredients etc.), a simpler proof will also count towards “own investigations”
Length: In the Neurips format (see template below) it should be at most 10 pages main text, excluding references and appendix, where you could add more experiments and proof of technical lemmas etc.
Style:
Please see the following guidelines which will be used to guide the presentation portion of the report grade
In terms of content, it is not the absolute results that will be graded (i.e. you don’t have to prove a new theorem or write a new conference paper), but the depth at which you investigate the paper’s faults and contributions critically, put it into context and the novelty and impact of the follow-up questions that you would like to pursue. Hence, primarily you will be graded on points 1-4 in Final Content
Obviously it’s great if you succeed to solve your follow-up questions (i.e. successfully manage points 5-6), and that’ll be a big bonus for the grade, but you can achieve a good grade without actually having publishable results. Maybe think of it as a proposal for a master thesis project with preliminary ideas and/or results.
Note: Many of these have been published at conferences or journals by now. Link to arxiv is provided to find the newest version.
Surprises in High-Dimensional Ridgeless Least Squares Interpolation. Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani arXiv:1903.08560 [math.ST]
Exact expressions for double descent and implicit regularization via surrogate random design. Michał Dereziński, Feynman Liang, Michael W. Mahoney arXiv:1912.04533 [cs.LG]
Linearized two-layers neural networks in high dimension. Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari arXiv:1904.12191 [math.ST]
When Do Neural Networks Outperform Kernel Methods? Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari arXiv:2006.13409 [stat.ML]
Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed. Maria Refinetti, Sebastian Goldt, Florent Krzakala, Lenka Zdeborová arXiv:2102.11742 [cs.LG]
Generalisation error in learning with random features and the hidden manifold model Federica Gerace, Bruno Loureiro, Florent Krzakala, Marc Mézard, Lenka Zdeborová [arXiv:2002.09339 [math.ST]] https://arxiv.org/pdf/2002.09339.pdf
Triple descent and the two kinds of overfitting: Where & why do they appear? Stéphane d’Ascoli, Levent Sagun, Giulio Biroli arXiv:2006.03509 [cs.LG]
Finite-sample analysis of interpolating linear classifiers in the overparameterized regime. Niladri S. Chatterji, Philip M. Long arXiv:2004.12019v2 [stat.ML]
On the Inductive Bias of Neural Tangent Kernels. Alberto Bietti, Julien Mairal. arXiv:1905.12173 [stat.ML]
Deep Equals Shallow for ReLU Networks in Kernel Regimes Alberto Bietti, Francis Bach arXiv:2009.14397 [stat.ML]
Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS. Lin Chen, Sheng Xu. arXiv:2009.14397 [stat.ML]
On the Similarity between the Laplace and Neural Tangent Kernels. Amnon Geifman, Abhay Yadav, Yoni Kasten, Meirav Galun, David Jacobs, Ronen Basri. arXiv:2007.01580 [cs.LG]
Nonparametric regression using deep neural networks with ReLU activation function. Johannes Schmidt-Hieber. arXiv:1708.06633 [math.ST]
Gradient Methods Never Overfit On Separable Data. Ohad Shamir. arXiv:2007.00028 [cs.LG]
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy Edward Moroshko, Blake E. Woodworth, Suriya Gunasekar, Jason D. Lee, Nati Srebro, Daniel Soudry arXiv:2007.06738 [cs.LG]
Learning Parities with Neural Networks. Amit Daniely, Eran Malach. arXiv:2002.07400 [cs.LG]
Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian. arXiv:1703.02930 [cs.LG]
Size-Independent Sample Complexity of Neural Networks. Noah Golowich, Alexander Rakhlin, Ohad Shamir. arXiv:1712.06541 [cs.LG]
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks. Ziwei Ji, Matus Telgarsky. arXiv:1909.12292 [cs.LG]
Gradient Descent Finds Global Minima of Deep Neural Networks. Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai. arXiv:1811.03804 [cs.LG]
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. Lenaic Chizat, Francis Bach. arXiv:1805.09545 [math.OC].
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss. Lenaic Chizat, Francis Bach. arXiv:2002.04486 [math.OC]
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks. Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak. arXiv:1903.11680 [cs.LG]
Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks. Spencer Frei, Yuan Cao, Quanquan Gu. arXiv:1910.02934 [cs.LG]
Bridging Theory and Algorithm for Domain Adaptation. Yuchen Zhang*, Tianle Liu, Mingsheng Long, Michael I. Jordan (* not the same as above) arXiv:1904.05801 [cs.LG], supplement
Certifying Some Distributional Robustness with Principled Adversarial Training. Aman Sinha, Hongseok Namkoong, John Duchi. arXiv:1710.10571 [stat.ML]
Domain adaptation under structural causal models. Yuansi Chen, Peter Bühlmann arXiv:2010.15764 [stat.ML]
Minimax optimality of permutation tests. Ilmun Kim, Sivaraman Balakrishnan, Larry Wasserman arXiv:2003.13208 [math.ST]
You can choose your own paper, however you have to double check with the instructor before registering the paper. Other possible topics: