Progress measures for grokking via mechanistic interpretability

Type: Preprint

Publication Date: 2023-01-01

Citations: 36

DOI: https://doi.org/10.48550/arxiv.2301.05217

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks 2023 Ziqian Zhong
Ziming Liu
Max Tegmark
Jacob Andreas
+ PDF Chat Emergence in non-neural models: grokking modular arithmetic via average gradient outer product 2024 Neil Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Mikhail Belkin
+ PDF Chat Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration 2024 Chun Hei Yip
Rajashree Agrawal
Lawrence Chan
Jason Gross
+ PDF Chat Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation 2024 Yeachan Park
Minseok Kim
Yeoneung Kim
+ PDF Chat Emergent properties with repeated examples 2024 François Charton
Julia Kempe
+ A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks 2023 William Merrill
Nikolaos Tsilivis
Aman Shukla
+ PDF Chat Clustering and Alignment: Understanding the Training Dynamics in Modular Addition 2024 Tiberiu Musat
+ PDF Chat Interpreting Grokked Transformers in Complex Modular Arithmetic 2024 Hiroki Furuta
Gouki Minegishi
Yusuke Iwasawa
Yutaka Matsuo
+ Grokking modular arithmetic 2023 Andrey Gromov
+ Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability 2023 Ziming Liu
Eric Gan
Max Tegmark
+ PDF Chat Grokking Modular Polynomials 2024 Darshil Doshi
Tianyu He
Aritra Das
Andrey Gromov
+ Grokking Tickets: Lottery Tickets Accelerate Grokking 2023 Gouki Minegishi
Yusuke Iwasawa
Yutaka Matsuo
+ A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations 2023 Bilal Chughtai
Lawrence Chan
Neel Nanda
+ Feature emergence via margin maximization: case studies in algebraic tasks 2023 Depen Morwani
Benjamin Edelman
Costin-Andrei Oncescu
Rosie Zhao
Sham M. Kakade
+ Towards Understanding Grokking: An Effective Theory of Representation Learning 2022 Ziming Liu
O. Kitouni
Niklas Stefan Nolte
Eric J. Michaud
Max Tegmark
M. Williams
+ Quiver neural networks 2022 Iordan Ganev
Robin Walters
+ PDF Chat Survival of the Fittest Representation: A Case Study with Modular Addition 2024 Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
+ Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking 2023 Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
+ PDF Chat Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic 2024 Jiuxiang Gu
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Tianyi Zhou
+ PDF Chat One Step Back, Two Steps Forward: Interference and Learning in Recurrent Neural Networks 2019 Chen Beer
Omri Barak

Works That Cite This (17)

Action Title Year Authors
+ PDF Chat Dissecting Recall of Factual Associations in Auto-Regressive Language Models 2023 Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
+ PDF Chat Do Transformers Parse while Predicting the Masked Word? 2023 Haoyu Zhao
Abhishek Panigrahi
Rong Ge
Sanjeev Arora
+ PDF Chat A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis 2023 Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
+ Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions 2024 Luca Longo
Mario Brčić
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
Javier Del Ser
Riccardo Guidotti
Yoichi Hayashi
Francisco Herrera
Andreas Holzinger
+ PDF Chat Domain-specific chatbots for science using embeddings 2023 Kevin G. Yager
+ Emergent Linear Representations in World Models of Self-Supervised Sequence Models 2023 Neel Nanda
Andrew Lee
Martin Wattenberg
+ Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions 2023 Luca Longo
Mario Brčić
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
Javier Del Ser
Riccardo Guidotti
Yoichi Hayashi
Francisco Herrera
Andreas Holzinger
+ Towards a Science Exocortex 2024 Kevin G. Yager
+ Arithmetic with language models: From memorization to computation 2024 Davide Maltoni
Matteo Ferrara
+ PDF Chat Adversarial Attacks on the Interpretation of Neuron Activation Maximization 2024 Géraldin Nanfack
Alexander Fulleringer
Jonathan Marty
Michael Eickenberg
Eugene Belilovsky

Works Cited by This (0)

Action Title Year Authors