+ PDF Chat On scalable oversight with weak LLMs judging strong LLMs 2024 Zachary Kenton
Noah Y. Siegel
János Kramár
Jonah Brown-Cohen
Samuel Albanie
Jannis Bulian
Rishabh Agarwal
David Lindner
Yunhao Tang
Noah D. Goodman
+ PDF Chat The Ethics of Advanced AI Assistants 2024 Iason Gabriel
Arianna Manzini
Geoff Keeling
Lisa Anne Hendricks
Verena Rieser
Hasan Iqbal
Nenad Tomašev
Sofia Ira Ktena
Zachary Kenton
M. Balsa Rodríguez
+ PDF Chat A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI 2024 Seliem El-Sayed
Canfer Akbulut
Amanda McCroskery
Geoff Keeling
Zachary Kenton
Zaria Jalan
Nahema Marchal
Arianna Manzini
Toby Shevlane
Shannon Vallor
+ Explaining grokking through circuit efficiency 2023 Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
+ Challenges with unsupervised LLM knowledge discovery 2023 Sebastian Farquhar
Vikrant Varma
Zachary Kenton
Johannes Gasteiger
Vladimir Mikulik
Rohin Shah
+ Safe Deep RL in 3D Environments using Human Feedback 2022 Matthew Rahtz
Vikrant Varma
Ramana Kumar
Zachary Kenton
Shane Legg
Jan Leike
+ Discovering Agents 2022 Zachary Kenton
Ramana Kumar
Sebastian Farquhar
Jonathan G. Richens
Matt MacDermott
Tom Everitt
+ Alignment of Language Agents 2021 Zachary Kenton
Tom Everitt
Laura Weidinger
Iason Gabriel
Vladimir Mikulik
Geoffrey Irving
+ PDF Chat Imitating Interactive Intelligence 2020 Josh Abramson
Arun Ahuja
Iain Barr
Arthur Brussee
Federico Carnevale
Mary Cassin
Rachita Chhaparia
Stephen R. L. Clark
Bogdan Damoc
Andrew Dudzik
+ Generalizing from a few environments in safety-critical reinforcement learning 2019 Zachary Kenton
Angelos Filos
Owain Evans
Yarin Gal
+ A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks 2019 Angelos Filos
Sebastian Farquhar
Aidan N. Gomez
Tim G. J. Rudner
Zachary Kenton
Lewis Smith
Milad Alizadeh
Arnoud de Kroon
Yarin Gal
+ DNN's Sharpest Directions Along the SGD Trajectory. 2018 Stanisław Jastrzȩbski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
+ On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length 2018 Stanisław Jastrzȩbski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
+ Three Factors Influencing Minima in SGD 2017 Stanisław Jastrzȩbski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
