Projects
Reading
People
Chat
SU\G
(𝔸)
/K·U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Anka Reuel
Follow
Share
Generating author description...
All published works
Action
Title
Year
Authors
+
PDF
Chat
More than Marketing? On the Information Value of AI Benchmarks for Practitioners
2024
Amelia Hardy
Anka Reuel
Kiana Jafari Meimandi
Lisa Soder
A. Griffith
Dylan M. Asmar
Sanmi Koyejo
Michael S. Bernstein
Mykel J. Kochenderfer
+
PDF
Chat
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
2024
Anka Reuel
Amelia Hardy
Chandler Smith
Max Lamparth
Mary Hardy
Mykel J. Kochenderfer
+
PDF
Chat
Position Paper: Technical Research and Talent is Needed for Effective AI Governance
2024
Anka Reuel
Lisa Soder
Ben Bucknall
Trond Arne Undheim
+
PDF
Chat
Generative AI Needs Adaptive Governance
2024
Anka Reuel
Trond Arne Undheim
+
PDF
Chat
Escalation Risks from Language Models in Military and Diplomatic Decision-Making
2024
Juan-Pablo Rivera
Gabriel Mukobi
Anka Reuel
Max Lamparth
Chandler Smith
Jacquelyn Schneider
+
PDF
Chat
Analyzing And Editing Inner Mechanisms of Backdoored Language Models
2024
Max Lamparth
Anka Reuel
+
PDF
Chat
Artificial Intelligence Index Report 2024
2024
Nestor Maslej
Loredana Fattorini
Raymond Perrault
Vanessa Parli
Anka Reuel
Erik Brynjolfsson
John Etchemendy
Katrina Ligett
Terah Lyons
James Manyika
+
Escalation Risks from Language Models in Military and Diplomatic Decision-Making
2024
Juan-Pablo Rivera
Gabriel Mukobi
Anka Reuel
Max Lamparth
Chandler Smith
Jacquelyn Schneider
+
Analyzing And Editing Inner Mechanisms Of Backdoored Language Models
2023
Max Lamparth
Anka Reuel
+
How to design an AI ethics board
2023
Jonas Schuett
Anka Reuel
Alexis Carlier
+
International Governance of Civilian AI: A Jurisdictional Certification Approach
2023
Robert F. Trager
Ben Harack
Anka Reuel
Allison Carnegie
Lennart Heim
L. Lawrence Ho
Sarah Kreps
Ranjit Lall
Owen Larter
Seán Ó hÉigeartaigh
+
PDF
Chat
International Governance of Civilian AI: A Jurisdictional Certification Approach
2023
Robert F. Trager
Ben Harack
Anka Reuel
Allison Carnegie
Lennart Heim
L. Lawrence Ho
Sarah Kreps
Ranjit Lall
Owen Larter
Seán Ó hÉigeartaigh
Common Coauthors
Coauthor
Papers Together
Max Lamparth
5
Chandler Smith
3
Jacquelyn Schneider
2
Seán Ó hÉigeartaigh
2
Lennart Heim
2
Juan-Pablo Rivera
2
Amelia Hardy
2
Robert F. Trager
2
Simon Staffell
2
José Jaime Villalobos
2
Ben Harack
2
Owen Larter
2
Trond Arne Undheim
2
Mykel J. Kochenderfer
2
Ranjit Lall
2
L. Lawrence Ho
2
Lisa Soder
2
Allison Carnegie
2
Sarah Kreps
2
Gabriel Mukobi
2
Ben Bucknall
1
Mary Hardy
1
Russell Wald
1
Raymond Perrault
1
Terah Lyons
1
Jack A. Clark
1
Jonas Schuett
1
Juan Carlos Niebles
1
Yoav Shoham
1
Erik Brynjolfsson
1
Alexis Carlier
1
James Manyika
1
John Etchemendy
1
Michael S. Bernstein
1
Katrina Ligett
1
Kiana Jafari Meimandi
1
A. Griffith
1
Nestor Maslej
1
Dylan M. Asmar
1
Loredana Fattorini
1
Sanmi Koyejo
1
Vanessa Parli
1
Commonly Cited References
Action
Title
Year
Authors
# of times referenced
+
PDF
Chat
Backdoor Attacks on Self-Supervised Learning
2022
Aniruddha Saha
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
2
+
PDF
Chat
Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
2019
Thomas Manzini
Lim Yao Chong
Alan W. Black
Yulia Tsvetkov
2
+
Word embeddings quantify 100 years of gender and ethnic stereotypes
2018
Nikhil Garg
Londa Schiebinger
Dan Jurafsky
James Zou
2
+
PDF
Chat
Backdoor Learning: A Survey
2022
Yiming Li
Yong Jiang
Zhifeng Li
Shu‐Tao Xia
2
+
Universal and Transferable Adversarial Attacks on Aligned Language Models
2023
Andy Zou
Zifan Wang
J. Zico Kolter
Matt Fredrikson
2
+
PDF
Chat
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
2015
Yukun Zhu
Ryan Kiros
Rich Zemel
Ruslan Salakhutdinov
Raquel Urtasun
Antonio Torralba
Sanja Fidler
2
+
PDF
Chat
Semantics derived automatically from language corpora contain human-like biases
2017
Aylin Caliskan
Joanna J. Bryson
Arvind Narayanan
1
+
Decoupled Weight Decay Regularization
2017
Ilya Loshchilov
Frank Hutter
1
+
PDF
Chat
Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors
2019
Anne Lauscher
Goran Glavaš
1
+
The Role of Cooperation in Responsible AI Development
2019
Amanda Askell
Miles Brundage
Gillian K. Hadfield
1
+
Optuna: A Next-generation Hyperparameter Optimization Framework
2019
Takuya Akiba
Shotaro Sano
Toshihiko Yanase
Takeru Ohta
Masanori Koyama
1
+
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing
2020
Inioluwa Deborah Raji
Andrew Smart
Rebecca N. White
Margaret Mitchell
Timnit Gebru
Ben Hutchinson
Jamila Smith-Loud
Daniel Theron
Parker Barnes
1
+
Scaling Laws for Neural Language Models
2020
Jared Kaplan
Sam McCandlish
Tom Henighan
T. B. Brown
Benjamin Chess
Rewon Child
Scott Gray
Alec Radford
Jeffrey Wu
Dario Amodei
1
+
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
2020
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
1
+
Weight Poisoning Attacks on Pretrained Models
2020
Keita Kurita
Paul Michel
Graham Neubig
1
+
Towards Debiasing Sentence Representations
2020
Paul Pu Liang
Irene Li
Emily Zheng
Yao Chong Lim
Ruslan Salakhutdinov
Louis–Philippe Morency
1
+
Scaling Laws for Autoregressive Generative Modeling
2020
Tom Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
Jacob Jackson
Heewoo Jun
T. B. Brown
Prafulla Dhariwal
Scott Gray
1
+
PDF
Chat
Model Cards for Model Reporting
2019
Margaret Mitchell
Simone Wu
Andrew Zaldivar
Parker Barnes
Lucy Vasserman
Ben Hutchinson
Elena Spitzer
Inioluwa Deborah Raji
Timnit Gebru
1
+
PDF
Chat
Adversarial Machine Learning - Industry Perspectives
2020
Ram Shankar Siva Kumar
Magnus Nyström
John Lambert
Andrew Marshall
Mario Goertzel
Andi Comissoneru
Matt Swann
Sharon Xia
1
+
LoRA: Low-Rank Adaptation of Large Language Models
2021
J. Edward Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Weizhu Chen
1
+
PDF
Chat
Concealed Data Poisoning Attacks on NLP Models
2021
Eric Wallace
Tony Z. Zhao
Shi Feng
Sameer Singh
1
+
On the Opportunities and Risks of Foundation Models
2021
Rishi Bommasani
Drew A. Hudson
Ehsan Adeli
Russ B. Altman
Simran Arora
Sydney von Arx
Michael S. Bernstein
Jeannette Bohg
Antoine Bosselut
Emma Brunskill
1
+
PDF
Chat
Defining the scope of AI regulations
2023
Jonas Schuett
1
+
Datasets: A Community Library for Natural Language Processing
2021
Quentin Lhoest
A. Villanova del Moral
Yacine Jernite
Abhishek Thakur
Patrick von Platen
Suraj Patil
Julien Chaumond
Mariama Drame
Julien Plu
Lewis Tunstall
1
+
Unsolved Problems in ML Safety
2021
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
1
+
PDF
Chat
Venomave: Targeted Poisoning Against Speech Recognition
2023
Hojjat Aghakhani
Lea Schönherr
Thorsten Eisenhofer
Dorothea Kolossa
Thorsten Holz
Christopher Kruegel
Giovanni Vigna
1
+
Adversarial Neuron Pruning Purifies Backdoored Deep Models
2021
Dongxian Wu
Yisen Wang
1
+
PDF
Chat
Trojaning Language Models for Fun and Profit
2021
Xinyang Zhang
Zheng Zhang
Shouling Ji
Ting Wang
1
+
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
2022
Jason Lee
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Ed H.
Quoc V. Le
Denny Zhou
1
+
Structured access: an emerging paradigm for safe AI deployment
2022
Toby Shevlane
1
+
Planting Undetectable Backdoors in Machine Learning Models
2022
Shafi Goldwasser
Michael P. Kim
Vinod Vaikuntanathan
Or Zamir
1
+
Training Compute-Optimal Large Language Models
2022
Jordan Hoffmann
Sebastian Borgeaud
Arthur Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
1
+
Locating and Editing Factual Associations in GPT
2022
Kevin Meng
David Bau
Alex Andonian
Yonatan Belinkov
1
+
Emergent Abilities of Large Language Models
2022
Jason Lee
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
Sebastian Borgeaud
Dani Yogatama
Maarten Bosma
Denny Zhou
Donald Metzler
1
+
PDF
Chat
Predictability and Surprise in Large Generative Models
2022
Deep Ganguli
Danny Hernandez
Liane Lovitt
Amanda Askell
Yuntao Bai
Anna Chen
Tom Conerly
Nova Dassarma
Dawn Drain
Nelson Elhage
1
+
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
2020
Miles Brundage
Shahar Avin
Jasmine Wang
Haydn Belfield
Gretchen Krueger
Gillian K. Hadfield
Heidy Khlaaf
Jingying Yang
Helen Toner
Ruth Fong
1
+
Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
2019
Niru Maheswaranathan
Alex H. Williams
Matthew D. Golub
Surya Ganguli
David Sussillo
1
+
The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market
2022
Charlotte Siegmann
Markus Anderljung
1
+
The alignment problem from a deep learning perspective
2022
Richard Ngo
1
+
In-context Learning and Induction Heads
2022
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova DasSarma
Tom Henighan
Ben Mann
Amanda Askell
Yuntao Bai
Anna Chen
1
+
Triggerless Backdoor Attack for NLP Tasks with Clean Labels
2021
Leilei Gan
Jiwei Li
Tianwei Zhang
Xiaoya Li
Yuxian Meng
Fei Wu
Yi Yang
Shangwei Guo
Chun Fan
1
+
Spinning Sequence-to-Sequence Models with Meta-Backdoors
2021
Eugene Bagdasaryan
Vitaly Shmatikov
1
+
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
1
+
Discovering Latent Knowledge in Language Models Without Supervision
2022
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
1
+
Algorithmic progress in computer vision
2022
Ege Erdil
Tamay Besiroglu
1
+
Constitutional AI: Harmlessness from AI Feedback
2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
Jackson Kernion
Andy Jones
Anna Chen
Anna Goldie
Azalia Mirhoseini
Cameron McKinnon
1
+
PDF
Chat
Compute Trends Across Three Eras of Machine Learning
2022
Jaime Sevilla
Lennart Heim
Anson Ho
Tamay Besiroglu
Marius Hobbhahn
P. Moreno
1
+
Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations
2023
Josh A. Goldstein
Girish Sastry
Micah Musser
Renée DiResta
Matthew Gentzel
Kateřina Šeďová
1
+
Progress measures for grokking via mechanistic interpretability
2023
Neel Nanda
Lawrence Chan
Tom Lieberum
J. Lacey Smith
Jacob Steinhardt
1
+
PDF
Chat
Auditing Large Language Models: A Three-Layered Approach
2023
Jakob Mökander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
1