Anka Reuel

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat More than Marketing? On the Information Value of AI Benchmarks for Practitioners 2024 Amelia Hardy
Anka Reuel
Kiana Jafari Meimandi
Lisa Soder
A. Griffith
Dylan M. Asmar
Sanmi Koyejo
Michael S. Bernstein
Mykel J. Kochenderfer
+ PDF Chat BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices 2024 Anka Reuel
Amelia Hardy
Chandler Smith
Max Lamparth
Mary Hardy
Mykel J. Kochenderfer
+ PDF Chat Position Paper: Technical Research and Talent is Needed for Effective AI Governance 2024 Anka Reuel
Lisa Soder
Ben Bucknall
Trond Arne Undheim
+ PDF Chat Generative AI Needs Adaptive Governance 2024 Anka Reuel
Trond Arne Undheim
+ PDF Chat Escalation Risks from Language Models in Military and Diplomatic Decision-Making 2024 Juan-Pablo Rivera
Gabriel Mukobi
Anka Reuel
Max Lamparth
Chandler Smith
Jacquelyn Schneider
+ PDF Chat Analyzing And Editing Inner Mechanisms of Backdoored Language Models 2024 Max Lamparth
Anka Reuel
+ PDF Chat Artificial Intelligence Index Report 2024 2024 Nestor Maslej
Loredana Fattorini
Raymond Perrault
Vanessa Parli
Anka Reuel
Erik Brynjolfsson
John Etchemendy
Katrina Ligett
Terah Lyons
James Manyika
+ Escalation Risks from Language Models in Military and Diplomatic Decision-Making 2024 Juan-Pablo Rivera
Gabriel Mukobi
Anka Reuel
Max Lamparth
Chandler Smith
Jacquelyn Schneider
+ Analyzing And Editing Inner Mechanisms Of Backdoored Language Models 2023 Max Lamparth
Anka Reuel
+ How to design an AI ethics board 2023 Jonas Schuett
Anka Reuel
Alexis Carlier
+ International Governance of Civilian AI: A Jurisdictional Certification Approach 2023 Robert F. Trager
Ben Harack
Anka Reuel
Allison Carnegie
Lennart Heim
L. Lawrence Ho
Sarah Kreps
Ranjit Lall
Owen Larter
Seán Ó hÉigeartaigh
+ PDF Chat International Governance of Civilian AI: A Jurisdictional Certification Approach 2023 Robert F. Trager
Ben Harack
Anka Reuel
Allison Carnegie
Lennart Heim
L. Lawrence Ho
Sarah Kreps
Ranjit Lall
Owen Larter
Seán Ó hÉigeartaigh
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ PDF Chat Backdoor Attacks on Self-Supervised Learning 2022 Aniruddha Saha
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
2
+ PDF Chat Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings 2019 Thomas Manzini
Lim Yao Chong
Alan W. Black
Yulia Tsvetkov
2
+ Word embeddings quantify 100 years of gender and ethnic stereotypes 2018 Nikhil Garg
Londa Schiebinger
Dan Jurafsky
James Zou
2
+ PDF Chat Backdoor Learning: A Survey 2022 Yiming Li
Yong Jiang
Zhifeng Li
Shu‐Tao Xia
2
+ Universal and Transferable Adversarial Attacks on Aligned Language Models 2023 Andy Zou
Zifan Wang
J. Zico Kolter
Matt Fredrikson
2
+ PDF Chat Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books 2015 Yukun Zhu
Ryan Kiros
Rich Zemel
Ruslan Salakhutdinov
Raquel Urtasun
Antonio Torralba
Sanja Fidler
2
+ PDF Chat Semantics derived automatically from language corpora contain human-like biases 2017 Aylin Caliskan
Joanna J. Bryson
Arvind Narayanan
1
+ Decoupled Weight Decay Regularization 2017 Ilya Loshchilov
Frank Hutter
1
+ PDF Chat Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors 2019 Anne Lauscher
Goran Glavašš
1
+ The Role of Cooperation in Responsible AI Development 2019 Amanda Askell
Miles Brundage
Gillian K. Hadfield
1
+ Optuna: A Next-generation Hyperparameter Optimization Framework 2019 Takuya Akiba
Shotaro Sano
Toshihiko Yanase
Takeru Ohta
Masanori Koyama
1
+ Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing 2020 Inioluwa Deborah Raji
Andrew Smart
Rebecca N. White
Margaret Mitchell
Timnit Gebru
Ben Hutchinson
Jamila Smith-Loud
Daniel Theron
Parker Barnes
1
+ Scaling Laws for Neural Language Models 2020 Jared Kaplan
Sam McCandlish
Tom Henighan
T. B. Brown
Benjamin Chess
Rewon Child
Scott Gray
Alec Radford
Jeffrey Wu
Dario Amodei
1
+ Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection 2020 Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
1
+ Weight Poisoning Attacks on Pretrained Models 2020 Keita Kurita
Paul Michel
Graham Neubig
1
+ Towards Debiasing Sentence Representations 2020 Paul Pu Liang
Irene Li
Emily Zheng
Yao Chong Lim
Ruslan Salakhutdinov
Louis–Philippe Morency
1
+ Scaling Laws for Autoregressive Generative Modeling 2020 Tom Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
Jacob Jackson
Heewoo Jun
T. B. Brown
Prafulla Dhariwal
Scott Gray
1
+ PDF Chat Model Cards for Model Reporting 2019 Margaret Mitchell
Simone Wu
Andrew Zaldivar
Parker Barnes
Lucy Vasserman
Ben Hutchinson
Elena Spitzer
Inioluwa Deborah Raji
Timnit Gebru
1
+ PDF Chat Adversarial Machine Learning - Industry Perspectives 2020 Ram Shankar Siva Kumar
Magnus Nyström
John Lambert
Andrew Marshall
Mario Goertzel
Andi Comissoneru
Matt Swann
Sharon Xia
1
+ LoRA: Low-Rank Adaptation of Large Language Models 2021 J. Edward Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Weizhu Chen
1
+ PDF Chat Concealed Data Poisoning Attacks on NLP Models 2021 Eric Wallace
Tony Z. Zhao
Shi Feng
Sameer Singh
1
+ On the Opportunities and Risks of Foundation Models 2021 Rishi Bommasani
Drew A. Hudson
Ehsan Adeli
Russ B. Altman
Simran Arora
Sydney von Arx
Michael S. Bernstein
Jeannette Bohg
Antoine Bosselut
Emma Brunskill
1
+ PDF Chat Defining the scope of AI regulations 2023 Jonas Schuett
1
+ Datasets: A Community Library for Natural Language Processing 2021 Quentin Lhoest
A. Villanova del Moral
Yacine Jernite
Abhishek Thakur
Patrick von Platen
Suraj Patil
Julien Chaumond
Mariama Drame
Julien Plu
Lewis Tunstall
1
+ Unsolved Problems in ML Safety 2021 Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
1
+ PDF Chat Venomave: Targeted Poisoning Against Speech Recognition 2023 Hojjat Aghakhani
Lea Schönherr
Thorsten Eisenhofer
Dorothea Kolossa
Thorsten Holz
Christopher Kruegel
Giovanni Vigna
1
+ Adversarial Neuron Pruning Purifies Backdoored Deep Models 2021 Dongxian Wu
Yisen Wang
1
+ PDF Chat Trojaning Language Models for Fun and Profit 2021 Xinyang Zhang
Zheng Zhang
Shouling Ji
Ting Wang
1
+ Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 2022 Jason Lee
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Ed H.
Quoc V. Le
Denny Zhou
1
+ Structured access: an emerging paradigm for safe AI deployment 2022 Toby Shevlane
1
+ Planting Undetectable Backdoors in Machine Learning Models 2022 Shafi Goldwasser
Michael P. Kim
Vinod Vaikuntanathan
Or Zamir
1
+ Training Compute-Optimal Large Language Models 2022 Jordan Hoffmann
Sebastian Borgeaud
Arthur Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
1
+ Locating and Editing Factual Associations in GPT 2022 Kevin Meng
David Bau
Alex Andonian
Yonatan Belinkov
1
+ Emergent Abilities of Large Language Models 2022 Jason Lee
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
Sebastian Borgeaud
Dani Yogatama
Maarten Bosma
Denny Zhou
Donald Metzler
1
+ PDF Chat Predictability and Surprise in Large Generative Models 2022 Deep Ganguli
Danny Hernandez
Liane Lovitt
Amanda Askell
Yuntao Bai
Anna Chen
Tom Conerly
Nova Dassarma
Dawn Drain
Nelson Elhage
1
+ Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims 2020 Miles Brundage
Shahar Avin
Jasmine Wang
Haydn Belfield
Gretchen Krueger
Gillian K. Hadfield
Heidy Khlaaf
Jingying Yang
Helen Toner
Ruth Fong
1
+ Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics 2019 Niru Maheswaranathan
Alex H. Williams
Matthew D. Golub
Surya Ganguli
David Sussillo
1
+ The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market 2022 Charlotte Siegmann
Markus Anderljung
1
+ The alignment problem from a deep learning perspective 2022 Richard Ngo
1
+ In-context Learning and Induction Heads 2022 Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova DasSarma
Tom Henighan
Ben Mann
Amanda Askell
Yuntao Bai
Anna Chen
1
+ Triggerless Backdoor Attack for NLP Tasks with Clean Labels 2021 Leilei Gan
Jiwei Li
Tianwei Zhang
Xiaoya Li
Yuxian Meng
Fei Wu
Yi Yang
Shangwei Guo
Chun Fan
1
+ Spinning Sequence-to-Sequence Models with Meta-Backdoors 2021 Eugene Bagdasaryan
Vitaly Shmatikov
1
+ Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small 2022 Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
1
+ Discovering Latent Knowledge in Language Models Without Supervision 2022 Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
1
+ Algorithmic progress in computer vision 2022 Ege Erdil
Tamay Besiroglu
1
+ Constitutional AI: Harmlessness from AI Feedback 2022 Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
Jackson Kernion
Andy Jones
Anna Chen
Anna Goldie
Azalia Mirhoseini
Cameron McKinnon
1
+ PDF Chat Compute Trends Across Three Eras of Machine Learning 2022 Jaime Sevilla
Lennart Heim
Anson Ho
Tamay Besiroglu
Marius Hobbhahn
P. Moreno
1
+ Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations 2023 Josh A. Goldstein
Girish Sastry
Micah Musser
Renée DiResta
Matthew Gentzel
Kateřina Šeďová
1
+ Progress measures for grokking via mechanistic interpretability 2023 Neel Nanda
Lawrence Chan
Tom Lieberum
J. Lacey Smith
Jacob Steinhardt
1
+ PDF Chat Auditing Large Language Models: A Three-Layered Approach 2023 Jakob Mökander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
1