Karina Nguyen

Follow

Generating author description...

All published works
Action Title Year Authors
+ The Capacity for Moral Self-Correction in Large Language Models 2023 Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas T. Liao
Kamilė Lukošiūtė
Anna Chen
Anna Goldie
Azalia Mirhoseini
Catherine Olsson
Danny Hernandez
+ FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling 2023 Wei-Yin Ko
Daniel D’souza
Karina Nguyen
Randall Balestriero
Sara Hooker
+ Question Decomposition Improves the Faithfulness of Model-Generated Reasoning 2023 Ansh Radhakrishnan
Karina Nguyen
Anna Chen
Carol Chen
Carson Denison
Danny Hernandez
Esin Durmus
Evan Hubinger
Jackson Kernion
Kamilė Lukošiūtė
+ Measuring Faithfulness in Chain-of-Thought Reasoning 2023 Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson Denison
Danny Hernandez
Dustin Li
Esin Durmus
Evan Hubinger
Jackson Kernion
+ PDF Chat Discovering Language Model Behaviors with Model-Written Evaluations 2023 Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
Scott Heiner
Craig Pettit
Catherine Olsson
Sandipan Kundu
Saurav Kadavath
+ Studying Large Language Model Generalization with Influence Functions 2023 Roger Grosse
Juhan Bae
Cem Anil
Nelson Elhage
Alex Tamkin
Amirhossein Tajdini
Benoit Steiner
Dustin Li
Esin Durmus
Ethan Perez
+ Specific versus General Principles for Constitutional AI 2023 Sandipan Kundu
Yuntao Bai
Saurav Kadavath
Amanda Askell
A. Callahan
Anna Chen
Anna Goldie
Avital Balwit
Azalia Mirhoseini
B. T. McLean
+ Evaluating and Mitigating Discrimination in Language Model Decisions 2023 Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
+ Discovering Language Model Behaviors with Model-Written Evaluations 2022 Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
Scott Heiner
Craig Pettit
Catherine Olsson
Sandipan Kundu
Saurav Kadavath