Trevor Cai

Follow

Generating author description...

Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ Deep Learning with Limited Numerical Precision 2015 Suyog Gupta
Ankur Agrawal
Kailash Gopalakrishnan
Pritish Narayanan
1
+ Explaining and Harnessing Adversarial Examples 2014 Ian Goodfellow
Jonathon Shlens
Christian Szegedy
1
+ Learning functions across many orders of magnitudes. 2016 Hado van Hasselt
Arthur Guez
Matteo Hessel
David Silver
1
+ Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples. 2016 Nicolas Papernot
Patrick McDaniel
Ian Goodfellow
Somesh Jha
Z. Berkay Celik
Ananthram Swami
1
+ Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples 2016 Nicolas Papernot
Patrick McDaniel
Ian Goodfellow
1
+ Concrete Problems in AI Safety 2016 Dario Amodei
Chris Olah
Jacob Steinhardt
Paul F. Christiano
John Schulman
Dan Mané
1
+ PDF Chat Deep Learning with Differential Privacy 2016 Martı́n Abadi
Andy Chu
Ian Goodfellow
H. Brendan McMahan
Ilya Mironov
Kunal Talwar
Li Zhang
1
+ PDF Chat Membership Inference Attacks Against Machine Learning Models 2017 Reza Shokri
Marco Stronati
Congzheng Song
Vitaly Shmatikov
1
+ Delving into Transferable Adversarial Examples and Black-box Attacks 2016 Yanpei Liu
Xinyun Chen
Chang Liu
Dawn Song
1
+ Generating Natural Adversarial Examples 2017 Zhengli Zhao
Dheeru Dua
Sameer Singh
1
+ HotFlip: White-Box Adversarial Examples for Text Classification 2018 Javid Ebrahimi
Anyi Rao
Daniel Lowd
Dejing Dou
1
+ Detecting egregious responses in neural sequence-to-sequence models 2018 Tianxing He
James Glass
1
+ The Curious Case of Neural Text Degeneration 2019 Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
1
+ Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA 2019 Yichen Jiang
Mohit Bansal
1
+ Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization 2018 Yizhe Zhang
Michel Galley
Jianfeng Gao
Zhe Gan
Xiujun Li
Chris Brockett
Bill Dolan
1
+ Auditing Data Provenance in Text-Generation Models 2019 Congzheng Song
Vitaly Shmatikov
1
+ Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog 2019 Natasha Jaques
Asma Ghandeharioun
Judy Hanwen Shen
Craig Ferguson
Àgata Lapedriza
Noah Jones
Shixiang Gu
Rosalind W. Picard
1
+ Counterfactual Fairness in Text Classification through Robustness 2019 Sahaj Garg
Vincent Perot
Nicole Limtiaco
Ankur Taly
Ed H.
Alex Beutel
1
+ PDF Chat Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers 2015 Giuseppe Ateniese
Luigi V. Mancini
Angelo Spognardi
Antonio Villani
Domenico Vitali
Giovanni Felici
1
+ Bag of Tricks for Efficient Text Classification 2017 Armand Joulin
Édouard Grave
Piotr Bojanowski
Tomáš Mikolov
1
+ PDF Chat How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation 2016 Chia‐Wei Liu
Ryan Lowe
Iulian Vlad Serban
Mike Noseworthy
Laurent Charlin
Joëlle Pineau
1
+ PDF Chat Ethical Challenges in Data-Driven Dialogue Systems 2018 Peter Henderson
Koustuv Sinha
Nicolas Angelard-Gontier
Nan Rosemary Ke
Genevieve Fried
Ryan Lowe
Joëlle Pineau
1
+ PDF Chat Adversarial Examples for Evaluating Reading Comprehension Systems 2017 Robin Jia
Percy Liang
1
+ Asynchronous Methods for Deep Reinforcement Learning 2016 Volodymyr Mnih
Adrià Puigdomènech Badia
Mehdi Mirza
Alex Graves
Tim Harley
Timothy Lillicrap
David Silver
Koray Kavukcuoglu
1
+ Neural Text Generation with Unlikelihood Training 2019 Sean Welleck
Ilia Kulikov
Stephen Roller
Emily Dinan
Kyunghyun Cho
Jason Weston
1
+ Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack 2019 Emily Dinan
Samuel Humeau
Bharath Chintagunta
Jason Weston
1
+ The Woman Worked as a Babysitter: On Biases in Language Generation 2019 Emily Sheng
Kai-Wei Chang
Prem Natarajan
Nanyun Peng
1
+ Say What I Want: Towards the Dark Side of Neural Dialogue Models 2019 Haochen Liu
Tyler Derr
Zitao Liu
Jiliang Tang
1
+ Fine-Tuning Language Models from Human Preferences 2019 Daniel M. Ziegler
Nisan Stiennon
Jeffrey Wu
T. B. Brown
Alec Radford
Dario Amodei
Paul F. Christiano
Geoffrey Irving
1
+ Universal Adversarial Triggers for Attacking and Analyzing NLP 2019 Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
1
+ PDF Chat Hierarchical Reinforcement Learning for Open-Domain Dialog 2020 Abdelrhman Saleh
Natasha Jaques
Asma Ghandeharioun
Judy Hanwen Shen
Rosalind W. Picard
1
+ Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models 2020 Haochen Liu
Zhiwei Wang
Tyler Derr
Jiliang Tang
1
+ Adversarial NLI: A New Benchmark for Natural Language Understanding 2020 Yixin Nie
Adina Williams
Emily Dinan
Mohit Bansal
Jason Weston
Douwe Kiela
1
+ Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training 2020 Margaret Li
Stephen Roller
Ilia Kulikov
Sean Welleck
Y-Lan Boureau
Kyunghyun Cho
Jason Weston
1
+ Negative Training for Neural Dialogue Response Generation 2020 Tianxing He
James Glass
1
+ RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models 2020 Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
1
+ Reducing Sentiment Bias in Language Models via Counterfactual Evaluation 2020 Po-Sen Huang
Huan Zhang
Ray Jiang
Robert Stanforth
Johannes Welbl
Jack W. Rae
Vishal Maini
Dani Yogatama
Pushmeet Kohli
1
+ PDF Chat Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? 2020 Sorami Hisamoto
Matt Post
Kevin Duh
1
+ PDF Chat Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP 2021 Timo Schick
Sahana Udupa
Hinrich Schütze
1
+ PDF Chat Universal Adversarial Attacks with Natural Triggers for Text Classification 2021 Liwei Song
Xinwei Yu
Hsuan-Tung Peng
Karthik Narasimhan
1
+ PDF Chat Detoxifying Language Models Risks Marginalizing Minority Voices 2021 Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
1
+ Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models 2021 Tongshuang Wu
Marco Túlio Ribeiro
Jeffrey Heer
Daniel S. Weld
1
+ HateCheck: Functional Tests for Hate Speech Detection Models 2021 Paul Röttger
Bertie Vidgen
Dong Nguyen
Zeerak Waseem
Helen Margetts
Janet B. Pierrehumbert
1
+ PDF Chat Tailor: Generating and Perturbing Text with Semantic Controls 2022 Alexis Ross
Tongshuang Wu
Hao Peng
Matthew N. Peters
Matt Gardner
1
+ Recursively Summarizing Books with Human Feedback 2021 Jeff Wu
Long Ouyang
Daniel M. Ziegler
Nisan Stiennon
Ryan Lowe
Jan Leike
Paul F. Christiano
1
+ PDF Chat Automatically Exposing Problems with Neural Dialog Models 2021 Dian Yu
Kenji Sagae
1
+ Unsolved Problems in ML Safety 2021 Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
1
+ PDF Chat Analyzing Dynamic Adversarial Training Data in the Limit 2022 Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
1
+ Delphi: Towards Machine Ethics and Norms 2021 Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Maxwell Forbes
Jon Borchardt
Jenny Liang
Oren Etzioni
Maarten Sap
Yejin Choi
1
+ Challenges in Detoxifying Language Models 2021 Johannes Welbl
Amelia Glaese
Jonathan Uesato
Sumanth Dathathri
John W. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
1