+
PDF
Chat
|
Segmenting Text and Learning Their Rewards for Improved RLHF in Language
Model
|
2025
|
Yueqin Yin
Su Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Hassan Awadalla
Weizhu Chen
Mingyuan Zhou
|
+
|
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
|
2024
|
Dilxat Muhtar
Yelong Shen
Yaming Yang
Xiaodong Liu
Yadong Lu
Jianfeng Liu
Yuefeng Zhan
Hao Sun
WeiâWei Deng
Feng Sun
|
+
PDF
Chat
|
StreamAdapter: Efficient Test Time Adaptation from Contextual Streams
|
2024
|
Dilxat Muhtar
Yelong Shen
Yaming Yang
Xiaodong Liu
Yadong Lu
Jianfeng Liu
Yuefeng Zhan
Hao Sun
WeiâWei Deng
Feng Sun
|
+
PDF
Chat
|
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
|
2024
|
Yaming Yang
Dilxat Muhtar
Yelong Shen
Yuefeng Zhan
Jianfeng Liu
Yujing Wang
Hao Sun
Denvy Deng
Feng Sun
Qi Zhang
|
+
PDF
Chat
|
GRIN: GRadient-INformed MoE
|
2024
|
Liyuan Liu
Young Jin Kim
Shuohang Wang
Liang Chen
Yelong Shen
Hao Chen
Xiaodong Liu
Masahiro Tanaka
Xiaoxia Wu
Wanglai Hu
|
+
PDF
Chat
|
Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated
Chatbot Arena
|
2024
|
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Qingwei Lin
Jianguang Lou
Shifeng Chen
Yansong Tang
Weizhu Chen
|
+
PDF
Chat
|
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context
Language Modeling
|
2024
|
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Liang Chen
Weizhu Chen
|
+
PDF
Chat
|
Automatic Instruction Evolving for Large Language Models
|
2024
|
Weihao Zeng
Can Xu
Yingxiu Zhao
JianâGuang Lou
Weizhu Chen
|
+
PDF
Chat
|
Self-Augmented Preference Optimization: Off-Policy Paradigms for
Language Model Alignment
|
2024
|
Yueqin Yin
Zhendong Wang
Yujia Xie
Weizhu Chen
Mingyuan Zhou
|
+
PDF
Chat
|
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
|
2024
|
Marah Abdin
Sam Adé Jacobs
Ammar Ahmad Awan
Jyoti Aneja
Ahmed Hassan Awadallah
Hany Awadalla
Nguyá»
n BĂĄch
Amit Bahree
Arash Bakhtiari
Harkirat Singh Behl
|
+
PDF
Chat
|
Exploring the Mystery of Influential Data for Mathematical Reasoning
|
2024
|
Xinzhe Ni
Yeyun Gong
Zhibin Gou
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
|
+
PDF
Chat
|
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical
Reasoning
|
2024
|
Yiming Huang
X Liu
Yeyun Gong
Zhibin Gou
Yelong Shen
Nan Duan
Weizhu Chen
|
+
PDF
Chat
|
Multi-LoRA Composition for Image Generation
|
2024
|
Ming Zhong
Yelong Shen
Shuohang Wang
Yadong Lu
Yizhu Jiao
Siru Ouyang
Donghan Yu
Jiawei Han
Weizhu Chen
|
+
PDF
Chat
|
Relative Preference Optimization: Enhancing LLM Alignment through
Contrasting Responses across Identical and Diverse Prompts
|
2024
|
Yueqin Yin
Zhendong Wang
Yi Gu
Hai Huang
Weizhu Chen
Mingyuan Zhou
|
+
PDF
Chat
|
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
|
2024
|
Xingwei He
Zhenghao Lin
Yeyun Gong
A-Long Jin
Hang Zhang
Lin Chen
Jian Jiao
Siu Ming Yiu
Nan Duan
Weizhu Chen
|
+
PDF
Chat
|
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
|
2024
|
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai LĂŒ
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
|
+
PDF
Chat
|
Competition-Level Problems are Effective LLM Evaluators
|
2024
|
Yiming Huang
Zhenghao Lin
Xiao Liu
Yeyun Gong
Shuai Lu
Fangyu Lei
Yaobo Liang
Yelong Shen
Lin Chen
Nan Duan
|
+
|
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models
|
2023
|
Zhihong Shao
Yeyun Gong
Yelong Shen
Minlie Huang
Nan Duan
Weizhu Chen
|
+
|
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
|
2023
|
Baolin Peng
Michel Galley
Pengcheng He
Hao Cheng
Yujia Xie
Yu Hu
Qiuyuan Huang
Lars Lidén
Yu Zhou
Weizhu Chen
|
+
|
Meet in the Middle: A New Pre-training Paradigm
|
2023
|
AnhâTu Nguyen
Nikos Karampatziakis
Weizhu Chen
|
+
|
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
|
2023
|
Qingru Zhang
Minshuo Chen
Alexander Bukharin
Pengcheng He
Yu Cheng
Weizhu Chen
Tuo Zhao
|
+
|
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
|
2023
|
Fengji Zhang
Bei Chen
Yue Zhang
Jin Liu
Daoguang Zan
Yi Mao
JianâGuang Lou
Weizhu Chen
|
+
|
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
|
2023
|
Xingwei He
Zhenghao Lin
Yeyun Gong
A-Long Jin
Hang Zhang
Lin Chen
Jian Jiao
Siu Ming Yiu
Nan Duan
Weizhu Chen
|
+
|
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
|
2023
|
Wanjun Zhong
Ruixiang Cui
Yiduo Guo
Yaobo Liang
Shuai LĂŒ
Yanlin Wang
Amin Saied
Weizhu Chen
Nan Duan
|
+
|
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
|
2023
|
Zhendong Wang
Yifan Jiang
Huangjie Zheng
Peihao Wang
Pengcheng He
Zhangyang Wang
Weizhu Chen
Mingyuan Zhou
|
+
|
In-Context Learning Unlocked for Diffusion Models
|
2023
|
Zhendong Wang
Yifan Jiang
Yadong Lu
Yelong Shen
Pengcheng He
Weizhu Chen
Zhangyang Wang
Mingyuan Zhou
|
+
|
Code Execution with Pre-trained Language Models
|
2023
|
Chenxiao Liu
Shuai Lu
Weizhu Chen
Daxin Jiang
Alexey Svyatkovskiy
Shengyu Fu
Neel Sundaresan
Nan Duan
|
+
|
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
|
2023
|
Tong Wu
Zhihao Fan
Xiao Liu
Yeyun Gong
Yelong Shen
Jian Jiao
Hai-Tao Zheng
Juntao Li
Zhongyu Wei
Jian Guo
|
+
|
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
|
2023
|
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
|
+
|
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
|
2023
|
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
|
+
|
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
|
2023
|
Zhihong Shao
Yeyun Gong
Yelong Shen
Minlie Huang
Nan Duan
Weizhu Chen
|
+
|
Skill-Based Few-Shot Selection for In-Context Learning
|
2023
|
Shengnan An
Bo Zhou
Zeqi Lin
Qiang Fu
Bei Chen
Nanning Zheng
Weizhu Chen
JianâGuang Lou
|
+
|
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
|
2023
|
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
|
+
|
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
|
2023
|
Xuxi Chen
Tianlong Chen
Weizhu Chen
Ahmed Hassan Awadallah
Zhangyang Wang
Yu Cheng
|
+
PDF
Chat
|
Code Execution with Pre-trained Language Models
|
2023
|
Chenxiao Liu
Shuai Lu
Weizhu Chen
Daxin Jiang
Alexey Svyatkovskiy
Shengyu Fu
Neel Sundaresan
Nan Duan
|
+
PDF
Chat
|
Joint Generator-Ranker Learning for Natural Language Generation
|
2023
|
Weizhou Shen
Yeyun Gong
Yelong Shen
Song Wang
Xiaojun Quan
Nan Duan
Weizhu Chen
|
+
|
Deep Reinforcement Learning from Hierarchical Weak Preference Feedback
|
2023
|
Alexander Bukharin
Yixiao Li
Pengcheng He
Weizhu Chen
Tuo Zhao
|
+
|
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
|
2023
|
Zhibin Gou
Shao Zhi-hong
Yeyun Gong
Yelong Shen
Yujiu Yang
Minlie Huang
Nan Duan
Weizhu Chen
|
+
|
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency
|
2023
|
Baizhou Huang
Shuai Lu
Weizhu Chen
Xiaojun Wan
Nan Duan
|
+
|
Sparse Backpropagation for MoE Training
|
2023
|
Liyuan Liu
Jianfeng Gao
Weizhu Chen
|
+
|
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
|
2023
|
Yixiao Li
Yifan Yu
Liang Chen
Pengcheng He
Nikos Karampatziakis
Weizhu Chen
Tuo Zhao
|
+
|
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
|
2023
|
Ming Zhong
Chenxin An
Weizhu Chen
Jiawei Han
Pengcheng He
|
+
|
Learning From Mistakes Makes LLM Better Reasoner
|
2023
|
Shengnan An
Zexiong Ma
Zeqi Lin
Nanning Zheng
JianâGuang Lou
Weizhu Chen
|
+
|
Language Models can be Logical Solvers
|
2023
|
Jiazhan Feng
Ruochen Xu
Junheng Hao
Hiteshi Sharma
Yelong Shen
Dongyan Zhao
Weizhu Chen
|
+
|
Competition-Level Problems are Effective LLM Evaluators
|
2023
|
Yiming Huang
Zhenghao Lin
X Liu
Yeyun Gong
Shuai Lu
Fangyu Lei
Yaobo Liang
Yelong Shen
Chen Lin
Nan Duan
|
+
PDF
Chat
|
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
|
2023
|
Fengji Zhang
Bei Chen
Yue Zhang
Jacky Keung
Jin Liu
Daoguang Zan
Yi Mao
JianâGuang Lou
Weizhu Chen
|
+
PDF
Chat
|
Skill-Based Few-Shot Selection for In-Context Learning
|
2023
|
Shengnan An
Bo Zhou
Zeqi Lin
Qiang Fu
Bei Chen
Nanning Zheng
Weizhu Chen
JianâGuang Lou
|
+
|
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
|
2023
|
Zhihong Shao
Yeyun Gong
Yelong Shen
Minlie Huang
Nan Duan
Weizhu Chen
|
+
|
Supervised Knowledge Makes Large Language Models Better In-context Learners
|
2023
|
Linyi Yang
Shuibai Zhang
Zhuohao Yu
Guangsheng Bao
Yidong Wang
Jindong Wang
Ruochen Xu
Wei Ye
Xing Xie
Weizhu Chen
|
+
PDF
Chat
|
CERT: Continual Pre-training on Sketches for Library-oriented Code Generation
|
2022
|
Daoguang Zan
Bei Chen
Dejian Yang
Zeqi Lin
Minsu Kim
Bei Guan
Yongji Wang
Weizhu Chen
JianâGuang Lou
|
+
PDF
Chat
|
XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge
|
2022
|
Xiaoze Jiang
Yaobo Liang
Weizhu Chen
Nan Duan
|
+
|
What Makes Good In-Context Examples for GPT-3?
|
2022
|
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
|
+
PDF
Chat
|
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
|
2022
|
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
Bill Dolan
|
+
PDF
Chat
|
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
|
2022
|
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
|
+
|
CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search
|
2022
|
Xiaonan Li
Yeyun Gong
Yelong Shen
Xipeng Qiu
Hang Zhang
Bolun Yao
Weizhen Qi
Daxin Jiang
Weizhu Chen
Nan Duan
|
+
PDF
Chat
|
Controllable Natural Language Generation with Contrastive Prefixes
|
2022
|
Qian Jing
Dong Li
Yelong Shen
Furu Wei
Weizhu Chen
|
+
|
Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models
|
2022
|
Shengnan An
Yifei Li
Zeqi Lin
Qian Liu
Bei Chen
Qiang Fu
Weizhu Chen
Nanning Zheng
JianâGuang Lou
|
+
|
Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs
|
2022
|
Huangjie Zheng
Pengcheng He
Weizhu Chen
Mingyuan Zhou
|
+
|
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
|
2022
|
Greg Yang
J. Edward Hu
I. Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
Jakub Pachocki
Weizhu Chen
Jianfeng Gao
|
+
PDF
Chat
|
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
|
2022
|
Liang Chen
Pengcheng He
Yelong Shen
Weizhu Chen
Tuo Zhao
|
+
PDF
Chat
|
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
|
2022
|
Simiao Zuo
Qingru Zhang
Liang Chen
Pengcheng He
Tuo Zhao
Weizhu Chen
|
+
PDF
Chat
|
DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation
|
2022
|
Wei Chen
Yeyun Gong
Song Wang
Bolun Yao
Weizhen Qi
Zhongyu Wei
Xiaowu Hu
Bartuer Zhou
Yi Mao
Weizhu Chen
|
+
|
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
|
2022
|
Liang Chen
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Tuo Zhao
|
+
|
Reasoning Like Program Executors
|
2022
|
Xinyu Pi
Qian Liu
Bei Chen
Morteza Ziyadi
Zeqi Lin
Yan Gao
Qiang Fu
JianâGuang Lou
Weizhu Chen
|
+
|
A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation
|
2022
|
Weizhen Qi
Yeyun Gong
Yelong Shen
Jian Jiao
Yu Yan
Houqiang Li
Ruofei Zhang
Weizhu Chen
Nan Duan
|
+
|
Diffusion-GAN: Training GANs with Diffusion
|
2022
|
Zhendong Wang
Huangjie Zheng
Pengcheng He
Weizhu Chen
Mingyuan Zhou
|
+
|
Making Large Language Models Better Reasoners with Step-Aware Verifier
|
2022
|
Yifei Li
Zeqi Lin
Shizhuo Zhang
Qiang Fu
Bei Chen
JianâGuang Lou
Weizhu Chen
|
+
|
PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
|
2022
|
Qingru Zhang
Simiao Zuo
Liang Chen
Alexander Bukharin
Pengcheng He
Weizhu Chen
Tuo Zhao
|
+
|
Joint Generator-Ranker Learning for Natural Language Generation
|
2022
|
Weizhou Shen
Yeyun Gong
Yelong Shen
Song Wang
Xiaojun Quan
Nan Duan
Weizhu Chen
|
+
PDF
Chat
|
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering
|
2022
|
Zhengbao Jiang
Yi Mao
Pengcheng He
Graham Neubig
Weizhu Chen
|
+
|
CodeT: Code Generation with Generated Tests
|
2022
|
Bei Chen
Fengji Zhang
Anh Nguyen
Daoguang Zan
Zeqi Lin
JianâGuang Lou
Weizhu Chen
|
+
PDF
Chat
|
ALLSH: Active Learning Guided by Local Sensitivity and Hardness
|
2022
|
Shujian Zhang
Chengyue Gong
Xingchao Liu
Pengcheng He
Weizhu Chen
Mingyuan Zhou
|
+
|
Controllable Natural Language Generation with Contrastive Prefixes
|
2022
|
Qian Jing
Dong Li
Yelong Shen
Furu Wei
Weizhu Chen
|
+
|
CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
|
2022
|
Liang Chen
Pengcheng He
Yelong Shen
Weizhu Chen
Tuo Zhao
|
+
|
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders
|
2022
|
Huangjie Zheng
Pengcheng He
Weizhu Chen
Mingyuan Zhou
|
+
|
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
|
2022
|
Liang Chen
Simiao Zuo
Qingru Zhang
Pengcheng He
Weizhu Chen
Tuo Zhao
|
+
|
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
|
2022
|
Simiao Zuo
Qingru Zhang
Liang Chen
Pengcheng He
Tuo Zhao
Weizhu Chen
|
+
|
Soft-Labeled Contrastive Pre-training for Function-level Code Representation
|
2022
|
Xiaonan Li
Daya Guo
Yeyun Gong
Yun Lin
Yelong Shen
Xipeng Qiu
Daxin Jiang
Weizhu Chen
Nan Duan
|
+
|
SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval
|
2022
|
Kun Zhou
Yeyun Gong
Xiao Liu
Wayne Xin Zhao
Yelong Shen
Anlei Dong
Jingwen LĂŒ
Rangan Majumder
Ji-Rong Wen
Nan Duan
|
+
|
GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation
|
2022
|
Biyang Guo
Yeyun Gong
Yelong Shen
Songqiao Han
Hailiang Huang
Nan Duan
Weizhu Chen
|
+
|
HyperTuning: Toward Adapting Large Language Models without Back-propagation
|
2022
|
Jason Phang
Yi Mao
Pengcheng He
Weizhu Chen
|
+
|
CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation
|
2022
|
Daoguang Zan
Bei Chen
Dejian Yang
Zeqi Lin
Minsu Kim
Bei Guan
Yongji Wang
Weizhu Chen
JianâGuang Lou
|
+
|
Generation-Augmented Query Expansion For Code Retrieval
|
2022
|
Dong Li
Yelong Shen
Ruoming Jin
Yi Mao
Kuan Wang
Weizhu Chen
|
+
|
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering
|
2022
|
Zhengbao Jiang
Yi Mao
Pengcheng He
Graham Neubig
Weizhu Chen
|
+
|
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
|
2022
|
Zhenghao Lin
Yeyun Gong
Yelong Shen
Tong Wu
Zhihao Fan
Chen Lin
Weizhu Chen
Nan Duan
|
+
|
ALLSH: Active Learning Guided by Local Sensitivity and Hardness
|
2022
|
Shujian Zhang
Chengyue Gong
Xingchao Liu
Pengcheng He
Weizhu Chen
Mingyuan Zhou
|
+
PDF
Chat
|
Reasoning Like Program Executors
|
2022
|
Xinyu Pi
Qian Liu
Bei Chen
Morteza Ziyadi
Zeqi Lin
Qiang Fu
Yan Gao
JianâGuang Lou
Weizhu Chen
|
+
|
Soft-Labeled Contrastive Pre-Training for Function-Level Code Representation
|
2022
|
Xiaonan Li
Daya Guo
Yeyun Gong
Yun Lin
Yelong Shen
Xipeng Qiu
Daxin Jiang
Weizhu Chen
Nan Duan
|
+
PDF
Chat
|
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing
|
2021
|
Pengcheng He
Jianfeng Gao
Weizhu Chen
|
+
PDF
Chat
|
Contextual Bandit Applications in a Customer Support Bot
|
2021
|
Sandra Sajeev
Jade Huang
Nikos Karampatziakis
Matthew Hall
Sebastian Kochman
Weizhu Chen
|
+
|
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned
|
2021
|
Sewon Min
Jordan BoydâGraber
Chris Alberti
Danqi Chen
Eunsol Choi
Michael J. Collins
Kelvin Guu
Hannaneh Hajishirzi
Kenton Lee
Jennimaria Palomaki
|
+
PDF
Chat
|
TAPEX: Table Pre-training via Learning a Neural SQL Executor
|
2021
|
Qian Liu
Bei Chen
Jiaqi Guo
Morteza Ziyadi
Zeqi Lin
Weizhu Chen
JianâGuang Lou
|
+
PDF
Chat
|
Poolingformer: Long Document Modeling with Pooling Attention
|
2021
|
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
|
+
|
Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach.
|
2021
|
Simiao Zuo
Liang Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Jianfeng Gao
Weizhu Chen
Tuo Zhao
|
+
|
Finetuning Pretrained Transformers into RNNs
|
2021
|
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
|
+
|
UnitedQA: A Hybrid Approach for Open Domain Question Answering
|
2021
|
Hao Cheng
Yelong Shen
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
|
+
|
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned
|
2021
|
Sewon Min
Jordan BoydâGraber
Chris Alberti
Danqi Chen
Eunsol Choi
Michael J. Collins
Kelvin Guu
Hannaneh Hajishirzi
Kenton Lee
Jennimaria Palomaki
|
+
|
Reader-Guided Passage Reranking for Open-Domain Question Answering
|
2021
|
Yuning Mao
Pengcheng He
Xiaodong Liu
Yelong Shen
Jianfeng Gao
Jiawei Han
Weizhu Chen
|
+
|
Token-wise Curriculum Learning for Neural Machine Translation
|
2021
|
Liang Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
Tuo Zhao
|
+
|
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
|
2021
|
Liang Chen
Simiao Zuo
Minshuo Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Tuo Zhao
Weizhu Chen
|
+
|
Poolingformer: Long Document Modeling with Pooling Attention
|
2021
|
Hang Zhang
Yeyun Gong
Yelong Shen
Weisheng Li
Jiancheng Lv
Nan Duan
Weizhu Chen
|
+
|
LoRA: Low-Rank Adaptation of Large Language Models
|
2021
|
J. Edward Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Weizhu Chen
|
+
|
Memory-Efficient Differentiable Transformer Architecture Search
|
2021
|
Yuekai Zhao
Dong Li
Yelong Shen
Zhihua Zhang
Furu Wei
Weizhu Chen
|
+
|
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization
|
2021
|
Jiaao Chen
Dinghan Shen
Weizhu Chen
Diyi Yang
|
+
|
Generation-Augmented Retrieval for Open-Domain Question Answering
|
2021
|
Yuning Mao
Pengcheng He
Xiaodong Liu
Yelong Shen
Jianfeng Gao
Jiawei Han
Weizhu Chen
|
+
|
UnitedQA: A Hybrid Approach for Open Domain Question Answering
|
2021
|
Hao Cheng
Yelong Shen
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
|
+
|
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization
|
2021
|
Liang Chen
Simiao Zuo
Minshuo Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Tuo Zhao
Weizhu Chen
|
+
|
GLGE: A New General Language Generation Evaluation Benchmark
|
2021
|
Dayiheng Liu
Yu Yan
Yeyun Gong
Weizhen Qi
Hang Zhang
Jian Jiao
Weizhu Chen
Jie Fu
Linjun Shou
Ming Gong
|
+
|
TAPEX: Table Pre-training via Learning a Neural SQL Executor
|
2021
|
Qian Liu
Bei Chen
Jiaqi Guo
Morteza Ziyadi
Zeqi Lin
Weizhu Chen
JianâGuang Lou
|
+
PDF
Chat
|
Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach
|
2021
|
Simiao Zuo
Liang Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Jianfeng Gao
Weizhu Chen
Tuo Zhao
|
+
|
ARCH: Efficient Adversarial Regularized Training with Caching
|
2021
|
Simiao Zuo
Liang Chen
Haoming Jiang
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Tuo Zhao
|
+
|
Adversarial Retriever-Ranker for dense text retrieval
|
2021
|
Hang Zhang
Yeyun Gong
Yelong Shen
Jiancheng Lv
Nan Duan
Weizhu Chen
|
+
|
Finetuning Pretrained Transformers into RNNs
|
2021
|
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
|
+
|
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
|
2021
|
Xuxi Chen
Tianlong Chen
Yu Cheng
Weizhu Chen
Zhangyang Wang
Ahmed Hassan Awadallah
|
+
|
ARCH: Efficient Adversarial Regularized Training with Caching
|
2021
|
Simiao Zuo
Liang Chen
Haoming Jiang
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Tuo Zhao
|
+
|
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
|
2021
|
Pengcheng He
Jianfeng Gao
Weizhu Chen
|
+
|
Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach
|
2021
|
Simiao Zuo
Liang Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Jianfeng Gao
Weizhu Chen
Tuo Zhao
|
+
|
Token-wise Curriculum Learning for Neural Machine Translation
|
2021
|
Liang Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
Tuo Zhao
|
+
PDF
Chat
|
Finetuning Pretrained Transformers into RNNs
|
2021
|
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
|
+
|
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
|
2021
|
Woojeong Jin
Cheng Yu
Yelong Shen
Weizhu Chen
Xiang Ren
|
+
|
Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering
|
2021
|
Yuning Mao
Pengcheng He
Xiaodong Liu
Yelong Shen
Jianfeng Gao
Jiawei Han
Weizhu Chen
|
+
|
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
|
2021
|
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
Bill Dolan
|
+
|
XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge
|
2021
|
Xiaoze Jiang
Yaobo Liang
Weizhu Chen
Nan Duan
|
+
|
What Makes Good In-Context Examples for GPT-$3$?
|
2021
|
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
|
+
|
BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining
|
2020
|
Weizhen Qi
Yeyun Gong
Jian Jiao
Yu Yan
Dayiheng Liu
Weizhu Chen
Kewen Tang
Houqiang Li
Jiusheng Chen
Ruofei Zhang
|
+
|
MixKD: Towards Efficient Distillation of Large-scale Language Models
|
2020
|
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
|
+
PDF
Chat
|
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding
|
2020
|
Yanru Qu
Dinghan Shen
Yelong Shen
Sandra Sajeev
Jiawei Han
Weizhu Chen
|
+
PDF
Chat
|
Adversarial Training for Large Neural Language Models
|
2020
|
Xiaodong Liu
Hao Cheng
Pengcheng He
Weizhu Chen
Yu Wang
Hoifung Poon
Jianfeng Gao
|
+
|
Conditional Self-Attention for Query-based Summarization
|
2020
|
Yujia Xie
Tianyi Zhou
Yi Mao
Weizhu Chen
|
+
|
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
|
2020
|
Xiaodong Liu
Yu Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
Emmanuel Awa
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
|
+
|
Understanding the Difficulty of Training Transformers
|
2020
|
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
|
+
|
Adversarial Training for Large Neural Language Models
|
2020
|
Xiaodong Liu
Hao Cheng
Pengcheng He
Weizhu Chen
Yu Wang
Hoifung Poon
Jianfeng Gao
|
+
|
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning
|
2020
|
Tao Shen
Yi Mao
Pengcheng He
Guodong Long
Adam Trischler
Weizhu Chen
|
+
|
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
|
2020
|
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
|
+
|
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
|
2020
|
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Tuo Zhao
|
+
|
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
|
2020
|
Xiaodong Liu
Yu Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
Emmanuel Awa
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
|
+
|
Example-Based Named Entity Recognition
|
2020
|
Morteza Ziyadi
Yuting Sun
Abhishek Goswami
Jade Huang
Weizhu Chen
|
+
|
Generation-Augmented Retrieval for Open-domain Question Answering
|
2020
|
Yuning Mao
Pengcheng He
Xiaodong Liu
Yelong Shen
Jianfeng Gao
Jiawei Han
Weizhu Chen
|
+
|
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation
|
2020
|
Dinghan Shen
Mingzhi Zheng
Yelong Shen
Yanru Qu
Weizhu Chen
|
+
|
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding
|
2020
|
Yanru Qu
Dinghan Shen
Yelong Shen
Sandra Sajeev
Jiawei Han
Weizhu Chen
|
+
|
Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model
|
2020
|
Mingzhi Zheng
Dinghan Shen
Yelong Shen
Weizhu Chen
Lin Xiao
|
+
|
Understanding the Difficulty of Training Transformers
|
2020
|
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
|
+
|
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning
|
2020
|
Tao Shen
Yi Mao
Pengcheng He
Guodong Long
Adam Trischler
Weizhu Chen
|
+
|
GLGE: A New General Language Generation Evaluation Benchmark
|
2020
|
Dayiheng Liu
Yu Yan
Yeyun Gong
Weizhen Qi
Hang Zhang
Jian Jiao
Weizhu Chen
Jie Fu
Linjun Shou
Ming Gong
|
+
|
Few-Shot Named Entity Recognition: A Comprehensive Study
|
2020
|
Jiaxin Huang
Chunyuan Li
Krishan Subudhi
Damien Jose
Shobana Balakrishnan
Weizhu Chen
Baolin Peng
Jianfeng Gao
Jiawei Han
|
+
|
MixKD: Towards Efficient Distillation of Large-scale Language Models
|
2020
|
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
|
+
|
BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining
|
2020
|
Weizhen Qi
Yeyun Gong
Jian Jiao
Yu Yan
Weizhu Chen
Dayiheng Liu
Kewen Tang
Houqiang Li
Jiusheng Chen
Ruofei Zhang
|
+
|
X-SQL: reinforce schema representation with context.
|
2019
|
Pengcheng He
Yi Mao
Kaushik Chakrabarti
Weizhu Chen
|
+
|
On the Variance of the Adaptive Learning Rate and Beyond
|
2019
|
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
|
+
|
Lessons from Real-World Reinforcement Learning in a Customer Support Bot.
|
2019
|
Nikos Karampatziakis
Sebastian Kochman
Jade Huang
Paul Mineiro
Kathy Osborne
Weizhu Chen
|
+
|
Multi-Task Deep Neural Networks for Natural Language Understanding
|
2019
|
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
|
+
|
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
|
2019
|
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
|
+
|
Lessons from Contextual Bandit Learning in a Customer Support Bot
|
2019
|
Nikos Karampatziakis
Sebastian Kochman
Jade Huang
Paul Mineiro
Kathy Osborne
Weizhu Chen
|
+
PDF
Chat
|
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
|
2019
|
Jianmo Ni
Chenguang Zhu
Weizhu Chen
Julian McAuley
|
+
|
Multi-Task Deep Neural Networks for Natural Language Understanding
|
2019
|
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
|
+
|
A Hybrid Neural Network Model for Commonsense Reasoning
|
2019
|
Pengcheng He
Xiaodong Liu
Weizhu Chen
Jianfeng Gao
|
+
|
Parameter-free Sentence Embedding via Orthogonal Basis
|
2019
|
Ziyi Yang
Chenguang Zhu
Weizhu Chen
|
+
|
A Hybrid Neural Network Model for Commonsense Reasoning
|
2019
|
Pengcheng He
Xiaodong Liu
Weizhu Chen
Jianfeng Gao
|
+
|
X-SQL: reinforce schema representation with context
|
2019
|
Pengcheng He
Yi Mao
Kaushik Chakrabarti
Weizhu Chen
|
+
|
On the Variance of the Adaptive Learning Rate and Beyond
|
2019
|
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
|
+
|
Zero-training Sentence Embedding via Orthogonal Basis
|
2018
|
Ziyi Yang
Chenguang Zhu
Weizhu Chen
|
+
|
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
|
2018
|
Jianmo Ni
Chenguang Zhu
Weizhu Chen
Julian McAuley
|
+
|
IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles
|
2018
|
Tianze Shi
Kedar Tatwawadi
Kaushik Chakrabarti
Yi Mao
Oleksandr Polozov
Weizhu Chen
|
+
|
Parameter-free Sentence Embedding via Orthogonal Basis
|
2018
|
Ziyi Yang
Chenguang Zhu
Weizhu Chen
|
+
PDF
Chat
|
ReasoNet
|
2017
|
Yelong Shen
Po-Sen Huang
Jianfeng Gao
Weizhu Chen
|
+
PDF
Chat
|
Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization
|
2017
|
Ching-pei Lee
Po-Wei Wang
Weizhu Chen
ChihâJen Lin
|
+
|
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
|
2017
|
Lin Xiao
Adams Wei Yu
Qihang Lin
Weizhu Chen
|
+
|
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension
|
2017
|
Hsin-Yuan Huang
Chenguang Zhu
Yelong Shen
Weizhu Chen
|