+
PDF
Chat
|
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
|
2025
|
Nanye Ma
Shangyuan Tong
Haolin Jia
Hexiang Hu
Yu-Chuan Su
Mingda Zhang
Xuan Yang
Yandong Li
Tommi Jaakkola
Xuhui Jia
|
+
PDF
Chat
|
MetaMorph: Multimodal Understanding and Generation via Instruction
Tuning
|
2024
|
Shengbang Tong
Daiming Fan
Jianfei Zhu
Yunyang Xiong
Xinlei Chen
Koustuv Sinha
Michael Rabbat
Yann LeCun
Saining Xie
Zhuang Liu
|
+
PDF
Chat
|
Thinking in Space: How Multimodal Large Language Models See, Remember,
and Recall Spaces
|
2024
|
Jihan Yang
Shusheng Yang
Anjali Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
|
+
PDF
Chat
|
Altogether: Image Captioning via Re-aligning Alt-text
|
2024
|
Hu Xu
Po-Yao Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
Christine Jou
Gargi Ghosh
Omer Levy
Luke Zettlemoyer
Wen-tau Yih
|
+
PDF
Chat
|
Representation Alignment for Generation: Training Diffusion Transformers
Is Easier Than You Think
|
2024
|
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
|
+
PDF
Chat
|
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image
Editing
|
2024
|
June Suk Choi
Kyungmin Lee
Jongheon Jeong
Saining Xie
Jinwoo Shin
Kimin Lee
|
+
PDF
Chat
|
AuroraCap: Efficient, Performant Video Detailed Captioning and a New
Benchmark
|
2024
|
Wenhao Chai
Enxin Song
Yilun Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jenq‐Neng Hwang
Saining Xie
Christopher D. Manning
|
+
PDF
Chat
|
Fast Encoding and Decoding for Implicit Video Representation
|
2024
|
Hao Chen
Saining Xie
Ser-Nam Lim
Abhinav Shrivastava
|
+
PDF
Chat
|
On Scaling Up 3D Gaussian Splatting Training
|
2024
|
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
|
+
PDF
Chat
|
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
|
2024
|
Shengbang Tong
Ellis Brown
Penghao Wu
S. Woo
Manoj Middepogu
Sai Charitha Akula
Jihan Yang
Shusheng Yang
Adithya Iyer
Xichen Pan
|
+
PDF
Chat
|
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via
Reinforcement Learning
|
2024
|
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
Yifei Zhou
Alane Suhr
Saining Xie
Yann LeCun
Yi Ma
|
+
PDF
Chat
|
MoDE: CLIP Data Experts via Clustering
|
2024
|
Jiawei Ma
Po-Yao Huang
Saining Xie
Shang-Wen Li
Luke Zettlemoyer
Shih‐Fu Chang
Wen-tau Yih
Xu Hu
|
+
PDF
Chat
|
V-IRL: Grounding Virtual Intelligence in Real Life
|
2024
|
Jihan Yang
Runyu Ding
Ellis Brown
Xiaojuan Qi
Saining Xie
|
+
|
Image Sculpting: Precise Object Editing with 3D Geometry Control
|
2024
|
Jiraphon Yenphraphai
Xichen Pan
Sainan Liu
Daniele Panozzo
Saining Xie
|
+
|
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
|
2024
|
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi Ma
Yann LeCun
Saining Xie
|
+
|
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
|
2024
|
Nanye Ma
Larry B. Goldstein
Michael S. Albergo
Nicholas M. Boffi
Eric Vanden‐Eijnden
Saining Xie
|
+
|
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
|
2024
|
Xinlei Chen
Zhuang Liu
Saining Xie
Kaiming He
|
+
PDF
Chat
|
Going Denser with Open-Vocabulary Part Segmentation
|
2023
|
Peize Sun
Shoufa Chen
Chenchen Zhu
Fanyi Xiao
Ping Luo
Saining Xie
Zhicheng Yan
|
+
PDF
Chat
|
Scalable Diffusion Models with Transformers
|
2023
|
William Peebles
Saining Xie
|
+
PDF
Chat
|
CiT: Curation in Training for Effective Vision-Language Data
|
2023
|
Xu Hu
Saining Xie
Po-Yao Huang
Licheng Yu
Russell Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
|
+
PDF
Chat
|
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
|
2023
|
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
|
+
|
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
|
2023
|
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
|
+
|
CiT: Curation in Training for Effective Vision-Language Data
|
2023
|
Xu Hu
Saining Xie
Po-Yao Huang
Licheng Yu
Russell Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
|
+
|
Going Denser with Open-Vocabulary Part Segmentation
|
2023
|
Peize Sun
Shoufa Chen
Chenchen Zhu
Fanyi Xiao
Ping Luo
Saining Xie
Zhicheng Yan
|
+
|
Demystifying CLIP Data
|
2023
|
Xu Hu
Saining Xie
Xiaoqing Ellen Tan
Po-Yao Huang
Russell Howes
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
|
+
|
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
|
2023
|
Penghao Wu
Saining Xie
|
+
PDF
Chat
|
Masked Feature Prediction for Self-Supervised Visual Pre-Training
|
2022
|
Chen Wei
Haoqi Fan
Saining Xie
Chao-Yuan Wu
Alan Yuille
Christoph Feichtenhofer
|
+
PDF
Chat
|
Masked Autoencoders Are Scalable Vision Learners
|
2022
|
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross Girshick
|
+
|
A ConvNet for the 2020s
|
2022
|
Zhuang Liu
Hanzi Mao
Chao-Yuan Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
|
+
|
Exploring Long-Sequence Masked Autoencoders
|
2022
|
Ronghang Hu
Shoubhik Debnath
Saining Xie
Xinlei Chen
|
+
|
Scalable Diffusion Models with Transformers
|
2022
|
William Peebles
Saining Xie
|
+
PDF
Chat
|
SLIP: Self-supervision Meets Language-Image Pre-training
|
2022
|
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
|
+
PDF
Chat
|
Masked Autoencoders Are Scalable Vision Learners
|
2021
|
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross Girshick
|
+
PDF
Chat
|
An Empirical Study of Training Self-Supervised Vision Transformers
|
2021
|
Xinlei Chen
Saining Xie
Kaiming He
|
+
PDF
Chat
|
Pri3D: Can 3D Priors Help 2D Representation Learning?
|
2021
|
Ji Hou
Saining Xie
Benjamin Graham
Angela Dai
Matthias Niesner
|
+
PDF
Chat
|
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts
|
2021
|
Ji Hou
Benjamin Graham
Matthias Niesner
Saining Xie
|
+
PDF
Chat
|
On Interaction Between Augmentations and Corruptions in Natural
Corruption Robustness
|
2021
|
Eric Mintun
Alexander Kirillov
Saining Xie
|
+
|
Pri3D: Can 3D Priors Help 2D Representation Learning?
|
2021
|
Ji Hou
Saining Xie
Benjamin Graham
Angela Dai
Matthias Nießner
|
+
|
An Empirical Study of Training Self-Supervised Vision Transformers
|
2021
|
Xinlei Chen
Saining Xie
Kaiming He
|
+
|
Masked Autoencoders Are Scalable Vision Learners
|
2021
|
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross Girshick
|
+
|
On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness
|
2021
|
Eric Mintun
Alexander Kirillov
Saining Xie
|
+
|
Benchmarking Detection Transfer Learning with Vision Transformers
|
2021
|
Yanghao Li
Saining Xie
Xinlei Chen
Piotr Dollár
Kaiming He
Ross Girshick
|
+
|
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision
|
2021
|
Ajinkya Tejankar
Maziar Sanjabi
Bichen Wu
Saining Xie
Madian Khabsa
Hamed Pirsiavash
Hamed Firooz
|
+
|
Masked Feature Prediction for Self-Supervised Visual Pre-Training
|
2021
|
Wei Chen
Haoqi Fan
Saining Xie
Chao-Yuan Wu
Alan Yuille
Christoph Feichtenhofer
|
+
|
SLIP: Self-supervision meets Language-Image Pre-training
|
2021
|
Norman Mu
Alexander M. Kirillov
David Wagner
Saining Xie
|
+
PDF
Chat
|
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene
Contexts
|
2020
|
Ji Hou
Benjamin A.T. Graham
Matthias Nießner
Saining Xie
|
+
PDF
Chat
|
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
|
2020
|
Alvin Wan
Xiaoliang Dai
Peizhao Zhang
Zijian He
Yuandong Tian
Saining Xie
Bichen Wu
Matthew Yu
Tao Xu
Kan Chen
|
+
PDF
Chat
|
Momentum Contrast for Unsupervised Visual Representation Learning
|
2020
|
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross Girshick
|
+
|
Are Labels Necessary for Neural Architecture Search?
|
2020
|
Chenxi Liu
Piotr Dollár
Kaiming He
Ross Girshick
Alan Yuille
Saining Xie
|
+
|
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
|
2020
|
Alvin Wan
Xiaoliang Dai
Peizhao Zhang
Zijian He
Yuandong Tian
Saining Xie
Bichen Wu
Matthew Yu
Tao Xu
Kan Chen
|
+
|
Graph Structure of Neural Networks
|
2020
|
Jiaxuan You
Jure Leskovec
Kaiming He
Saining Xie
|
+
|
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
|
2020
|
Saining Xie
Jiatao Gu
Demi Guo
Charles R. Qi
Leonidas Guibas
Or Litany
|
+
PDF
Chat
|
Are Labels Necessary for Neural Architecture Search?
|
2020
|
Chenxi Liu
Piotr Dollár
Kaiming He
Ross Girshick
Alan Yuille
Saining Xie
|
+
|
Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts
|
2020
|
Ji Hou
Benjamin Graham
Matthias Nießner
Saining Xie
|
+
PDF
Chat
|
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
|
2020
|
Saining Xie
Jiatao Gu
Demi Guo
Charles R. Qi
Leonidas Guibas
Or Litany
|
+
|
Decoupling Representation and Classifier for Long-Tailed Recognition
|
2019
|
Bingyi Kang
Saining Xie
Marcus Rohrbach
Zhicheng Yan
Albert Gordo
Jiashi Feng
Yannis Kalantidis
|
+
PDF
Chat
|
On Network Design Spaces for Visual Recognition
|
2019
|
Ilija Radosavovic
Justin Johnson
Saining Xie
Wan‐Yen Lo
Piotr Dollár
|
+
PDF
Chat
|
Exploring Randomly Wired Neural Networks for Image Recognition
|
2019
|
Saining Xie
Alexander Kirillov
Ross Girshick
Kaiming He
|
+
|
Exploring Randomly Wired Neural Networks for Image Recognition
|
2019
|
Saining Xie
Alexander Kirillov
Ross Girshick
Kaiming He
|
+
|
Sample-Efficient Neural Architecture Search by Learning Action Space
|
2019
|
Linnan Wang
Saining Xie
Teng Li
Rodrigo Fonseca
Yuandong Tian
|
+
|
Momentum Contrast for Unsupervised Visual Representation Learning
|
2019
|
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross Girshick
|
+
|
Decoupling Representation and Classifier for Long-Tailed Recognition
|
2019
|
Bingyi Kang
Saining Xie
Marcus Rohrbach
Zhicheng Yan
Albert Gordo
Jiashi Feng
Yannis Kalantidis
|
+
|
On Network Design Spaces for Visual Recognition
|
2019
|
Ilija Radosavovic
Justin C. Johnson
Saining Xie
Wan‐Yen Lo
Piotr Dollár
|
+
PDF
Chat
|
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
|
2018
|
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Murphy
|
+
|
Rethinking Spatiotemporal Feature Learning For Video Understanding.
|
2017
|
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Murphy
|
+
PDF
Chat
|
Aggregated Residual Transformations for Deep Neural Networks
|
2017
|
Saining Xie
Ross Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
|
+
PDF
Chat
|
Holistically-Nested Edge Detection
|
2017
|
Saining Xie
Zhuowen Tu
|
+
|
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
|
2017
|
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Murphy
|
+
|
Aggregated Residual Transformations for Deep Neural Networks
|
2016
|
Saining Xie
Ross Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
|
+
PDF
Chat
|
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior
|
2016
|
Saining Xie
Xun Huang
Zhuowen Tu
|
+
PDF
Chat
|
Holistically-Nested Edge Detection
|
2015
|
Saining Xie
Zhuowen Tu
|
+
|
Convolutional Pseudo-Prior for Structured Labeling.
|
2015
|
Saining Xie
Xun Huang
Zhuowen Tu
|
+
|
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior
|
2015
|
Saining Xie
Xun Huang
Zhuowen Tu
|
+
|
Holistically-Nested Edge Detection
|
2015
|
Saining Xie
Zhuowen Tu
|
+
|
Deeply-Supervised Nets
|
2014
|
Chen‐Yu Lee
Saining Xie
Patrick W. Gallagher
Zhengyou Zhang
Zhuowen Tu
|