+
PDF
Chat
|
AIM: Additional Image Guided Generation of Transferable Adversarial
Attacks
|
2025
|
Teng Li
Xingjun Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
|
2025
|
Feng Han
Kai Chen
Chao Gong
Zhipeng Wei
Jingjing Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
Instruction-Guided Scene Text Recognition
|
2025
|
Yongkun Du
Zhineng Chen
Yuchen Su
Caiyan Jia
Yu–Gang Jiang
|
+
PDF
Chat
|
4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives
|
2024
|
Zeyu Yang
Zijie Pan
Xiatian Zhu
Zhang Li
Yu‐Gang Jiang
Philip H. S. Torr
|
+
PDF
Chat
|
STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video
Anomaly Detection
|
2024
|
Zhangxun Li
Mengyang Zhao
Xuan Yang
Yang Liu
Jiamu Sheng
Xinhua Zeng
Tian Wang
Kewei Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics
Manipulation with Long-Horizon Reasoning Tasks
|
2024
|
Shiduo Zhang
Zhe Xu
Peiju Liu
Xiaopeng Yu
Yuan-Fang Li
Qinghui Gao
Zhaoye Fei
Zhangyue Yin
Hang Xu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Comprehensive Multi-Modal Prototypes are Simple and Effective
Classifiers for Vast-Vocabulary Object Detection
|
2024
|
Yitong Chen
Wenhao Yao
Lingchen Meng
Sihong Wu
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual
Prompt Instruction Tuning
|
2024
|
Wujian Peng
Lingchen Meng
Yitong Chen
Y.J. Hou J.B. Xie
Yang Liu
Tao Gui
Songcen Xu
Xipeng Qiu
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative
Layout-to-Image Generation
|
2024
|
Hui Zhang
Dexiang Hong
Tingwei Gao
Yitong Wang
Jie Shao
Xinglong Wu
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
DiffPatch: Generating Customizable Adversarial Patches using Diffusion
Model
|
2024
|
Zhixiang Wang
Guangnan Ye
Xiaosen Wang
Siheng Chen
Zhibo Wang
Xingjun Ma
Yu‐Gang Jiang
|
+
PDF
Chat
|
SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from
Sparse Multi-View RGB Images
|
2024
|
Jinhui Yu
Xinlin Ren
Yanfeng Gu
Haitao Lin
Tianyu Wang
Yi Zhu
Hang Xu
Yu‐Gang Jiang
Xiangyang Xue
Yanwei Fu
|
+
PDF
Chat
|
LoRA of Change: Learning to Generate LoRA for the Editing Instruction
from A Single Before-After Image Pair
|
2024
|
Song Xue
Jiequan Cui
Hanwang Zhang
Jiaxin Shi
Jingjing Chen
Chi Zhang
Yu‐Gang Jiang
|
+
PDF
Chat
|
ForgerySleuth: Empowering Multimodal Large Language Models for Image
Manipulation Detection
|
2024
|
Zhihao Sun
Haoran Jiang
Haoran Chen
Yixin Cao
Xipeng Qiu
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
|
2024
|
Yongkun Du
Zhineng Chen
Hongtao Xie
Caiyan Jia
Yu–Gang Jiang
|
+
PDF
Chat
|
REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using
Extremely Compressed Motion Latents
|
2024
|
Rui Tian
Qi Dai
Jianmin Bao
Kai Qiu
Yifan Yang
Chong Luo
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Retrieval Augmented Recipe Generation
|
2024
|
Guoshan Liu
Hailong Yin
Bin Zhu
Jingjing Chen
Chong‐Wah Ngo
Yu‐Gang Jiang
|
+
PDF
Chat
|
Domain Expansion and Boundary Growth for Open-Set Single-Source Domain
Generalization
|
2024
|
Pengkun Jiao
Na Zhao
Jingjing Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against
Jailbreak Attacks
|
2024
|
Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
Navigating Weight Prediction with Diet Diary
|
2024
|
Yuesheng Gui
Bin Zhu
Jingjing Chen
Chong‐Wah Ngo
Yu‐Gang Jiang
|
+
PDF
Chat
|
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via
Exposed Models
|
2024
|
Yige Li
Hanxun Huang
Jiaming Zhang
Xingjun Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
UnSeg: One Universal Unlearnable Example Generator is Enough against All
Image Segmentation
|
2024
|
Ye Sun
Hao Zhang
Tiehua Zhang
Xingjun Ma
Yu‐Gang Jiang
|
+
PDF
Chat
|
Towards a Theoretical Understanding of Memorization in Diffusion Models
|
2024
|
Yunhao Chen
Xingjun Ma
Difan Zou
Yu‐Gang Jiang
|
+
PDF
Chat
|
EAGLE: Towards Efficient Arbitrary Referring Visual Prompts
Comprehension for Multimodal Large Language Models
|
2024
|
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
|
2024
|
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
A Survey on Video Diffusion Models
|
2024
|
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Han Hu
Hang Xu
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent
Noising-and-Denoising Process
|
2024
|
Yang Luo
Yiheng Zhang
Zhaofan Qiu
Ting Yao
Zhineng Chen
Yu–Gang Jiang
Tao Mei
|
+
PDF
Chat
|
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for
Text-to-3D Generation
|
2024
|
Haibo Yang
Yang Chen
Yingwei Pan
Ting Yao
Zhineng Chen
Zuxuan Wu
Yu–Gang Jiang
Tao Mei
|
+
PDF
Chat
|
GenRec: Unifying Video Generation and Recognition with Diffusion Models
|
2024
|
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Decoder Pre-Training with only Text for Scene Text Recognition
|
2024
|
Shuai Zhao
Yongkun Du
Zhineng Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
ReToMe-VA: Recursive Token Merging for Video Diffusion-based
Unrestricted Adversarial Attack
|
2024
|
Ziyi Gao
Kai Chen
Zhipeng Wei
Tingshu Mou
Jingjing Chen
Zhiyu Tan
Hao Li
Yu‐Gang Jiang
|
+
PDF
Chat
|
EnJa: Ensemble Jailbreak on Large Language Models
|
2024
|
Jiahao Zhang
Z. Y. Wang
Ruofan Wang
Xingjun Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial
Contrastive Prompt Tuning
|
2024
|
Xin Wang
Kai Chen
Xingjun Ma
Zhineng Chen
Jingjing Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
Downstream Transfer Attack: Adversarial Attacks on Downstream Models
with Pre-trained Vision Transformers
|
2024
|
Weijie Zheng
Xingjun Ma
Hanxun Huang
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Zero-shot High-fidelity and Pose-controllable Character Animation
|
2024
|
Bingwen Zhu
Fanyi Wang
Tianyi Lu
Peng Liu
Jingwen Su
Jinxiu Liu
Yanhao Zhang
Zuxuan Wu
Guo-Jun Qi
Yu–Gang Jiang
|
+
PDF
Chat
|
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
|
2024
|
Chao Gong
Kai Chen
Zhipeng Wei
Jingjing Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
RoDE: Linear Rectified Mixture of Diverse Experts for Food Large
Multi-Modal Models
|
2024
|
Pengkun Jiao
X. Wu
Bin Zhu
Jingjing Chen
Chong‐Wah Ngo
Yu–Gang Jiang
|
+
PDF
Chat
|
Out of Length Text Recognition with Sub-String Matching
|
2024
|
Yongkun Du
Zhineng Chen
Caiyan Jia
Xieping Gao
Yu–Gang Jiang
|
+
PDF
Chat
|
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision
Transformer
|
2024
|
Feng Qian
Hanbin Zhao
Chao Zhang
Jiahua Dong
Henghui Ding
Yu‐Gang Jiang
Hui Qian
|
+
PDF
Chat
|
MMLongBench-Doc: Benchmarking Long-context Document Understanding with
Visualizations
|
2024
|
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
Xinze Li
Xinyuan Lu
Ziyu Liu
Yan Ma
Xiaoyi Dong
|
+
PDF
Chat
|
Text-driven Video Prediction
|
2024
|
Song Xue
Jingjing Chen
Bin Zhu
Yu‐Gang Jiang
|
+
PDF
Chat
|
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion
Models
|
2024
|
Xincheng Shuai
Henghui Ding
Xingjun Ma
Rong-Cheng Tu
Yu‐Gang Jiang
Dacheng Tao
|
+
PDF
Chat
|
Extracting Training Data from Unconditional Diffusion Models
|
2024
|
Yunhao Chen
Xingjun Ma
Difan Zou
Yu–Gang Jiang
|
+
PDF
Chat
|
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object
Detection: Methods and Results
|
2024
|
Jiaqi Wang
Yuhang Zang
Pan Zhang
Tao Chu
Yuhang Cao
Zeyi Sun
Ziyu Liu
Xiaoyi Dong
Tong Wu
Dahua Lin
|
+
PDF
Chat
|
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
|
2024
|
Junke Wang
Yi Jiang
Zehuan Yuan
Binyue Peng
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video
Grounding
|
2024
|
Xing Zhang
Jiaxi Gu
Haoyu Zhao
Shicong Wang
Hang Xu
Renjing Pei
Songcen Xu
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Hydra-MDP: End-to-end Multimodal Planning with Multi-target
Hydra-Distillation
|
2024
|
Zhenxin Li
Kailin Li
Shihao Wang
Shiyi Lan
Zhiding Yu
Y. Y. Ji
Zhiqi Li
Ziyue Zhu
Jan Kautz
Zuxuan Wu
|
+
PDF
Chat
|
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
Prediction
|
2024
|
Zhen Xing
Qi Dai
Zejia Weng
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
AgentGym: Evolving Large Language Model-based Agents across Diverse
Environments
|
2024
|
Zhiheng Xi
Yiwen Ding
Wen-Xiang Chen
Boyang Hong
Honglin Guo
Junzhe Wang
Dingwen Yang
Chenyang Liao
Xin Guo
Wei He
|
+
PDF
Chat
|
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and
Effective for LMMs
|
2024
|
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu–Gang Jiang
|
+
PDF
Chat
|
MotionFollower: Editing Video Motion via Lightweight Score-Guided
Diffusion
|
2024
|
Shuyuan Tu
Qi Dai
Zihao Zhang
Sicheng Xie
Zhi-Qi Cheng
Chong Luo
Xintong Han
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
White-box Multimodal Jailbreaks Against Large Vision-Language Models
|
2024
|
Ruofan Wang
Xingjun Ma
Hanxu Zhou
Chuanjun Ji
Guangnan Ye
Yu‐Gang Jiang
|
+
PDF
Chat
|
Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D
Generation
|
2024
|
Zhang Li
Yuankun Yang
Ziyang Xie
Zhiyuan Yuan
Jianfeng Feng
Xiatian Zhu
Yu–Gang Jiang
|
+
PDF
Chat
|
Adaptive Rentention & Correction for Continual Learning
|
2024
|
Hao Chen
Micah Goldblum
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable
Federated Learning
|
2024
|
Liuzhi Zhou
Yu He
Kun Zhai
Xiang Liu
Sen Liu
Xingjun Ma
Guangnan Ye
Yu‐Gang Jiang
Hongfeng Chai
|
+
PDF
Chat
|
PoseAnimate: Zero-shot high fidelity pose controllable character
animation
|
2024
|
Bingwen Zhu
Fanyi Wang
Tianyi Lu
Peng Liu
Jingwen Su
Jinxiu Liu
Yanhao Zhang
Zuxuan Wu
Yu‐Gang Jiang
Guo-Jun Qi
|
+
PDF
Chat
|
Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of
Multi-modal Large Language Models
|
2024
|
Yian Li
Wentao Tian
Yang Jiao
Jingjing Chen
Yu–Gang Jiang
|
+
PDF
Chat
|
The Dog Walking Theory: Rethinking Convergence in Federated Learning
|
2024
|
Kun Zhai
Yifeng Gao
Xingjun Ma
Difan Zou
Guangnan Ye
Yu‐Gang Jiang
|
+
PDF
Chat
|
Learning to Rank Patches for Unbiased Image Redundancy Reduction
|
2024
|
Yang Luo
Zhineng Chen
Peng Zhou
Zuxuan Wu
Xieping Gao
Yu–Gang Jiang
|
+
PDF
Chat
|
OmniVid: A Generative Framework for Universal Video Understanding
|
2024
|
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario
|
2024
|
Tianwen Qian
Jingjing Chen
Linhai Zhuo
Yang Jiao
Yu‐Gang Jiang
|
+
PDF
Chat
|
LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network
|
2024
|
Yuchen Su
Zhineng Chen
Zhiwen Shao
Yuning Du
Zhilong Ji
Jinfeng Bai
Yong Zhou
Yu–Gang Jiang
|
+
PDF
Chat
|
Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
|
2024
|
Yang Jiao
Zequn Jie
Shaoxiang Chen
Lechao Cheng
Jingjing Chen
Lin Ma
Yu‐Gang Jiang
|
+
PDF
Chat
|
FDGaussian: Fast Gaussian Splatting from Single Image via
Geometric-aware Diffusion Model
|
2024
|
Qijun Feng
Zhen Xing
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Whose Side Are You On? Investigating the Political Stance of Large
Language Models
|
2024
|
Pagnarasmey Pit
Xingjun Ma
Mike Conway
Qingyu Chen
James Bailey
Henry Pit
Putrasmey Keo
Watey Diep
Yu–Gang Jiang
|
+
PDF
Chat
|
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large
Multimodal Models
|
2024
|
Yang Jiao
Shaoxiang Chen
Zequn Jie
Jingjing Chen
Lin Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
Doubly Abductive Counterfactual Inference for Text-based Image Editing
|
2024
|
Xue Song
Jiequan Cui
Hanwang Zhang
Jingjing Chen
Richang Hong
Yu–Gang Jiang
|
+
PDF
Chat
|
Instruction-Guided Scene Text Recognition
|
2024
|
Yongkun Du
Zhineng Chen
Yuchen Su
Caiyan Jia
Yu–Gang Jiang
|
+
PDF
Chat
|
MouSi: Poly-Visual-Expert Vision-Language Models
|
2024
|
Xiaoran Fan
Tao Ji
Changhao Jiang
Shuo Li
Senjie Jin
Sirui Song
Junke Wang
Boyang Hong
Lu Chen
Guodong Zheng
|
+
PDF
Chat
|
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
|
2024
|
Yige Li
Xingjun Ma
Jiabo He
Hanxun Huang
Yu–Gang Jiang
|
+
PDF
Chat
|
Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data
|
2024
|
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu–Gang Jiang
|
+
|
SonicVisionLM: Playing Sound with Vision Language Models
|
2024
|
Zhifeng Xie
Shengye Yu
Mengtian Li
Qile He
Chaofeng Chen
Yu–Gang Jiang
|
+
|
Secrets of RLHF in Large Language Models Part II: Reward Modeling
|
2024
|
Binghai Wang
Rui Zheng
Lu Chen
Yan Liu
Shihan Dou
Caishuang Huang
Wei Shen
Senjie Jin
Enyu Zhou
Chenyu Shi
|
+
|
GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting
|
2024
|
Mengtian Li
Shengxiang Yao
Zhifeng Xie
Keyu Chen
Yu–Gang Jiang
|
+
|
Identity-Driven Multimedia Forgery Detection via Reference Assistance
|
2024
|
Junhao Xu
Jingjing Chen
Song Xue
Han Feng
Haijun Shan
Yu‐Gang Jiang
|
+
PDF
Chat
|
From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
|
2024
|
Guoshan Liu
Yang Jiao
Jingjing Chen
Bin Zhu
Yu‐Gang Jiang
|
+
|
On the Importance of Spatial Relations for Few-shot Action Recognition
|
2023
|
Yilun Zhang
Yuqian Fu
Xingjun Ma
Lizhe Qi
Jingjing Chen
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Locate Before Answering: Answer Guided Question Localization for Video Question Answering
|
2023
|
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiaowei Guo
Yu‐Gang Jiang
|
+
PDF
Chat
|
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
|
2023
|
Shuyuan Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Han Hu
Yu–Gang Jiang
|
+
PDF
Chat
|
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
|
2023
|
Tianlun Zheng
Zhineng Chen
Bingchen Huang
W. Zhang
Yu‐Gang Jiang
|
+
|
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
|
2023
|
Tianlun Zheng
Zhineng Chen
Jinfeng Bai
Hongtao Xie
Yu‐Gang Jiang
|
+
|
Adaptive Split-Fusion Transformer
|
2023
|
Zixuan Su
Jingjing Chen
Lei Pang
Chong‐Wah Ngo
Yu‐Gang Jiang
|
+
PDF
Chat
|
PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer
|
2023
|
Yanqin Jiang
Zhang Li
Zhenwei Miao
Xiatian Zhu
Jin Gao
Weiming Hu
Yu–Gang Jiang
|
+
PDF
Chat
|
Enhancing the Self-Universality for Transferable Targeted Attacks
|
2023
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
|
2023
|
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Chuanxin Tang
Xiyang Dai
Yucheng Zhao
Yujia Xie
Yuan Lü
Yu‐Gang Jiang
|
+
PDF
Chat
|
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
|
2023
|
Lingchen Meng
Xiyang Dai
Yinpeng Chen
Pengchuan Zhang
Dongdong Chen
Mengchen Liu
Jianfeng Wang
Zuxuan Wu
Lu Yuan
Yu‐Gang Jiang
|
+
PDF
Chat
|
SVFormer: Semi-supervised Video Transformer for Action Recognition
|
2023
|
Zhen Xing
Qi Dai
Hu Han
Jingjing Chen
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
|
2023
|
Yuqian Fu
Yu Xie
Yanwei Fu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Prototypical Residual Networks for Anomaly Detection and Localization
|
2023
|
Hui Zhang
Zuxuan Wu
Zheng Wang
Zhineng Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
|
2023
|
Jiaming Zhang
Xingjun Ma
Qi Yi
Jitao Sang
Yu–Gang Jiang
Yaowei Wang
Changsheng Xu
|
+
PDF
Chat
|
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
|
2023
|
Yang Jiao
Zequn Jie
Shaoxiang Chen
Jingjing Chen
Lin Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
ResFormer: Scaling ViTs with Multi-Resolution Training
|
2023
|
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu Qiao
Yu–Gang Jiang
|
+
PDF
Chat
|
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
|
2023
|
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yuan Lü
Yu–Gang Jiang
|
+
PDF
Chat
|
Imbalanced gradients: a subtle cause of overestimated adversarial robustness
|
2023
|
Xingjun Ma
Linxi Jiang
Hanxun Huang
Zejia Weng
James Bailey
Yu‐Gang Jiang
|
+
|
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
|
2023
|
Jiaming Zhang
Xingjun Ma
Qi Yi
Jitao Sang
Yu–Gang Jiang
Yaowei Wang
Changsheng Xu
|
+
|
Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
|
2023
|
Zejia Weng
Xitong Yang
Ang Li
Zuxuan Wu
Yu–Gang Jiang
|
+
|
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
|
2023
|
Yuqian Fu
Yu Xie
Yanwei Fu
Yu–Gang Jiang
|
+
|
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
|
2023
|
Hao Chen
Zuxuan Wu
Xintong Han
Menglin Jia
Yu–Gang Jiang
|
+
|
DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection
|
2023
|
Hui Zhang
Zheng Wang
Zuxuan Wu
Yu–Gang Jiang
|
+
|
OmniTracker: Unifying Object Tracking by Tracking-with-Detection
|
2023
|
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Xiyang Dai
Lu Yuan
Yu–Gang Jiang
|
+
|
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
|
2023
|
Shuyuan Tu
Qi Dai
Zuxuan Wu
Zhi-Qi Cheng
Han Hu
Yu–Gang Jiang
|
+
|
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
|
2023
|
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu–Gang Jiang
|
+
|
TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
|
2023
|
Tianlun Zheng
Zhineng Chen
Jinfeng Bai
Hongtao Xie
Yu–Gang Jiang
|
+
|
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
|
2023
|
Tianlun Zheng
Zhineng Chen
BingChen Huang
W. Zhang
Yu–Gang Jiang
|
+
|
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
|
2023
|
Tianwen Qian
Jingjing Chen
Linhai Zhuo
Yang Jiao
Yu–Gang Jiang
|
+
|
Reconstructive Neuron Pruning for Backdoor Defense
|
2023
|
Yige Li
Xixiang Lyu
Xingjun Ma
Nodens Koren
Lingjuan Lyu
Bo Li
Yu–Gang Jiang
|
+
|
Prompting Large Language Models to Reformulate Queries for Moment Localization
|
2023
|
Wenfeng Yan
Shaoxiang Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Context Perception Parallel Decoder for Scene Text Recognition
|
2023
|
Yongkun Du
Zhineng Chen
Caiyan Jia
Xiaoting Yin
Chenxia Li
Yuning Du
Yu–Gang Jiang
|
+
|
On the Importance of Spatial Relations for Few-shot Action Recognition
|
2023
|
Yilun Zhang
Yuqian Fu
Xingjun Ma
Lizhe Qi
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
SimDA: Simple Diffusion Adapter for Efficient Video Generation
|
2023
|
Zhen Xing
Qi Dai
Han Hu
Zuxuan Wu
Yu‐Gang Jiang
|
+
|
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
|
2023
|
Jiaxi Gu
Shicong Wang
Haoyu Zhao
Tianyi Lu
Xing Zhang
Zuxuan Wu
Songcen Xu
Wei Zhang
Yu–Gang Jiang
Hang Xu
|
+
|
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
|
2023
|
Zuxuan Wu
Zejia Weng
Wujian Peng
Xitong Yang
Ang Li
Larry S. Davis
Yu–Gang Jiang
|
+
|
A Survey on Video Diffusion Models
|
2023
|
Zhen Xing
Qijun Feng
Hao Chen
Qi Dai
Han Hu
Hang Xu
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
|
2023
|
Lingchen Meng
Xiyang Dai
Jianwei Yang
Dongdong Chen
Yinpeng Chen
Mengchen Liu
Yi‐Ling Chen
Zuxuan Wu
Lu Yuan
Yu–Gang Jiang
|
+
|
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
|
2023
|
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Adversarial Prompt Tuning for Vision-Language Models
|
2023
|
Jiaming Zhang
Xingjun Ma
Xin Wang
Lingyu Qiu
Jiaqi Wang
Yu–Gang Jiang
Jitao Sang
|
+
|
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
|
2023
|
Lingchen Meng
Shiyi Lan
Hengduo Li
Jose M. Álvarez
Zuxuan Wu
Yu–Gang Jiang
|
+
|
AdaDiff: Adaptive Step Selection for Fast Diffusion
|
2023
|
Hui Zhang
Zuxuan Wu
Zhen Xing
Jie Shao
Yu–Gang Jiang
|
+
|
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model
|
2023
|
Haoyu Zhao
Tianyi Lu
Jiaxi Gu
Xing Zhang
Zuxuan Wu
Hang Xu
Yu–Gang Jiang
|
+
|
MotionEditor: Editing Video Motion via Content-Aware Diffusion
|
2023
|
Shuyuan Tu
Qi Dai
Zhi-Qi Cheng
Han Hu
Xintong Han
Zuxuan Wu
Yu–Gang Jiang
|
+
|
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
|
2023
|
Zhen Xing
Qi Dai
Zihao Zhang
Hui Zhang
Hu Han
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
|
2023
|
Yang Jiao
Zequn Jie
Shaoxiang Chen
Lechao Cheng
Jingjing Chen
Lin Ma
Yu‐Gang Jiang
|
+
|
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
|
2023
|
Yuehao Yin
Huiyan Qi
Bin Zhu
Jingjing Chen
Yu–Gang Jiang
Chong‐Wah Ngo
|
+
PDF
Chat
|
Self-Supervised Learning for Semi-Supervised Temporal Language Grounding
|
2022
|
Fan Luo
Shaoxiang Chen
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
|
2022
|
Zejia Weng
Zuxuan Wu
Hengduo Li
Jingjing Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning
|
2022
|
Linhai Zhuo
Yuqian Fu
Jingjing Chen
Yixin Cao
Yu‐Gang Jiang
|
+
PDF
Chat
|
ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning
|
2022
|
Yuqian Fu
Yu Xie
Yanwei Fu
Jingjing Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
Video Moment Retrieval from Text Queries via Single Frame Annotation
|
2022
|
Ran Cui
Tianwen Qian
Pai Peng
Elena Daskalaki
Jingjing Chen
Xiaowei Guo
Huyang Sun
Yu–Gang Jiang
|
+
PDF
Chat
|
SVTR: Scene Text Recognition with a Single Visual Model
|
2022
|
Yongkun Du
Zhineng Chen
Caiyan Jia
Xiaoting Yin
Tianlun Zheng
Chenxia Li
Yuning Du
Yu–Gang Jiang
|
+
PDF
Chat
|
Towards Transferable Adversarial Attacks on Vision Transformers
|
2022
|
Zhipeng Wei
Jingjing Chen
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu‐Gang Jiang
|
+
PDF
Chat
|
Boosting the Transferability of Video Adversarial Examples via Temporal Translation
|
2022
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Attacking Video Recognition Models with Bullet-Screen Comments
|
2022
|
Kai Chen
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection
|
2022
|
Junke Wang
Zuxuan Wu
Wenhao Ouyang
Xintong Han
Jingjing Chen
Yu–Gang Jiang
Ser-Nam Li
|
+
PDF
Chat
|
BEVT: BERT Pretraining of Video Transformers
|
2022
|
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu‐Gang Jiang
Luowei Zhou
Yuan Lü
|
+
PDF
Chat
|
Balanced Contrastive Learning for Long-Tailed Visual Recognition
|
2022
|
Jianggang Zhu
Zheng Wang
Jingjing Chen
Yi‐Ping Phoebe Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
|
2022
|
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu–Gang Jiang
Ser-Nam Lim
|
+
PDF
Chat
|
Cross-Modal Transferable Adversarial Attacks from Images to Videos
|
2022
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
ObjectFormer for Image Manipulation Detection and Localization
|
2022
|
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu–Gang Jiang
|
+
PDF
Chat
|
FT-TDR: Frequency-Guided Transformer and Top-Down Refinement Network for Blind Face Inpainting
|
2022
|
Junke Wang
Shaoxiang Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Cross-Domain Contrastive Learning for Unsupervised Domain Adaptation
|
2022
|
Rui Wang
Zuxuan Wu
Zejia Weng
Jingjing Chen
Guo-Jun Qi
Yu‐Gang Jiang
|
+
|
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
|
2022
|
Yang Jiao
Shaoxiang Chen
Zequn Jie
Jingjing Chen
Lin Ma
Yu–Gang Jiang
|
+
|
ObjectFormer for Image Manipulation Detection and Localization
|
2022
|
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu–Gang Jiang
|
+
|
Adaptive Split-Fusion Transformer
|
2022
|
Zixuan Su
Hao Zhang
Jingjing Chen
Lei Pang
Chong‐Wah Ngo
Yu‐Gang Jiang
|
+
|
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding
|
2022
|
Yang Jiao
Zequn Jie
Jingjing Chen
Lin Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
Efficient Video Transformers with Spatial-Temporal Token Selection
|
2022
|
Junke Wang
Xitong Yang
Hengduo Li
Li Li Liu
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning
|
2022
|
Yuqian Fu
Yu Xie
Yanwei Fu
Jingjing Chen
Yu–Gang Jiang
|
+
|
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
|
2022
|
Lingchen Meng
Xiyang Dai
Yinpeng Chen
Pengchuan Zhang
Dongdong Chen
Mengchen Liu
Jianfeng Wang
Zuxuan Wu
Lu Yuan
Yu‐Gang Jiang
|
+
|
PolarFormer: Multi-camera 3D Object Detection with Polar Transformer
|
2022
|
Yanqin Jiang
Zhang Li
Zhenwei Miao
Xiatian Zhu
Jin Gao
Weiming Hu
Yu–Gang Jiang
|
+
|
Balanced Contrastive Learning for Long-Tailed Visual Recognition
|
2022
|
Jianggang Zhu
Zheng Wang
Jingjing Chen
Yi‐Ping Phoebe Chen
Yu–Gang Jiang
|
+
|
Deeper Insights into the Robustness of ViTs towards Common Corruptions
|
2022
|
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu–Gang Jiang
|
+
|
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
|
2022
|
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Luowei Zhou
Lu Yuan
Yu–Gang Jiang
|
+
|
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
|
2022
|
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu–Gang Jiang
Lu Yuan
|
+
|
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
|
2022
|
Yang Jiao
Zequn Jie
Shaoxiang Chen
Jingjing Chen
Xiaolin Wei
Lin Ma
Yu–Gang Jiang
|
+
|
Enhancing the Self-Universality for Transferable Targeted Attacks
|
2022
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
|
2022
|
Hao Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
|
2022
|
Zhen Xing
Hengduo Li
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Locate before Answering: Answer Guided Question Localization for Video Question Answering
|
2022
|
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiaowei Guo
Yu–Gang Jiang
|
+
|
Text-driven Video Prediction
|
2022
|
Song Xue
Jingjing Chen
Bin Zhu
Yu–Gang Jiang
|
+
|
SVTR: Scene Text Recognition with a Single Visual Model
|
2022
|
Yongkun Du
Zhineng Chen
Caiyan Jia
Xiaoting Yin
Tianlun Zheng
Chenxia Li
Yuning Du
Yu–Gang Jiang
|
+
PDF
Chat
|
Generalized Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data
|
2022
|
Yuqian Fu
Yanwei Fu
Jingjing Chen
Yu‐Gang Jiang
|
+
|
SVFormer: Semi-supervised Video Transformer for Action Recognition
|
2022
|
Zhen Xing
Qi Dai
Hu Han
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Transferability Estimation Based On Principal Gradient Expectation
|
2022
|
Huiyan Qi
Lechao Cheng
Jingjing Chen
Yue Yu
Zunlei Feng
Yu–Gang Jiang
|
+
|
ResFormer: Scaling ViTs with Multi-Resolution Training
|
2022
|
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu Qiao
Yu–Gang Jiang
|
+
|
Prototypical Residual Networks for Anomaly Detection and Localization
|
2022
|
Hui Zhang
Zuxuan Wu
Zheng Wang
Zhineng Chen
Yu–Gang Jiang
|
+
|
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
|
2022
|
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu–Gang Jiang
|
+
|
Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection
|
2022
|
Junke Wang
Zhenxin Li
Chao Zhang
Jingjing Chen
Zuxuan Wu
Larry S. Davis
Yu–Gang Jiang
|
+
|
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
|
2022
|
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Chuanxin Tang
Xiyang Dai
Yucheng Zhao
Yujia Xie
Lu Yuan
Yu–Gang Jiang
|
+
PDF
Chat
|
Semi-supervised Single-View 3D Reconstruction via Prototype Shape Priors
|
2022
|
Zhen Xing
Hengduo Li
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
|
2022
|
Yang Jiao
Shaoxiang Chen
Zequn Jie
Jingjing Chen
Lin Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
Semi-supervised Vision Transformers
|
2022
|
Zejia Weng
Xitong Yang
Ang Li
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Cross-Modal Transferable Adversarial Attacks from Images to Videos
|
2021
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation
|
2021
|
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
|
2021
|
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu‐Gang Jiang
Ser-Nam Lim
|
+
PDF
Chat
|
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text
Recognition
|
2021
|
Tianlun Zheng
Zhineng Chen
Shancheng Fang
Hongtao Xie
Yu–Gang Jiang
|
+
PDF
Chat
|
Meta-FDMixup
|
2021
|
Yuqian Fu
Yanwei Fu
Yu–Gang Jiang
|
+
PDF
Chat
|
Two-stage Visual Cues Enhancement Network for Referring Image Segmentation
|
2021
|
Yang Jiao
Zequn Jie
Weixin Luo
Jingjing Chen
Yu–Gang Jiang
Xiaolin Wei
Lin Ma
|
+
PDF
Chat
|
A Multimodal Framework for Video Ads Understanding
|
2021
|
Zejia Weng
Lingchen Meng
Rui Wang
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
|
2021
|
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu‐Gang Jiang
|
+
PDF
Chat
|
VideoLT: Large-scale Long-tailed Video Recognition
|
2021
|
Xing Zhang
Zuxuan Wu
Zejia Weng
Huazhu Fu
Jingjing Chen
Yu–Gang Jiang
Larry S. Davis
|
+
PDF
Chat
|
Can Action be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos
|
2021
|
Yuqian Fu
Yanwei Fu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make
Student Better
|
2021
|
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu–Gang Jiang
|
+
PDF
Chat
|
Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data
|
2021
|
Yuqian Fu
Yanwei Fu
Yu–Gang Jiang
|
+
PDF
Chat
|
Ultrafast non-volatile flash memory based on van der Waals heterostructures
|
2021
|
Lan Liu
Chunsen Liu
Lilai Jiang
Jiayi Li
Yi Ding
Shuiyuan Wang
Yu–Gang Jiang
Yabin Sun
Jianlu Wang
Shiyou Chen
|
+
|
VideoLT: Large-scale Long-tailed Video Recognition
|
2021
|
Xing Zhang
Zuxuan Wu
Zejia Weng
Huazhu Fu
Jingjing Chen
Yu‐Gang Jiang
Larry S. Davis
|
+
PDF
Chat
|
VideoLT: Large-scale Long-tailed Video Recognition
|
2021
|
Xing Zhang
Zuxuan Wu
Zejia Weng
Huazhu Fu
Jingjing Chen
Yu‐Gang Jiang
Larry S. Davis
|
+
|
Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness
|
2021
|
Linxi Jiang
Xingjun Ma
Zejia Weng
James Bailey
Yu–Gang Jiang
|
+
|
HMS: Hierarchical Modality Selection for Efficient Video Recognition.
|
2021
|
Zejia Weng
Zuxuan Wu
Hengduo Li
Yu‐Gang Jiang
|
+
|
HMS: Hierarchical Modality Selectionfor Efficient Video Recognition
|
2021
|
Zejia Weng
Zuxuan Wu
Hengduo Li
Yu–Gang Jiang
|
+
PDF
Chat
|
HMS: Hierarchical Modality Selection for Efficient Video Recognition
|
2021
|
Zejia Weng
Zuxuan Wu
Hengduo Li
Yu–Gang Jiang
|
+
PDF
Chat
|
Long-Term Cloth-Changing Person Re-identification
|
2021
|
Xuelin Qian
Wenxuan Wang
Li Zhang
Fangrui Zhu
Yanwei Fu
Tao Xiang
Yu‐Gang Jiang
Xiangyang Xue
|
+
|
WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection
|
2021
|
Bojia Zi
Minghao Chang
Jingjing Chen
Xingjun Ma
Yu–Gang Jiang
|
+
|
What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space
|
2021
|
Shihao Zhao
Xingjun Ma
Yisen Wang
James Bailey
Bo Li
Yu–Gang Jiang
|
+
|
Cross-domain Contrastive Learning for Unsupervised Domain Adaptation
|
2021
|
Rui Wang
Zuxuan Wu
Zejia Weng
Jingjing Chen
Guo-Jun Qi
Yu–Gang Jiang
|
+
|
Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
|
2021
|
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu–Gang Jiang
|
+
|
A Multimodal Framework for Video Ads Understanding
|
2021
|
Zejia Weng
Lingchen Meng
Rui Wang
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Self-supervised Learning for Semi-supervised Temporal Language Grounding
|
2021
|
Fan Luo
Shaoxiang Chen
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
VideoLT: Large-scale Long-tailed Video Recognition
|
2021
|
Xing Zhang
Zuxuan Wu
Zejia Weng
Huazhu Fu
Jingjing Chen
Yu‐Gang Jiang
Larry S. Davis
|
+
|
Two-stage Visual Cues Enhancement Network for Referring Image Segmentation
|
2021
|
Yang Jiao
Zequn Jie
Weixin Luo
Jingjing Chen
Yu–Gang Jiang
Xiaolin Wei
Lin Ma
|
+
|
CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition
|
2021
|
Tianlun Zheng
Zhineng Chen
Shancheng Fang
Hongtao Xie
Yu–Gang Jiang
|
+
|
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
|
2021
|
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu‐Gang Jiang
Ser-Nam Lim
|
+
|
Semi-Supervised Vision Transformers
|
2021
|
Zejia Weng
Xitong Yang
Ang Li
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Efficient Video Transformers with Spatial-Temporal Token Selection
|
2021
|
Junke Wang
Xitong Yang
Hengduo Li
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
|
2021
|
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu‐Gang Jiang
|
+
|
Cross-Modal Transferable Adversarial Attacks from Images to Videos
|
2021
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
BEVT: BERT Pretraining of Video Transformers
|
2021
|
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu‐Gang Jiang
Luowei Zhou
Lu Yuan
|
+
|
FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting
|
2021
|
Junke Wang
Shaoxiang Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection
|
2021
|
Junke Wang
Zuxuan Wu
Wenhao Ouyang
Xintong Han
Jingjing Chen
Ser-Nam Lim
Yu–Gang Jiang
|
+
|
Boosting the Transferability of Video Adversarial Examples via Temporal Translation
|
2021
|
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
Towards Transferable Adversarial Attacks on Vision Transformers
|
2021
|
Zhipeng Wei
Jingjing Chen
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu–Gang Jiang
|
+
|
Attacking Video Recognition Models with Bullet-Screen Comments
|
2021
|
Kai Chen
Zhipeng Wei
Jingjing Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
PDF
Chat
|
Colonoscopy Polyp Detection: Domain Adaptation From Medical Report
Images to Real-time Videos
|
2020
|
Zhiqin Zhan
Huazhu Fu
Yan-Yao Yang
Jingjing Chen
Jie Liu
Yu–Gang Jiang
|
+
PDF
Chat
|
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition
|
2020
|
Yuqian Fu
Li Zhang
Junke Wang
Yanwei Fu
Yu‐Gang Jiang
|
+
PDF
Chat
|
Multi-modal Cooking Workflow Construction for Food Recipes
|
2020
|
Liangming Pan
Jingjing Chen
Jianlong Wu
Shaoteng Liu
Chong‐Wah Ngo
Min‐Yen Kan
Yu–Gang Jiang
Tat‐Seng Chua
|
+
PDF
Chat
|
Clean-Label Backdoor Attacks on Video Recognition Models
|
2020
|
Shihao Zhao
Xingjun Ma
Xiang Zheng
James Bailey
Jingjing Chen
Yu‐Gang Jiang
|
+
PDF
Chat
|
Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt
|
2020
|
Hangyu Lin
Yanwei Fu
Xiangyang Xue
Yu‐Gang Jiang
|
+
PDF
Chat
|
Heuristic Black-Box Adversarial Attacks on Video Recognition Models
|
2020
|
Zhipeng Wei
Jingjing Chen
Xingxing Wei
Linxi Jiang
Tat‐Seng Chua
Fengfeng Zhou
Yu‐Gang Jiang
|
+
|
Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition
|
2020
|
Wenxuan Wang
Yanwei Fu
Qiang Sun
Tao Chen
Chenjie Cao
Ziqi Zheng
Guoqiang Xu
Han Qiu
Yu–Gang Jiang
Xiangyang Xue
|
+
|
Clean-Label Backdoor Attacks on Video Recognition Models
|
2020
|
Shihao Zhao
Xingjun Ma
Xiang Zheng
James Bailey
Jingjing Chen
Yu–Gang Jiang
|
+
|
Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt
|
2020
|
Hangyu Lin
Yanwei Fu
Yu–Gang Jiang
Xiangyang Xue
|
+
|
Long-Term Cloth-Changing Person Re-identification
|
2020
|
Xuelin Qian
Wenxuan Wang
Li Zhang
Fangrui Zhu
Yanwei Fu
Tao Xiang
Yu–Gang Jiang
Xiangyang Xue
|
+
|
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
|
2020
|
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu–Gang Jiang
|
+
|
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition
|
2020
|
Yuqian Fu
Li Zhang
Junke Wang
Yanwei Fu
Yu–Gang Jiang
|
+
PDF
Chat
|
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
|
2020
|
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu‐Gang Jiang
|
+
|
Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos
|
2020
|
Zhiqin Zhan
Huazhu Fu
Yan-Yao Yang
Jingjing Chen
Jie Liu
Yu–Gang Jiang
|
+
|
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness
|
2020
|
Xingjun Ma
Linxi Jiang
Hanxun Huang
Zejia Weng
James Bailey
Yu–Gang Jiang
|
+
PDF
Chat
|
Black-box Adversarial Attacks on Video Recognition Models
|
2019
|
Linxi Jiang
Xingjun Ma
Shaoxiang Chen
James Bailey
Yu‐Gang Jiang
|
+
PDF
Chat
|
Composite Binary Decomposition Networks
|
2019
|
You Qiaoben
Zheng Wang
Jianguo Li
Yinpeng Dong
Yu‐Gang Jiang
Jun Zhu
|
+
PDF
Chat
|
Multi-Level Semantic Feature Augmentation for One-Shot Learning
|
2019
|
Zitian Chen
Yanwei Fu
Yinda Zhang
Yu‐Gang Jiang
Xiangyang Xue
Leonid Sigal
|
+
PDF
Chat
|
Object Detection from Scratch with Deep Supervision
|
2019
|
Zhiqiang Shen
Zhuang Liu
Jianguo Li
Yu‐Gang Jiang
Yurong Chen
Xiangyang Xue
|
+
PDF
Chat
|
Vocabulary-Informed Zero-Shot and Open-Set Learning
|
2019
|
Yanwei Fu
Xiaomei Wang
Hanze Dong
Yu–Gang Jiang
Meng Wang
Xiangyang Xue
Leonid Sigal
|
+
PDF
Chat
|
A Multi-Task Neural Approach for Emotion Attribution, Classification, and Summarization
|
2019
|
Guoyun Tu
Yanwei Fu
Boyang Li
Jiarui Gao
Yu‐Gang Jiang
Xiangyang Xue
|
+
PDF
Chat
|
Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging
|
2019
|
Jinhui Tang
Xiangbo Shu
Zechao Li
Yu‐Gang Jiang
Qi Tian
|
+
|
Black-box Adversarial Attacks on Video Recognition Models
|
2019
|
Linxi Jiang
Xingjun Ma
Shaoxiang Chen
James Bailey
Yu–Gang Jiang
|
+
PDF
Chat
|
Non-local NetVLAD Encoding for Video Classification
|
2019
|
Yongyi Tang
Xing Zhang
Jingwen Wang
Shaoxiang Chen
Lin Ma
Yu‐Gang Jiang
|
+
|
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
|
2019
|
Zuxuan Wu
Caiming Xiong
Yu‐Gang Jiang
Larry S. Davis
|
+
|
Heuristic Black-box Adversarial Attacks on Video Recognition Models
|
2019
|
Zhipeng Wei
Jingjing Chen
Xingxing Wei
Linxi Jiang
Tat‐Seng Chua
Fengfeng Zhou
Yu‐Gang Jiang
|
+
|
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
|
2019
|
Zuxuan Wu
Caiming Xiong
Yu‐Gang Jiang
Larry S. Davis
|
+
|
Learning to Separate Domains in Generalized Zero-Shot and Open Set Learning: a probabilistic perspective
|
2018
|
Hanze Dong
Yanwei Fu
Leonid Sigal
Sung Ju Hwang
Yu–Gang Jiang
Xiangyang Xue
|
+
PDF
Chat
|
NAIS: Neural Attentive Item Similarity Model for Recommendation
|
2018
|
Xiangnan He
Zhankui He
Jingkuan Song
Zhenguang Liu
Yu–Gang Jiang
Tat‐Seng Chua
|
+
|
Semantic Feature Augmentation in Few-shot Learning.
|
2018
|
Zitian Chen
Yanwei Fu
Yinda Zhang
Yu–Gang Jiang
Xiangyang Xue
Leonid Sigal
|
+
PDF
Chat
|
Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification
|
2018
|
Yu–Gang Jiang
Zuxuan Wu
Jinhui Tang
Zechao Li
Xiangyang Xue
Shih‐Fu Chang
|
+
|
Learning to score and summarize figure skating sport videos.
|
2018
|
Bing Zhang
Chengming Xu
Changmao Cheng
Yanwei Fu
Yu–Gang Jiang
Xiangyang Xue
|
+
|
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
|
2018
|
Nanyang Wang
Yinda Zhang
Zhuwen Li
Yanwei Fu
Wei Liu
Yu‐Gang Jiang
|
+
|
Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging
|
2018
|
Jinhui Tang
Xiangbo Shu
Zechao Li
Yu–Gang Jiang
Qi Tian
|
+
|
Learning to score the figure skating sports videos
|
2018
|
Chengming Xu
Yanwei Fu
Bing Zhang
Zitian Chen
Yu–Gang Jiang
Xiangyang Xue
|
+
PDF
Chat
|
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
|
2018
|
Minjun Li
Haozhi Huang
Lin Ma
Wei Liu
Tong Zhang
Yu‐Gang Jiang
|
+
PDF
Chat
|
Recurrent Fusion Network for Image Captioning
|
2018
|
Wenhao Jiang
Lin Ma
Yu‐Gang Jiang
Wei Liu
Tong Zhang
|
+
|
Object Detection from Scratch with Deep Supervision
|
2018
|
Zhiqiang Shen
Zhuang Liu
Jianguo Li
Yu–Gang Jiang
Yurong Chen
Xiangyang Xue
|
+
|
Non-local NetVLAD Encoding for Video Classification
|
2018
|
Yongyi Tang
Xing Zhang
Jingwen Wang
Shaoxiang Chen
Lin Ma
Yu–Gang Jiang
|
+
|
Composite Binary Decomposition Networks
|
2018
|
You Qiaoben
Zheng Wang
Jianguo Li
Yinpeng Dong
Yu‐Gang Jiang
Jun Zhu
|
+
|
Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network
|
2018
|
Peng Lu
Hangyu Lin
Yanwei Fu
Shaogang Gong
Yu‐Gang Jiang
Xiangyang Xue
|
+
|
A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization
|
2018
|
Guoyun Tu
Yanwei Fu
Boyang Li
Jiarui Gao
Yu–Gang Jiang
Xiangyang Xue
|
+
|
Recurrent Fusion Network for Image Captioning
|
2018
|
Wenhao Jiang
Lin Ma
Yu–Gang Jiang
Wei Liu
Tong Zhang
|
+
|
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
|
2018
|
Minjun Li
Haozhi Huang
Lin Ma
Wei Liu
Tong Zhang
Yu–Gang Jiang
|
+
PDF
Chat
|
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
|
2018
|
Nanyang Wang
Yinda Zhang
Zhuwen Li
Yanwei Fu
Wei Liu
Yu–Gang Jiang
|
+
PDF
Chat
|
Pose-Normalized Image Generation for Person Re-identification
|
2018
|
Xuelin Qian
Yanwei Fu
Tao Xiang
Wenxuan Wang
Jie Qiu
Yang Wu
Yu‐Gang Jiang
Xiangyang Xue
|
+
PDF
Chat
|
Deep learning for video classification and captioning
|
2017
|
Zuxuan Wu
Ting Yao
Yanwei Fu
Yu–Gang Jiang
|
+
|
Left-Right Skip-DenseNets for Coarse-to-Fine Object Categorization.
|
2017
|
Changmao Cheng
Yanwei Fu
Wenlian Lu
Yu–Gang Jiang
Jianfeng Feng
Xiangyang Xue
|
+
PDF
Chat
|
Learning Fashion Compatibility with Bidirectional LSTMs
|
2017
|
Xintong Han
Zuxuan Wu
Yu‐Gang Jiang
Larry S. Davis
|
+
PDF
Chat
|
Multi-scale Deep Learning Architectures for Person Re-identification
|
2017
|
Xuelin Qian
Yanwei Fu
Yu‐Gang Jiang
Tao Xiang
Xiangyang Xue
|
+
PDF
Chat
|
DSOD: Learning Deeply Supervised Object Detectors from Scratch
|
2017
|
Zhiqiang Shen
Zhuang Liu
Jianguo Li
Yu‐Gang Jiang
Yurong Chen
Xiangyang Xue
|
+
PDF
Chat
|
Iterative object and part transfer for fine-grained recognition
|
2017
|
Zhiqiang Shen
Yu–Gang Jiang
Dequan Wang
Xiangyang Xue
|
+
PDF
Chat
|
Weakly Supervised Dense Video Captioning
|
2017
|
Zhiqiang Shen
Jianguo Li
Su Zhou
Minjun Li
Yurong Chen
Yu‐Gang Jiang
Xiangyang Xue
|
+
|
Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
|
2017
|
Yu–Gang Jiang
Zuxuan Wu
Jun Wang
Xiangyang Xue
Shih‐Fu Chang
|
+
|
Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification
|
2017
|
Yu–Gang Jiang
Zuxuan Wu
Jinhui Tang
Zechao Li
Xiangyang Xue
Shih‐Fu Chang
|
+
|
Aggregating Frame-level Features for Large-Scale Video Classification
|
2017
|
Shaoxiang Chen
Xi Wang
Yongyi Tang
Xinpeng Chen
Zuxuan Wu
Yu–Gang Jiang
|
+
|
DSOD: Learning Deeply Supervised Object Detectors from Scratch
|
2017
|
Zhiqiang Shen
Zhuang Liu
Jianguo Li
Yu–Gang Jiang
Yurong Chen
Xiangyang Xue
|
+
|
Multi-scale Deep Learning Architectures for Person Re-identification
|
2017
|
Xuelin Qian
Yanwei Fu
Yu–Gang Jiang
Tao Xiang
Xiangyang Xue
|
+
|
Recent Advances in Zero-shot Recognition
|
2017
|
Yanwei Fu
Tao Xiang
Yu–Gang Jiang
Xiangyang Xue
Leonid Sigal
Shaogang Gong
|
+
|
Pose-Normalized Image Generation for Person Re-identification
|
2017
|
Xuelin Qian
Yanwei Fu
Tao Xiang
Wenxuan Wang
Jie Qiu
Yang Wu
Yu‐Gang Jiang
Xiangyang Xue
|
+
|
Dual Skipping Networks
|
2017
|
Changmao Cheng
Yanwei Fu
Yu‐Gang Jiang
Wei Liu
Wenlian Lu
Jianfeng Feng
Xiangyang Xue
|
+
|
Iterative Object and Part Transfer for Fine-Grained Recognition
|
2017
|
Zhiqiang Shen
Yu–Gang Jiang
Dequan Wang
Xiangyang Xue
|
+
|
Weakly Supervised Dense Video Captioning
|
2017
|
Zhiqiang Shen
Jianguo Li
Su Zhou
Minjun Li
Yurong Chen
Yu–Gang Jiang
Xiangyang Xue
|
+
PDF
Chat
|
The THUMOS challenge on action recognition for videos “in the wild”
|
2016
|
Haroon Idrees
Amir Zamir
Yu‐Gang Jiang
Alex Gorban
Ivan Laptev
Rahul Sukthankar
Mubarak Shah
|
+
PDF
Chat
|
Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization
|
2016
|
Baohan Xu
Yanwei Fu
Yu‐Gang Jiang
Boyang Li
Leonid Sigal
|
+
|
On stochastic primal-dual hybrid gradient approach for compositely regularized minimization
|
2016
|
Linbo Qiao
Tianyi Lin
Yu‐Gang Jiang
Fan Yang
Wei Liu
Xicheng Lu
|
+
PDF
Chat
|
Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
|
2015
|
Zuxuan Wu
Xi Wang
Yu‐Gang Jiang
Hao Ye
Xiangyang Xue
|
+
PDF
Chat
|
Evaluating Two-Stream CNN for Video Classification
|
2015
|
Hao Ye
Zuxuan Wu
Rui-Wei Zhao
Xi Wang
Yu‐Gang Jiang
Xiangyang Xue
|
+
|
Fusing Multi-Stream Deep Networks for Video Classification
|
2015
|
Zuxuan Wu
Yu–Gang Jiang
Xi Wang
Hao Ye
Xiangyang Xue
Jun Wang
|
+
|
Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
|
2015
|
Zuxuan Wu
Xi Wang
Yu–Gang Jiang
Hao Ye
Xiangyang Xue
|