Faculty Profiles - CHEN Long | The Hong Kong University of Science and Technology

Long CHEN
陳隆

PhD in Computer Science and Technology
Zhejiang University, 2020

Assistant Professor
Department of Computer Science and Engineering

(852) 2358 8836
longchen@ust.hk
Room CYT3003
Personal Web

Google Scholar

-gtmMpIAAAAJ

ORCID

0000-0001-6148-9709

ResearcherID

HZJ-7271-2023

Scopus ID

57195626216

Research Interest Publications Projects Teaching Assignment RPG Supervision

Research Interest

Computer vision
Artificial intelligence
Machine learning
Multimedia computing

Publications

All Years 125 2026 8 2025 35 2024 24 2023 19 2022 18 2021 11 2020 10

2026 8

Multi-level Compositional Feature Augmentation for Unbiased Scene Graph Generation

International Journal of Computer Vision, v. 134, (6), article number 301
Li, Lin; Li, Xingchen; Sun, Chong; Li, Chen; Chen, Long
Article

Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

International Journal of Computer Vision, v. 134, (3), article number 135
Han, Guangxing; Chen, Long; Ma, Jiawei; Huang, Shiyuan; Chellappa, Rama; Chang, Shih Fu
Article

Physically Plausible Human–Object Rendering From Sparse Views via 3D Gaussian Splatting

IEEE Transactions on Image Processing, v. 35, p. 3938-3953, article number 11481577
Wang, Weiquan; Xiao, Jun; Yang, Yi; Zhuang, Yueting; Chen, Long
Article

Towards Customized Knowledge Distillation for Efficient Dense Image Predictions

Transactions on Machine Learning Research, v. 2026-April
Zhang, Dong; Dong, Pingcheng; Chen, Long; Cheng, Kwang Ting
Article

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

14th International Conference on Learning Representations, International Conference on Learning Representations, ICLR, 2026,
LIU, Jiazhen; Deng, Yuchuan; CHEN, Long
Conference paper

Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Proceedings of the AAAI Conference on Artificial Intelligence, v. 40, (11), p. 9386-9394
Tang, Haomiao; Wang, Jinpeng; Zhao, Minyi; Meng, Guanghao; Luo, Ruisheng; Chen, Long; Xia, Shu Tao
Conference paper

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Proceedings of the AAAI Conference on Artificial Intelligence, v. 40, (12), p. 10332-10340
Wang, Yuxuan; Yi, Xuanyu; Xu, Qingshan; Zhou, Yuan; Chen, Long; Zhang, Hanwang
Conference paper

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Proceedings of the AAAI Conference on Artificial Intelligence, v. 40, (8), p. 6306-6314
Li, Lin; Chen, Wei; Li, Jiahui; Cheng, Kwang Ting; Chen, Long
Conference paper

2025 35

Cross-Modal Conditioned Reconstruction for Language-Guided Medical Image Segmentation

IEEE Transactions on Medical Imaging, v. 44, (4), p. 1821-1835
Huang, Xiaoshuang; Li, Hongxiang; Cao, Meng; Chen, Long; You, Chenyu; An, Dong
Article

Eliminating Semantic Ambiguity in Human Pose Estimation via Stable Feature Upsampling

IEEE Transactions on Circuits and Systems for Video Technology, v. 35, (12), p. 11863-11876, article number 11071896
Jiang, Shu; Zhang, Dong; Yan, Rui; Shu, Xiangbo; DONG, Pingcheng; Chen, Long; Du, Xiaoyu
Article

ENCODE: Breaking the Trade-Off Between Performance and Efficiency in Long-Term User Behavior Modeling

IEEE Transactions on Knowledge and Data Engineering, v. 37, (1), p. 265-277
Zhou, Wen Ji; Zheng, Yuhang; Feng, Yinfu; Ye, Yunan; Xiao, Rong; Chen, Long; Yang, Xiaosong; Xiao, Jun
Article

From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation

International Journal of Computer Vision, v. 133, (1), p. 489-508, article number 102911
Shi, Hanrong; Li, Lin; Xiao, Jun; Zhuang, Yueting; Chen, Long
Article

Knowledge Integration for Grounded Situation Recognition

Pattern Recognition, v. 167, p. 1-12, article number 111766
Lei, Jiaming; Wu, Sijing; Li, Lin; Chen, Lei; Xiao, Jun; Yang, Yi; Chen, Long
Article

Learning Combinatorial Prompts for Universal Controllable Image Captioning

International Journal of Computer Vision, v. 133, (1), p. 129-150
Wang, Zhen; Xiao, Jun; Zhuang, Yueting; Gao, Fei; Shao, Jian; Chen, Long
Article

Recent Advances in Finetuning Multimodal Large Language Models

AI Magazine, v. 46, (3), article number e70025
Wang, Zhen; Li, Lin; Chen, Long
Article

An Efficient and Effective Transformer Decoder-Based Framework for Multi-task Visual Grounding

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 125-141
Chen, Wei; Chen, Long; Wu, Yu
Conference paper

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Proceedings of Machine Learning Research, v. 267, p. 18550-18565
Gao, Kaifeng; Shi, Jiaxin; Zhang, Hanwang; Wang, Chunping; Xiao, Jun; Chen, Long
Conference paper

CLIPDRAG: COMBINING TEXT-BASED AND DRAG-BASED INSTRUCTIONS FOR IMAGE EDITING

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 3971-3987
JIANG, Ziqi; WANG, Zhen; CHEN, Long
Conference paper

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 8073-8082
CHEN, Wei; LI, Lin; YANG, Yongqi; WEN, Bin; YANG, Fan; GAO, Tingting; WU, Yu; CHEN, Long
Conference paper

Compositional Zero-shot Learning via Progressive Language-based Observations

MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, Association for Computing Machinery, Inc, 2025, p. 3827-3836
Li, Lin; Chen, Guikun; Wang, Zhen; Xiao, Jun; Chen, Long
Conference paper

CYCLIC CONTRASTIVE KNOWLEDGE TRANSFER FOR OPEN-VOCABULARY OBJECT DETECTION

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 49610-49624
Zhang, Chuhan; Zhu, Chaoyang; Dong, Pingcheng; Chen, Long; Zhang, Dong
Conference paper

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 365-381
Wang, Zhen; Jiang, Xinyun; Xiao, Jun; Chen, Tao; Chen, Long
Conference paper

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Paper presented at 39th Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, United States, p. 1-23
CHEN, Wei; YAN, Xin; WEN, Bin; YANG, Fan; GAO, Tingting; ZHANG, Di; CHEN, Long
Conference paper

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 20173-20191
Li, Hongxiang; Li, Yaowei; Yang, Yuhang; Cao, Junjie; Zhu, Zhihong; Cheng, Xuxin; Chen, Long
Conference paper

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 25156-25165, article number 11094088
WANG, Jinpeng; LUO, Tianci; ZHA, Yaohua; FENG, Yan; LUO, Ruisheng; CHEN, Bin; DAI, Tao; CHEN, Long; WANG, Yaowei; XIA, Shu-Tao
Conference paper

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Paper presented at International Conference on Computer Vision (ICCV 2025), Honolulu, United States, p. 23074-23084
LI, Jun; WANG, Jinpeng; TAN, Chaolei; LIAN, Niu; CHEN, Long; WANG, Yaowei; ZHANG, Min; XIA, Shu-Tao; CHEN, Bin
Conference paper

Event-Customized Image Generation

Proceedings of Machine Learning Research, v. 267, p. 63245-63265
Wang, Zhen; Jiang, Yilei; Zheng, Dong; Xiao, Jun; Chen, Long
Conference paper

Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation

Paper presented at 39th Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, United States
Li, Lin; Zhang, Chuhan; ZHANG, Dong; SUN, Chong; LI, Chen; CHEN, Long
Conference paper

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Wang, Yanghao; Chen, Long
Conference paper

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 4829-4838
Chen, Hongxu; Wang, Zhen; Li, Runshi; Zhu, Bowei; Chen, Long
Conference paper

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Zhu, Yue; Diao, Haiwen; Gao, Shang; Chen, Long; Lu, Huchuan
Conference paper

Learning Causal Transition Matrix for Instance-dependent Label Noise

Proceedings of the AAAI Conference on Artificial Intelligence, v. 39, (17), p. 18305-18313
Li, Jiahui; Chang, Tai Wei; Kuang, Kun; Li, Ximing; Chen, Long; Zhou, Jun
Conference paper

Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics / edited by Che Wanxiang; Nabende Joyce; Shutova Ekaterina; Pilehvar Mohammad Taher. Association for Computational Linguistics (ACL), 2025, p. 1210-1222
Tang, Haomiao; Wang, Jinpeng; Peng, Yuang; Meng, Guanghao; Luo, Ruisheng; Chen, Bin; Chen, Long; Wang, Yaowei; Xia, Shu Tao
Conference paper

MULTI-RESOLUTION DECOMPOSABLE DIFFUSION MODEL FOR NON-STATIONARY TIME SERIES ANOMALY DETECTION

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 33527-33553
Zhong, Guojin; Wang, Pan; Yuan, Jin; Li, Zhiyong; Chen, Long
Conference paper

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Paper presented at International Conference on Computer Vision (ICCV 2025), Honolulu, United States, p. 1-17
Wang, Yuxuan; Yi, Xuanyu; Weng, Haohan; Xu, Qingshan; Wei, Xiaokang; Yang, Xianghui; Guo, Chunchao; CHEN, Long; Zhang, Hanwang
Conference paper

Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

Paper presented at 39th Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, United States
WANG, Yanghao; CHEN, Long
Conference paper

Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models

Special Track on AI Alignment / edited by Walsh Toby; Shah Julie; Kolter Zico. Association for the Advancement of Artificial Intelligence, 2025, p. 28706
Chen, Long
Conference paper

RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2025, p. 4993–5022
Li, Jiahui; LIN, Li; Chang, Tai-Wei; KUANG, Kun; CHEN, Long; Zhou, Jun; Yang, Cheng
Conference paper

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 75-95
Diao, Haiwen; Wan, Bo; Jia, Xu; Zhuge, Yunzhi; Zhang, Ying; Lu, Huchuan; Chen, Long
Conference paper

SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation

MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, Association for Computing Machinery, Inc, 2025, p. 10476-10485
Pham, Kien T.; He, Yingqing; Xing, Yazhou; Chen, Qifeng; Chen, Long
Conference paper

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Paper presented at 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, United States, p. 23604-23614
Hu, Zijing; Zhang, Fengda; Chen, Long; Kuang, Kun; Li, Jiahui; Gao, Kaifeng; Xiao, Jun; Wang, Xin; Zhu, Wenwu
Conference paper

View-Consistent 3D Editing with Gaussian Splatting

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 404-420
Wang, Yuxuan; Yi, Xuanyu; Wu, Zike; Zhao, Na; Chen, Long; Zhang, Hanwang
Conference paper

Zero-shot Compositional Action Recognition with Neural Logic Constraints

MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, Association for Computing Machinery, Inc, 2025, p. 3625-3634
Ye, Gefan; Li, Lin; Li, Kexin; Xiao, Jun; Chen, Long
Conference paper

2024 24

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 46, (12), p. 8954-8975
Zhu, Chaoyang; Chen, Long
Article

CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 46, (5), p. 3123-3136, article number 10366193
Wang, Wenxiao; Chen, Wei; Qiu, Qibo; Chen, Long; Wu, Boxi; Lin, Binbin; He, Xiaofei; Liu, Wei
Article

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

ACM Transactions on Multimedia Computing, Communications and Applications, v. 21, (1), article number 30
Li, Xingchen; Xiao, Jun; Chen, Guikun; Feng, Yinfu; Yang, Yi; Liu, An An; Chen, Long
Article

GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning

IEEE Transactions on Image Processing, v. 33, p. 6241-6252
Diao, Haiwen; Zhang, Ying; Gao, Shang; Zhu, Jiawen; Chen, Long; Lu, Huchuan
Article

Improving Reference-Based Distinctive Image Captioning with Contrastive Rewards

ACM Transactions on Multimedia Computing, Communications and Applications, v. 20, (12), article number ART390
Mao, Yangjun; Xiao, Jun; Zhang, Dong; Cao, Meng; Shao, Jian; Zhuang, Yueting; Chen, Long
Article

In Defense of Clip-Based Video Relation Detection

IEEE Transactions on Image Processing, v. 33, p. 2759-2769
Wei, Meng; Chen, Long; Ji, Wei; Yue, Xiaoyu; Zimmermann, Roger
Article

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

IEEE Transactions on Circuits and Systems for Video Technology, v. 34, (1), p. 195-206
Li, Lin; Xiao, Jun; Shi, Hanrong; Wang, Wenxiao; Shao, Jian; Liu, An An; Yang, Yi; Chen, Long
Article

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 46, (10), p. 6873-6888
Li, Lin; Xiao, Jun; Shi, Hanrong; Zhang, Hanwang; Yang, Yi; Liu, Wei; Chen, Long
Article

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities

Proceedings of the AAAI Conference on Artificial Intelligence, v. 38, (16), p. 17664-17672
Ayyubi, Hammad; Thomas, Christopher; Chum, Lovish; Lokesh, Rahul; Chen, Long; Niu, Yulei; Lin, Xudong; Feng, Xuande; Koo, Jaywon; Ray, Sounak; Chang, Shih Fu
Conference paper

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Complex Action Spaces

Paper presented at 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)
Li, Yang; Yang, Libing; Chen, Long
Conference paper

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces

Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 / edited by Larson Kate. International Joint Conferences on Artificial Intelligence, 2024, p. 6895-6903
Yang, Libing; Li, Yang; Chen, Long
Conference paper

Di²Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Advances in Neural Information Processing Systems, v. 37
Wang, Weiquan; Xiao, Jun; Wang, Chunping; Liu, Wei; Wang, Zhao; Chen, Long
Conference paper

Distributionally Generative Augmentation for Fair Facial Attribute Classification

Paper presented at Unknown Event
HE, Qianpei; KUANG, Kun; ZHANG, Fengda; LIU, Jiashuo; CHEN, Long; WU, Chao; XIAO, Jun; ZHANG, Hanwang
Conference paper

Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning

ICMR 2024-Proceedings of the 14th Annual ACM International Conference on Multimedia Retrieval, Association for Computing Machinery, Inc, 2024, p. 1084-1088
Zheng, Yuhang; Wang, Zhen; Chen, Long
Conference paper

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

Advances in Neural Information Processing Systems, v. 37
Yu, Jiazuo; Xiong, Haomiao; Zhang, Lu; Diao, Haiwen; Zhuge, Yunzhi; Hong, Lanqing; Wang, Dong; Lu, Huchuan; He, You; Chen, Long
Conference paper

MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference / edited by Al-Onaizan Yaser; Bansal Mohit; Chen Yun-Nung. Association for Computational Linguistics (ACL), 2024, p. 7800-7815
Xu, Baixuan; Wang, Weiqi; Shi, Haochen; Ding, Wenxuan; Jing, Huihao; Fang, Tianqing; Bai, Jiaxin; Liu, Xin; Yu, Changlong; Li, Zheng; Luo, Chen; Yin, Qingyu; Yin, Bing; Chen, Long; Song, Yangqiu
Conference paper

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding

Paper presented at 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)SP), 2024
Ji, Wei; Qin, You; Wei, Yinwei; Wu, Yiming; Zimmermann, Roger; Chen, Long
Conference paper

Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning

EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference / edited by Al-Onaizan Yaser; Bansal Mohit; Chen Yun-Nung. Association for Computational Linguistics (ACL), 2024, p. 10122-10140
Li, Jiahui; Zhang, Hanlin; Zhang, Fengda; Chang, Tai Wei; Kuang, Kun; Chen, Long; Zhou, Jun
Conference paper

PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation

MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2024, p. 3313-3322
Zhong, Guojin; Guo, Yihu; Yuan, Jin; Zhang, Qianjun; Guan, Weili; Chen, Long
Conference paper

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

The 62nd Annual Meeting of the Association for Computational Linguistics / edited by Ku Lun-Wei; Martins Andre; Srikumar Vivek. Association for Computational Linguistics (ACL), 2024, p. 7160-7174
Cao, Meng; Tang, Haoran; Huang, Jinfa; Jin, Peng; Zhang, Can; Liu, Ruyang; Chen, Long; Liang, Xiaodan; Yuan, Li; Li, Ge
Conference paper

SCHEMA: State Changes Matter for Procedure Planning in Instructional Videos

12th International Conference on Learning Representations, ICLR 2024 / International Conference on Learning Representations, ICLR. International Conference on Learning Representations, ICLR, 2024
Niu, Yulei; Guo, Wenliang; Chen, Long; Lin, Xudong; Chang, Shih-Fu
Conference paper

Seeing beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer

MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2024, p. 1602-1611
Lei, Jiaming; Li, Lin; Wang, Chunping; Xiao, Jun; Chen, Long
Conference paper

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval

MMGR 2024 - Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, Association for Computing Machinery, Inc, 2024, p. 1-6
Ji, Wei; Fei, Hao; Wei, Yinwei; Zheng, Zhedong; Li, Juncheng; Chen, Long; Liao, Lizi; Zhuang, Yueting; Zimmermann, Roger
Conference paper

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, IEEE Computer Society, 2024, p. 28729-28740
Diao, Haiwen; Wan, Bo; Zhang, Ying; Jia, Xu; Lu, Huchuan; Chen, Long
Conference paper

2023 19

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, (6), article number 218
Lan, Xiaohan; Yuan, Yitian; Wang, Xin; Chen, Long; Wang, Zhi; Ma, Lin; Zhu, Wenwu
Article

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 45, (11), p. 13218-13234
Chen, Long; Zheng, Yuhang; Niu, Yulei; Zhang, Hanwang; Xiao, Jun
Article

Federated unsupervised representation learning

Frontiers of Information Technology and Electronic Engineering, v. 24, (8), p. 1181-1193
Zhang, Fengda; Kuang, Kun; Chen, Long; You, Zhaoyang; Shen, Tao; Xiao, Jun; Zhang, Yin; Wu, Chao; Wu, Fei; Zhuang, Yueting; Li, Xiaolin
Article

VL-NMS: Breaking Proposal Bottlenecks in Two-stage Visual-language Matching

ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, (5 s), article number 166
Zhang, Chenchi; Ma, Wenbo; Xiao, Jun; Zhang, Hanwang; Shao, Jian; Zhuang, Yueting; Chen, Long
Article

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models

Findings of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2023, p. 9114-9128
Lin, Hongzhan; Luo, Ziyang; Ma, Jing; Chen, Long
Conference paper

Compositional Feature Augmentation for Unbiased Scene Graph Generation

Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Institute of Electrical and Electronics Engineers Inc., 2023, p. 21628-21638
Li, Lin; Chen, Guikun; Xiao, Jun; Yang, Yi; Wang, Chunping; Chen, Long
Conference paper

COMPOSITIONAL PROMPT TUNING WITH MOTION CUES FOR OPEN-VOCABULARY VIDEO RELATION DETECTION

Paper presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda
Gao, Kaifeng; Chen, Long; Zhang, Hanwang; Xiao, Jun; Sun, Qianru
Conference paper

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Findings of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2023, p. 8598-8617
Wang, Zhecan; Chen, Long; You, Haoxuan; Xu, Keyang; He, Yicheng; Li, Wenhao; Codella, Noel; Chang, Kai Wei; Chang, Shih Fu
Conference paper

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection

IJCAI International Joint Conference on Artificial Intelligence, v. 2023-August, August 2023, p. 1387-1395
Shi, Zenan; Chen, Haipeng; Chen, Long; Zhang, Dong
Conference paper

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Findings of the Association for Computational Linguistics, ACL 2023, Association for Computational Linguistics (ACL), 2023, p. 1314-1326
Zhou, Mingyang; Fung, Yi R.; Chen, Long; Thomas, Christopher; Ji, Heng; Chang, Shih Fu
Conference paper

FAIRNESS-AWARE CONTRASTIVE LEARNING WITH PARTIALLY ANNOTATED SENSITIVE ATTRIBUTES

Paper presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda
Zhang, Fengda; Kuang, Kun; Chen, Long; Liu, Yuxuan; Wu, Chao; Xiao, Jun
Conference paper

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Findings of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2023, p. 11289-11303
You, Haoxuan; Sun, Rui; Wang, Zhecan; Chen, Long; Wang, Gengyu; Ayyubi, Hammad A.; Chang, Kai Wei; Chang, Shih Fu
Conference paper

Iterative Proposal Refinement for Weakly-Supervised Video Grounding

Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, IEEE Computer Society, 2023, p. 6524-6534
Cao, Meng; Wei, Fangyun; Xu, Can; Geng, Xiubo; Chen, Long; Zhang, Can; Zou, Yuexian; Shen, Tao; Jiang, Daxin
Conference paper

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification

Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings / edited by Wang Lei; Gall Juergen; Chin Tat-Jun; Sato Imari; Chellappa Rama. Springer Science and Business Media Deutschland GmbH, 2023, p. 107-123
Chen, Long; Su, Feng; Shi, Jiahao; Qian, Ye
Conference paper

TempCLR: Temporal Alignment Representation with Contrastive Learning

Paper presented at 11th International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda
Chang, Shih-Fu; Chen, Long; Han, Guangxing; Huang, Shiyuan; Lin, Xudong; Ma, Jiawei; Yang, Yuncong
Conference paper

Two Heads are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning

Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 / edited by Oh A.; Neumann T.; Globerson A.; Saenko K.; Hardt M.; Levine S.. Neural information processing systems foundation, 2023,
Li, Jiahui; Kuang, Kun; Wang, Baoxiang; Li, Xingchen; Wu, Fei; Xiao, Jun; Chen, Long
Conference paper

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

MMIR 2023 - Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval, Co-located with, Association for Computing Machinery, Inc, 2023, p. 39-48
Ji, Jiang; Cao, Meng; Song, Tengtao; Chen, Long; Wang, Yi; Zou, Yuexian
Conference paper

VIDEO SCENE GRAPH GENERATION FROM SINGLE-FRAME WEAK SUPERVISION

Paper presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda
Chen, Siqi; Xiao, Jun; Chen, Long
Conference paper

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 / edited by Oh A.; Neumann T.; Globerson A.; Saenko K.; Hardt M.; Levine S.. Neural information processing systems foundation, 2023,
Li, Lin; Xiao, Jun; Chen, Guikun; Shao, Jian; Zhuang, Yueting; Chen, Long
Conference paper

2022 18

Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey

Neurocomputing, v. 496, p. 192-207
Shao, Feifei; Chen, Long; Shao, Jian; Ji, Wei; Xiao, Shaoning; Ye, Lu; Zhuang, Yueting; Xiao, Jun
Article

Deep Motion Prior for Weakly-Supervised Temporal Action Localization

IEEE Transactions on Image Processing, v. 31, p. 5203-5213
Cao, Meng; Zhang, Can; Chen, Long; Shou, Mike Zheng; Zou, Yuexian
Article

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs

Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, IEEE Computer Society, 2022, p. 19475-19484
Gao, Kaifeng; Chen, Long; Niu, Yulei; Shao, Jian; Xiao, Jun
Conference paper

Correspondence Matters for Video Referring Expression Comprehension

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2022, p. 4967-4976
Cao, Meng; Jiang, Ji; Chen, Long; Zou, Yuexian
Conference paper

CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION

Paper presented at 10th International Conference on Learning Representations, ICLR 2022, Virtual, Online
Wang, Wenxiao; Yao, Lu; Chen, Long; Lin, Binbin; Cai, Deng; He, Xiaofei; Liu, Wei
Conference paper

Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning

Proceedings of Machine Learning Research, v. 162, 2022, p. 12843-12856
Li, Jiahui; Kuang, Kun; Wang, Baoxiang; Liu, Furui; Chen, Long; Fan, Changjie; Wu, Fei; Xiao, Jun
Conference paper

Explicit Image Caption Editing

Computer Vision – ECCV 2022 - 17th European Conference, Proceedings / edited by Avidan Shai; Brostow Gabriel; Cissé Moustapha; Farinella Giovanni Maria; Hassner Tal. Springer Science and Business Media Deutschland GmbH, 2022, p. 113-129
Wang, Zhen; Chen, Long; Ma, Wenbo; Han, Guangxing; Niu, Yulei; Shao, Jian; Xiao, Jun
Conference paper

Few-Shot Object Detection with Fully Cross-Transformer

Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, IEEE Computer Society, 2022, p. 5311-5320
Han, Guangxing; Ma, Jiawei; Huang, Shiyuan; Chen, Long; Chang, Shih Fu
Conference paper

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

MM '22: Proceedings of the 30th ACM International Conference on Multimedia / Association for Computing Machinery. New York, NY : Association for Computing Machinery, 2022, p. 4204-4213
Li, Xingchen; Chen, Long; Ma, Wenbo; Yang, Yi; Xiao, Jun
Conference paper

Respecting Transfer Gap in Knowledge Distillation

Niu, Yulei; Chen, Long; Zhou, Chang; Zhang, Hanwang
Conference paper

Rethinking Data Augmentation for Robust Visual Question Answering

Computer Vision – ECCV 2022 - 17th European Conference, Proceedings / edited by Avidan Shai; Brostow Gabriel; Cissé Moustapha; Farinella Giovanni Maria; Hassner Tal. Springer Science and Business Media Deutschland GmbH, 2022, p. 95-112
Chen, Long; Zheng, Yuhang; Xiao, Jun
Conference paper

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 / edited by Goldberg Yoav; Kozareva Zornitsa; Zhang Yue. Association for Computational Linguistics (ACL), 2022, p. 8188-8198
Xiao, Shaoning; Chen, Long; Gao, Kaifeng; Wang, Zhao; Yang, Yi; Zhang, Zhimeng; Xiao, Jun
Conference paper

Rethinking the Evaluation of Unbiased Scene Graph Generation

Li, Xingchen; Chen, Long; Shao, Jian; Xiao, Shaoning; Zhang, Songyang; Xiao, Jun
Conference paper

Rethinking the Reference-based Distinctive Image Captioning

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2022, p. 4374-4384
Mao, Yangjun; Chen, Long; Jiang, Zhihong; Zhang, Dong; Zhang, Zhimeng; Shao, Jian; Xiao, Jun
Conference paper

Rethinking the Two-Stage Framework for Grounded Situation Recognition

AAAI-22 Technical Tracks 3, Association for the Advancement of Artificial Intelligence, 2022, p. 2651-2658
Wei, Meng; Chen, Long; Ji, Wei; Yue, Xiaoyu; Chua, Tat Seng
Conference paper

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, IEEE Computer Society, 2022, p. 18847-18856
Li, Lin; Chen, Long; Huang, Yifeng; Zhang, Zhimeng; Zhang, Songyang; Xiao, Jun
Conference paper

Towards Multi-level Fairness and Robustness on Federated Learning

Paper presented at 39th International Conference on Machine Learning (ICML 2022)
Lu, Jiaxun; Chen, Long; Kuang, Kun; Liu, Yuxuan; Wu, Fei; Wu, Chao; Xiao, Jun; Zhang, Fengda
Conference paper

Weakly-Supervised Temporal Article Grounding

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 / edited by Goldberg Yoav; Kozareva Zornitsa; Zhang Yue. Association for Computational Linguistics (ACL), 2022, p. 9402-9413
Chen, Long; Niu, Yulei; Chen, Brian; Lin, Xudong; Han, Guangxing; Thomas, Christopher; Ayyubi, Hammad; Ji, Heng; Chang, Shih Fu
Conference paper

2021 11

A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric

HUMA 2021 - Proceedings of the 2nd International Workshop on Human-Centric Multimedia Analysis, co-located with ACM MM 2021, Association for Computing Machinery, Inc, 2021, p. 13-21
Yuan, Yitian; Lan, Xiaohan; Wang, Xin; Chen, Long; Wang, Zhi; Zhu, Wenwu
Conference paper

Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework

Paper presented at Proceedings of Machine Learning Research, p. 10717-10726
WANG, Wenxiao; CHEN, Minghao; ZHAO, Shuai; CHEN, Long; HU, Jinming; LIU, Haifeng; CAI, Deng; HE, Xiaofei; LIU, Wei
Conference paper

Boundary Proposal Network for Two-Stage Natural Language Video Localization

35th AAAI Conference on Artificial Intelligence, AAAI 2021, Association for the Advancement of Artificial Intelligence, 2021, p. 2986-2994
Xiao, Shaoning; Chen, Long; Zhang, Songyang; Ji, Wei; Shao, Jian; Ye, Lu; Xiao, Jun
Conference paper

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021, IEEE Computer Society, 2021, p. 16841-16851
Chen, Long; Jiang, Zhihong; Xiao, Jun; Liu, Wei
Conference paper

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2021, p. 3664-3672
Li, Jiahui; Kuang, Kun; Li, Lin; Chen, Long; Zhang, Songyang; Shao, Jian; Xiao, Jun
Conference paper

Natural Language Video Localization with Learnable Moment Proposals

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics (ACL), 2021, p. 4008-4017
Xiao, Shaoning; Chen, Long; Shao, Jian; Zhuang, Yueting; Xiao, Jun
Conference paper

On Pursuit of Designing Multi-modal Transformer for Video Grounding

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics (ACL), 2021, p. 9810-9823
Cao, Meng; Chen, Long; Shou, Mike Zheng; Zhang, Can; Zou, Yuexian
Conference paper

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value

Artificial Intelligence - 1st CAAI International Conference, CICAI 2021, Proceedings / edited by Fang Lu; Chen Yiran; Zhai Guangtao; Wang Jane; Wang Ruiping; Dong Weisheng. Springer Science and Business Media Deutschland GmbH, 2021, p. 164-175
Tang, Zuoqi; Shao, Feifei; Chen, Long; Ye, Yunan; Wu, Chao; Xiao, Jun
Conference paper

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

35th AAAI Conference on Artificial Intelligence, AAAI 2021, Association for the Advancement of Artificial Intelligence, 2021, p. 1036-1044
Chen, Long; Ma, Wenbo; Xiao, Jun; Zhang, Hanwang; Chang, Shih Fu
Conference paper

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

KDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, 2021, p. 934-942
Li, Jiahui; Kuang, Kun; Wang, Baoxiang; Liu, Furui; Chen, Long; Wu, Fei; Xiao, Jun
Conference paper

Video Relation Detection via Tracklet based Visual Transformer

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2021, p. 4833-4837
Gao, Kaifeng; Chen, Long; Huang, Yifeng; Xiao, Jun
Conference paper

2020 4

Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering

Neural Processing Letters, v. 52, (2), p. 993-1003
Xiao, Shaoning; Li, Yimeng; Ye, Yunan; Chen, Long; Pu, Shiliang; Zhao, Zhou; Shao, Jian; Xiao, Jun
Article

Counterfactual samples synthesizing for robust visual question answering

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 10797-10806, article number 9157377
Chen, Long; Yan, Xin; Xiao, Jun; Zhang, Hanwang; Pu, Shiliang; Zhuang, Yueting
Conference paper

Hierarchical Fashion Graph Network for Personalized Outfit Recommendation

SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 2020, p. 159-168
Li, Xingchen; Wang, Xiang; He, Xiangnan; Chen, Long; Xiao, Jun; Chua, Tat Seng
Conference paper

Rethinking the bottom-up framework for query-based video localization

AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, AAAI Press, 2020, p. 10551-10558
Chen, Long; Lu, Chujie; Tang, Siliang; Xiao, Jun; Zhang, Dong; Tan, Chilie; Li, Xiaolin
Conference paper

2019 3

Counterfactual critic multi-agent training for scene graph generation

Proceedings - 2019 International Conference on Computer Vision, ICCV 2019, Institute of Electrical and Electronics Engineers Inc., 2019, p. 4612-4622article number 9010810
Chen, Long; Zhang, Hanwang; Xiao, Jun; He, Xiangnan; Pu, Shiliang; Chang, Shih Fu
Conference paper

DebuG: A dense bottom-up grounding approach for natural language video localization

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, Association for Computational Linguistics, 2019, p. 5144-5153
Lu, Chujie; Chen, Long; Tan, Chilie; Li, Xiaolin; Xiao, Jun
Conference paper

Learning using privileged information for food recognition

MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2019, p. 557-565
Meng, Lei; Tao, Dacheng; Chen, Long; Zhang, Hanwang; Miao, Chunyan; Yang, Xun; Chua, Tat Seng
Conference paper

2018 1

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, IEEE Computer Society, 2018, p. 1043-1052article number 8578213
Chen, Long; Zhang, Hanwang; Xiao, Jun; Liu, Wei; Chang, Shih Fu
Conference paper

2017 2

SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Institute of Electrical and Electronics Engineers Inc., 2017, p. 6298-6306
Chen, Long; Zhang, Hanwang; Xiao, Jun; Nie, Liqiang; Shao, Jian; Liu, Wei; Chua, Tat Seng
Conference paper

Video question answering via attribute-Augmented attention network learning

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 2017, p. 829-832
Ye, Yunan; Zhao, Zhou; Li, Yimeng; Chen, Long; Xiao, Jun; Zhuang, Yueting
Conference paper

Article 4

Multi-level Compositional Feature Augmentation for Unbiased Scene Graph Generation

International Journal of Computer Vision, v. 134, (6), article number 301
Li, Lin; Li, Xingchen; Sun, Chong; Li, Chen; Chen, Long

Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

International Journal of Computer Vision, v. 134, (3), article number 135
Han, Guangxing; Chen, Long; Ma, Jiawei; Huang, Shiyuan; Chellappa, Rama; Chang, Shih Fu

Physically Plausible Human–Object Rendering From Sparse Views via 3D Gaussian Splatting

IEEE Transactions on Image Processing, v. 35, p. 3938-3953, article number 11481577
Wang, Weiquan; Xiao, Jun; Yang, Yi; Zhuang, Yueting; Chen, Long

Towards Customized Knowledge Distillation for Efficient Dense Image Predictions

Transactions on Machine Learning Research, v. 2026-April
Zhang, Dong; Dong, Pingcheng; Chen, Long; Cheng, Kwang Ting

Conference paper 4

Empowering Small VLMs to Think with Dynamic Memorization and Exploration

14th International Conference on Learning Representations, International Conference on Learning Representations, ICLR, 2026,
LIU, Jiazhen; Deng, Yuchuan; CHEN, Long

Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Proceedings of the AAAI Conference on Artificial Intelligence, v. 40, (11), p. 9386-9394
Tang, Haomiao; Wang, Jinpeng; Zhao, Minyi; Meng, Guanghao; Luo, Ruisheng; Chen, Long; Xia, Shu Tao

Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image

Proceedings of the AAAI Conference on Artificial Intelligence, v. 40, (12), p. 10332-10340
Wang, Yuxuan; Yi, Xuanyu; Xu, Qingshan; Zhou, Yuan; Chen, Long; Zhang, Hanwang

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Proceedings of the AAAI Conference on Artificial Intelligence, v. 40, (8), p. 6306-6314
Li, Lin; Chen, Wei; Li, Jiahui; Cheng, Kwang Ting; Chen, Long

Article 7

Cross-Modal Conditioned Reconstruction for Language-Guided Medical Image Segmentation

IEEE Transactions on Medical Imaging, v. 44, (4), p. 1821-1835
Huang, Xiaoshuang; Li, Hongxiang; Cao, Meng; Chen, Long; You, Chenyu; An, Dong

Eliminating Semantic Ambiguity in Human Pose Estimation via Stable Feature Upsampling

IEEE Transactions on Circuits and Systems for Video Technology, v. 35, (12), p. 11863-11876, article number 11071896
Jiang, Shu; Zhang, Dong; Yan, Rui; Shu, Xiangbo; DONG, Pingcheng; Chen, Long; Du, Xiaoyu

ENCODE: Breaking the Trade-Off Between Performance and Efficiency in Long-Term User Behavior Modeling

IEEE Transactions on Knowledge and Data Engineering, v. 37, (1), p. 265-277
Zhou, Wen Ji; Zheng, Yuhang; Feng, Yinfu; Ye, Yunan; Xiao, Rong; Chen, Long; Yang, Xiaosong; Xiao, Jun

From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation

International Journal of Computer Vision, v. 133, (1), p. 489-508, article number 102911
Shi, Hanrong; Li, Lin; Xiao, Jun; Zhuang, Yueting; Chen, Long

Knowledge Integration for Grounded Situation Recognition

Pattern Recognition, v. 167, p. 1-12, article number 111766
Lei, Jiaming; Wu, Sijing; Li, Lin; Chen, Lei; Xiao, Jun; Yang, Yi; Chen, Long

Learning Combinatorial Prompts for Universal Controllable Image Captioning

International Journal of Computer Vision, v. 133, (1), p. 129-150
Wang, Zhen; Xiao, Jun; Zhuang, Yueting; Gao, Fei; Shao, Jian; Chen, Long

Recent Advances in Finetuning Multimodal Large Language Models

AI Magazine, v. 46, (3), article number e70025
Wang, Zhen; Li, Lin; Chen, Long

Conference paper 28

An Efficient and Effective Transformer Decoder-Based Framework for Multi-task Visual Grounding

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 125-141
Chen, Wei; Chen, Long; Wu, Yu

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

Proceedings of Machine Learning Research, v. 267, p. 18550-18565
Gao, Kaifeng; Shi, Jiaxin; Zhang, Hanwang; Wang, Chunping; Xiao, Jun; Chen, Long

CLIPDRAG: COMBINING TEXT-BASED AND DRAG-BASED INSTRUCTIONS FOR IMAGE EDITING

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 3971-3987
JIANG, Ziqi; WANG, Zhen; CHEN, Long

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 8073-8082
CHEN, Wei; LI, Lin; YANG, Yongqi; WEN, Bin; YANG, Fan; GAO, Tingting; WU, Yu; CHEN, Long

Compositional Zero-shot Learning via Progressive Language-based Observations

MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, Association for Computing Machinery, Inc, 2025, p. 3827-3836
Li, Lin; Chen, Guikun; Wang, Zhen; Xiao, Jun; Chen, Long

CYCLIC CONTRASTIVE KNOWLEDGE TRANSFER FOR OPEN-VOCABULARY OBJECT DETECTION

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 49610-49624
Zhang, Chuhan; Zhu, Chaoyang; Dong, Pingcheng; Chen, Long; Zhang, Dong

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 365-381
Wang, Zhen; Jiang, Xinyun; Xiao, Jun; Chen, Tao; Chen, Long

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Paper presented at 39th Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, United States, p. 1-23
CHEN, Wei; YAN, Xin; WEN, Bin; YANG, Fan; GAO, Tingting; ZHANG, Di; CHEN, Long

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 20173-20191
Li, Hongxiang; Li, Yaowei; Yang, Yuhang; Cao, Junjie; Zhu, Zhihong; Cheng, Xuxin; Chen, Long

Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 25156-25165, article number 11094088
WANG, Jinpeng; LUO, Tianci; ZHA, Yaohua; FENG, Yan; LUO, Ruisheng; CHEN, Bin; DAI, Tao; CHEN, Long; WANG, Yaowei; XIA, Shu-Tao

Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Paper presented at International Conference on Computer Vision (ICCV 2025), Honolulu, United States, p. 23074-23084
LI, Jun; WANG, Jinpeng; TAN, Chaolei; LIAN, Niu; CHEN, Long; WANG, Yaowei; ZHANG, Min; XIA, Shu-Tao; CHEN, Bin

Event-Customized Image Generation

Proceedings of Machine Learning Research, v. 267, p. 63245-63265
Wang, Zhen; Jiang, Yilei; Zheng, Dong; Xiao, Jun; Chen, Long

Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation

Paper presented at 39th Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, United States
Li, Lin; Zhang, Chuhan; ZHANG, Dong; SUN, Chong; LI, Chen; CHEN, Long

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Wang, Yanghao; Chen, Long

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 4829-4838
Chen, Hongxu; Wang, Zhen; Li, Runshi; Zhu, Bowei; Chen, Long

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Zhu, Yue; Diao, Haiwen; Gao, Shang; Chen, Long; Lu, Huchuan

Learning Causal Transition Matrix for Instance-dependent Label Noise

Proceedings of the AAAI Conference on Artificial Intelligence, v. 39, (17), p. 18305-18313
Li, Jiahui; Chang, Tai Wei; Kuang, Kun; Li, Ximing; Chen, Long; Zhou, Jun

Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics / edited by Che Wanxiang; Nabende Joyce; Shutova Ekaterina; Pilehvar Mohammad Taher. Association for Computational Linguistics (ACL), 2025, p. 1210-1222
Tang, Haomiao; Wang, Jinpeng; Peng, Yuang; Meng, Guanghao; Luo, Ruisheng; Chen, Bin; Chen, Long; Wang, Yaowei; Xia, Shu Tao

MULTI-RESOLUTION DECOMPOSABLE DIFFUSION MODEL FOR NON-STATIONARY TIME SERIES ANOMALY DETECTION

13th International Conference on Learning Representations, ICLR 2025, International Conference on Learning Representations, ICLR, 2025, p. 33527-33553
Zhong, Guojin; Wang, Pan; Yuan, Jin; Li, Zhiyong; Chen, Long

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Paper presented at International Conference on Computer Vision (ICCV 2025), Honolulu, United States, p. 1-17
Wang, Yuxuan; Yi, Xuanyu; Weng, Haohan; Xu, Qingshan; Wei, Xiaokang; Yang, Xianghui; Guo, Chunchao; CHEN, Long; Zhang, Hanwang

Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

Paper presented at 39th Annual Conference on Neural Information Processing Systems, NeurIPS 2025, San Diego, United States
WANG, Yanghao; CHEN, Long

Open-World Multimodal Understanding and Generation with Efficiently Finetuned Foundation Models

Special Track on AI Alignment / edited by Walsh Toby; Shah Julie; Kolter Zico. Association for the Advancement of Artificial Intelligence, 2025, p. 28706
Chen, Long

RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2025, p. 4993–5022
Li, Jiahui; LIN, Li; Chang, Tai-Wei; KUANG, Kun; CHEN, Long; Zhou, Jun; Yang, Cheng

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 75-95
Diao, Haiwen; Wan, Bo; Jia, Xu; Zhuge, Yunzhi; Zhang, Ying; Lu, Huchuan; Chen, Long

SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation

MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, Association for Computing Machinery, Inc, 2025, p. 10476-10485
Pham, Kien T.; He, Yingqing; Xing, Yazhou; Chen, Qifeng; Chen, Long

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards

Paper presented at 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, United States, p. 23604-23614
Hu, Zijing; Zhang, Fengda; Chen, Long; Kuang, Kun; Li, Jiahui; Gao, Kaifeng; Xiao, Jun; Wang, Xin; Zhu, Wenwu

View-Consistent 3D Editing with Gaussian Splatting

Computer Vision – ECCV 2024 - 18th European Conference, Proceedings / edited by Leonardis Aleš; Ricci Elisa; Roth Stefan; Russakovsky Olga; Sattler Torsten; Varol Gül. Springer Science and Business Media Deutschland GmbH, 2025, p. 404-420
Wang, Yuxuan; Yi, Xuanyu; Wu, Zike; Zhao, Na; Chen, Long; Zhang, Hanwang

Zero-shot Compositional Action Recognition with Neural Logic Constraints

MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, Association for Computing Machinery, Inc, 2025, p. 3625-3634
Ye, Gefan; Li, Lin; Li, Kexin; Xiao, Jun; Chen, Long

Article 8

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 46, (12), p. 8954-8975
Zhu, Chaoyang; Chen, Long

CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 46, (5), p. 3123-3136, article number 10366193
Wang, Wenxiao; Chen, Wei; Qiu, Qibo; Chen, Long; Wu, Boxi; Lin, Binbin; He, Xiaofei; Liu, Wei

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

ACM Transactions on Multimedia Computing, Communications and Applications, v. 21, (1), article number 30
Li, Xingchen; Xiao, Jun; Chen, Guikun; Feng, Yinfu; Yang, Yi; Liu, An An; Chen, Long

GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning

IEEE Transactions on Image Processing, v. 33, p. 6241-6252
Diao, Haiwen; Zhang, Ying; Gao, Shang; Zhu, Jiawen; Chen, Long; Lu, Huchuan

Improving Reference-Based Distinctive Image Captioning with Contrastive Rewards

ACM Transactions on Multimedia Computing, Communications and Applications, v. 20, (12), article number ART390
Mao, Yangjun; Xiao, Jun; Zhang, Dong; Cao, Meng; Shao, Jian; Zhuang, Yueting; Chen, Long

In Defense of Clip-Based Video Relation Detection

IEEE Transactions on Image Processing, v. 33, p. 2759-2769
Wei, Meng; Chen, Long; Ji, Wei; Yue, Xiaoyu; Zimmermann, Roger

Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation

IEEE Transactions on Circuits and Systems for Video Technology, v. 34, (1), p. 195-206
Li, Lin; Xiao, Jun; Shi, Hanrong; Wang, Wenxiao; Shao, Jian; Liu, An An; Yang, Yi; Chen, Long

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 46, (10), p. 6873-6888
Li, Lin; Xiao, Jun; Shi, Hanrong; Zhang, Hanwang; Yang, Yi; Liu, Wei; Chen, Long

Conference paper 16

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities

Proceedings of the AAAI Conference on Artificial Intelligence, v. 38, (16), p. 17664-17672
Ayyubi, Hammad; Thomas, Christopher; Chum, Lovish; Lokesh, Rahul; Chen, Long; Niu, Yulei; Lin, Xudong; Feng, Xuande; Koo, Jaywon; Ray, Sounak; Chang, Shih Fu

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Complex Action Spaces

Paper presented at 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)
Li, Yang; Yang, Libing; Chen, Long

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces

Proceedings of the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 / edited by Larson Kate. International Joint Conferences on Artificial Intelligence, 2024, p. 6895-6903
Yang, Libing; Li, Yang; Chen, Long

Di²Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Advances in Neural Information Processing Systems, v. 37
Wang, Weiquan; Xiao, Jun; Wang, Chunping; Liu, Wei; Wang, Zhao; Chen, Long

Distributionally Generative Augmentation for Fair Facial Attribute Classification

Paper presented at Unknown Event
HE, Qianpei; KUANG, Kun; ZHANG, Fengda; LIU, Jiashuo; CHEN, Long; WU, Chao; XIAO, Jun; ZHANG, Hanwang

Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning

ICMR 2024-Proceedings of the 14th Annual ACM International Conference on Multimedia Retrieval, Association for Computing Machinery, Inc, 2024, p. 1084-1088
Zheng, Yuhang; Wang, Zhen; Chen, Long

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

Advances in Neural Information Processing Systems, v. 37
Yu, Jiazuo; Xiong, Haomiao; Zhang, Lu; Diao, Haiwen; Zhuge, Yunzhi; Hong, Lanqing; Wang, Dong; Lu, Huchuan; He, You; Chen, Long

MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference / edited by Al-Onaizan Yaser; Bansal Mohit; Chen Yun-Nung. Association for Computational Linguistics (ACL), 2024, p. 7800-7815
Xu, Baixuan; Wang, Weiqi; Shi, Haochen; Ding, Wenxuan; Jing, Huihao; Fang, Tianqing; Bai, Jiaxin; Liu, Xin; Yu, Changlong; Li, Zheng; Luo, Chen; Yin, Qingyu; Yin, Bing; Chen, Long; Song, Yangqiu

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding

Paper presented at 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)SP), 2024
Ji, Wei; Qin, You; Wei, Yinwei; Wu, Yiming; Zimmermann, Roger; Chen, Long

Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning

EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference / edited by Al-Onaizan Yaser; Bansal Mohit; Chen Yun-Nung. Association for Computational Linguistics (ACL), 2024, p. 10122-10140
Li, Jiahui; Zhang, Hanlin; Zhang, Fengda; Chang, Tai Wei; Kuang, Kun; Chen, Long; Zhou, Jun

PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation

MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2024, p. 3313-3322
Zhong, Guojin; Guo, Yihu; Yuan, Jin; Zhang, Qianjun; Guan, Weili; Chen, Long

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter

The 62nd Annual Meeting of the Association for Computational Linguistics / edited by Ku Lun-Wei; Martins Andre; Srikumar Vivek. Association for Computational Linguistics (ACL), 2024, p. 7160-7174
Cao, Meng; Tang, Haoran; Huang, Jinfa; Jin, Peng; Zhang, Can; Liu, Ruyang; Chen, Long; Liang, Xiaodan; Yuan, Li; Li, Ge

SCHEMA: State Changes Matter for Procedure Planning in Instructional Videos

12th International Conference on Learning Representations, ICLR 2024 / International Conference on Learning Representations, ICLR. International Conference on Learning Representations, ICLR, 2024
Niu, Yulei; Guo, Wenliang; Chen, Long; Lin, Xudong; Chang, Shih-Fu

Seeing beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer

MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2024, p. 1602-1611
Lei, Jiaming; Li, Lin; Wang, Chunping; Xiao, Jun; Chen, Long

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval

MMGR 2024 - Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, Association for Computing Machinery, Inc, 2024, p. 1-6
Ji, Wei; Fei, Hao; Wei, Yinwei; Zheng, Zhedong; Li, Juncheng; Chen, Long; Liao, Lizi; Zhuang, Yueting; Zimmermann, Roger

UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory

Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, IEEE Computer Society, 2024, p. 28729-28740
Diao, Haiwen; Wan, Bo; Zhang, Ying; Jia, Xu; Lu, Huchuan; Chen, Long

Article 4

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, (6), article number 218
Lan, Xiaohan; Yuan, Yitian; Wang, Xin; Chen, Long; Wang, Zhi; Ma, Lin; Zhu, Wenwu

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 45, (11), p. 13218-13234
Chen, Long; Zheng, Yuhang; Niu, Yulei; Zhang, Hanwang; Xiao, Jun

Federated unsupervised representation learning

Frontiers of Information Technology and Electronic Engineering, v. 24, (8), p. 1181-1193
Zhang, Fengda; Kuang, Kun; Chen, Long; You, Zhaoyang; Shen, Tao; Xiao, Jun; Zhang, Yin; Wu, Chao; Wu, Fei; Zhuang, Yueting; Li, Xiaolin

VL-NMS: Breaking Proposal Bottlenecks in Two-stage Visual-language Matching

ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, (5 s), article number 166
Zhang, Chenchi; Ma, Wenbo; Xiao, Jun; Zhang, Hanwang; Shao, Jian; Zhuang, Yueting; Chen, Long

Conference paper 15

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models

Findings of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2023, p. 9114-9128
Lin, Hongzhan; Luo, Ziyang; Ma, Jing; Chen, Long

Compositional Feature Augmentation for Unbiased Scene Graph Generation

Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Institute of Electrical and Electronics Engineers Inc., 2023, p. 21628-21638
Li, Lin; Chen, Guikun; Xiao, Jun; Yang, Yi; Wang, Chunping; Chen, Long

COMPOSITIONAL PROMPT TUNING WITH MOTION CUES FOR OPEN-VOCABULARY VIDEO RELATION DETECTION

Paper presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda
Gao, Kaifeng; Chen, Long; Zhang, Hanwang; Xiao, Jun; Sun, Qianru

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Findings of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2023, p. 8598-8617
Wang, Zhecan; Chen, Long; You, Haoxuan; Xu, Keyang; He, Yicheng; Li, Wenhao; Codella, Noel; Chang, Kai Wei; Chang, Shih Fu

Discrepancy-Guided Reconstruction Learning for Image Forgery Detection

IJCAI International Joint Conference on Artificial Intelligence, v. 2023-August, August 2023, p. 1387-1395
Shi, Zenan; Chen, Haipeng; Chen, Long; Zhang, Dong

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Findings of the Association for Computational Linguistics, ACL 2023, Association for Computational Linguistics (ACL), 2023, p. 1314-1326
Zhou, Mingyang; Fung, Yi R.; Chen, Long; Thomas, Christopher; Ji, Heng; Chang, Shih Fu

FAIRNESS-AWARE CONTRASTIVE LEARNING WITH PARTIALLY ANNOTATED SENSITIVE ATTRIBUTES

Paper presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda
Zhang, Fengda; Kuang, Kun; Chen, Long; Liu, Yuxuan; Wu, Chao; Xiao, Jun

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

Findings of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2023, p. 11289-11303
You, Haoxuan; Sun, Rui; Wang, Zhecan; Chen, Long; Wang, Gengyu; Ayyubi, Hammad A.; Chang, Kai Wei; Chang, Shih Fu

Iterative Proposal Refinement for Weakly-Supervised Video Grounding

Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, IEEE Computer Society, 2023, p. 6524-6534
Cao, Meng; Wei, Fangyun; Xu, Can; Geng, Xiubo; Chen, Long; Zhang, Can; Zou, Yuexian; Shen, Tao; Jiang, Daxin

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification

Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings / edited by Wang Lei; Gall Juergen; Chin Tat-Jun; Sato Imari; Chellappa Rama. Springer Science and Business Media Deutschland GmbH, 2023, p. 107-123
Chen, Long; Su, Feng; Shi, Jiahao; Qian, Ye

TempCLR: Temporal Alignment Representation with Contrastive Learning

Paper presented at 11th International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda
Chang, Shih-Fu; Chen, Long; Han, Guangxing; Huang, Shiyuan; Lin, Xudong; Ma, Jiawei; Yang, Yuncong

Two Heads are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning

Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 / edited by Oh A.; Neumann T.; Globerson A.; Saenko K.; Hardt M.; Levine S.. Neural information processing systems foundation, 2023,
Li, Jiahui; Kuang, Kun; Wang, Baoxiang; Li, Xingchen; Wu, Fei; Xiao, Jun; Chen, Long

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

MMIR 2023 - Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval, Co-located with, Association for Computing Machinery, Inc, 2023, p. 39-48
Ji, Jiang; Cao, Meng; Song, Tengtao; Chen, Long; Wang, Yi; Zou, Yuexian

VIDEO SCENE GRAPH GENERATION FROM SINGLE-FRAME WEAK SUPERVISION

Paper presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda
Chen, Siqi; Xiao, Jun; Chen, Long

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 / edited by Oh A.; Neumann T.; Globerson A.; Saenko K.; Hardt M.; Levine S.. Neural information processing systems foundation, 2023,
Li, Lin; Xiao, Jun; Chen, Guikun; Shao, Jian; Zhuang, Yueting; Chen, Long

Article 2

Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey

Neurocomputing, v. 496, p. 192-207
Shao, Feifei; Chen, Long; Shao, Jian; Ji, Wei; Xiao, Shaoning; Ye, Lu; Zhuang, Yueting; Xiao, Jun

Deep Motion Prior for Weakly-Supervised Temporal Action Localization

IEEE Transactions on Image Processing, v. 31, p. 5203-5213
Cao, Meng; Zhang, Can; Chen, Long; Shou, Mike Zheng; Zou, Yuexian

Conference paper 16

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs

Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, IEEE Computer Society, 2022, p. 19475-19484
Gao, Kaifeng; Chen, Long; Niu, Yulei; Shao, Jian; Xiao, Jun

Correspondence Matters for Video Referring Expression Comprehension

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2022, p. 4967-4976
Cao, Meng; Jiang, Ji; Chen, Long; Zou, Yuexian

CROSSFORMER: A VERSATILE VISION TRANSFORMER HINGING ON CROSS-SCALE ATTENTION

Paper presented at 10th International Conference on Learning Representations, ICLR 2022, Virtual, Online
Wang, Wenxiao; Yao, Lu; Chen, Long; Lin, Binbin; Cai, Deng; He, Xiaofei; Liu, Wei

Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning

Proceedings of Machine Learning Research, v. 162, 2022, p. 12843-12856
Li, Jiahui; Kuang, Kun; Wang, Baoxiang; Liu, Furui; Chen, Long; Fan, Changjie; Wu, Fei; Xiao, Jun

Explicit Image Caption Editing

Computer Vision – ECCV 2022 - 17th European Conference, Proceedings / edited by Avidan Shai; Brostow Gabriel; Cissé Moustapha; Farinella Giovanni Maria; Hassner Tal. Springer Science and Business Media Deutschland GmbH, 2022, p. 113-129
Wang, Zhen; Chen, Long; Ma, Wenbo; Han, Guangxing; Niu, Yulei; Shao, Jian; Xiao, Jun

Few-Shot Object Detection with Fully Cross-Transformer

Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, IEEE Computer Society, 2022, p. 5311-5320
Han, Guangxing; Ma, Jiawei; Huang, Shiyuan; Chen, Long; Chang, Shih Fu

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

MM '22: Proceedings of the 30th ACM International Conference on Multimedia / Association for Computing Machinery. New York, NY : Association for Computing Machinery, 2022, p. 4204-4213
Li, Xingchen; Chen, Long; Ma, Wenbo; Yang, Yi; Xiao, Jun

Respecting Transfer Gap in Knowledge Distillation

Niu, Yulei; Chen, Long; Zhou, Chang; Zhang, Hanwang

Rethinking Data Augmentation for Robust Visual Question Answering

Computer Vision – ECCV 2022 - 17th European Conference, Proceedings / edited by Avidan Shai; Brostow Gabriel; Cissé Moustapha; Farinella Giovanni Maria; Hassner Tal. Springer Science and Business Media Deutschland GmbH, 2022, p. 95-112
Chen, Long; Zheng, Yuhang; Xiao, Jun

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 / edited by Goldberg Yoav; Kozareva Zornitsa; Zhang Yue. Association for Computational Linguistics (ACL), 2022, p. 8188-8198
Xiao, Shaoning; Chen, Long; Gao, Kaifeng; Wang, Zhao; Yang, Yi; Zhang, Zhimeng; Xiao, Jun

Rethinking the Evaluation of Unbiased Scene Graph Generation

Li, Xingchen; Chen, Long; Shao, Jian; Xiao, Shaoning; Zhang, Songyang; Xiao, Jun

Rethinking the Reference-based Distinctive Image Captioning

MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2022, p. 4374-4384
Mao, Yangjun; Chen, Long; Jiang, Zhihong; Zhang, Dong; Zhang, Zhimeng; Shao, Jian; Xiao, Jun

Rethinking the Two-Stage Framework for Grounded Situation Recognition

AAAI-22 Technical Tracks 3, Association for the Advancement of Artificial Intelligence, 2022, p. 2651-2658
Wei, Meng; Chen, Long; Ji, Wei; Yue, Xiaoyu; Chua, Tat Seng

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, IEEE Computer Society, 2022, p. 18847-18856
Li, Lin; Chen, Long; Huang, Yifeng; Zhang, Zhimeng; Zhang, Songyang; Xiao, Jun

Towards Multi-level Fairness and Robustness on Federated Learning

Paper presented at 39th International Conference on Machine Learning (ICML 2022)
Lu, Jiaxun; Chen, Long; Kuang, Kun; Liu, Yuxuan; Wu, Fei; Wu, Chao; Xiao, Jun; Zhang, Fengda

Weakly-Supervised Temporal Article Grounding

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 / edited by Goldberg Yoav; Kozareva Zornitsa; Zhang Yue. Association for Computational Linguistics (ACL), 2022, p. 9402-9413
Chen, Long; Niu, Yulei; Chen, Brian; Lin, Xudong; Han, Guangxing; Thomas, Christopher; Ayyubi, Hammad; Ji, Heng; Chang, Shih Fu

Conference paper 11

A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric

HUMA 2021 - Proceedings of the 2nd International Workshop on Human-Centric Multimedia Analysis, co-located with ACM MM 2021, Association for Computing Machinery, Inc, 2021, p. 13-21
Yuan, Yitian; Lan, Xiaohan; Wang, Xin; Chen, Long; Wang, Zhi; Zhu, Wenwu

Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework

Paper presented at Proceedings of Machine Learning Research, p. 10717-10726
WANG, Wenxiao; CHEN, Minghao; ZHAO, Shuai; CHEN, Long; HU, Jinming; LIU, Haifeng; CAI, Deng; HE, Xiaofei; LIU, Wei

Boundary Proposal Network for Two-Stage Natural Language Video Localization

35th AAAI Conference on Artificial Intelligence, AAAI 2021, Association for the Advancement of Artificial Intelligence, 2021, p. 2986-2994
Xiao, Shaoning; Chen, Long; Zhang, Songyang; Ji, Wei; Shao, Jian; Ye, Lu; Xiao, Jun

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021, IEEE Computer Society, 2021, p. 16841-16851
Chen, Long; Jiang, Zhihong; Xiao, Jun; Liu, Wei

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2021, p. 3664-3672
Li, Jiahui; Kuang, Kun; Li, Lin; Chen, Long; Zhang, Songyang; Shao, Jian; Xiao, Jun

Natural Language Video Localization with Learnable Moment Proposals

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics (ACL), 2021, p. 4008-4017
Xiao, Shaoning; Chen, Long; Shao, Jian; Zhuang, Yueting; Xiao, Jun

On Pursuit of Designing Multi-modal Transformer for Video Grounding

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics (ACL), 2021, p. 9810-9823
Cao, Meng; Chen, Long; Shou, Mike Zheng; Zhang, Can; Zou, Yuexian

Optimizing Federated Learning on Non-IID Data Using Local Shapley Value

Artificial Intelligence - 1st CAAI International Conference, CICAI 2021, Proceedings / edited by Fang Lu; Chen Yiran; Zhai Guangtao; Wang Jane; Wang Ruiping; Dong Weisheng. Springer Science and Business Media Deutschland GmbH, 2021, p. 164-175
Tang, Zuoqi; Shao, Feifei; Chen, Long; Ye, Yunan; Wu, Chao; Xiao, Jun

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

35th AAAI Conference on Artificial Intelligence, AAAI 2021, Association for the Advancement of Artificial Intelligence, 2021, p. 1036-1044
Chen, Long; Ma, Wenbo; Xiao, Jun; Zhang, Hanwang; Chang, Shih Fu

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

KDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, 2021, p. 934-942
Li, Jiahui; Kuang, Kun; Wang, Baoxiang; Liu, Furui; Chen, Long; Wu, Fei; Xiao, Jun

Video Relation Detection via Tracklet based Visual Transformer

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2021, p. 4833-4837
Gao, Kaifeng; Chen, Long; Huang, Yifeng; Xiao, Jun

Article 1

Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering

Neural Processing Letters, v. 52, (2), p. 993-1003
Xiao, Shaoning; Li, Yimeng; Ye, Yunan; Chen, Long; Pu, Shiliang; Zhao, Zhou; Shao, Jian; Xiao, Jun

Conference paper 3

Counterfactual samples synthesizing for robust visual question answering

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 10797-10806, article number 9157377
Chen, Long; Yan, Xin; Xiao, Jun; Zhang, Hanwang; Pu, Shiliang; Zhuang, Yueting

Hierarchical Fashion Graph Network for Personalized Outfit Recommendation

SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 2020, p. 159-168
Li, Xingchen; Wang, Xiang; He, Xiangnan; Chen, Long; Xiao, Jun; Chua, Tat Seng

Rethinking the bottom-up framework for query-based video localization

AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, AAAI Press, 2020, p. 10551-10558
Chen, Long; Lu, Chujie; Tang, Siliang; Xiao, Jun; Zhang, Dong; Tan, Chilie; Li, Xiaolin

Conference paper 3

Counterfactual critic multi-agent training for scene graph generation

Proceedings - 2019 International Conference on Computer Vision, ICCV 2019, Institute of Electrical and Electronics Engineers Inc., 2019, p. 4612-4622article number 9010810
Chen, Long; Zhang, Hanwang; Xiao, Jun; He, Xiangnan; Pu, Shiliang; Chang, Shih Fu

DebuG: A dense bottom-up grounding approach for natural language video localization

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, Association for Computational Linguistics, 2019, p. 5144-5153
Lu, Chujie; Chen, Long; Tan, Chilie; Li, Xiaolin; Xiao, Jun

Learning using privileged information for food recognition

MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2019, p. 557-565
Meng, Lei; Tao, Dacheng; Chen, Long; Zhang, Hanwang; Miao, Chunyan; Yang, Xun; Chua, Tat Seng

Conference paper 1

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, IEEE Computer Society, 2018, p. 1043-1052article number 8578213
Chen, Long; Zhang, Hanwang; Xiao, Jun; Liu, Wei; Chang, Shih Fu

Conference paper 2

SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Institute of Electrical and Electronics Engineers Inc., 2017, p. 6298-6306
Chen, Long; Zhang, Hanwang; Xiao, Jun; Nie, Liqiang; Shao, Jian; Liu, Wei; Chua, Tat Seng

Video question answering via attribute-Augmented attention network learning

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 2017, p. 829-832
Ye, Yunan; Zhao, Zhou; Li, Yimeng; Chen, Long; Xiao, Jun; Zhuang, Yueting

2020 4

Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering

Neural Processing Letters, v. 52, (2), p. 993-1003
Xiao, Shaoning; Li, Yimeng; Ye, Yunan; Chen, Long; Pu, Shiliang; Zhao, Zhou; Shao, Jian; Xiao, Jun
Article

Counterfactual samples synthesizing for robust visual question answering

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 10797-10806, article number 9157377
Chen, Long; Yan, Xin; Xiao, Jun; Zhang, Hanwang; Pu, Shiliang; Zhuang, Yueting
Conference paper

Hierarchical Fashion Graph Network for Personalized Outfit Recommendation

SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 2020, p. 159-168
Li, Xingchen; Wang, Xiang; He, Xiangnan; Chen, Long; Xiao, Jun; Chua, Tat Seng
Conference paper

Rethinking the bottom-up framework for query-based video localization

AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, AAAI Press, 2020, p. 10551-10558
Chen, Long; Lu, Chujie; Tang, Siliang; Xiao, Jun; Zhang, Dong; Tan, Chilie; Li, Xiaolin
Conference paper

2019 3

Counterfactual critic multi-agent training for scene graph generation

Proceedings - 2019 International Conference on Computer Vision, ICCV 2019, Institute of Electrical and Electronics Engineers Inc., 2019, p. 4612-4622article number 9010810
Chen, Long; Zhang, Hanwang; Xiao, Jun; He, Xiangnan; Pu, Shiliang; Chang, Shih Fu
Conference paper

DebuG: A dense bottom-up grounding approach for natural language video localization

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, Association for Computational Linguistics, 2019, p. 5144-5153
Lu, Chujie; Chen, Long; Tan, Chilie; Li, Xiaolin; Xiao, Jun
Conference paper

Learning using privileged information for food recognition

MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia, Association for Computing Machinery, Inc, 2019, p. 557-565
Meng, Lei; Tao, Dacheng; Chen, Long; Zhang, Hanwang; Miao, Chunyan; Yang, Xun; Chua, Tat Seng
Conference paper

2018 1

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Proceedings - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, IEEE Computer Society, 2018, p. 1043-1052article number 8578213
Chen, Long; Zhang, Hanwang; Xiao, Jun; Liu, Wei; Chang, Shih Fu
Conference paper

2017 2

SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Institute of Electrical and Electronics Engineers Inc., 2017, p. 6298-6306
Chen, Long; Zhang, Hanwang; Xiao, Jun; Nie, Liqiang; Shao, Jian; Liu, Wei; Chua, Tat Seng
Conference paper

Video question answering via attribute-Augmented attention network learning

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, 2017, p. 829-832
Ye, Yunan; Zhao, Zhou; Li, Yimeng; Chen, Long; Xiao, Jun; Zhuang, Yueting
Conference paper

Teaching Assignment

2025-26 Summer 0 2025-26 Spring 3 2025-26 Winter 0 2025-26 Fall 5 2024-25 Summer 2 2024-25 Spring 4

ARIN5204	Reinforcement Learning
COMP4981H	Final Year Thesis
COMP6922E	Research Project

COMP4901Z	Reinforcement Learning
COMP4971A	Independent Work
COMP4971D	Independent Work
COMP4981H	Final Year Thesis
COMP6922E	Research Project

COMP4971A	Independent Work
COMP4981H	Final Year Thesis

COMP4981	Final Year Project
COMP6411C	Advanced Topics in Multimodal Machine Learning
COMP6922E	Research Project
COMP6922I	Research Project

No Teaching Assignments

Research Postgraduate (RPG) Supervision

From January 2023 to December 2026 (As of 23 June 2026)

Current RPGs

Doctor of Philosophy

LIU, Yaoyang
Computer Science and Engineering
CHEN, Hongxu
Computer Science and Engineering
HE, Zhenqi
Computer Science and Engineering
JIANG, Ziqi
Computer Science and Engineering
LI, Hongxiang
Computer Science and Engineering
LIU, Junzhang
Computer Science and Engineering
WANG, Shaodong
Computer Science and Engineering
CHEN, Wei
Computer Science and Engineering
LIU, Jiazhen
Computer Science and Engineering
PHAM, Trung Kien
Computer Science and Engineering
TAN, Chaolei
Computer Science and Engineering
WANG, Yanghao
Computer Science and Engineering

Master of Philosophy

ZHANG, Haoqi
Computer Science and Engineering

Projects

From January 2024 to December 2026

All Projects 9 Leading Projects 8 Participating Projects 1

Toward Explainable and Robust Cross-modal Understanding

通向可解釋和魯棒的跨媒體智能感知 Leading

National Natural Science Foundation of China
Project Team (HKUST)
CHEN Long (Lead)
2026 -
Towards Efficiently Controllable and Editable Multimodal Foundation Models in an Evolving World

面向進化環境下的高效可控和可編輯的多模態基座模型 Leading

RGC - General Research Fund
Project Team (HKUST)
CHEN Long (Lead)
2026 -
Vehicle-Target Remote Sensing Scene Perception and Semantic Understanding

面向車輛目標的遙感場景感知與語義理解研究 Participating

Changchun Inst. of Optics, Fine Machanics and Physics, CAS
Project Team (HKUST)
CHEN Long
2026 -
Compositional Visual Perception and Understanding in Open Scenes

開放場景下的組合式視覺內容感知 Leading

National Natural Science Foundation of China
Project Team (HKUST)
CHEN Long (Lead)
2025 -
Image and Video Generation

圖像和視頻生成 Leading

Shenzhen RabbitPre Intelligence Technology Co., Ltd.
Project Team (HKUST)
CHEN Long (Lead)
2025 -
Learning to Generate Multimodal Event Complexes for Open-world Multimodal Assets Understanding

針對開放多模態數據理解的多模態事件關系圖的構建 Leading

RGC - Early Career Scheme
Project Team (HKUST)
CHEN Long (Lead)
2025 -
Towards Efficient and Robust Multimodal Understanding and Generation

高效和魯棒的多模態理解和生成 Leading

Hangzhou Rutone Technology Co., Ltd.
Project Team (HKUST)
CHEN Long (Lead)
2025 -
Key Technologies for Enhancing Image Generation to Empower Multimodal Reasoning and Understanding Leading

Fortune Ever Global Limited
Project Team (HKUST)
CHEN Long (Lead)
2025 -
Towards Efficient and Robust Visual Understanding and Generation with Diffusion Models

基于擴散模型的高效和魯棒的視覺理解與生成 Leading

Huawei Technologies Co., Ltd.
Project Team (HKUST)
CHEN Long (Lead)
2024 -

Research Interest

Publications

2026 8

2025 35

2024 24

2023 19

2022 18

2021 11

2020 4

2019 3

2018 1

2017 2

Article 4

Conference paper 4

Article 7

Conference paper 28

Article 8

Conference paper 16

Article 4

Conference paper 15

Article 2

Conference paper 16

Conference paper 11

Article 1

Conference paper 3

Conference paper 3

Conference paper 1

Conference paper 2

2020 4

2019 3

2018 1

2017 2

Teaching Assignment

Research Postgraduate (RPG) Supervision

From January 2023 to December 2026 (As of 23 June 2026)

Current RPGs

Projects

From January 2024 to December 2026

Your browser is out of date!