PhD in Computer Science and Technology
Zhejiang University, 2020
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Article
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention
Article
Article
In Defense of Clip-Based Video Relation Detection
Article
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation
Article
Learning Combinatorial Prompts for Universal Controllable Image Captioning
Article
NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation
Article
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Conference paper
Conference paper
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Conference paper
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Conference paper
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
Conference paper
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
Conference paper
RAP: Efficient Text-Video Retrieval with Spare-and-Correlated Adapter
Conference paper
SCHEMA: State Changes Matter for Procedure Planning in Instructional Videos
Conference paper
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Conference paper
View-Consistent 3D Editing with Gaussian Splatting
Conference paper
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Article
Federated Unsupervised Representation Learning
Article
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Article
Conference paper
Compositional Feature Augmentation for Unbiased Scene Graph Generation
Conference paper
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
Conference paper
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Conference paper
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
Conference paper
Conference paper
Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes
Conference paper
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Conference paper
Iterative Proposal Refinement for Weakly-Supervised Video Grounding
Conference paper
TempCLR: Temporal Alignment Representation with Contrastive Learning
Conference paper
Conference paper
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Conference paper
Video Scene Graph Generation from Single-FrameWeak Supervision
Conference paper
Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
Conference paper
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Article
Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey
Article
Deep Motion Prior for Weakly-Supervised Temporal Action Localization
Article
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Conference paper
Correspondence Matters for Video Referring Expression Comprehension
Conference paper
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Conference paper
Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
Conference paper
Explicit Image Caption Editing
Conference paper
Few-Shot Object Detection with Fully Cross-Transformer
Conference paper
Conference paper
Respecting Transfer Gap in Knowledge Distillation
Conference paper
Rethinking Data Augmentation for Robust Visual Question Answering
Conference paper
Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives
Conference paper
Rethinking the Evaluation of Unbiased Scene Graph Generation
Conference paper
Rethinking the Reference-based Distinctive Image Captioning
Conference paper
Rethinking the Two-Stage Framework for Grounded Situation Recognition
Conference paper
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
Conference paper
Towards Multi-level Fairness and Robustness on Federated Learning
Conference paper
Weakly-Supervised Temporal Article Grounding
Conference paper
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric
Conference paper
Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
Conference paper
Boundary Proposal Network for Two-Stage Natural Language Video Localization
Conference paper
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Conference paper
Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation
Conference paper
Natural Language Video Localization with Learnable Moment Proposals
Conference paper
On Pursuit of Designing Multi-modal Transformer for Video Grounding
Conference paper
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Conference paper
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Conference paper
Video Relation Detection via Tracklet based Visual Transformer
Conference paper
Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering
Article
Counterfactual samples synthesizing for robust visual question answering
Conference paper
Hierarchical Fashion Graph Network for Personalized Outfit Recommendation
Conference paper
Rethinking the bottom-up framework for query-based video localization
Conference paper
Counterfactual critic multi-agent training for scene graph generation
Conference paper
DebuG: A dense bottom-up grounding approach for natural language video localization
Conference paper
Learning using privileged information for food recognition
Conference paper
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
Conference paper
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning
Conference paper
Video question answering via attribute-Augmented attention network learning
Conference paper
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation
Learning Combinatorial Prompts for Universal Controllable Image Captioning
NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
RAP: Efficient Text-Video Retrieval with Spare-and-Correlated Adapter
SCHEMA: State Changes Matter for Procedure Planning in Instructional Videos
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Compositional Feature Augmentation for Unbiased Scene Graph Generation
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Iterative Proposal Refinement for Weakly-Supervised Video Grounding
TempCLR: Temporal Alignment Representation with Contrastive Learning
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Video Scene Graph Generation from Single-FrameWeak Supervision
Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey
Deep Motion Prior for Weakly-Supervised Temporal Action Localization
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Correspondence Matters for Video Referring Expression Comprehension
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
Rethinking Data Augmentation for Robust Visual Question Answering
Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives
Rethinking the Evaluation of Unbiased Scene Graph Generation
Rethinking the Two-Stage Framework for Grounded Situation Recognition
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
Towards Multi-level Fairness and Robustness on Federated Learning
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric
Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
Boundary Proposal Network for Two-Stage Natural Language Video Localization
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation
Natural Language Video Localization with Learnable Moment Proposals
On Pursuit of Designing Multi-modal Transformer for Video Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Video Relation Detection via Tracklet based Visual Transformer
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
Conference paper
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning
Conference paper
Video question answering via attribute-Augmented attention network learning
Conference paper
COMP4901Z | Reinforcement Learning |
COMP4971A | Independent Work |
COMP4981 | Final Year Project |
COMP6922E | Research Project |
COMP6922I | Research Project |
COMP4971A | Independent Work |
COMP4981 | Final Year Project |
UROP1000 | Undergraduate Research Opportunities |
UROP1100M | Undergraduate Research Opportunities Series 1 |
COMP4971A | Independent Work |
COMP6411C | Advanced Topics in Multimodal Machine Learning |
UROP1100L | Undergraduate Research Opportunities Series 1 |
COMP6922E | Research Project |
COMP4971A | Independent Work |
COMP6922E | Research Project |
COMP6922E | Research Project |
CHEN, Wei
Computer Science and Engineering
LIU, Jiazhen
Computer Science and Engineering
PHAM, Trung Kien
Computer Science and Engineering
TAN, Chaolei
Computer Science and Engineering
WANG, Yanghao
Computer Science and Engineering
Update your browser to view this website correctly. Update your browser now