PhD in Computer Science and Technology
Zhejiang University, 2020
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Article
Federated Unsupervised Representation Learning
Article
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation
Article
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Article
Compositional Feature Augmentation for Unbiased Scene Graph Generation
Conference paper
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
Conference paper
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
Conference paper
Conference paper
Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes
Conference paper
Iterative Proposal Refinement for Weakly-Supervised Video Grounding
Conference paper
TempCLR: Temporal Alignment Representation with Contrastive Learning
Conference paper
Video Scene Graph Generation from Single-FrameWeak Supervision
Conference paper
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Article
Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey
Article
Deep Motion Prior for Weakly-Supervised Temporal Action Localization
Article
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Conference paper
Correspondence Matters for Video Referring Expression Comprehension
Conference paper
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Conference paper
Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
Conference paper
Explicit Image Caption Editing
Conference paper
Few-Shot Object Detection with Fully Cross-Transformer
Conference paper
Conference paper
Respecting Transfer Gap in Knowledge Distillation
Conference paper
Rethinking Data Augmentation for Robust Visual Question Answering
Conference paper
Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives
Conference paper
Rethinking the Evaluation of Unbiased Scene Graph Generation
Conference paper
Rethinking the Reference-based Distinctive Image Captioning
Conference paper
Rethinking the Two-Stage Framework for Grounded Situation Recognition
Conference paper
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
Conference paper
Towards Multi-level Fairness and Robustness on Federated Learning
Conference paper
Weakly-Supervised Temporal Article Grounding
Conference paper
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric
Conference paper
Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
Conference paper
Boundary Proposal Network for Two-Stage Natural Language Video Localization
Conference paper
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Conference paper
Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation
Conference paper
Natural Language Video Localization with Learnable Moment Proposals
Conference paper
On Pursuit of Designing Multi-modal Transformer for Video Grounding
Conference paper
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Conference paper
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Conference paper
Video Relation Detection via Tracklet based Visual Transformer
Conference paper
Hierarchical Temporal Fusion of Multi-grained Attention Features for Video Question Answering
Article
Counterfactual samples synthesizing for robust visual question answering
Conference paper
Hierarchical Fashion Graph Network for Personalized Outfit Recommendation
Conference paper
Rethinking the bottom-up framework for query-based video localization
Conference paper
Counterfactual critic multi-agent training for scene graph generation
Conference paper
DebuG: A dense bottom-up grounding approach for natural language video localization
Conference paper
Learning using privileged information for food recognition
Conference paper
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
Conference paper
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning
Conference paper
Video question answering via attribute-Augmented attention network learning
Conference paper
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Label Semantic Knowledge Distillation for Unbiased Scene Graph Generation
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Compositional Feature Augmentation for Unbiased Scene Graph Generation
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
Discrepancy-Guided Reconstruction Learning for Image Forgery Detection
Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes
Iterative Proposal Refinement for Weakly-Supervised Video Grounding
TempCLR: Temporal Alignment Representation with Contrastive Learning
Video Scene Graph Generation from Single-FrameWeak Supervision
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey
Deep Motion Prior for Weakly-Supervised Temporal Action Localization
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Correspondence Matters for Video Referring Expression Comprehension
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning
Rethinking Data Augmentation for Robust Visual Question Answering
Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives
Rethinking the Evaluation of Unbiased Scene Graph Generation
Rethinking the Two-Stage Framework for Grounded Situation Recognition
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
Towards Multi-level Fairness and Robustness on Federated Learning
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric
Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
Boundary Proposal Network for Two-Stage Natural Language Video Localization
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation
Natural Language Video Localization with Learnable Moment Proposals
On Pursuit of Designing Multi-modal Transformer for Video Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Video Relation Detection via Tracklet based Visual Transformer
SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning
Conference paper
Video question answering via attribute-Augmented attention network learning
Conference paper
COMP4971A | Independent Work |
COMP6922E | Research Project |
COMP6922E | Research Project |
No Teaching Assignments |
No Teaching Assignments |
No Teaching Assignments |
No Teaching Assignments |
ZHU, Chaoyang
Computer Science and Engineering
ZHA, Jiajun
Computer Science and Engineering
Update your browser to view this website correctly. Update your browser now