Research

Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory

Efficient and Versatile Robust Fine-Tuning of Zero-shot Models

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Online Temporal Action Localization with Memory-Augmented Transformer

Towards More Practivcal Group Activity Detection: A New Benchmark and Model

Classification Matters: Improving Video Action Detection with Class-Specific Attention

Burst Image Super-Resolution with Base Frame Selection

Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform

Contrastive Mean-Shift Learning for Generalized Category Discovery

Learning Correlation Structures for Vision Transformers

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Self-supervised Learning of Semantic Correspondence Using Web Videos

Efficient Semantic Matching with Hypercolumn Correlation

NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

Activity Grammars for Temporal Action Segmentation

Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

Leveraging Proxy of Training Data for Test-Time Adaptation

Scaling up GANs for Text-to-Image Synthesis

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

WEDGE: Web-Image Assisted Domain Generalization for Semantic Segmentation

Combating Label Distribution Shift for Active Domain Adaptation

Relational Context Learning for Human-Object Interaction Detection

HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization

Devil's on the Edges: Selective Quad Attention for Scene Graph Generation

Learning Rotation-Equivariant Features for Visual Correspondence

3D Scene Painting via Semantic Image Synthesis

Style Neophile: Constantly Seeking Novel Styles for Domain Generalization

Peripheral Vision Transformer

Future Transformer for Long-Term Action Anticipation

TransforMatcher: Match-to-Match Attention for Semantic Correspondence

Self-Taught Metric Learning without Labels

Integrative Few-Shot Learning for Classification and Segmentation

Fast Point Transformer

FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation

DENse and DIverse symmetry dataset (DENDI)

Reflection and Rotation Symmetry Detection via Equivariant Learning

Detector-Free Weakly Supervised Group Activity Recognition

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

Self-Supervised Equivariant Learning for Oriented Keypoint Detection

Semi-supervised Semantic Segmentation with Error Localization Network

Relational Self-Attention: What's Missing in Attention for Video Understanding

Deep Hough Voting for Robust Global Registration

Self-Calibrating Neural Radiance Fields

Learning to Discover Reflection Symmetry via Polar Matching Convolution

Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition

Relational Embedding for Few-Shot Classification

ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer

Hypercorrelation Squeeze for Few-Shot Segmentation

Embedding Transfer with Label Relaxation for Improved Metric Learning

Convolutional Hough Matching Networks

URIE: Universal Image Enhancement for Visual Recognition
in the Wild

MotionSqueeze:
Neural Motion Feature Learning for Video Understanding

Learning to Compose Hypercolumns for Visual Correspondence

Proxy Anchor Loss for Deep Metric Learning

SPair-71k:
A Large-scale Benchmark for Semantic Correspondence

Hyperpixel Flow:
Semantic Correspondence with Multi-layer Neural Features

Deep Metric Learning Beyond Binary Supervision

Relational Knowledge Distillation

Attentive Semantic Alignment with Offset-Aware Correlation Kernels

Visual Reference Resolution using Attention Memory for Visual Dialog

MarioQA: Answering Questions by Watching Gameplay Videos

Weakly Supervised Semantic Segmentation using Web-Crawled Videos

Superpixel-based Tracking-by-Segmentation using Markov Chains

Text-guided Attention Model for Image Captioning

Training Recurrent Answering Units with Joint Loss Minimization for VQA

Superpixel segmentation by constrained minimax label propagation

TransferNet: Transfer learning for semantic segmentation

MDNet for visual tracking "VOT2015 Winner"

DPPnet for image question answering

Unsupervised Co-activity Detection from Multiple Videos using Absorbing Markov Chain

DecoupledNet for semi-supervised semantic segmentation

DeconvNet for semantic segmentation

Tracking-by-segmentation using online GBDT

Online tracking by learning discriminative saliency map with CNN

Beyond chain models for visual tracking: A Trilogy

Event detection

Joint human segmentation and pose tracking

PUBLICATION DATASET

Tracking with occlusion reasoning

Generalized background subtraction

Fast nearest neighbor search