Research Landscape

This section contains information about our lab's active research projects, public datasets and benchmarks. Our research mainly focuses on perception, forecasting, and navigation/planning for embodied AI to perceive, predict and interact with the dynamic environment.

Basic Vision Tasks

Perception

Focuses on scene understanding tasks including object detection, segmentation (instance, semantic, panoptic), and depth estimation from multi-modal sequences using supervised, semi-supervised, few-shot and self-supervised learning.

Key Publications

Energy-based Self-Training, ICCV 2023
ProtoCon: Pseudo-label Refinement, CVPR 2023
AerOSeg++: Scale-Aware Segmentation, TOMM 2026
GMFlow: Optical Flow via Global Matching, CVPR 2022
Unifying Flow, Stereo and Depth, TPAMI 2023
Generalized IoU (GIoU), CVPR 2019

Multi-Object Tracking

Perception

Designing end-to-end MOT frameworks to track an unknown and time-varying number of objects in crowded, unconstrained environments, addressing track initiation, termination, and occlusion handling.

JRDB-PanoTrack, CVPR 2024
Spatial and Temporal Transformers for MOT, TPAMI 2022
Learning of Global Objective for MOT, CVPR 2022
Joint Probabilistic Data Association Revisited, ICCV 2015
Online multi-target tracking using RNNs, AAAI 2017

Human Face, Emotion, Action, and Social Group & Activity Detection

Perception

Human behavior understanding in videos: simultaneously grouping people by social interactions, predicting individual actions and social activity of each group.

Social-MAE, ICRA 2025
Real-time Trajectory-based Social Group Detection, IROS 2023
MARLIN: Masked Autoencoder for facial video, CVPR 2023
JRDB-Act, CVPR 2022
Joint learning of Social Groups, Individuals Action, ECCV 2020

3D Reconstruction and Mapping

Perception

3D localisation, reconstruction and mapping of objects and human body in dynamic environments for high-level 3D scene understanding.

Normal-GS: 3D Gaussian Splatting, NeuRIPS 2024
TFS-NeRF: Template-Free NeRF, NeuRIPS 2024
ActiveRMAP: Radiance Field for Planning, Arxiv 2022
ODAM, ICCV 2021
MO-LTR, RA-L & ICRA 2021

Multi-task 3D Visual Perception System for a Mobile Robot

Perception

Designing a multitask perception system for autonomous agents (e.g. social robots) including basic-level to high-level perception and reasoning. Creating large-scale datasets for training and evaluation.

MGIoU, AAAI 2025
JRDB-Reasoning, AAAI 2025
DifFUSER, ECCV 2024
JRDB-Act, CVPR 2022
JRDB: A Dataset and Benchmark, TPAMI 2021
JRMOT, IROS 2020

Visual Reasoning

Perception

Interpreting, analyzing, and making sense of visual information: recognizing patterns, spatial relationships, and logical structures in images or diagrams.

MATA: Multi-Agent Visual Reasoning, ICLR 2026
DWIM: Tool-aware Visual Reasoning, ICCV 2025
NAVER: Neuro-Symbolic Compositional Automaton, ICCV 2025
DrVideo: Document Retrieval Based Long Video Understanding, CVPR 2025
HYDRA: Hyper Agent for Dynamic Compositional Visual Reasoning, ECCV 2024

Human Trajectory/Body Motion Forecasting

Forecasting

Developing physically and socially plausible frameworks to predict human trajectory and body pose dynamics in complex environments.

JRDB-Traj, Arxiv 2023
SoMoFormer: Multi-Person Pose Forecasting, Arxiv 2022
TRiPOD: Human Trajectory and Pose Dynamics, ICCV 2021
Social-BiGAT: Multimodal Forecasting, NeurIPS 2019
SoPhie: Attentive GAN for Predicting Paths, CVPR 2019

Active Visual Navigation in an Unexplored Environment

Using deep learning to make informed predictions about scene layout, enabling robots to navigate unseen environments with human-like instructions.

Hi-SLAM: Semantic SLAM, ICRA 2025
ActiveRMAP: Radiance Field for Planning, Arxiv 2022
Predicting Topological Maps for Visual Navigation, Arxiv 2022

Single or Multi-UAV Planning for Discovering and Tracking Mobile Objects

Online path planning for UAV-based localisation and tracking of an unknown and time-varying number of objects. Tackling single or multiple (centralised/decentralised) UAVs.