Research Landscape
This section contains information about our lab's active research projects, public datasets and benchmarks. Our research mainly focuses on perception, forecasting, and navigation/planning for embodied AI to perceive, predict and interact with the dynamic environment.
Basic Vision Tasks
PerceptionFocuses on scene understanding tasks including object detection, segmentation (instance, semantic, panoptic), and depth estimation from multi-modal sequences using supervised, semi-supervised, few-shot and self-supervised learning.
Key Publications
- Energy-based Self-Training, ICCV 2023
- ProtoCon: Pseudo-label Refinement, CVPR 2023
- AerOSeg++: Scale-Aware Segmentation, TOMM 2026
- GMFlow: Optical Flow via Global Matching, CVPR 2022
- Unifying Flow, Stereo and Depth, TPAMI 2023
- Generalized IoU (GIoU), CVPR 2019
Multi-Object Tracking
PerceptionDesigning end-to-end MOT frameworks to track an unknown and time-varying number of objects in crowded, unconstrained environments, addressing track initiation, termination, and occlusion handling.
- JRDB-PanoTrack, CVPR 2024
- Spatial and Temporal Transformers for MOT, TPAMI 2022
- Learning of Global Objective for MOT, CVPR 2022
- Joint Probabilistic Data Association Revisited, ICCV 2015
- Online multi-target tracking using RNNs, AAAI 2017
Human Face, Emotion, Action, and Social Group & Activity Detection
PerceptionHuman behavior understanding in videos: simultaneously grouping people by social interactions, predicting individual actions and social activity of each group.
- Social-MAE, ICRA 2025
- Real-time Trajectory-based Social Group Detection, IROS 2023
- MARLIN: Masked Autoencoder for facial video, CVPR 2023
- JRDB-Act, CVPR 2022
- Joint learning of Social Groups, Individuals Action, ECCV 2020
3D Reconstruction and Mapping
Perception3D localisation, reconstruction and mapping of objects and human body in dynamic environments for high-level 3D scene understanding.
- Normal-GS: 3D Gaussian Splatting, NeuRIPS 2024
- TFS-NeRF: Template-Free NeRF, NeuRIPS 2024
- ActiveRMAP: Radiance Field for Planning, Arxiv 2022
- ODAM, ICCV 2021
- MO-LTR, RA-L & ICRA 2021
Multi-task 3D Visual Perception System for a Mobile Robot
PerceptionDesigning a multitask perception system for autonomous agents (e.g. social robots) including basic-level to high-level perception and reasoning. Creating large-scale datasets for training and evaluation.
- MGIoU, AAAI 2025
- JRDB-Reasoning, AAAI 2025
- DifFUSER, ECCV 2024
- JRDB-Act, CVPR 2022
- JRDB: A Dataset and Benchmark, TPAMI 2021
- JRMOT, IROS 2020
Visual Reasoning
PerceptionInterpreting, analyzing, and making sense of visual information: recognizing patterns, spatial relationships, and logical structures in images or diagrams.
- MATA: Multi-Agent Visual Reasoning, ICLR 2026
- DWIM: Tool-aware Visual Reasoning, ICCV 2025
- NAVER: Neuro-Symbolic Compositional Automaton, ICCV 2025
- DrVideo: Document Retrieval Based Long Video Understanding, CVPR 2025
- HYDRA: Hyper Agent for Dynamic Compositional Visual Reasoning, ECCV 2024
Human Trajectory/Body Motion Forecasting
ForecastingDeveloping physically and socially plausible frameworks to predict human trajectory and body pose dynamics in complex environments.
- JRDB-Traj, Arxiv 2023
- SoMoFormer: Multi-Person Pose Forecasting, Arxiv 2022
- TRiPOD: Human Trajectory and Pose Dynamics, ICCV 2021
- Social-BiGAT: Multimodal Forecasting, NeurIPS 2019
- SoPhie: Attentive GAN for Predicting Paths, CVPR 2019
Single or Multi-UAV Planning for Discovering and Tracking Mobile Objects
Online path planning for UAV-based localisation and tracking of an unknown and time-varying number of objects. Tackling single or multiple (centralised/decentralised) UAVs.
- NEUSIS: Neuro-Symbolic Framework for UAV, RA-L 2025
- GyroCopter, RA-L 2024
- Multi-Objective Multi-Agent Planning, TSP 2024
- Distributed Multi-object Tracking under Limited FoV, TSP 2021
- Multi-Objective Multi-Agent Planning, AAAI 2020




