VISION & LANGUAGE
FOR AUTONOMOUS AI GROUP
Discovering the fundamental principles for an embodied AI to perceive, predict and interact with the dynamic world.
Our Mission
Our research addresses the theoretical foundations and practical applications of computer vision and machine learning for an embodied AI to perceive, predict and interact with the dynamic environment around it.
- • Object and scene understanding and reconstruction.
- • Prediction and reasoning about human motion, activity and behaviour in the presence of physical and social interactions with the environment and the other people, respectively.
Our overarching aim is to develop an end-to-end perception system for an embodied agent to learn, perceive and act simultaneously through interaction with the dynamic world.
Seminal Works
Unifying Flow, Stereo and Depth Estimation
GMFlow: Learning Optical Flow via Global Matching
JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints
Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks
Online multi-target tracking using recurrent neural networks
Joint probabilistic data association revisited
Recent News

MATA published in ICLR 2026
Congratulations to Zhixi. Read the paper for the groundbreaking work in ICLR.
READ PAPER →
Aeroseg++ published in TOMM 2026
Congratulations to Saikat. Read the paper for Aeroseg++.
READ PAPER →
MGIoU published in AAAI 2025
Congratulations to Tho. Read the paper for MGIoU.
READ PAPER →
JRDB-Reasoning published in AAAI 2025
Congratulations to Simin. Read the paper for reasoning in robot perception.
READ PAPER →