VL4AI Lab | Vision & Language for Autonomous AI

Our Mission

Our research addresses the theoretical foundations and practical applications of computer vision and machine learning for an embodied AI to perceive, predict and interact with the dynamic environment around it.

• Object and scene understanding and reconstruction.
• Prediction and reasoning about human motion, activity and behaviour in the presence of physical and social interactions with the environment and the other people, respectively.

Our overarching aim is to develop an end-to-end perception system for an embodied agent to learn, perceive and act simultaneously through interaction with the dynamic world.

Seminal Works

TPAMI 2023

Unifying Flow, Stereo and Depth Estimation

CVPR 2022

GMFlow: Learning Optical Flow via Global Matching

TPAMI 2021

JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments

CVPR 2019

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

CVPR 2019

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

NeurIPS 2019

Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

AAAI 2017

Online multi-target tracking using recurrent neural networks

ICCV 2015

Joint probabilistic data association revisited

Check our publications for a complete list →

VISION & LANGUAGE
FOR AUTONOMOUS AI GROUP

Our Mission

Seminal Works

Unifying Flow, Stereo and Depth Estimation

GMFlow: Learning Optical Flow via Global Matching

JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Online multi-target tracking using recurrent neural networks

Joint probabilistic data association revisited

Recent News

MATA published in ICLR 2026

Aeroseg++ published in TOMM 2026

MGIoU published in AAAI 2025

JRDB-Reasoning published in AAAI 2025

VISION & LANGUAGEFOR AUTONOMOUS AI GROUP

Our Mission

Seminal Works

Unifying Flow, Stereo and Depth Estimation

GMFlow: Learning Optical Flow via Global Matching

JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Online multi-target tracking using recurrent neural networks

Joint probabilistic data association revisited

Recent News

MATA published in ICLR 2026

Aeroseg++ published in TOMM 2026

MGIoU published in AAAI 2025

JRDB-Reasoning published in AAAI 2025

VISION & LANGUAGE
FOR AUTONOMOUS AI GROUP