Core Concepts
A novel approach for long-term human trajectory prediction in complex environments by leveraging 3D Dynamic Scene Graphs and Large Language Models to predict sequences of human interactions with the environment, which are then grounded into a probabilistic spatio-temporal distribution over future human positions.
Abstract
The authors present a novel approach called "Language-driven Probabilistic Long-term Prediction" (LP2) for long-term human trajectory prediction in complex environments. Existing methods are limited by their focus on short-term collision avoidance and their inability to model complex human-environment interactions.
LP2 overcomes these limitations by predicting sequences of human interactions with the environment using Large Language Models (LLMs) conditioned on a 3D Dynamic Scene Graph (DSG) representation of the environment. The DSG encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. The LLM-based Interaction Sequence Prediction (ISP) module predicts likely future interactions, which are then grounded into a probabilistic spatio-temporal distribution over human positions using a Continuous-Time Markov Chain (CTMC)-based Probabilistic Trajectory Prediction (PTP) module.
The authors introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments with rich scene and interaction annotations. Experiments show that LP2 achieves a 54% lower average negative log-likelihood (NLL) and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged baselines for a time horizon of 60 seconds.
Stats
The average distance travelled in the past is 14.0 ± 11.9 m in the office and 7.8 ± 3.0 m in the home environment.
The average distance travelled in the future is 18.6 ± 12.7 m in the office and 11.7 ± 6.5 m in the home environment.
The average number of past interactions is 2.2 ± 0.5 in the office and 2.4 ± 0.5 in the home environment.
The average number of future interactions is 2.3 ± 0.8 in the office and 2.2 ± 0.8 in the home environment.
The average duration of the past observation is 67.6 ± 30.8 s in the office and 60.6 ± 30.2 s in the home environment.
The average time spent interacting in the past is 55.2 ± 30.1 s in the office and 55.1 ± 30.4 s in the home environment.
The average time spent interacting in the future is 43.7 ± 11.5 s in the office and 51.3 ± 6.6 s in the home environment.
Quotes
"For the first time, we study long-term human trajectory prediction with a prediction horizon of up to 60 s and including complex human-object interactions."
"We present LP2, a novel method combining LLM-based zero-shot interaction sequence prediction and CTMC-based probabilistic trajectory prediction to infer multi-modal spatio-temporal distribution over future human positions."
"We introduce a new semi-synthetic dataset of human trajectories in complex indoor environments with annotated human-object interactions."