Core Concepts
An autonomous agent should maximize the mutual information between past observations and future observations to perform effective active perception. This requires a representation that can summarize past observations, synthesize future observations, and calculate predictive information along dynamically-feasible trajectories.
Abstract
The paper proposes a framework for active perception that aims to maximize the mutual information between past observations and future observations. This is motivated by the idea that an autonomous agent performing active perception should seek to obtain the most informative observations about the environment.
The key components of the proposed approach are:
Neural Radiance Fields (NeRFs): The scene is represented using a NeRF, which captures the photometric, geometric, and semantic properties of the environment. NeRFs can be used to synthesize new observations from different viewpoints.
Predictive Information Calculation: The authors define predictive information as the mutual information between past observations and future observations. They show how to calculate this using an ensemble of NeRF models, which allows estimating the uncertainty in color, depth, occupancy, and semantic predictions.
Sampling-based Trajectory Optimization: The authors use a sampling-based planner to generate dynamically-feasible trajectories that maximize the predictive information. This involves generating a set of candidate trajectories and evaluating their predictive information to select the most informative one.
The authors demonstrate the effectiveness of their approach in simulation experiments for object localization and scene reconstruction tasks in realistic 3D indoor environments. They show that their method outperforms baseline exploration strategies in terms of the number of objects localized and the quality of the reconstructed scene.
The key insights are that (1) maximizing predictive information can lead to sophisticated exploration behaviors without the need for hand-engineered heuristics, and (2) NeRFs provide a suitable representation for active perception tasks, as they can capture the necessary photometric, geometric, and semantic properties of the environment.
Stats
The paper does not provide specific numerical data or statistics to support the key claims. However, it presents qualitative results and performance comparisons against baseline methods for the object localization and scene reconstruction tasks.
Quotes
"An autonomous agent performing active perception should maximize the mutual information that past observations posses about future ones."
"NeRFs represent photometric and geometric properties well but they do not represent the semantics. In §III we will use a "semantic variant" of NeRF as our generative model for predictive information."
"Predictive information can characterize the complexity of the scene and that is why maximizing it could enable active perception."