Core Concepts
The paper derives and proves a set of joint covariance properties for spatio-temporal receptive fields under compositions of spatial scaling, spatial affine, Galilean, and temporal scaling transformations. These joint covariance properties enable robust and accurate inference of 3D scene structure and motion from multi-view observations of dynamic scenes.
Abstract
The paper studies the covariance properties of a generalized Gaussian derivative model for spatio-temporal receptive fields under various geometric image transformations. The key points are:
The paper defines a model for spatio-temporal receptive fields that combines spatial smoothing with affine Gaussian kernels and temporal smoothing with Gaussian or time-causal limit kernels. This model can be used to describe receptive fields in the retina, LGN and primary visual cortex.
It introduces the concept of scale-normalized spatial and temporal derivative operators, which are essential for obtaining covariance properties under scaling transformations. This includes defining a new notion of affine scale-normalized directional derivatives.
The paper derives the individual covariance properties of the receptive field model under spatial scaling, spatial affine, Galilean, and temporal scaling transformations.
It then proves the joint covariance properties under compositions of these geometric image transformations. This allows relating receptive field responses between different views of the same dynamic scene, accounting for variations in viewing distance, orientation, and relative motion.
A geometric analysis shows how the derived joint covariance properties can be interpreted in terms of locally linearized perspective or projective transformations between views, as well as temporal scaling of spatio-temporal events.
The paper argues that the derived covariance properties are highly relevant for interpreting the functional properties of simple cells in the primary visual cortex, as they enable these receptive fields to be well-adapted to handling the variability of image structures caused by observing a dynamic 3D environment.