Leveraging language priors in a variational framework to improve metric-scale monocular depth estimation.