Applying state-of-the-art deep learning models for eye feature detection can significantly improve the dropout rate, accuracy, and precision of gaze estimation in virtual reality compared to traditional computer vision techniques.
TikTok's and Instagram's local machine learning models exhibit significant demographic disparities in their performance, particularly for age and gender prediction, as well as extraction of visual concepts from images.
The core message of this work is to propose the Multi-View Entropy Bottleneck (MVEB) objective to effectively learn the minimal sufficient representation in the unsupervised multi-view setting. MVEB simplifies the learning of the minimal sufficient representation to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution.
Our method automatically generates a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery, enabling efficient training of object reconstruction methods that are robust to occlusions.
A novel continual learning approach that automatically expands pre-trained vision transformers by adding modular adapters and representation descriptors to accommodate distribution shifts in incoming tasks, without the need for memory rehearsal.
The core message of this work is that neural network layers can be designed to learn histogram-based "engineered" features, such as local binary patterns and edge histogram descriptors, which can improve feature representation and performance on image classification tasks.
TASK2BOX is a method that uses box embeddings to effectively model and visualize asymmetric relationships between datasets, such as hierarchical structures and task affinities.