Core Concepts
An intelligent reading assistant system based on smart glasses with embedded RGB cameras and a Large Language Model (LLM) that can process textual information from the user's perspective, understand their preferences, and provide personalized guidance and information.
Abstract
The proposed system utilizes Aria smart glasses with embedded RGB cameras to capture egocentric video of the user's surroundings. The video is processed using state-of-the-art object detection and optical character recognition (OCR) techniques to extract textual information, such as from a restaurant menu. The extracted text is then fed into a Large Language Model (LLM), specifically GPT4, to create a digital representation of the menu.
The system also incorporates the user's personal preferences, such as dietary restrictions or food likes/dislikes, which are retrieved from various sources (e.g., bank transactions, Google Photos, Google Maps). The LLM-based chatbot then uses this personalized information to provide contextual and tailored recommendations to the user, such as suggesting suitable menu items.
The system was evaluated in a real-world setting by having four participants, each with a different native language, interact with the system while reading menus from various restaurants. The results showed a high accuracy of 96.77% in text retrieval and all participants rated the system's performance and recommendations as highly satisfactory, with an average rating of 4.87 out of 5.
The proposed framework highlights the potential of integrating egocentric vision, LLMs, and personalized data to create an effective and accessible reading assistance solution for visually impaired individuals, addressing challenges in daily activities and improving their independence and quality of life.
Stats
The number of adults aged 50 and over with visual impairment worldwide was estimated to be around 186 million in 2010.
The prevalence of uncorrectable vision problems among adults aged 40 years and older in the United States exceeded 3 million and is projected to increase to 7 million by 2050.
Quotes
"The ability to read, understand and find important information from written text is a critical skill in our daily lives for our independence, comfort and safety."
"Partial vision loss creates challenges in performing Activities of Daily Living (ADLs) and thus increases older adults' dependence on other people's assistance."