المفاهيم الأساسية
Relational databases and deep learning models can be seamlessly integrated to enable efficient serving of deep learning inference queries on relational data.
الملخص
The content discusses the need for efficiently serving deep learning (DL) models on relational data across diverse commercial and scientific domains. It highlights three pivotal architectural paradigms: the state-of-the-art DL-centric architecture, the potential UDF-centric architecture, and the potential relation-centric architecture.
The DL-centric architecture offloads DL computations to dedicated DL frameworks, leading to significant cross-system overheads. The UDF-centric architecture encapsulates tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS), but lacks flexibility in optimizing the operations within the UDF. The relation-centric architecture represents large-scale tensor computations through relational operators, facilitating co-optimization with relational processing, but may incur higher latency for small model inferences.
The content argues for a novel RDBMS design that seamlessly integrates these three architectural paradigms and various middle grounds in between. The key components of this envisioned system include:
A unified intermediate representation (IR) and a novel query optimizer that dynamically selects the appropriate representation for each operator and co-optimizes SQL processing and model inferences.
A unified resource management framework for tuning the threading and memory allocation across DB operations, DL runtime operations, DNN library operations, and linear algebra operations.
A unified storage co-optimization framework for tensor blocks and relational data, enabling techniques like accuracy-aware data/model deduplication and physics-aware data/model co-partitioning.
Novel model serving techniques adapted for RDBMS, such as using RDBMS indexing to cache inference results in an application-aware style.
The proposed system aims to enhance productivity, deliver superior performance for a broad class of inference applications, and effectively avoid cross-system overheads.