toplogo
Sign In

High-Resolution Video Analytics on Serverless Platform with SLO-aware Batching


Core Concepts
Tangram, an efficient cloud-edge video analytics system, leverages serverless computing and adaptive frame partitioning to reduce bandwidth consumption and computation cost while maintaining SLO violations within 5% and negligible accuracy loss.
Abstract
The paper introduces Tangram, a cloud-edge video analytics system that addresses the challenges of high-resolution video analytics. Tangram consists of two main components: the edge and the cloud server. At the edge, the adaptive frame partitioning algorithm is used to align the regions of interest (RoIs) within video frames into patches of various sizes. These patches are then transmitted to the cloud server. In the cloud, the Patch-stitching Solver stitches the patches onto a sequence of fixed-size canvases. The Latency Estimator provides a conservative estimation of the inference time for different batch sizes. The Online SLO-aware Batching Invoker determines the optimal time to invoke the serverless function to execute the inference, aiming to minimize the cost and SLO violation rate. Experimental results show that Tangram can reduce bandwidth consumption by up to 74.30% and computation cost by up to 66.35%, while maintaining SLO violations within 5% and negligible accuracy loss.
Stats
Transmitting a 4K video encoded in H.264 format at 30 frames per second typically requires a bandwidth of 13-34 Mbps. RoIs constitute less than 10% of most high-resolution videos, and non-RoI computation overheads occupy up to 15.43%. As the number of source cameras increases from 1 to 5, the average RoI inference time exponentially escalates from 59.07ms to 325.84ms with an NVIDIA GeForce RTX 4090 GPU.
Quotes
"Tangram can reduce bandwidth consumption by up to 74.30% and computation cost by up to 66.35%, while maintaining SLO violations within 5% and negligible accuracy loss."

Deeper Inquiries

How can Tangram's adaptive frame partitioning algorithm be extended to handle more complex video analytics tasks, such as object tracking or activity recognition

Tangram's adaptive frame partitioning algorithm can be extended to handle more complex video analytics tasks by incorporating additional features and techniques tailored to specific requirements. For object tracking, the algorithm can be enhanced to track the movement of objects across frames by linking the identified RoIs in consecutive frames. This can involve assigning unique identifiers to objects and predicting their positions based on previous frames. Activity recognition can be achieved by analyzing the patterns of RoIs over time to identify specific actions or behaviors. This may involve training the system to recognize predefined activities or anomalies based on the sequence of RoIs.

What are the potential trade-offs between the granularity of the frame partitioning (e.g., 2x2, 4x4, 6x6) and the overall system performance in terms of accuracy, latency, and cost

The granularity of the frame partitioning (e.g., 2x2, 4x4, 6x6) in Tangram's adaptive algorithm can impact the overall system performance in several ways: Accuracy: A finer granularity (e.g., 6x6) may provide more precise RoIs but could lead to higher computational costs and latency due to increased patch sizes and stitching complexity. Conversely, a coarser granularity (e.g., 2x2) may sacrifice some accuracy but reduce computational overhead. Latency: Finer granularity can result in more patches per frame, potentially increasing the time required for patch extraction and stitching. Coarser granularity may reduce the number of patches but could lead to information loss and reduced accuracy. Cost: The granularity of frame partitioning directly impacts the number of patches and the size of canvases, affecting the cost of function invocations. Finer granularity may incur higher costs due to more frequent invocations, while coarser granularity may reduce costs but at the expense of accuracy. Finding the optimal balance between granularity and performance requires considering the specific requirements of the video analytics task, such as the level of detail needed, the available computational resources, and the desired trade-offs between accuracy, latency, and cost.

How can Tangram's design principles be applied to other cloud-edge computing scenarios beyond video analytics, such as real-time sensor data processing or edge-based machine learning inference

Tangram's design principles can be applied to other cloud-edge computing scenarios beyond video analytics by adapting the system architecture and algorithms to suit the specific requirements of different applications. For real-time sensor data processing, Tangram's adaptive batching and serverless computing approach can be utilized to efficiently process and analyze streaming sensor data. By partitioning the sensor data into manageable chunks and leveraging serverless functions for processing, the system can handle fluctuating workloads and optimize resource utilization. In edge-based machine learning inference, Tangram's approach of adaptive frame partitioning and SLO-aware batching can be applied to optimize the inference process on edge devices. By aligning input data into batches based on resource constraints and latency requirements, the system can efficiently execute machine learning models on edge devices while minimizing costs and meeting performance targets. This can be particularly beneficial for edge devices with limited computational resources or intermittent connectivity to the cloud.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star