Core Concepts
Twitter has revolutionized its real-time event processing architecture to handle over 400 billion events per day, achieving low latency, high accuracy, and reduced operational costs.
Abstract
The article discusses how Twitter has evolved its event processing system to handle the massive scale of data it generates daily.
Key highlights:
Twitter processes approximately 400 billion events in real-time and generates a petabyte of data every day.
Twitter's previous architecture was based on the lambda architecture, with batch and real-time components. This led to challenges like data loss, inaccuracies, and high latency during high-traffic events.
The new architecture leverages both Twitter's data centers and Google Cloud Platform. It uses Pub/Sub to stream events to Google Cloud, where streaming data flow jobs perform real-time aggregation and store data in BigTable.
The new architecture provides lower latency, higher throughput, and better handling of late events compared to the old Heron-based topology. It also simplifies the design and reduces compute costs.
The migration to the hybrid architecture on Twitter's data centers and Google Cloud has enabled Twitter to process billions of events in real-time with low latency, high accuracy, and stability, while also reducing operational costs for engineers.
Stats
Twitter processes approximately 400 billion events in real time and generates a petabyte of data every day.
Quotes
"Twitter ad services are one of the top revenue models for Twitter, and if its performance is impacted, it directly impacts their business model."
"Twitter offers various data product services to retrieve information on impression and engagement metrics; these services would be impacted by inaccurate data."