PACELC Theorem: The Evolution of CAP for Modern Distributed

In the world of distributed systems, scalability, availability, and consistency are paramount. For years, the CAP theorem provided a foundational understanding of the trade-offs involved. However, as systems grew more complex and the need for nuanced control increased, the PACELC theorem emerged as a more comprehensive model, guiding architects in designing robust and performant distributed applications. This blog post delves into the PACELC theorem, its evolution from CAP, and its implications for modern distributed systems, complete with practical examples.

From CAP to PACELC: A Necessary Evolution

The CAP theorem, introduced by Eric Brewer, states that a distributed system can only guarantee two of the following three properties: Consistency (C), Availability (A), and Partition Tolerance (P).

Consistency (C): All clients see the same data at the same time, regardless of which node they connect to.
Availability (A): Every request receives a response, without guarantee that the response contains the most recent write.
Partition Tolerance (P): The system continues to operate despite network failures (partitions) that prevent some nodes from communicating with others.

CAP effectively highlighted the inherent trade-offs, particularly in the face of network partitions. When a partition occurs, a system must choose between consistency and availability.

CP System: Priorities consistency. If a partition occurs, the system will become unavailable to ensure data consistency across the remaining connected nodes.
- Practical Example: A traditional banking system processing a transaction. If the network splits and some database replicas can't communicate, the bank will refuse further transactions on affected accounts until consistency can be guaranteed. This prevents double-spending or incorrect balances.
AP System: Priorities availability. If a partition occurs, the system will remain available, but some nodes might serve stale data.
- Practical Example: An online retail website displaying product availability. If a partition occurs, some users might see an item as "in stock" even if it just sold out, as the inventory update hasn't propagated to all replicas. However, the site remains functional, allowing users to browse and add items to their cart.

While CAP provided a crucial starting point, it only addressed the scenario during a network partition. It didn't offer guidance on how a system should behave when there is no partition. This is where PACELC steps in.

Unpacking PACELC - A More Granular Perspective

PACELC extends CAP by adding two additional considerations:

P (Partition Tolerance): This is the same as in CAP. When a partition occurs, you must choose between:

A (Availability): Prioritise serving requests, potentially with stale data.
C (Consistency): Prioritise data integrity, potentially at the cost of availability.

EL (Else): This is the crucial addition. Else (when there is no partition and everything is running normally), you must choose between:

L (Latency): Prioritise faster responses.
C (Consistency): Prioritise stricter data consistency.

Therefore, PACELC stands for "If there is a Partition (P), choose between Availability (A) and Consistency (C), Else (EL), choose between Latency (L) and Consistency (C)."

Let's break down the "EL" part with examples. Even without a network partition, a distributed system has to make decisions about how to propagate writes and handle reads.

ELC (Else, Consistency): This implies strong consistency even in the absence of partitions. All nodes must agree on the state of the data before a write is acknowledged or a read is returned. This usually comes with higher latency as more coordination is required.
- Practical Example: A system managing a shared counter where every increment must be seen immediately by all subsequent reads globally. To achieve this, a write operation might involve communicating with multiple nodes to ensure they all commit the new value before confirming the write, incurring higher latency.
ELL (Else, Latency): This priorities low latency when no partition is present. Writes might be acknowledged quickly, and reads might be served from local replicas, even if those replicas haven't fully caught up to the latest writes from other nodes. Eventual consistency is a common pattern here.
- Practical Example: A social media feed. When you post an update, it's typically acknowledged very quickly (low latency). However, it might take a few seconds or even minutes for all your followers around the world to see it because the update propagates asynchronously to different data centers (eventual consistency). This trade-off is acceptable for the user experience.

Common PACELC System Types

Here's how different distributed systems can be categorised under PACELC:

PC/EC (Partition-tolerant, Consistent; Else, Consistent)
- Description: These systems prioritise strong consistency above all else. During a partition, they become unavailable to maintain consistency. When no partition exists, they still strive for the highest consistency, often at the expense of latency.
- Practical Example: Google Spanner is a prime example. It aims for external consistency (a very strong form of consistency) across a globally distributed database. During a partition, it will prioritise consistency, potentially affecting availability. When no partition, it uses atomic clocks and transaction managers to maintain strict consistency across replicas, even if it means slightly higher latency than a less consistent system.
- Trade-offs: High consistency, but potentially lower availability during partitions and higher latency generally.
PA/EL (Partition-tolerant, Available; Else, Latency optimised)
- Description: These systems prioritise availability during partitions and low latency when no partition exists. They are willing to sacrifice some consistency for continuous operation and speed.
- Practical Example: Apache Cassandra. During a network partition, Cassandra nodes can continue to accept writes and reads, ensuring availability (PA). Data conflicts are resolved later when the partition heals. When no partition exists, Cassandra priorities low latency (ELL) by allowing clients to read from any replica and write to a quorum of replicas without waiting for full global synchronisation. This makes it ideal for high-throughput, always-on applications like IoT data ingestion or real-time analytics.
- Trade-offs: High availability and low latency, but eventual consistency, meaning data might be temporarily inconsistent.
PC/EL (Partition-tolerant, Consistent; Else, Latency optimised)
- Description: This is a less common but interesting hybrid. During a partition, it prioritises consistency. However, when things are normal, it leans towards latency, perhaps by using optimistic concurrency or eventual consistency mechanisms within its consistent boundaries.
- Practical Example: A distributed configuration management system (like Apache ZooKeeper or etcd) often needs strong consistency for its core configuration data (PC). If a partition occurs, it will ensure that only a consistent view of the configuration is available, potentially limiting updates. However, for clients frequently reading this configuration when no partition is present, it might allow reads from local replicas to minimize latency (ELL), provided those reads are "fresh enough" based on a bounded staleness model, or it carefully manages read consistency based on application needs.
- Trade-offs: Can be complex to implement correctly, balancing strictness during failure with speed during normal operation.

Modern Distributed Systems and PACELC in Practice

The shift in mindset from "writing code" to "designing systems" is fundamentally about understanding and leveraging these trade-offs, guided by PACELC.

Microservices Architectures: With microservices, you often have different data stores for different services. A payment service might need strong PC/EC properties (like a distributed ledger or a highly consistent database), while a user profile service might be perfectly fine with PA/EL (like a sharded document database). Architects must consciously choose the right PACELC characteristics for each service based on its specific domain and criticality.
Global Distribution: When deploying systems across multiple geographical regions, network latency becomes a significant factor. Choosing between ELC and ELL becomes a critical design decision. Global systems often opt for ELL to provide a good user experience by serving reads from local replicas, accepting eventual consistency. For example, a global e-commerce platform might show regionally consistent product prices (PC/EC for pricing) but eventually consistent stock levels (PA/EL for inventory).
Eventual Consistency Patterns: Many modern distributed systems embrace eventual consistency (PA/EL, with an emphasis on ELL) to achieve high availability and low latency. Techniques like Conflict-Free Replicated Data Types (CRDTs), message queues for asynchronous updates, and carefully designed eventual consistency models are widely used. This allows applications to remain responsive even under heavy load or network instability, providing a superior user experience in many scenarios.

Understanding PACELC is not just academic; it's a practical necessity for anyone designing or operating modern distributed systems. By clearly defining the trade-offs both during normal operation and during network partitions, PACELC empowers engineers to make informed decisions that align system behaviour with business requirements, moving beyond simply "writing code" to truly "designing resilient systems."

PACELC Theorem - The Evolution of CAP for Modern Distributed Systems

From CAP to PACELC: A Necessary Evolution

Unpacking PACELC - A More Granular Perspective

Common PACELC System Types

Modern Distributed Systems and PACELC in Practice

Comments

System Design

Back-of-the-Envelope Calculations: Sizing Your System Before You Code

More from this blog

Replication in System Design: Master-Slave vs. Multi-Master vs. Leaderless

SQL vs NoSQL: A framework for choosing the right database

Back-of-the-Envelope Calculations: Sizing Your System Before You Code

CAP Theorem: Mastering the Two-of-Three Tradeoff in Distributed Systems

Command Palette

From CAP to PACELC: A Necessary Evolution

Unpacking PACELC - A More Granular Perspective

Common PACELC System Types

Modern Distributed Systems and PACELC in Practice

Comments

System Design

Back-of-the-Envelope Calculations: Sizing Your System Before You Code

More from this blog