latency vs throughput shifting your system design mindset

In the world of system design, performance is paramount. But what exactly constitutes "performance"? Too often, discussions around system speed and efficiency get bogged down in a shallow understanding of metrics. Many developers, especially those transitioning from a purely coding mindset, tend to focus on individual code execution speed. While optimising algorithms is crucial, it's only one piece of a much larger puzzle. To truly design robust and scalable systems, we need to shift our mindset from "writing code" to "designing systems," and a critical part of that shift involves a deep understanding of latency and throughput.

These two terms are often used interchangeably, leading to confusion and suboptimal design decisions. They are, in fact, distinct and equally important metrics, each providing a different lens through which to evaluate a system's behaviour.

Latency: The Time Taken for a Single Operation

Imagine you're ordering a coffee. Latency is the time it takes for your single order to be placed, prepared, and delivered to you. In system design, latency refers to the time delay between a request being initiated and its corresponding response being received. It's often measured in milliseconds (ms) or microseconds (µs).

Key characteristics of latency

User-centric: High latency directly impacts the user experience. A slow loading webpage, a delayed API response, or a lagging game all point to high latency.
Single-unit focus: It measures the time for a single unit of work (e.g., one database query, one API call, one message processing).
Impact of overhead: Network hops, serialisation / deserialisation, disk I/O, context switching, and even garbage collection can all contribute to increased latency.

When to prioritise latency

Interactive applications: Websites, mobile apps, real-time dashboards, and anything requiring immediate user feedback.
Real time systems: Financial trading platforms, gaming, autonomous driving systems where quick decision making is critical.
Low volume, critical operations: Think about a single, important transaction that needs to complete quickly.

Consider a simple user login service. If a user clicks "Login" and waits 5 seconds for a response, that's high latency. Even if the system can handle a million logins per hour, that individual user experience is poor.

Throughput: The Amount of Work Done Over Time

Now, imagine the coffee shop again. Throughput is the total number of coffees the shop can make and serve per hour. In system design, throughput refers to the rate at which a system can process units of work over a given period. It's typically measured in requests per second (RPS), transactions per second (TPS), messages per minute, or megabytes per second (MB/s).

Key characteristics of throughput

System-centric: It measures the overall capacity and processing power of the system.
Batch processing: It's often more relevant for workloads that process a large volume of data or requests asynchronously.
Scalability indicator: A system with high throughput can handle a large concurrent load.

When to prioritise throughput

Batch processing jobs: Data analytics, report generation, nightly backups.
High volume, non-interactive services: Message queues, logging services, background processing tasks.
Microservices architectures: Where individual services might have lower latency requirements but collectively need to process a massive number of requests.

Think about a log processing service. It might not matter if a single log entry takes 100ms to process, as long as the system can ingest and process millions of log entries per second.

The Interplay and the Trade-off

Latency and throughput are not entirely independent. They are often inversely related, and optimising for one can sometimes come at the expense of the other.

Increased Concurrency: Often, to increase throughput, you increase concurrency (e.g., add more threads, more servers). While this can boost the number of operations per second, each individual operation might experience slightly higher latency due to shared resource contention, context switching, or queueing delays.
Resource Utilisation: A system pushing its limits to achieve maximum throughput might experience higher latency because resources are constantly saturated.

Lets look at an example
Imagine a single lane bridge (your system).

Low Latency: If only one car (request) is on the bridge at a time, it crosses very quickly. Low latency.
High Throughput: If you want to maximise the number of cars crossing the bridge per hour, you might pack them in bumper-to-bumper. This increases throughput, but each individual car's journey time (latency) might increase due to traffic and slower overall movement.

The art of system design lies in understanding these trade-offs and finding the right balance for your specific use case.

Shifting the Mindset: From Code to System

For developers accustomed to optimising individual functions and algorithms, the transition to system design requires a broader perspective:

Identify bottlenecks beyond code: A slow system might not be due to an inefficient for loop. It could be a database query taking too long, network latency between microservices, contention for a shared resource, or an under-provisioned server. Focus on identifying the slowest path in the entire request flow.
Understand Service Level Objectives (SLOs): What are the performance requirements for your system? Do users expect responses in under 100ms (latency focus)? Or does the background job need to process 10,000 items per second (throughput focus)? Define these early.
Consider Concurrency and Parallelism: How will your system handle multiple simultaneous requests? This is where throughput becomes critical. Think about load balancing, message queues, thread pools, and distributed architectures.
Think About Queues and Backpressure: What happens when your system is overloaded? Do requests get dropped? Do they queue up, increasing latency for everyone? Designing for graceful degradation and backpressure mechanisms is crucial for maintaining stability under load.
Design for Observability: How will you know what your latency and throughput are in production? Implement comprehensive monitoring, logging, and tracing to gain insights into your system's real world (production) performance. Tools like Prometheus, Grafana, Jaeger, and Zipkin become indispensable.

Practical Implications

Let's consider a practical Scenario: An e-commerce platform needs to process customer orders.

Latency critical aspect: The checkout process. A customer expects immediate feedback after clicking "Place Order." High latency here leads to abandoned carts and frustrated users.
Throughput critical aspect: Processing daily order fulfillment batches. While individual orders need to be processed correctly, the primary goal is to ensure all orders placed within a 24 hour window are picked, packed, and shipped. The system needs to handle a massive volume of these operations efficiently, even if individual processing times are slightly higher.

A well designed e-commerce system will have dedicated components and strategies to optimise for both. The real time checkout service might use fast, in-memory caches and highly optimised, low-latency APIs. The batch order fulfillment system might leverage message queues, worker pools, and asynchronous processing to maximise throughput.

Conclusion

Moving from a code centric to a system centric design philosophy is fundamental for building performant, scalable, and resilient applications. Latency and throughput are not just abstract terms, they are tangible metrics that drive design decisions at every level, from database schema to network topology. By deeply understanding the nuances of each, and critically evaluating their trade-offs in the context of your specific system requirements, you can design systems that not only "work" but truly excel in meeting user expectations and business demands.

The next time you're evaluating a system's performance, ask yourself: "Am I optimising for individual speed, or overall capacity?" The answer will reveal a lot about your design priorities.

Latency vs. Throughput - Shifting Your System Design Mindset

Latency: The Time Taken for a Single Operation