Latency vs. Throughput vs. Bandwidth

Nov 13, 2025

Latency, throughput, and bandwidth are the core metrics that describe the performance of a network or distributed system.

Together they determine how fast the first byte arrives, how much data you can move per second, and the maximum capacity of the path.

Let’s understand them with the highway analogy.

Bandwidth: The number of lanes on the highway (e.g., 5 lanes). This is the maximum physical capacity of the road.
Latency: The time it takes one car to drive from Exit 1 to Exit 10 at the speed limit with no traffic.
Throughput: The total number of cars that pass Exit 10 per hour.

Now, consider a traffic jam (congestion).

The bandwidth stays the same. The road still has 5 lanes.
The latency (travel time) rises for every car because travel takes longer.
The throughput (cars per hour) drops because fewer cars clear the exit per hour.

Networks behave the same way. Congestion increases latency and reduces throughput even when raw bandwidth does not change.

Lets now explore them in more detail.

Understanding Latency

Latency is the delay. It is the time it takes for a single piece of data to travel from its source to its destination. It is commonly measured in Milliseconds (ms).

In practice, we almost always measure Round Trip Time (RTT), which is the time for a request to go out and an acknowledgment to come back.

Example: When you make an API call, the latency is the time between your client sending the HTTP request and receiving the first byte of the response.

Common Causes of Latency

Physical Distance: The speed of light sets a hard limit. Even with a 1 Gbps internet connection, your latency to a server on another continent will be high.
Network Hops: Each router, DNS lookup, NAT, firewall, and load balancer adds queueing and processing time.
Connection setup: Establishing a connection (especially over HTTPS) requires multiple back-and-forth steps.
Server Processing: CPU contention, locks, context switches, garbage collection pauses, and cold code paths add processing time before a response is generated.
Databases: Cache misses, missing indexes, N+1 queries, synchronous disk I/O, and cross-region reads lengthen database query time.

How to Reduce Latency

The goal is to cut distance, round trips, and processing time.

Caching: Store frequently accessed data (e.g., in Redis or a browser cache) so you avoid a slow database call or network request.
Content Delivery Networks (CDNs): This is a global network of caches. When a user in India requests a video, they get it from a server in Mumbai, not New York.
Edge Computing: Run application logic (not just static content) closer to the user to reduce the number of round trips.
Asynchronous Processing: Instead of blocking the user while your system completes a time-consuming task, offload the task to a background process and return a response immediately.
Database Optimization: Use techniques like database indexes, partitioning, and sharding to reduce query time.
Persistent Connections: Keep a long-lived connection between a client and server that remains open across multiple requests and responses.

Understanding Throughput

Throughput is the rate. It measures how much data actually gets transferred successfully over a network in a specific amount of time.

It is measured in Megabits per second (Mbps), Gigabits per second (Gbps), or often in Requests Per Second (RPS) or or Transactions Per Second (TPS).

Example: Your API server might have a low latency of 50ms. However, if that server can only process 100 requests concurrently before it runs out of CPU or database connections, its throughput is 100 RPS.

Throughput is almost always limited by a bottleneck. This bottleneck could be anything:

Server CPU
Database connection limits
Disk read/write speeds (I/O)
Network congestion

How to Increase Throughput

The goal is to process more units of work in the same amount of time. You can do this by reducing the overhead of each task or by doing more tasks at once.

Batching: Instead of making 1000 separate, tiny API calls to insert data, make one large API call with 1000 records. This reduces the per call overhead.
Parallelism: This means doing multiple things at the same time. Download managers use multiple connections to download parts of a file simultaneously.
Increase Server Capacity: Scale horizontally (add more servers behind a load balancer) to handle more concurrent requests.

Understanding Bandwidth

Bandwidth is the capacity. It represents the maximum theoretical amount of data your network link or system could handle.

Bandwidth is measured as a rate of data capacity per second, usually in bits per second. Common units: Kbps, Mbps, Gbps. In some systems you might also see bytes per second (KB/s, MB/s), where 1 byte = 8 bits.

Example: Your internet connection has 100 Mbps bandwidth. So a 2 GB download cannot finish faster than ~164 seconds in ideal conditions, no matter how low the latency is or how many threads you use.

Think of bandwidth as the width of the pipe. It belongs to the link, not to a single request or user.

Two important clarifications:

Bandwidth is a ceiling. Real applications rarely hit it because protocols, losses, CPU, and storage often limit the actual rate.
Bandwidth is not throughput. Throughput is what you actually achieve at a moment in time. Bandwidth is what the link could deliver in ideal conditions.

In real-world systems, we rarely talk about increasing bandwidth itself, that’s just buying a bigger pipe from your provider. Instead, we focus on using our available bandwidth more efficiently.

How to Improve Bandwidth Utilization

The goal is to get more work done with the capacity you already have.

Compression: The most direct way to “improve” bandwidth is to make your data smaller. If you can make a 1 MB file 100 KB, you can send 10 times as many files over the same link in the same amount of time.
- For Text (HTML, CSS, JSON): Use server-side compression like Gzip or Brotli.
- For Images: Serve modern formats like WebP or AVIF. They offer vastly better compression and quality compared to old JPEGs or PNGs.
- For Video: Use modern codecs (like H.265/HEVC or AV1) instead of older ones (like H.264) to cut the data size in half for the same visual quality.
Protocol Tuning: Use modern protocols like HTTP/2 or HTTP/3 (which uses QUIC). They are far more efficient at handling multiple requests over a single connection, which reduces the “waiting” caused by latency.