What Is Latency? Causes, Types, and Fixes

What is Latency?

Latency is a measurement of delay in a system. It is the time that passes between a request being made and the response coming back, and it is usually measured in milliseconds. The lower the latency, the faster a system responds to the people and services that depend on it.

Network latency is the amount of time it takes for data to travel from one point to another across a network.

It is the most visible form of latency, the gap between a client sending a packet and the destination receiving it. It is not the only form, though. Storage, applications, and databases each add delay of their own.

That is why the latency a user experiences is rarely a single number. A database query, a network hop between two data centers, an API call to a payment provider, a disk read: each one adds a slice, and the sum is what the request waits through.

Keep one thing straight from the start. Latency measures time, not volume. It tells you how long one operation took, not how much data you can move at once, which is a separate property entirely.

How is Latency Measured?

Latency is clocked in time units, almost always milliseconds, sometimes microseconds when the path is short and fast. The most common figure is round-trip time, the full there-and-back journey that a tool like ping reports.

1. Round-Trip vs One-Way

Round-trip time (RTT) measures a request going out and the response coming back. One-way latency measures just one direction, which is harder to capture because it needs clocks on both ends that agree. Most monitoring leans on RTT because it is simpler and it matches what a user experiences.

2. Averages Hide the Truth

An average latency number is comfortable and usually misleading. It buries the slow requests under a pile of fast ones. That is why teams report percentiles instead: p95 latency is the value 95% of requests come in under, and p99 catches the painful tail that an average smooths away. At scale, the tail is where users churn.

3. Where the Measurement Happens

You can measure latency at the network edge, inside the application with an APM agent, or end to end with distributed tracing that follows a single request across every service it touches. Each vantage point answers a different question. Network tools tell you the path is slow. Tracing tells you which service made it slow.

What are the Main Causes of Latency?

Total latency is the sum of four delays stacked on top of each other. Knowing which one dominates is how you fix the right thing instead of guessing.

1. Propagation Delay

This is distance divided by the speed of the signal, and physics caps it. Light in fiber moves at roughly two-thirds of its speed in a vacuum, so a signal between New York and London cannot beat about 56 milliseconds round trip at the theoretical floor, and in practice it runs closer to 70 or 80. No amount of faster hardware changes that. Only shorter distance moves the floor.

2. Transmission Delay

This is the time it takes to push the bits of a packet onto the wire, and it depends on packet size against link capacity. A bigger payload on a thinner link takes longer to serialize. Smaller packets and fatter links shrink it.

3. Processing Delay

Every router, switch, and server that handles the packet spends a little time on it, inspecting headers, making routing decisions, running application logic. Each hop adds only a little. Across a long path or a heavy workload, it adds up.

4. Queuing Delay

When a link or a service is busy, packets wait in line. This is the delay that swings the most, because it depends on how loaded the system is at that exact moment. A quiet network barely queues. A congested one stacks requests up, and latency spikes right when traffic is highest, which is the worst possible time.

What is the Difference Between Latency, Bandwidth, and Throughput?

These three get used interchangeably, and they should not be. Each describes a different property of the same connection.

1. Latency

Latency is the delay for one unit of data to make the trip. It is measured in time, like 40 milliseconds, and it answers a single question: how long does one journey take?

2. Bandwidth

Bandwidth is the maximum a link can carry, its theoretical ceiling, measured in bits per second. It describes how wide the pipe is, not how fast any single drop moves through it.

3. Throughput

Throughput is the data you move per second in practice, which sits below the bandwidth ceiling because of latency, packet loss, and protocol overhead. It describes how much data gets through in everyday conditions.

Here is the part people miss: adding bandwidth does nothing for latency. A wider pipe carries more at once, but a single request still takes just as long to cross it. If your problem is delay, more bandwidth is money spent on the wrong fix.

What are the Common Types of Latency?

Latency is not one thing. It is a label for delay wherever it occurs, and in a typical stack it occurs in several places at once.

1. Network Latency

This is delay across the network itself, the time packets spend crossing links and routers between two points. Distance, congestion, and routing all feed it. It is what ping and traceroute expose.

2. Disk and Storage Latency

This is how long storage takes to serve a read or write. The hardware matters enormously here. A spinning hard drive answers in several milliseconds because a physical head has to move; an SSD answers in a fraction of that, and NVMe drives push it down toward microseconds.

3. Application Latency

This is time spent inside your own code, running logic, waiting on locks, calling other services. A single slow database query or a chatty call to an external API can dominate the whole response, even when the network is fast and the disk is quick.

4. Database Latency

This is the delay a query adds, from establishing the connection to executing the statement and returning rows. Missing indexes, lock contention, and oversized result sets are the usual culprits, and database latency often hides as application latency until someone profiles it.

How Can You Reduce Latency?

You cut latency by attacking whichever delay dominates, which is why measuring first beats optimizing blind. A handful of tactics cover most cases.

1. Move Data Closer

Distance is propagation delay, and the only lever on it is shorter distance. Content delivery networks and edge locations put data near users; regional deployments keep a service close to the people who call it most.

2. Cache Aggressively

A cache answers without doing the slow work again. An in-memory store like Redis returns in microseconds instead of hitting a database; a CDN serves a cached asset from the edge instead of crossing the country. Every cache hit is latency you skipped entirely.

3. Cut the Round Trips

Each separate request pays the latency tax once. Reusing connections with keep-alive, multiplexing over HTTP/2, and batching several calls into one all shrink the number of trips, which matters far more on a high-latency link than raw speed does.

4. Tune the Slow Path

Once measurement points at the culprit, fix it directly. Add the missing database index, trim the oversized payload, swap the spinning disk for NVMe, or rewrite the function that blocks. Small targeted fixes on the dominant delay beat broad ones everywhere else.

Why Does Latency Matter?

Latency is the part of performance users feel directly, and it shows up on the business side faster than most teams expect.

1. User Experience

People read delay as quality. A snappy interface feels well-built; a laggy one feels cheap and unfinished, regardless of how solid the engineering underneath happens to be. Past a few hundred milliseconds, the wait stops being invisible and starts being annoying.

2. Revenue and Conversion

Slow pages lose customers before they convert. Every extra second of load time gives someone another reason to abandon a cart or close a tab, and on a checkout flow that delay maps straight to lost sales. Speed is not a luxury feature here. It is part of the funnel.

3. Reliability Targets

Latency is a first-class SLI, and many SLOs are written around it directly, for example 95% of requests under 250 milliseconds. When latency creeps past the target, it burns error budget the same way outright failures do, even while everything is technically still up.

4. Distributed Systems Compound It

In a microservices architecture, one user action can fan out into dozens of internal calls. Each call adds its own latency, and a slow tail on any one of them drags the whole request down. This is why tail latency gets serious attention at scale: the more services a request touches, the more chances there are to be slow.

What are Latency Best Practices?

Managing latency well is less about heroics and more about measuring honestly and watching the right number.

1. Track Percentiles, Not Averages

Report p95 and p99, not the mean. The average will tell you everything is fine while a slice of users sits through three-second waits. The tail is where the pain lives, so that is the number to put on the dashboard.

2. Set a Latency Budget

Decide what is acceptable before you measure, and write it as a target, say p95 under 200 milliseconds. A budget turns a vague want-it-faster into a clear line that either holds or does not, which makes regressions obvious instead of arguable.

3. Measure End to End

A single number for the whole journey hides where the time goes. Instrument each segment, network, application, database, external calls, with distributed tracing so a slow request points at its own cause instead of leaving you to guess.

4. Watch for Drift

Latency rarely breaks all at once. It creeps, a few milliseconds per release, until one day the page feels slow and nobody can say when it happened. Baseline it, alert on regressions, and treat a steady climb as the early warning it is.

Explore More IT Terms

Browse our comprehensive IT glossary to learn more about technology terminology.

Back to IT Glossary Contact Us

Table of Contents