Packet Loss

Short definition

Packet loss occurs when one or more data packets traveling across a network fail to reach their destination. It is measured as a percentage of packets sent versus packets received and is one of the primary indicators of network quality degradation.

Extended definition

Every data transmission across a network involves packets. It is a set of units of data routed independently from source to destination and reassembled at the receiving end. When any of those packets fail to arrive, the receiving system either requests retransmission (for TCP connections) or simply loses the data (for UDP connections). It depends on the protocol in use.

Small amounts of this loss are tolerable for some traffic types. A file transfer using TCP automatically retransmits lost packets, adding latency but eventually delivering complete data. For real-time applications using UDP, such as video conferencing, VoIP, or live streaming. This is immediately perceptible to users and cannot be recovered through retransmission.

Packet loss above 1% is generally considered significant for most applications. Above 5%, performance degrades noticeably. Above 10%, many real-time applications become unusable. And even TCP connections suffer severe throughput reduction as retransmission overhead consumes an increasing share of available bandwidth.

The causes of this span multiple layers of the network stack. Physical problems such as degraded cables or faulty transceivers, network congestion where device queues overflow, software bugs in routing hardware, and misconfigured devices can all produce this loss. Identifying which layer is responsible requires a combination of monitoring tools and systematic diagnostic steps.

Deep technical explanation

Causes of packet loss

Network congestion is the most common cause. When a network device receives more traffic than it can forward at the current moment. It buffers packets in a queue. If the queue fills faster than it empties, packets at the tail are dropped. This is called tail-drop and is the default behavior on most networking hardware.

Physical layer degradation includes damaged cables, dirty fiber optic connectors, faulty SFP transceivers, or electromagnetic interference on copper links. These produce bit errors that cause packets to fail checksum validation and be discarded before reaching the application layer.

Device resource exhaustion covers CPU overload on routing hardware, memory pressure reducing available buffer space, and table overflow conditions, such as a routing table or ARP cache that has exceeded its capacity.

Protocol-level causes include misconfigured MTU settings that cause packets to be fragmented incorrectly or dropped when fragmentation is disabled, and routing loops where packets circulate between devices until their TTL expires.

Measurement methods

ICMP ping provides basic packet loss measurement between two points. Sending a large number of ICMP echo requests over a sustained period gives a statistically meaningful packet loss percentage. However, some network devices deprioritize ICMP traffic, which can produce false positive loss readings.

Synthetic probes using UDP or TCP provide more representative measurements for specific application types. A probe using the same protocol as the monitored application gives a truer picture of what that application actually experiences.

MTR (My Traceroute) combines traceroute and ping to measure packet loss at each hop along a path, which is useful for isolating which network segment is responsible for observed loss without needing to check each device individually.

TWAMP (Two-Way Active Measurement Protocol) provides precise bidirectional packet loss and delay measurements, commonly used by ISPs and carriers for SLA verification.

Packet loss and TCP throughput

TCP’s congestion control mechanisms treat packet loss as a signal of network congestion and respond by reducing the transmission rate. At 1% packet loss, TCP throughput can drop to 30-40% of the theoretical maximum, depending on the connection’s round-trip time. This is why a link that appears to have capacity available can deliver very poor actual throughput when even small amounts of packet loss are present.

Practical examples

A managed NOC team monitoring a client’s video conferencing gateway detects packet loss of 0.8% on an uplink. The client has not reported issues yet, but the alert fires because the team monitors at a 0.5% threshold for latency-sensitive traffic. Investigation reveals a duplex mismatch on a switch port. The mismatch is corrected and packet loss drops to 0%.

A CDN operator notices elevated packet loss on one PoP’s backbone peering session during peak hours. Flow analysis shows the peering port is reaching 94% utilization, causing tail-drop on the queue. The operator provisions additional peering capacity and moves some prefixes to a secondary peer. Packet loss during peak hours drops from 2.1% to 0.1%.

A SaaS company’s support team receives complaints about degraded VoIP quality on calls through a specific office location. Packet loss testing to the VoIP gateway shows 3.4% loss. MTR analysis identifies the loss at the third hop, which is a managed switch inside the building. A failing SFP transceiver is identified and replaced. Call quality is fully restored.

Why it matters

  • Packet loss is one of the earliest observable symptoms of network degradation, often appearing before other indicators such as total link saturation or device failure.
  • Even small amounts of packet loss have a disproportionate impact on TCP throughput due to congestion control behavior, meaning a link with 1% loss can effectively deliver a fraction of its rated capacity.
  • Real-time applications using UDP cannot retransmit lost packets. Packet loss above 1% is audible in VoIP calls and visible in video streams without any further degradation needed.
  • Packet loss on paths used by monitoring or security systems can cause false alerts or missed detections, compounding other operational problems.
  • SLA definitions for network services typically include packet loss thresholds alongside latency and uptime figures. Tracking packet loss is required for accurate SLA compliance reporting.

How BlueGrid.io uses it

  • BlueGrid.io monitors packet loss on all managed network paths with per-link thresholds calibrated to the traffic type on each link, applying tighter thresholds for real-time traffic such as VoIP and video conferencing.
  • We use both ICMP synthetic probes and protocol-specific probes so that packet loss measurements reflect actual application experience, not just ICMP behavior.
  • MTR-based path analysis is part of our standard packet loss diagnostic workflow, isolating the responsible network segment quickly rather than checking each device in sequence.
  • Packet loss events are correlated with bandwidth utilization data to distinguish congestion-caused loss from physical or hardware-caused loss, directing the remediation to the correct root cause.
  • For CDN clients, we monitor packet loss on all peering and transit connections separately, with peak-hour trend analysis that identifies capacity headroom concerns before they cause user-visible degradation.
  • All packet loss incidents are documented with before-and-after measurements so clients have clear evidence of resolution and a baseline for comparison if the condition recurs.
Share this post

Share this link via

Or copy link