What is Quality of Service?
A former colleague once described Quality of Service as “managed unfairness” during a presentation, and this has stuck with me ever since. There are plenty of technical articles about Quality of Service, but this is a less technical description of the concepts.
Quality of Service (usually abbreviated as QoS) is a series of protocols and steps to identify and label traffic based on importance and urgency at different points in the network so that the network devices can decide how to treat the various types of traffic whenever portions of the network become congested. The simplest option for providing high quality of service for network traffic is to over-provision all elements of the network, but given the bursty, time-bound nature of network traffic, this is not always possible and is not cost-effective.
QoS is complex and, in a congested network, requires a view of end-to-end network requirements and available resources to be sure that your QoS configurations won’t exacerbate any latent issues hiding in your network. Configurations must be consistent or at least complementary throughout the network, and care must be taken to understand how different elements in the network will deal with traffic differently based on their capabilities. Additionally, the nature of important traffic must be understood to deal with it so as not to create additional problems for the applications sending and expecting that traffic. In other words, what will the application do if traffic doesn’t arrive, or if it arrives out of order, or after it is expected to arrive?
It is essential to understand that, for example, certain traffic, like that carrying the audio portion of a phone call, may be crucial, but if the traffic can’t be delivered in the time required, it becomes useless or even problematic. In this case, that traffic should be dropped before reaching its destination, even though it is crucial.
Traffic entering the network is marked based on different possible mechanisms. The marks identify the priority and urgency of the traffic based on the sender, the protocols, how much traffic a sender is currently sending or has already sent, etc. This step occurs whether there is congestion on the network or not.
Signaling communicates these marks to other devices on the network. Downstream devices may be configured to trust the markings they receive, may perform their own analysis and marking, and may translate the markings they receive into different markings. Again, this step occurs whether there is congestion or not.
Queuing may occur as traffic enters or leaves a network device and is intended to use the markings assigned to traffic to sort into different queues based on the various combinations of importance and urgency. There are varying degrees of support for queuing in the myriad of traffic-passing elements in the network. Switches may or may not have support for queues. If they do, they may support different quantities of queues or have different amounts of memory allocated to those queues, which defines how much traffic can be held in a queue. When traffic will not fit into a queue on a switch, that traffic will have to be dropped; options may be available to define when that determination is made and on what traffic characteristics.
If our network is congested, this is the time when some latency-sensitive traffic may be dropped sooner than other traffic, as in the example given above. Otherwise, in most configurations, the lowest priority traffic will start dropping. This is the step where unfairness appears. Dropping of traffic will occur only when traffic exceeds what can be handled in the various queues.
Since queuing uses processor and memory resources, there are network devices that can mark traffic but don’t support queuing; they are only marking traffic for these marks to be consumed by other devices in the network.
Shaping traffic for transmission across an interface is where the sending switch ensures that the traffic it sends across a connection best matches the priorities configured and attempts to send as much of the prioritized traffic as possible. In this step, traffic is taken from the queues based on priority and available bandwidth and transmitted to the far end of the connection. Unfairness again appears as traffic may be dropped instead of being transmitted based on configured requirements.
There are options for how traffic should be taken from the queues. Depending on the criticality, it might be necessary to send all of the highest priority traffic even if that means that no other traffic will be sent until the congestion ends, which will make congestion worse as devices retransmit the dropped packets. Usually, you will want to configure limits for how much traffic from each queue will be sent before sending traffic from a different queue.
All of these steps may happen on each network device that traffic passes through before reaching its destination. I have seen networks where the QoS configurations on different devices are working against one another, at the very least having suboptimal effects because one switch is undoing the changes that the previous switch just made for no reason other than the QoS design not being aligned end-to-end.
QoS is a very powerful tool to determine how your network will perform in a congested state, and if designed and implemented properly, can allow you to lower bandwidth costs because your network can survive short periods of congestion. But properly designing and implementing QoS requires an understanding of your traffic and your networking equipment.