AoIP II: practical applications


In the first installment of AoIP we talked about the advantages of using this technology in our production environments and its main differences in comparison to traditional audio, whether it was analogue or digital. However, implementing a complete AoIP system entails new challenges that must be faced which different manufacturers have been gradually sorting out. On this occasion, we will see the practical applications that are necessary to implement these types of systems: automatic detection, administration, control and security.


The plug-and-play concept is something already assumed by everyone, whether they are professionals or consumers in any IT environment. Wi-Fi just works, USB just plugs in, and my headphones are directly audible when I plug them into my phone. But reaching this level of maturity does not happen overnight. I still remember the days when in order to get any new device connected to a computer one had to prepare the drivers, install them -the correct version, of course, depending on the operating system, device model, etc.-, connect the device in proper sequence, follow a protocol and manually configure everything and more… And this was not so long ago, only going 10-15 years back.

And what has made this change in the IT world possible? Open standards, of course. All operating systems, devices and protocols speak the same language, at least partially, which allows them to behave in this highly desirable automatic way that makes life so much easier for us today. Well, something similar has happened with AoIP.

Three decades of history

Yes, it has been 30 years already since AoIP started to be used in general multimedia production environments. And in the beginning everything was as tedious as in the IT world. Everything had to be configured by hand, nothing was compatible with anything and the manufacturers seemed that they were fighting like successors involved in inheritance proceedings. There was no way for them to understand each other.

For example, an IP microphone connected to a network needed to have an IP address and make use of broadcast IP addresses known to the rest of the devices that would ‘listen’ to the stream sent by the microphone and be able to ‘read’ it, understand its format, bit depth, sample rate and everything else. Well, all this had to be configured manually.

It is easy to imagine the problems and the number of configuration errors that occurred and the time required to set up even the smallest AoIP production environment. This slowed its deployment a lot, but it was the future and manufacturers and standards continued to evolve to make what we have today possible: AES67.

Achieving Plug-n-Play

To ensure that when connecting a new device to our audio over IP production network the device will be recognized and work correctly, two things are required: finding the device and controlling it.

When we say finding the new device, it is not only about all other equipment knowing that it is actually there, but also identifying in what format it is configured and what type of audio stream it is generating, if the device is a source. In order to get this done, an IP address must be allocated. This is easy to solve thanks to the DHCP (Dynamic Host Control Protocol) protocol, through which the router is able to grant an IP address to a new device within the network’s range. We already have it on the network then.

Going one step further, we need to tell the rest of devices what audio settings it is set to. That is what AES67 is for. This protocol sets forth a specific format for RTP (Real-Time Transport Protocol) payload through which the audio specifications are exchanged. This payload is acknowledged by all other the devices and they adapt to it. Now we have the device correctly discovered and in a proper format.

Moving on to the control side, in addition to the device’s IP we must know the broadcast address(es) used to transmit the relevant stream. These addresses are used so that the rest of devices can ‘listen’ to the stream and request a copy of these broadcast packets from the network’s switch. This reduces the network’s load and all devices listen to the same stream at the same time without duplicating it. The problem is the large number of existing broadcast addresses even in the smallest network, which immediately complicates their control, although if done correctly they can be managed automatically.

Not all is good news with Plug-n-Play as it greatly complicates interoperability between devices. If all devices mutually identify each other, it is assumed they can also communicate among themselves. It is also assumed that other formats such as AES3 or MADI can properly communicate with AES67 devices. In addition to automated systems, control software applications always have manual functionalities so as to enable the different protocols understand each other.

Increasing flexibility

Increasing the flexibility and ease for interconnecting any type of device comes at a price: complexity. Having all devices communicate with each other means that we have to carry out complex configurations such as, for example, identifying all the broadcast addresses of the network. Even in the smallest of networks we find hundreds of broadcast addresses, and their number grows exponentially.

That is why standards such as AES67 do not include a particular specification for such interoperability and subsequent flexibility, leaving the network and management software to take care of that. Nowadays these systems are mature enough and work quite well in managing an AoIP network.

Enhanced interoperability

As with augmented reality, a critical point is that all devices, regardless of manufacturer, not only should communicate with each other, but also that management software, regardless of manufacturer, has to be able to discover and control all devices. The problem is that each manufacturer develops their control system based on their products, obviously, and they invest a lot of development efforts and money in the task, but they cannot carry out major interoperability developments.

In addition, it would entail problems related to sharing certain secrets or source code between manufacturers, something that everyone is very reluctant to do. In the end, the source code is where the value of the intellectual property of the different developments lies, and they involve a lot of money, as we mentioned.

That is why, if we want to have the flexibility of AoIP, like AES67, we must allows for certain compromises. For example, having an AoIP network based on AES67 with a control system from a single manufacturer will provide us with all the advantages and flexibility, but within this manufacturer.

It is true that either manufacturers or regulators could work on open standards so that equipment and control systems are compatible with each other, but this is not going to come out of the blue just like that. There are certain advances and initiatives in this regard, but there they have a long way to go before they are ripe.


In the end we must get the best of both worlds. For example, combining the specifications of the ST2110 standard with AES67 is a good compromise. The operations side is more than resolved in AoIP, but there are challenges such as interoperability of control systems between pieces of equipment from different manufacturers; or simplifying the necessary configuration of an IT network when connecting devices between themselves.

Rather than fighting against the current limitations, it is better to adapt to them and have them in mind in order to work much better and make the most of the benefits provided by AoIP, especially the flexibility that it offers us. Undoubtedly, compromising something in terms of variety of equipment is going to give us some advantages that, through traditional audio -whether analog or digital- we could not dream of.


Author: Yeray Alfageme

Aviwest and CyanView
EVS joins forces wit