VoIP: Exploring ST2110 and ST2022
When approaching any new audiovisual project, if we come to realize that IP infrastructure is not just another option but the main one, we must put theory into practice. Because, in principle, there is no difference between theory and practice, but the truth is that there really is.
By Yeray Alfageme
IP infrastructure provides -to phrase it in two words- greater flexibility and scalability. Traditional SDI solutions have the maturity and robustness provided by time and experience going for them, but they are rigid and stubborn. For example, if you need to go from SD to HD -although it may seem that this is something from the past it actually is more common than we think- much of the 270 Mbps SDI-SD infrastructure will not support 1,485 Gbps SDI-HD signals, this being the issue that many were facing some time ago. The same applies when going to UHD. Implementing 4K or 8K definitions with an SDI infrastructure in which you have to multiply by 4, or even by 16, both wiring and switching capacity becomes practically unfeasible.
IP, as we already know, is agnostic as to format to be transmitted through said infrastructure. It is all about data. There is no need to modify the wiring, the ports in the matrix (in fact, there is no matrix), and even signals of different formats can be mixed within the same equipment. No one would ever think that having SD, HD, 25 fps or slow-motion signals on an IP network can be a problem. This is just unthinkable in SDI.
SMPTE is a strong advocate for IP and has been long working hard to offer a standardization that sets up a framework for both manufacturers and integrators, and as early as 2007 they released ST2022. This was the first set of standards and the aim here was to define some requirements, in the simplest possible way, to make possible a deployment of IP infrastructure for broadcast.
Back in 2012, the ST2022-6 version provided for the use of standard Internet protocols to encapsulate SDI signals within IP packets. Basically, each SDI signal was divided into an RTP (Real Time Protocol) encapsulation within a UDP (User Datagram Packet) frame and then within the payload, the data part of an IP packet. Although this concept works well, it does not streamline the use of bandwidth since it requires more than 20% of overhead for packet headers and queues.
For example, of an SDI-SD signal packetized in IP under ST2022-6, only 76% of the information is video while the rest, almost 25%, are ancillary data, packet headers and queues, as well as TRS (Timing Reference Signals) information. ST2110 has come to the rescue.
ST2110 aims at fixing these inefficiencies, at least partially, by removing timing from the equation as a separate signal. Clocking data are embedded within the IP frames corresponding to the signals, whether they are SDI, MADI or AES, the latter two for audio.
Although Ethernet is the preferred channel for IP networks, we should not confuse IP with Ethernet, as they are not the same thing. Ethernet consists of the specifications for layer 1 -physical layer- and layer 2 -link- within a seven-layer ISO model and, for example, IP links can be implemented through the latter or through WIFI, just to give clearest and most differential example.
Both ST2022 and ST2110 can be implemented on any existing IP transport protocol, although the benefits offered by Ethernet are evident. In the first place, it is the most popular. I would almost say that the only de facto medium in corporate networks and all equipment is compatible with it, be it fiber or copper. In addition, being a physical medium -wired I mean- it avoids added latencies that are present in other media such as WIFI, in addition to increased reliability. Such delays and reliability issues are not critical in file exchange environments as they are supported by transport protocols, but they become critical when it comes to real-time video.
Eliminating TRS to gain bandwidth
Ethernet is asynchronous, that is, each packet is transmitted by the network at a different time and there is no need to wait for the next clock cycle to be transmitted, since such clock does not even exist. This makes the medium faster and more flexible, but at the same time forces the entire frame to be rebuilt when packets are received ‘at the wrong time’.
By removing TRS, ST2110 reduces the necessary bandwidth -or increases the amount of signals to transmit, whichever way look at it- between 16% and 40%, depending on whether it is audio or video. For all those of us who come from ‘old broadcast’, doing away with synchronicity can be quite scary, since the clock is one of the most important things in any system and all lines and fields, frames, audio samples and metadata have to be perfectly synchronized. So, what do we do now?
To achieve this synchronization, ST2110 uses the IEE 1588:2008 protocol, commonly known as PTP (Precision Timing Protocol). A curious piece of data: PTP is a counter that measures the exact number of nanoseconds that have elapsed since midnight on January 1, 1970, a date and time known as Epoch and used as a reference point in many computer systems. Each ST2110 packet includes within its metadata the PTP value in the RTP header, which provides the desired timing. A simple solution that is also elegant and functional.
Occam’s razor: the simplest solution is always the best.
The same PTO concept applicable to every video frame is also valid for every audio sample. The audio packets are put together in packets being similar to the AES67 ones. Each packet is signed with a PTP value so it can be sorted upon receipt. The same happens with metadata and control data, all with their PTP.
For all this to work, there must be a master PTP reference and since all data packets, whether video, audio or metadata, are generated by the camera, the mixer, the microphone or any other data source independently but referenced to this. PTP can be processed separately and in parallel, which increases the throughput and minimizes the time required.
One PTP to control them all
Within a network there may be more than one PTP source, but only one of these sources will be the so-called ‘Grandmaster’. In each of these clocks there is an algorithm called BMC (Best Master Clock) that decides which of them is the best. Criteria such as whether it is linked to a GPS signal -including preferences chosen by the system admin- establish which of all these clocks will be the grandmaster and will control them all.
This allows us to extend the limits of our network as we can imagine, since part of the information -audio for instance- can be processed within the control hardware because it requires less processing capacity, although HDR video is processed by means of Cloud resources, which are more flexible and less expensive. As a result, a large number of software has appeared that, in imitation of the Cloud, works under pay-per-use licenses, which allows us to gain that flexibility so necessary for our productions. Associating this to the past: in a linear SDI environment this was almost unthinkable.
Optimizing resources and innovating more easily
The SaaS (Software as a Service) model, which is basically the use of software as a service -paying for it only when necessary- is a paradigm shift that is quite natural in the IP world, but this results in a new set of rules for audiovisual equipment. Pay-as-you-Go is the key solution for all production companies in order to adapt their equipment to the needs of each given time in a business like ours in which workload is so changing.
Likewise, innovation becomes much easier. A mixing console, be it video or audio, used to have a format that would be the result of legacy and operational needs, but changing it required replacing everything. All this is much easier now. For example, implementing touch controls on screens that are today a mixing console, tomorrow a multiviewer and later on a replay server, requires very little effort and provides many benefits. Once more, we gain flexibility in a cheaper way and with fewer resources.
Migrating to an IP infrastructure offers great benefits, mainly flexibility and scalability, but to make the most of this, systems connected to an IP network must understand each other. For this reason, protocols and standards are even more necessary than in linear environments, since the standard IT equipment being used would not be suitable for combination without them.
Both ST2022-7 and ST2110-40 are the latest versions of these protocols. They are already well established and all IP broadcast systems are compliant with them, thus ensuring interoperability between them, which was a big headache in the past.
Now that we have interoperability, flexibility, and scalability, what’s stopping us from moving from SDI to IP? Nothing