SMPTE ST 2110: Deciphering the keys to IP video
In the current process of migration towards broadcast video transmission over IP protocols, let us unveil what lies behind this code and its relevance in all devices.
Text: Luis Pavía
We have been using IT networks for exchanging information for decades now. It has been also quite a few years since compatibility issues between devices and operating systems were being gradually solved. Nowadays, it is immaterial to us whether a content is accessed or an e-mail us sent from a certain device (desktop PC, tablet, mobile phone…) or operating system (Windows, Mac, Linux, iOS, Android,…). And one of the latest achievements worth noting is the increase of available bandwidth (3G, fiber optic, 4G, 5G…) which enables mobilization of nearly any volume of information in a flash.
All these achievements have been gradually developed on a common network protocol, the enormously popular “IP” –the internet protocol-, which has allowed information content to be independent from any hardware or software used for end-to-end movement. Having in mind that a protocol is just a language enabling information exchange among several devices or machines. In the same manner that we people must use the same alphabet and talk the same language to understand each other, a protocol sets the guidelines and codes that the various elements in a system must share in order to communicate with each other.
One of the features that this IP protocol takes advantage of, is that the system is supported by the layer or level structure as defined by the OSI model for interconnection of open systems, which is internationally accepted and used. This model somehow recreates the operating procedure of a physical mail system. This guarantees movement of various sets of information between multiple origins and destinations in a secure manner, regardless of content. In the same fashion as mail bags take groups of pieces of information of any size, from various points of origin to various points of destination through common channels (van, truck, train, plane…), large IP networks enable movement of large volumes of information between multiple origins and destinations.
Why is IP for video now fashionable? This is not new. Any video content accessed to in a web page, or in best known portals such as YouTube or Vimeo, platforms such as Netflix or HBO, or even video offered by our OTT operators, is video through IP networks. In all instances. Yes, we have been using it for longer than we think.
What is now new is the possibility of making the most of the advantages offered by this protocol from the origin of the signal in the cameras and within the broadcast studios themselves. It is paradoxical that we will be replacing our SDI digital signal transmission systems, initially conceived in 1989, by an IP transmission system designed in 1980. Why?
There are several reasons for this: With an IP transmission system we will not need to redo our entire transport network any time TV systems (HD-4K) evolve, as transport infrastructure is independent from content. Transmission and switching equipment (switches, routers) are much more economical than the traditional SDI systems (matrices), their market volume higher, and their development pace faster as well. And making the scope and expandability of an IP signal nearly unlimited is extremely cheap, while doing the same in SDI is significantly harder and more expensive. In sum, because of cost and future performance, IP is nowadays the only feasible option to grow in a business wise sensible manner. And it is in fact the only alternative favored by the market.
Although this move had not been made before because until recent times all requirements of the demanding Broadcast environment could not be covered simultaneously: the need of syncing devices with an accuracy of nanoseconds without impacting latency or delay in the transmission of huge volumes of information from multiple sources while keeping the indispensable interoperability between systems from any manufacturer for any kind of broadcast equipment, requires perfectly defined and structured standardization. Watching in a computer a video stored in a server is very different from making a live production of a big event with dozens of cameras, hundreds of microphones, a lot of servers, synchronizers, mixers and a wide array of equipment needed for generating a quality production to be broadcast in real time.
Because one thing is that IP will guarantee that information travels from one point to another through whatever devices are required, and a different thing is that each device that must interact in a production can handle such information timely and properly. IP guarantees that the message will reach destination, but not that the recipient will be able to read it. It would be equivalent to what happens to us whenever we want to play content in our device (computer, mobile phone, etc.) and the player tells us the required “codec” for translation is missing.
And there is much more than just codecs in the Broadcast world. It is not just a packaging method or a compression algorithm, but all control and synchronization signals that are implicit in an environment in which 24, 25 or 30 images per second must be “drawn”, having each of them millions of pixels and along with a good amount of color information for each pixel. Plus the associated audio. Plus metadata. If we stop to think that each image, sound and piece of metadata in each frame is broken in pieces, packed and sent through different routes to several recipients at the same time, interacts and synchronizes with another bunch of signals from an equal number of devices and is sent for distribution, we can start getting an idea of the complexity involved.
And that is what standards are for. Sets of specifications that are agreed and drafted from international organizations, which manufacturers accept and integrate in a seamless way for users in their systems. Thus, for us is enough to kwon the label or a few acronyms identifying the applicable standard in order to have an exact idea of what our equipment offers, requires or meets. Basically, it ensures a degree of compatibility with other equipment.
For instance, it is enough for us to hear “3G-SDI” and then we know that all specifications of our interconnection system meets the SMPTE 424M standard and it will be able to transmit bandwidths up to 2970/1001 Gbit/s suitable for high-definition video signals of 1080 progressive-scan lines. So, we plug a monitor to a camera and it can be seen. And heard, too.
In this case, aspirations are more ambitious. The goal is designing an environment that enables creating more and better content with less resources and also that said content will be able to reach any device in any resolution at any time as desired by viewers.
SMPTE stands for Society of Motion Picture and Television Engineers, which is an organization in charge of setting the standards for the audiovisual industry since 1916 and being of a similar nature to ITU (International Telecommunications Union). This latter organization is under the UN umbrella and it has been operating since 1865. In view of these dates, it is obvious that standardization is nothing new when it comes to interoperability of all kind of systems.
But, let us get back to present time. Once that the IP transmission standard is capable of efficiently managing Broadcast content with all professional-level requirements, the move from SDI to an IP infrastructure will be as significant as the change we experienced when taped content was replaced by computer files, which meant a quantum leap in work methods. But in order to get all that to work, ensuring interoperability is a requirement, not only with broadcast equipment, but also with routers and switches, which must be able to manage and properly prioritize such traffic.
As it is only natural, there are standards and regulations already in place to enable this, because this set of rules governing video transmission through networks operating under the IP protocol began to get standardized many years ago under several regulations. Worth noting was in 2007 the SMPTE ST-2022 set of standards. In some instances there were separate standards capable of using only a portion of the potential, but with no scope or interaction between them as enabled now by the ST-2110 standard. Gathering, streamlining and making consistent as required the standards that are necessary for present and future operation of all our facilities, it was structured to serve as the turning point for speeding up migration. It has to be also kept in mind that this will neither be a quick nor a massive change, as each facility will set its own migration speeds and also in view of the fact that systems must be able to exist simultaneously for some time, as it happens with all transitions.
For this reason, we are dealing with a whole set of standards and specifications and the generic name of ST-2110 is only a label for the whole group. One must go a bit deeper to see and understand what this set of standards offers us, as the aim is establishing a whole environment in which both compatibility with existing systems and feasibility for all new formats already knocking on our door as well as for all those formats that are going to arrive sooner than we think will be taken in. Because 4K TV will not bee widespread as yet. Anyway, the new generation of consoles already has 8K in its sights.
On the other hand, the drafting of standards takes time. Various groups of experts and professionals collaborate for months or even years to set a number of starting points, by taking into consideration different factors relating to most relevant players in the environment, including TV broadcasters and manufacturers, which also cooperate in the process. As a general rule, from that point, some recommendations are prepared which, after testing and validation for a sufficient period of time, are made a standard. In this case and thanks to due use of a good part of standards that are already operational and tested, progress is made in times that are significantly lower than the usual ones required in these standardization processes.
But we should not lose sight of a small detail: in a facility with SDI infrastructure, connections are point to point. From one piece of equipment to another piece of equipment. Matrices or signal splitters should be used if we need to share the same signal in several destinations, thus multiplying the amount of single connections: from camera to matrix, from matrix to monitor, from matrix to mixer, and so on.
However, in an IP installation “everything is connected to everything” by means of a single connection through a switch. By definition all connections are bidirectional, and each receiver may request and receive only the piece of information required out of a larger set. This enables us sending, for example, only audio to an audio device with no need to transfer all SDI content or require a de-embedder beforehand to separate the signals. But this requires each device to be able to “introduce and recognize” themselves in connection to a global device organizer, which is in charge of defining what piece of information must be sent or received by each device at any time. This is a sort of super master matrix which, logically, works through software.
All this requires specifications that are perfectly described and governed so everything will work as it should. This is the indispensable basis on which to build.
So let us go more in depth into this “set of standards of professional media over managed IP networks”, SMPTE ST 2110, examining the various aspects and stages concerning the same, where it comes from in some instances and where it is going to, taking as reference the public document SMPTE OV-2110-0:2018, as approved on 4 December 2018 and available for examination at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8626804, which describes the relationship between the various standards and their scope of application within the ST-2110 family of standards.
The first Standard, ST-2110-10, Synchronization and definitions, published on 27 November 2017, specifies the model for synchronization and common transport requirements for all flows of the essence. It is understood as ‘essence’ any flow containing information about audio, video, metadata media and various combinations thereof. It also includes requirements for precision time protocols (PTP), for the real-time clock protocol (RTP) with timestamps, the session description protocol (SDP), and size limits in UDP user datagrams.
The next following in numbering order, namely ST-2110-20, Uncompressed active video, also published on 27 November 2017, specifies by means of the SDP protocol, a method for identification of image parameters required for the receiver to be able to “read” the image. It supports resolutions up to 32K x 32K. Color samplings 4:4:4: 4:2:2 and 4:2:0. Various YCbCr and RGB component combinations. Floating point color depths ranging between 8 to 16 bits. Various color spaces such as BT.601-7, BT.2020-2 and ACES amongst others. And transfer systems such as BT.709, BT.2100-0, ACES y ADX. We could say it is an enhancement of SMPTE-2022-6.
Then, let us follow with ST-2110-21, Traffic shaping and delivery timing for video, also published on 27 November 2017, specifies the timing model and defines it by means of SDP parameters. Through planning of PRS package reading and traffic shaping models, identifying the various types of senders and receivers, either synchronous or asynchronous, together with their different variations, is possible, in order to properly adjust transmission requirements.
ST-2110-30, PCM audio transport, published on 31 August 2018, makes use of the SDP protocol to identity the information that is necessary to receive and correctly read any PCM audio information under specification AES67. Non-PCM audio signals and compressed audio signals are not covered by this specification. It does however contemplate identification of channels in multichannel groups, as well as sampling frequencies and package synchronization.
ST-2110-31, Seamless transport of AES3 audio, also published dated 31 August 2018, specifies requirements for transport of AES3 signals. These audio signals are capable of encapsulating many different elements. Standards SMPTE ST-337 and ST-338 are used for managing said encapsulation.
ST-2110-40, Transport of metadata, published on 25 April 2018, specifies the requirements for transport of SMPTE ST-291-1 metadata as referenced in the IETF RFC 8331 standard.
As we already mentioned, these standards were not drafted from scratch, no need to reinvent the wheel here. They are based in already existing standards, some of them still in use, that will likely end up being part –although in varying degrees of adaptation- of new 2110 specifications, amongst them:
- AES67-2018: High-performance audio streaming over IP with interoperability
- IEEE1588-2008: Time precision protocol PTP
- SMPTE-2022-6: Transport of media with high data rates (SDI over IP)
- SMPTE-2022-7: Switching of independent RTP flows
- SMPTE-2022-8: Synchronization of ST-2022-6 in ST-2110-10 environments
- SMPTE-2059-1: Generation of PTP synchronism signals
- SMPTE-2059-2: Operating parameters of SMPTE profiles in IEEE1588-2008
As we have seen so far, the structure of the family of standards forms a very specific set for each type of content with a high level of detail, which facilitates management, interactions and adaptation to new scenarios. Precisely in this same line, work is now being undertaken for new specifications, which even though not being yet published or approved, are in course of development. Worth noting are:
- ST-2110-22, Compressed video with constant bit rates (CBR), with definition of the compression format and the registration of various codecs.
- ST-2110-23, Splitting of large-bandwidth signals in multiple 2110-20 flows, with indication of the manner of splitting a signal into several having a lower bandwidth. For instance, to send 8K-16K flows as a set of lower-resolution flows.
- ST-2110-41, Transport of extended metadata, with definition of the manner of transporting dynamic or extended metadata in ST-2110 context.
On the other hand, it has to be taken into account that SMPTE is not the only organization involved in development and creation, as in many instances joint efforts are carried out with various organizations such as EBU, the European Broadcast Union. At present, one of the associations that is collaborating in the creation and standardization of specifications is AMWA (Advanced Media Workflow Association), which is developing, under a standard called NMOS (Networked Media Open Specifications) relating various aspects of network devices, interconnections and functionalities thereof. It is foreseeable that these guidelines will be eventually made part of the 2110 family, keeping their relevant codes. For the time being, the most significant and the ones giving us an idea of where the market is heading to are:
- AMWA IS-04, discovery and registration of devices in an IP network
- AMWA IS-05, management of interconnections between devices in an IP network
- AMWA IS-06, network monitoring
- AMWA IS-07, event triggering and tally
- AMWA IS-08, mapping of audio channels
Those interested in going into more depth on them and their content and specification status, can find detailed information in: https://amwa-tv.github.io/nmos/
As a conclusion, 2110 is not just another initiative. It is the determined effort of the industry and the market in building a solid, future-proof ecosystem for the benefit of all. Suffice it to check that all big names in the industry are involved and working in the same direction. Joint efforts are already bearing fruit.