Codecs. Coding magic.
What happens to our content when it is digitalized? What makes some formats better than others? Why some content achieves much better quality while taking up much less space? Let us see how our digital content works when packed and conveyed.
Text: Luis Pavía
In this occasion we will focus our laboratory in analyzing one of the elements we deal with all the times, one of which we may not always have enough information about in order to make the most of its possibilities. Because as in former times negative types and processing thereof would have a definite impact in the final result of contents, nowadays one of the crucial parameters to achieve our purposes with the highest assurance of success is proper handling of registry formats.
But do not worry. We will not be going back to the times of negative film. Indeed, we will even omit most of the older and SD formats, in our aim to focus on most current formats. We will have to see what their situation is in 5 years from now. And let us apologize beforehand if you are missing some format we may have overlooked.
But do not expect an exhaustive enumeration or comparison of all formats currently existing in the market. In the first place, due to physical space constraints. Secondly, because current obsolescence/renewal cycles would render this content outdated in a very short time. Thirdly, because we think it is more illustrative showing the criteria that will enable choosing every time the solution that is most suitable to each specific need. And last, because information tables prepared by the various manufacturers do not always provide us with uniform, comparable data sets.
To start with, then, let us start by shedding some light on certain aspects that, save for professionals directly involved with them, are not always dealt with properly. The term “format” is normally contemplated, although some times is used for making reference to a set of features. Although it is true that a “codec” is one of such features, it also comprises other parameters. And in view of the wide range of combinations available, it is important to have in mind what we are talking about.
The outermost layer we must first dissect in order to reach our binary data would be the physical media (i.e. the memory card or disk in which our files are stored). Physical media and notion still on a very basic and foreign to our lab, but something that is highly illustrative in order to be able to easily understand the fact that even though a given type of card may fit in a slot it does not mean that the machine will be able to read its content properly- Well-known examples of this are SD cards in their different variants; XQD, P2, SxS … and lately, for huge volumes as those generated by our cameras, the various types of hard disks: conventional, SSD, M.2, etc.
The relevant media contains the file, which is in turn often identified by its “format”. But this term has two different meanings, which gives rise to initial confusion. For the time being, we will just give the first meaning, which makes reference to the type of file, to be hereinafter exclusively named as “container”, being responsible for combining the various contents. This is a binary file identified by its extension. It actually resembles those shoe boxes we have in our closets at home: We have the possibility of using identical, adjacent boxes for storing various contents: a coin collection, old pictures, wool caps or even… shoes! Additionally and, pay attention to this because it is the key in the instance in hand: we have the possibility of storing different types of contents in the same box at the same time.
And, what are these different “things” stored in our digital shoe box? In our particular case, the container holds in the same file at least three or four types of different, but closely related contents. It must contain a collection of images comprising our video sequence, the various audio tracks associated to said images, some kind of time code or, at least, syncing between audio and video. Other contents that are occasionally found are generic metadata relating the file that include extended information on the content such as time and date for registration, or brand, model and serial number of the equipment having generated the relevant content. Furthermore, it may also contain various subtitle collections. Widely known containers are WAV, AC3, AAC, PCM, WMA and MP3 file types in the specific instance of audio-only files. Or other types such as AVI, MP4, MOV, MXF, M2TS, FLV… for video files.
A first piece of information to bear in mind: a container does not have an impact on video quality. As an initial example, let us remind that within a .MOV container we could find a video in MPEG2 with audio in MP3 whereas a different .MOV container could hold a video fragment in H.264 with the associated AAC audio. Although not all containers can have all codecs, the relationship between container and quality only depends on the type of codec it may hold. And on the type of codec chosen in each particular instance.
Going a bit further in our analysis, a “codec” is not even the video file. A codec is nothing more -and nothing less!- than an algorithm, the mathematical procedure that has been used to compress the binary data that make up the sequence to deal with, including images on the one hand and sound on the other, in order to store them in digital media. This will enable us to rebuild our contents, our recordings, provided the destination equipment is able to understand said codecs. It is the method used for packing, storing and distributing our digital content. And it must be supported by the player or destination device that will attempt processing or displaying the relevant content.
Effectively, the procedure must be bidirectional, thence its name: “codec” stands for co-dec, an acronym for coder-decoder. Normally it is a piece of software being placed in our computers when installing certain applications or included by manufacturers, for instance, in their cameras. It may also be the case that codecs are made available through hardware, which is actually nothing but chipsets featuring the relevant software on an in-built basis and exclusively dedicated to this process.
As for audio, especially in the very beginning a single relationship between containers and codecs was usual, which later gave rise to the mistaken belief –as video has been gaining widespread presence in digital media- that said association still stays true nowadays.
But does the existence of so many codecs make sense at present? There are several reasons for this, all of them falling into two categories: Those relating development-technology in the first place, and commercial reasons as well. And all of them always after the same goal: offering the highest quality through the least data size. That is, changing the compression technique based on the technology available various results are obtained. And here is where divisions originating from the various purposes intended by each system arise.
Compression will not be the same when the aim is distributing ready-to-use content through a network having limited bandwidth or if the purpose is directly reading that same content from media featuring minimal speed constraints. Both methods will even differ a great deal when the need of keeping as much information as possible for processing (editing, colour-grading …) the recording arises.
Every manufacturer and developer has tried to gradually cover various needs and therefore niches of their particular interest have been arising. And the evolution of technological capabilities with regards to processing, storage and transmission through different media coupled with the various needs of multiple players in the audiovisual world, has resulted in a wide, complex scenario.
At this point, and before identifying the various purposes for which each codec may be most suitable, let us roughly sketch what is the meaning of certain terms frequently used for telling between codecs. And most of all, in order to know which are the key aspects we must take into account when choosing our own codec.
Algorithm: a method based on the use of mathematical functions which interest lies in the ability to regenerate a sequence of binary data from a set of data with lower volume. Although several algorithms are used nowadays, in former times DCT –an acronym for Discreet Cosine Transform- was much in fashion and the most frequently used mathematical function in compression techniques for all kinds of binary data, even before the widespread increase of audiovisual data processing.
Spatial compression: This is the compression taking place within each frame individually and separately from the frames coming before or after the relevant one being processed. Decreases volume of information by grouping pixel areas that share the same colour. If some tolerance is allowed for this value, higher compression is achieved but at the expense of a final image with some clearly visible faults or artefacts. A most typical one is strips showing colour gaps where a smooth, continuous degrading should have been in place.
Time compression: this is achieved by grouping information obtained from several consecutive images. Information volume decreases, thus avoiding the resending of a given data subset from certain areas by identifying content as ‘identical in the sequence of images from some frame A to some frame B.’ Again, if we are too lose with tolerance levels, the fault we will get is that some areas in the picture do not show a smooth flow, but are sluggish instead.
When both techniques are used at the same time, the so-called ‘inter-frame’ (between frames) compression results. However, if only spatial compression is used this would be an ‘intra-frame’ (within the frame) compression. Merely knowing this we can already be certain that if our purpose is conveying content we will pursue an ‘inter-frame’ mode, and adjust compression to optimize the available bandwidth in order to achieve the best transmission with minimal data. However, if we intend to edit content while keeping the maximum degree of original information, we should go for an ‘intra-frame’ compression type.
Sampling: Another important factor to bear in mind. At the start of a digitalization process and before proceeding to compress the signal, a colour sampling is made. This involves translating the value of light falling on each pixel in our sensor into a binary figure. Based on the amount of information being used in this process, we will require a different kind of codec in order to keep it properly. In the case at hand, cameras normally enable us to choose resolution (HD, UHD …), frame rate (24, 25, 30, 50, 60) and sweep type (i or p, interlaced or progressive).
They typically allow us to select colour depth within a limited range depending on the relevant camera type: 8, 10, 12, 14 or even 16 bits of colour per channel. And also colour sampling in some instances. Based on the amount of information being recorded between luminance and chrominance, values will be the well-known 4:2:0, 4:2:2. 4:4:4 and even 4:4:4:4. In the most advanced instances, even gamut and colour spaces can be selected as well.
But neither all cameras have such options, nor all codecs are capable of supporting them: First major choice: What codecs support the kind of information I wish to save/transfer?
By taking into account the amount of pixels per second to read, process, save, transfer, decompress and retrieve in order to enjoy an audiovisual sequence, we begin to get an idea of the enormous processing power required for this. That is why the technological developments achieved in each of the above fields have set the limits for the various possibilities throughout time, although growth pace has been exponential, as it is nearly the case with new codecs.
In fact, a parameter being used for identifying the possibilities offered by codecs is the so-called bit rate, which is expressed in Mbps. The higher the value, the more amount of information being kept, although a higher bandwidth is required for transfer, along with more storage space and, in spite of this, higher image quality is not always achieved. That is where an efficient codec comes into play.
Then, how is it possible that a significantly smaller file will offer better image and sound quality than a much larger file? This is precisely the key question and where the ability of each codec lies. There is no standard efficiency parameter and this is not just a matter of Math, but has also something to do with perception of our senses.
Because limits are so tight that this is no longer a matter of dealing with the amount of information to save/retrieve as a mere piece of data. Great care is taken with regards to what features of human perception will detect these imperfections with more/less accuracy in order to maintain the highest standards in most critical levels perception while being more relaxed in those aspects that our perception is less subtle in capturing. And attention should be paid to yet another issue: A codec (storage method) should not be confused with the curve for processing light and colour (Canon LOG Gamma for Canon, V-Log for Panasonic, or S-Log for Sony, just to mention a few), which have an effect on conversion [analogue stimulus -> digital output] prior to compression and packaging. A codec is an algorithm and the curve is the method for processing light in the analogue-digital conversion.
Therefore, we find a wide range of codecs. While some of them are excellent for a certain purpose (for example, transfer), they could be really poor for other (colour grading, for instance). Keep in mind that all aim at compressing signal as much as possible in such a way that content can be retrieved at destination with the highest apparent or required quality. And ‘apparent’ means that we are going to perceive it better, even at the expense of sacrificing other features. And ‘required’ means the threshold under which content is not acceptable. And this greatly varies depending on the context or field involved.
As we said, if our aim is broadcasting content that has been already produced, colour and movement accuracy standards will not be as high as if our aim is broadcasting a sports event live, where editing is not required but a smooth image flow is critical. And we should go further beyond if we are recording content that will need subsequent editing and colour grading, in which case we will require a lower compression in order to be able to carry out later on a much more precise processing of both light and colour and movement.
And the scope opens up even further: Let not forget that current transmission scenarios are not just restricted to ‘traditional’ aerial broadcast, as broadcasting through data networks directly from a server right to client displays are becoming increasingly popular, from desktop –SetTopBox- configurations to downloads to tablets and cell phones in live broadcasting and video-on-demand configurations.
And if this seems quite complex, costs must be considered as well. Yes, the price. The other feature having directly impacted the development of codecs. Most codecs being used in various platforms involve a number of costs of which we are not directly aware, but which are nonetheless an element making for part of the price we pay for any products we purchase such as cameras, processors, software, etc. And that is the reason behind a good part of compatibility issues we find. Developers, most especially large ones, prefer to have their own product in order to achieve as well the best integration and performance possible with their own devices. And we are certainly approaching a point in which the impression is that complexity prevails over solutions. But this is not the case. The issue is shedding light on the vast array of parameters that nowadays and, most especially, in the future, will need to be considered when choosing a codec, knowing what their properties are for, and which ones are most important for our purpose. Because at present we are at a turning point due the high resolutions achieved such as 4K and 8K, a higher colour depth, huge dynamic ranges being attained now. All these elements will be crucial in current and future developments.
Will we be able to use just any codec at all times? No. We will be subject to the limits set by the capturing device itself. Should we always choose ‘the most powerful’ one? Obviously not. Should we use the one achieving the highest compression rate? Of course not either. Then, what questions should I ask myself? How to determine what codec is the ideal one? It is quite easy and we think a single question should give us the hint towards the best solution. In this case, by starting from the end: what am I generating THIS content for? There might be many instances in which several versions of the same content will be needed depending on whether it is going to serve different purposes. And the whole workflow must be considered as well.
Because, as it is often the case, it is a matter of trade-off. Using the most powerful codec necessarily involves capacity for processing and storing a huge volume of data and the according minimum requirements in terms of CPU, GPU, storage and ultra-fast controllers. And although this will provide us with the most amount of information possible, which is ideal for shooting a master that must subsequently undergo HDR, WCG, colour grading, composition, etc. all equipment to be used throughout the workflow should be of the highest level. This will be complete unnecessary in many other instances in which the purpose is broadcasting content already produced to have it reach -through various transmission channels- multiple screens of different viewers.
Taking into account all these assumption, let us review some codecs currently available and others that are now under development but will be reaching us in the short term, in order to shed some light on the relevant needs, requirements, benefits and limitations so we will be in a position to choose the most suitable codec based on our needs. Or our budget.
Starting with capture, and as far as capabilities are concerned we find on the top a collection of RAWs across the entire range of manufacturers/developers: normally uncompressed, and each of them on a proprietary basis for each of the different camera manufacturers. Arri, Blackmagic, Bolex, Canon, Panasonic, Red, Sony… Each offers their own (sometimes more than one) for each camera model based on sensor, purpose and the device’s intended performance.
RAWs are a quite particular case as strictly speaking this is not a codec but a method for conveying all information captured by the sensor in bulk, or raw, so as to enable further processing with maximum performance. It is a method used exclusively for capture, without compression or with a minimal lossless compression, which requires enormous storage volumes as well as enormous transfer speeds and the necessary decoding software in order to make it readable by the editing platform. It is only used as source and never as destination in workflows.
And there are cameras that are unable to internally record in RAW format, although they do feature dedicate video output directly from the sensor and uncompressed or with minimal compression, which is useful for specific recorders from the brand itself or from other manufacturers such as the well known case of Atomos.
In this line, one of the latest trends has been looking for the best balance between data volume and editing capabilities by developing specific codecs that will maintain most processing possibilities of raw data without generating such huge information volumes. Therefore, for instance, the Canon Cinema RAW Light, or Sony X-OCN formats are methods that take up between 2/3 and 1/5 of a full raw’s average size (depending on variations and internal settings used), while keeping intact the better part of post-production possibilities offered by an original RAW file.
This immediately translates in content of very high quality within files that are much easier to handle throughout the whole workflow. For a large number of productions the result is more than enough and costs savings really meaningful. Although all these codecs are proprietary, all developers provide the software required for facilitating processing in most edition platforms. And the borders between pure raw and these formats are increasingly blurring as they allow tweaking some parameters in post-production process (colour spaces, gamma, sub-sampling chroma, debayering, etc…) that had this far been exclusive of pure raw formats.
Although logically, both cameras and external recorders use their own codec collections. At present, amongst the most popular ones in high-end devices, we have those developed for cameras by manufacturers themselves or by software developers. Come to our mind the following containers/codecs: AVCHD, DNxHD and DNxHR (Avid), ProRes (Apple), XAVC (Sony), XDCAM, XF-AVC (Canon), … which are unfortunately not compatible with each other.
As we can see, not all high-level codecs are from camera manufacturers, but some companies as Apple or Avid –that have been involved in audiovisual creation for a long time with their equipment and/or software- have also developed codecs that have become de facto standards in the industry. In this regard we find that the most popular one is ProRes in its many variants. With many options and parameters that can be configured, it used to be only available in the Mac world, but this codec was so warmly welcomed by the market that can was many years ago made available for Windows platforms, and even -only in recent years- files containing this codec can be generated even from platforms other than Mac thanks to the plug-ins developed for editing packages such as Adobe Premiere.
Other tags that we will be awaiting, such as the well-known H-264 and H.265 (HEVC), are widely used codecs and they are an excellent example of the various efficiencies being achieved by the use of technology throughout time. Both are oriented to the transmission of signal through different media and H.265 has shown to bring a very significant, noticeable improvement in quality for the same data flow.
In this case, they are undoubtedly codecs aimed mainly to transmission. And they are also an excellent example of the mess of names and acronyms, as H.264 is “merely” the identifier for the MPEG-4 Part 10 AVC (Advanced Video Codec) specification released on 2003 and widely used in HD DVDs, Blu-Rays, TV HD broadcasting, especially in Europe, and internet sources as iTunes, YouTube and Vimeo.
Its evolution, H.265 has already seen 4 versions released successively between years 2013 and 2016. It supports resolutions up to 8K and various colour spaces including Rec2020 and Rec2100 amongst others. In general terms, we could say that the 32Mbps bandwidth needed for transmission of a 4K flow in H.264 are reduced to 15 Mbps for transmission of the same flow in H.265. That is, H.265 is twice as efficient as H.264.
Of course, there are still many other issues to consider such as coding time for content for filing or broadcasting. And indeed, under the same conditions, different codecs require different processing or “rendering” times.
A relevant competitor of H.265 is VP9, a video codec developed by Google on Open Source and oriented to distribution of contents under streaming in its own platforms. This provides the possibility of viewing 4K content with the same bandwidth as previously required for HD, in addition to enabling modification for other uses beyond Chrome and YouTube.
And yet a further contestant in this area, also under open, free-of-charge code, AX1. This is part of the latest development by the “Alliance for Open Media”, an open-code development platform that is free of charge, this being one of its main attractions.
In sum, as we have been able to see, we are very enthusiastic about the constant arrival of improvements within the wide and increasingly growing world of codecs, but we admit that they often make things harder instead of easier. We think that the best option, time and budget permitting, is performing some tests before embarking in projects of certain size.
We should test the whole workflow, from capture, viewing, post-processing, to setting and generation of the various end formats as required. This is the only way of having information that is 100% reliable. And not because manufactures refuse to provide it. This is due to the fact that different ways of processing images may result in different results as some kinds of images are processed better than others. And then, make the relevant checks, thus completing the whole workflow, through the various intermediate steps, thus arriving at the final codecs in the different files required for the relevant purposes.