IA, machine learning, and the “Make Movie” button

In recent years, perhaps in the last decade already, the use of what is commonly called artificial intelligence or, more precisely, advanced algorithms for analysis -including machine learning- has become widespread, passing from the consumer market to the professional environment. The sophistication of this kind of techniques has reached applications such as focusing or even recognizing and replacing objects on the scene.
By Yeray Alfageme, Business Development Manager at Optiva Media an EPAM company
What is AI?
Defining artificial in an objective way is always something controversial. And even defining intelligence is a daunting task, but let us try. An artificial intelligence algorithm is basically a program that performs certain operations considered more an area for human intelligence. So, what is human intelligence? Well, according to Oxford Languages, intelligence is the capacity of the mind that allows us to learn, understand, reason, make decisions and form a certain idea of reality. This does little to help our goal, so let us give an example.
When deciding within a scene, for instance, where to place the focus point, the scene and its components are intrinsically analyzed and the role of them within the story being narrated is then determined, the focus point within the scene being placed and modified. This process, so intuitive when it comes down to focusing but also complex to explain, can be performed by an algorithm that performs such analysis and decides to modify the optics in order to provide the desired focus point. This is artificial intelligence: It analyzes, interprets, and decides to perform an operation based on a given situation. And this has numerous applications in our industry.
We decide, not the machine
The example about focusing was long ago on cameras and even reached mobile phones with enough calculation capacity to perform it. Not to confuse with auto-focus, it is not the same thing. Auto-focus doesn’t analyze the scene or decide to change the focus point in a proactive way, as it just follows specific parameters and takes nothing else into account, that being the reason why it is so hated by any decent professional. We decide what to do with our take, not the machine.
However, when we apply more advanced techniques to these automatisms that help us to achieve a better composition and, in the end, increased quality of our content, the reluctance is not such, especially when we see the result. Another application of these kinds of techniques is -when dealing with post-production or editing- substitution or inclusion of objects in a scene. This is really a tedious task because it requires frame-by-frame image processing, and often it’s not so easy to do it.
This task, so arduous for any publisher and post-producer that takes pride on being such, is not so hard for an algorithm, which will recognize a certain silhouette, shape, or pattern of an object; and then will follow it up and replace it frame by frame throughout the scene. What it would takes us hours, for an algorithm is a matter of seconds. And this has implications not only for the post-production process, but also for the shooting itself.
If not, we have to wait for a clear day with nice clouds to travel to the location to roll out and do the shooting with all our equipment. The flexibility gained and the obvious cost savings throughout the process are significant. Although this should not lead us to the point of shooting in any circumstances and taking home the dangerous phrase, “this will be easily sorted out during postpo,” because that would be a bad mistake.
But having the possibility of improving the scene we have shot by eliminating a plane cruising through the skies in the 15th century or the watch on the wrist of an actor playing a historical character is real and the quality obtained is not at all amateurish. In other words, let us not make use of the new possibilities that AI offers us only to save, that would be too gross, but let us improve our content by doing so.
Beyond image touch-up
Another major application of AI is reformatting. And this is not only understood as scaling an image, but as a complete change in the image’s own format. We do not only have an algorithm that “inflates” the image to add more definition by interpolating pixels, but one it is able to interpret the sequence and add what is missing while keeping the genius of the original. In this way possibilities open up, such as being able to convert content from 4:3 to 16:9 without the much-hated pillar boxes.
Coloring of old images is also a field into which AI has penetrated and in which does it a great job, in this case, in close cooperation with manual operation. Let’s say that a frame is manually colored and, from there, the algorithm is able to extrapolate the rest of the colors in the sequence; I insist, always respecting the original version. If this process were not done automatically, it would take hours and days of work -like in the old days- to color just a few seconds or minutes of material. Again, we are not just talking about costs, but about achieving a better result in less time.
Less is more
And finally, I think it is worth highlighting a field that has not so much to do with content but with its analysis: Metadata. Any shooting requires at least several terabits of storage available, just in case. Because it is not uncommon to generate a thousand -no mistake, a thousand- times more raw content than is actually going to be used in the editing process. And this is not only a matter of storage, but also of cataloguing, as well as of searching and using that material. There is too much.
But if we allow an algorithm to perform the right search within the content, this can be minimized and simplified considerably without the risk of losing the right material. And I’m not just thinking about properly cataloguing the material and searching through keywords. No, I am referring to the algorithm’s the ability to analyze the scene, view the composition of the scene and, thanks to its automatic learning capability, distinguish between a correct shot or an incorrect one -whether technically in terms of exposure or focus or the action itself.
Although this really sounds like science-fiction, it is technically possible and if we do not want to delete anything from the content that we recorded because we could get the shivers, at least let us get help in choosing the material to use in our editing. It is very likely that the shots that the algorithm may have rated as five-star shots, for example, are the best and most readily available. Can you imagine how much time this saves in the editing room? Even multiple versions of the same content could be made with minimal effort.
Let’s go further beyond
And from this last step to making the leap to the magical “Make a movie” button, there is no technological chasm, but just a mental one. A few years ago I used a simple app -I won’t say the brand for obvious reasons- in whose hands I would leave the pictures and videos that I used to take with my simple mobile phone, and this app was capable of analyzing them, pick the best footage, mix and add music to it order to synchronize the composition to the beat in minutes. My mobile would get a little hot, but that was it.
Obviously, this greatly trivializes the notion and I know that my use of this tool was very simple and in a controlled environment: Music videos of family holidays. Nothing to write home about. But it is true that, until I discovered it, I used to spend hours editing those videos, and do you know what? In 90% of instances, I never tweak the automatic editing that the algorithm does -and my viewers, family and friends love it.
In this controlled environment, being able to have the content edited quickly really adds more value, having all the material with a nice, good finish, even if not optimal. You may not get the best of results -in many cases I would not achieve it either, because I am not a video editor, I am quite far from being so, indeed.
At present, I do not think the same concept is so far away in our professional field. Perhaps not in all environments, I am sure not in certain compositions in which emotions, intra-story or the type of editing fall out of the established path- it is there where what is truly exceptional is created- but in 90% of cases I see it feasible and even adequate.
Why should someone spend hours and hours in a monotonous, repetitive environment, performing the same task over and over again with similar content to edit -a cut, news or the clip in question- according to the parameters set? Why can’t our algorithm friend do this, and that person take on more human tasks like thinking about the next story, format or emotion to convey to the viewer? That is the added value of AI, not just cost savings.
Conclusion
AI, or machine learning, should not be thought of as a threat to our work or industry, doing so would be terribly wrong. It’s something that’s already here and its applications are going to increase and become more extensive. Its progress is unstoppable, so whoever tries to is going to be blown away. There will always be someone who uses it only to save costs. That’s like using a Swiss knife just to cut bread –almost an insult to the tool– but it’s inevitable.
If we really make use of this kind of progress as it deserves, by commissioning machines to perform mechanical tasks and allowing people to perform human tasks again, we will realize that the potential it has goes far beyond and we will all benefit from it.
Only a few examples have been given here -even happy ideas- of where we are heading and where we can get to, but I doubt that nothing here is already or will be on the desk -or at least on the roadmap- of many manufacturers and innovators who are players in our market.
Change is a risk, but embracing that change rather than rejecting it will make everything easier.