There are no signs that innovation in video compression is slowing, even after two full decades of digital television, and one of the companies that is pushing the envelope today is San-Francisco headquartered Bitmovin Inc., whose solution for live and especially VOD encoding looks like a vision of the future. The company provides ‘Encoding as a Service’ – a SaaS approach to the function where Bitmovin takes responsibility for the complexity and maintenance of software-based compression, but it can be hosted in the cloud or on-premise.
The compression software is architected as a microservice that is then containerised, which puts it firmly into the category of ‘cloud-native’, with encode instances able to run on any public cloud. Notably, this architecture has been precisely replicated so the Bitmovin software can be hosted on-premise without any difference in functionality, giving operators the option to run all or some encoding on their own infrastructure or pursue a hybrid model where they draw on public cloud resource to cope with demand peaks or harness the public cloud for redundancy.
What is most impressive is the way the encoding is performed across these diverse infrastructures. The company uses a combination of video stream ‘chunking’ and mass parallelisation of computing to accelerate the encode process, achieving speeds of up to 100x real-time. The video is split into ABR-compatible chunk lengths and, using a sophisticated orchestration software, each of them can be distributed to a different encoding instance across the premise or public cloud.
This means that, in theory, each chunk of video could be encoded using the same compression software/container combination using different CPU resources on different servers, at the same time. Once the encoding is finished the chunks are reassembled from their various locations to create the finished video stream.
The use of microservices/containerised software provides the extreme portability needed for such an approach, making it possible to host the encode function on any combination of operating system and hardware and effectively cordon off a section of a server – whether it is in a private datacentre or on AWS, Microsoft Azure or Google Cloud – for an exclusive encoding party.
A key strength of the microservices approach is that you can demand compute resource in small increments, which makes it an efficient way to use the public cloud on a pay-for-usage basis. Bitmovin has found another way to minimise cost – hooking into the real-time auctions that AWS and Google Cloud hold to sell cloud resource that is not currently used. With AWS you bid for what is called a ‘Spot Instance’ and if your per-hour offer exceeds the minimum price, you get access until someone outbids you. The Google equivalent is called a ‘Preemptible instance’.
For tasks that can be interrupted (like a batch VOD encoding job – perhaps converting part of a content owner’s archive into on-demand assets that will be licensed to an SVOD provider) this can reduce the cost of encoding. In fact, Bitmovin uses this auctioned resource for all its VOD encoding today. It has performed successful tests (in development) to harness this innovation for live workflows as well, and plans to roll this out for commercial use this year. The orchestration software must be extremely agile, of course, able to understand job priorities and make fast decisions about whether to seek out these auctioned resources and what to put there.
This is still not the end of the innovation. Bitmovin has introduced machine-learning to the encode process to create a database that correlates historical encode parameter settings to the quality results achieved on differing types of video content. This will help the encoding system to quickly figure out the best settings for video that is ingested in future.
Also, decisions are made on how many different ABR profiles (and which ones) are needed according to the type of video being processed. Instead of routinely creating a standard selection of ABR profiles that the receive device can choose from, this selection is customised. Cartoons, for example, may need only a few adaptive bit-rate profiles, as the perceptible difference between similar bit rate offerings may be minimal. Fewer bit rate/resolution profiles means fewer streams to create and store, saving resource. This is called per-title encoding.
Bitmovin also supports smart collaboration between the streaming server and the player on the receive device, whereby the encoder tells the player, on a frame-by-frame basis, whether the player could deliver the same perceptible (to a human) video quality with a lower bit-rate profile.
Typically, a player takes the highest bit-rate profile it needs to fill its screen, and sticks rigidly to it unless network conditions dictate otherwise. Here, the player ‘sees’ ahead of time that a video scene only needs 70% of the bandwidth (a realistic achievable figure quoted by Bitmovin) to deliver sufficient viewing quality, and can drop to a lower bit-rate profile for the duration of that scene.
A stream of metadata is sent by the server to the player, outlining the quality of forthcoming scenes, so the receive device can ‘see ahead’. This metadata is generated during the encode process.
Bitmovin is also helping video providers to adopt a multi-codec approach to distribution, which means delivering ABR streams in the best-performing codec that any given device will support. So, if your old tablet only supports H.264/AVC, that is what you receive, but if your device can decode the new (and very efficient) AV1 codec, it can request the AV1 version of the stream instead. Bitmovin estimates a population-wide bandwidth saving of 50% for a content owner that adopts this approach.
All of these techniques are designed to confront the challenges facing the modern content owner and distributor. They need to ship more content to more end-points while the number of codecs and resolutions keeps expanding, which means more ABR (adaptive bit-rate streaming) renditions and greater demand for encoding. Back-catalogues and even deep-archives of content can now be monetised in generalist or thematic SVOD services, but the original material must be encoded for the IP and multiscreen world.
Live content must be turned into VOD content quickly to meet consumer demand for catch-up TV – and the sooner it goes into the catch-up window the faster it can be monetised, whether the content is supported by advertising or subscriptions. News organisations are under pressure to get video ‘on-air’ faster than ever, competing against citizen journalists with their smartphones (and YouTube) as much as with other news channels.
Video keeps getting ‘bigger’, too. More streaming content is displayed on wide television screens and 4K is becoming an early-adopter expectation, which means streaming is no longer the primarily low bit-rate environment it once was.
Where content is going over lower bandwidth networks (which means a lot of DSL fixed broadband connections as well as mobile networks), the need for compression efficiency is amplified. Good quality video needs to reach the extremes of a telco copper loop and avoid buffering in a congested 4G cell.
Faced with demands for more encoding, higher performance encoding, faster encoding and flexible encoding (that can adapt to every new end-point that appears) the video industry responded with software-based compression that could be performed on a non-dedicated compute appliance or virtualised on a server in a datacentre.
Virtualisation led us to the cloud, where the compression application can be performed on servers owned by third-parties like Amazon Web Services. Sometimes the software applications were simply ported from appliances to the cloud but increasingly they are being re-written into a microservices architecture that is inherently better suited for a cloud environment, partly because it enables more granular scaling of cloud resource usage upwards or downwards and therefore, more flexible costing.
A content owner/distributor can license encoding software and run it on its own servers or in the cloud. There is also the option to use a managed encoding service where a vendor/supplier effectively gives you the on-demand access you need into their encoding capabilities, which, in the case of Bitmovin, uses the vendor’s own software and orchestration in public or private clouds, including on premise.
Customers can pay on a usage basis. Bitmovin charges per-minute for VOD encoding. You do not pay for any resources that are idle. This of course, moves encoding (or at least those parts of the encode process that shift to this model) into a purely OpEx cost, with no CapEx requirements for new encoding kit (unless you are buying new servers for your own datacentre).
A microservice can be packaged and can run inside a container. As Bitmovin explains, a container is a lightweight, standalone executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries and settings. They also isolate the software from other parts of the operating systems or other services that are running on the same machine (including within other containers).
Bitmovin makes use of the popular Docker software containerisation platform to implement its chunk-based approach to video encoding. This means the application process is the same on any machine, regardless of the host environment it is running in. That machine could therefore be owned by the media company and sited ‘on-premise’ in their own datacentre.
“Using containerized encoding instances with on-premise commodity hardware provides very compelling cost savings when compared to traditional dedicated broadcast encoding hardware,” Bitmovin claims. The company is contrasting its own approach here with the traditional model whereby a media company runs stacks of pizza-box sized appliances that have hardware chipsets onboard that are completely dedicated to running the encode software (and could not be used for anything else).
On-premise virtualisation begins the process of making more resource (compute, storage or networking) available for an application when it is needed, and allowing the same resource to be used for something else the rest of the time. The promise of the cloud (or a hybrid model that combines on-premise and cloud) is that you can keep using more and more resources – as much as you need or can afford.
The parallelisation of encode computing is a great example of this and is the reason Bitmovin can achieve what it says is 100x real-time speeds without any sacrifice in video quality. This means a recorded show that is 30 minutes long can be encoded within 18 seconds.
Segments of video, perhaps four seconds long, as an example, are spun-off to different places (on different encoding ‘instances’) to be compressed almost at the same time. At NAB 2017, Bitmovin demonstrated this by encoding a 1080p HD video stream in AV1 at 1.5Mbps using 200 separate CPU instances. This was for a live encode (therefore encoded in real-time, rather than faster than real-time).
There is a trade-off, in terms of latency, the more widely you spread the workload, Michael Armstrong, Vice President Sales EMEA at Bitmovin Inc, told Videonet earlier this year, based on the need to interlink cloud activities and re-assemble the final stream. As the NAB demonstration showed, the latency is not sufficient to prevent live encoding.
Armstrong also noted the value of AWS Spot Instances and the Google Cloud equivalent, called ‘Preemptible Virtual Machines’, though not for live encoding. “Rather than reserve resources for a fixed amount of time you can bid for resources via a live online bidding system. The price of resources fluctuates dynamically. You can place a bid and when the price reaches that point [i.e. you have become the highest bidder] you get the resource for a certain period of time.” Bitmovin has been working to make use of Spot Instances and Preemptible VMs.
AWS provides the official explanation of this auction process. “A Spot Instance is an unused EC2 instance that is available for less than the On-Demand price. Because Spot Instances enable you to request unused EC2 instances at steep discounts, you can lower your Amazon EC2 costs significantly.
“The hourly price for a Spot Instance is called a Spot price. The Spot price of each instance type in each Availability Zone is set by Amazon EC2, and adjusted gradually based on the long-term supply of, and demand for, Spot Instances. Your Spot Instance runs whenever capacity is available and the maximum price per hour for your request exceeds the Spot price.
“Spot Instances are a cost-effective choice if you can be flexible about when your applications run and if your applications can be interrupted. For example, Spot Instances are well-suited for data analysis, batch jobs, background processing and optional tasks.”
Google Cloud says of its own version: “A preemptible VM is an instance that you can create and run at a much lower price than normal instances. However, Compute Engine might terminate (pre-empt) these instances if it requires access to those resources for other tasks.
“If some of those instances terminate during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances, and without requiring you to pay full price for additional normal instances.
“Preemptible instances are excess Compute Engine capacity so their availability varies with usage. If your applications are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances.”
Armstrong points out that a media company (or rather, Bitmovin, as part of its encoding SaaS) can ensure it has sufficient reserved resource to cover mission-critical tasks and then make use of Spot Instances for bonus processing (more/faster, e.g.). “Because we are chunking [the video] and using small amounts of encode time for each chunk, we can dynamically start and stop Spot Instances,” he explains.
A Bitmovin blog says, “Amazon has turned this [unused cloud resource] into a market-driven system”, and that Spot Instances are a great way to save money, but suggests this is something best reserved for encode processes using containerisation.
“The flipside of the lower price is that reliability of your instance cannot be guaranteed. A higher bidder can take an instance away from you without notice. In a linear workflow this can lead to a complete loss of your encoding, forcing you to restart a job again from the beginning. But in a containerised workflow, losing one instance is almost insignificant. The interrupted segment is simply moved to another instance and the entire job continues without noticeable disruption.
“Taking this a step further, it is even possible to mix and match instance types depending on how the spot market develops. The encoding coordinator, which is the instance that controls each encoding job and assigns segments to instances, can be placed on a higher cost instance to ensure that it remains stable. The worker instances, which is where the encodings actually happen, are interchangeable, so can be placed on lower cost, higher risk instances.”
Bitmovin has its own software to handle bidding, as it does for all orchestration. Its ‘schedular’, for example, which covers its wider use of on-premise and cloud virtualisation, controls how many nodes are used to start each encoding job and which video segments are sent to which nodes.
The ‘schedular’ monitors each node so it knows if a segment is complete, or if a node has crashed and the segment needs to be resent to a different node. The orchestration software ensures encoding jobs are prioritised between instances, jobs can be queued, stopped and resources shifted.
As Bitmovin explains: “Encoding jobs can be sped up temporarily. For example, if there is live content, which has to be processed practically in real-time, other encoding jobs can be stopped to free up resources for the job with a higher priority. Resources can be freed up as soon as a time-critical encoding procedure has been finished. You can configure your orchestration system along with your specific cloud environment to allow for a balance between cost efficiency and accelerated encoding.”
The Bitmovin system also determines where exactly each chunk of video needs to be sent for the best results, adding a whole layer of sophistication to the orchestration. The chunking of the video occurs before the serious video analysis and transformation takes place.
The company’s compression software uses a three-pass approach where the first-pass [of the video content through the encoder] results in a superficial scan to judge the character of the content. Then the video is split into chunks and, in parallel, the chunks are sent to different places for the second pass, during which a more detailed analysis of each video chunk is performed to determine the level of encode complexity, properly attribute resources needed for the actual encode and set the detailed encode parameters for the third pass.
Chunks may yet be sent somewhere else, subject to resource needs, for the third and final pass, which is the actual encoding. This third pass also makes use of information from the first pass to optimise the process. All chunks are then re-assembled. The use of Spot Instances and Preemptible instances (which can be removed at any time), adds even more dynamism to this approach.
Bitmovin has a growing list of customers including iflix, the SVOD service focused on emerging markets, which is using Bitmovin containerised software in its own infrastructure to deliver H.264 video to subscribers. FuboTV, the online sports TV service that has over 65 channels, is using Bitmovin for cloud-based encoding (and for its cross-platform video player). Red Bull Media House uses Bitmovin for live streaming and VOD across all platforms. Other customers includeProSiebenSat.1 Media, Bouygues, RTL and Telekom Slovenije.
Bitmovin’s use of native-cloud software architectures (microservices and containerisation), the use of cloud, premise and hybrid options, and the chunking and mass parallelisation of processing show how far video compression has evolved (at the cutting edge) in just a few years. The days of pizza-box encoding racks are not over, but that model definitely has a legacy feel to it when you see what some of the sharpest innovators in the market are doing.