In live sports, the quality of experience to the consumer is critical. Subjects such as latency (offset from live) and the use of (or introduction of) immersive audio are hot topics at the moment. Rightfully so – these are things that really matter to worldwide audiences, especially those with huge home cinema rooms.
In live sports scenarios, the moments in the run up to the start of the event generate a peak traffic loading that has a huge impact to the platform and the consumer experience. After all – what good is immersive audio if you can’t get to the game on time? Of the major outages and interruptions that have plagued big brands delivering sports content over the last few years, it’s rarely the final element of media delivery. So what should we look out for when architecting platforms for global reach and scale?
The authentication service is one of several that bares the brunt of peak loading issues.Analytics data usually demonstrates that in the final moments before a live event starts there is a sudden surge of users heading to the service to prepare their tablets, TVs or consoles. This results in a flurry of authentication and/or session token refresh requests. For a typical Subscription Video on Demand (SVOD) or Traditional Video on Demand (TVOD) service that sees a peak of only a few thousand concurrent requests per second, this value can be multiplied significantly with the introduction of live sports. In the final few moments before events begin, some channels can receive 500,000 requests per second (RPS). It’s also worth noting that in scenarios of federated or Single Sign On (SSO) architectures – tunnelling this request load to downstream partners may also fundamentally prove to be your bottleneck.
Next on the hit list is the Entitlements service which translates a complex matrix of content availability (geo-blocking) and offer (package) rules in real-time as content is requested by the consumer. For example, 220 countries plus 15 content types plus day of the week variation and four offer types equals a complicated and time-consuming rule set to parse. This results in significant load time. Latency in areas such as SSL termination or retrieving entitlement responses from cache layers or database shards are all things to look out for.
Don’t fall into the trap of thinking that because your consumers can start the event successfully, you’re home and dry. Unlike Authentication that often bares only a one-time peak load of traffic in the minutes before the event, the Entitlements endpoint is also hit (if you’re in a hardened environment), at regular intervals throughout the event. If linked to your Location Service and perhaps Concurrency Service, it isn’t abnormal to see polling by the player at intervals of up to 30 seconds to ensure users aren’t tunnelling through VPNs or sharing credentials with friends and family. In many environments, a failed entitlement check will result in your consumers being ejected from the live stream. Maintaining 100% uptime of your entitlements endpoints is critical.
This should be your golden egg – the solution to all of your capacity problems. Auto-Scale groups are wonderful; public and private cloud providers can offer great out of the box capabilities but several big names have been caught out when it comes to cloud.
Rules and policies that monitor load in areas such as high response latency or instant CPU still take a few seconds to kick in and generate the additional compute capacity needed. Add a few more minutes of bootstrapping, attaching instances to a load balancer and signing off on health checks, and you’ve just hit an end-to-end duration of around 3-5 minutes. Taking the entitlements example above, you could have just missed the critical start of your live event. If the process you use to add capacity is only triggered at 75%, you’re likely to miss the mark. Pre-warming is a great way to solve this problem. Use your event schedule to script or automate the process of adding additional capacity in the hour or so before your live event starts.
Location Services are a critical part of the architecture – relied upon heavily for entitlements logic – but the way in which this core service is implemented can make or break you. It has become fairly common knowledge that traditional IPv4 addresses are now in such short supply – that Time to Live (TTL) values have been reduced down from days to hours – meaning that a service provider may assign an IP address used in one part of a region on a Monday, to an entirely different part of the region on a Tuesday. For the consumer, this often means that the IP address they were assigned at the point of subscribing, may now be allocated to a region where playback rights are prevented.
When deploying your location services, it is becoming ever more critical to ensure that customer services have the ability to bypass, whitelist or adjust an IP address in real time to grant the consumer access to a live event immediately. Moving to an owned and operated Location Service will bring you untold flexibility, help alleviate the capacity limitations offered by cloud based solutions, and no doubt earn you a few extra points on your net promoter score.
These subsystems only represent a fraction of the components in a typical environment – but they’re often the ones that create the most pain. For those embracing cloud and even perhaps on-premise environments in a build-it-yourself fashion – the challenge is yours. For those that have built platforms around multi-vendor externally hosted SaaS products, the challenge to fortify, scale and harden may require a little more thought.