How to get personal with millions by scaling manifest manipulation

Posted by Trevor Hunsaker

Personalized viewing experiences at a 1:1 level are transforming the TV experience. Instead of one-size-fits-all, viewers get targeted and highly relevant advertising, tailored content, recommendations for new programs, and precise DRM/blackout management, all based on your viewers’ device type, location, history, demographics, and other data.

But scaling personalized video streams to millions of viewers, especially for live programming such as sports, is nearly as challenging a feat as hitting for the cycle in baseball. Viewer volumes can swing wildly, surging by hundreds of thousands at must-watch moments such as kickoff, overtime, and during close matches. If your infrastructure for supporting personalization isn’t adaptable and scalable enough, your personalized experience will be game over, and in the world of OTT that could mean your entire business could be at risk.

The manifest server’s central role in personalization

OTT personalization hinges on the performance of the manifest server to generate a unique playlist of content, ads, and playback instructions. The manifest server has to contend with the following dependencies:

Personalizing streams in real time

As covered in part 1 of this blog series, a live or VOD feed is ingested, encoded and packaged by our Slicer software application. Ad boundaries may be inserted to enable content owners to customize the viewer’s experience. As ads are ingested, they are also processed through a Slicer running in the cloud for a more broadcast-like playback experience.

When the Slicer starts ingesting the live or VOD stream, it continually communicates with our backend infrastructure, keeping the database updated about how many segments are available. Manifest servers use that data to personalize the experience for each viewer. When a player requests a stream, the manifest server determines which video and audio segments should be listed in the manifest file, which acts as a playlist. The ability to change or customize the manifest dynamically, at a per-user level, is what makes it possible to tailor the viewing experience. In the case of live video, a new manifest is requested every few seconds, allowing adjustments to be applied dynamically by manifest servers as viewing conditions change.

The manifest server’s central role in personalizing video streams

As shown above, at the core of manifest personalization is communication, and with most OTT business requirements that means communications with ad servers to provide targeted, personalized ads in real time. Individual data, including a viewer’s IP address, location, and device type — essentially all the information we can capture while still adhering to strict PII rules and regulations — are provided to the ad decisioning system. The resulting solution is robust enough to learn what’s relevant to a viewer when delivering ads during live streams. The system is also robust enough to cope with challenges such as managing blackout restrictions and content rights on a per-user basis, while also supporting important personalization capabilities such as content recommendations and other localizations.

Architecting manifest infrastructure to scale

In our video platform, the manifest server is responsible for generating a custom streaming manifest for each viewer. It also needs to be aware of other requirements noted above, such as content restrictions and recommendations. In this model, we send out one integrated stream, meaning there are no “buffer wheel” problems while clients are waiting for ads to load in the middle of the stream.

To build a resilient manifest delivery system, we maintain clusters of manifest generation servers in the cloud that are spread across different geographic regions around the globe. In the U.S., for example, these servers are organized into five regions across the country. When a request for a new stream comes in from a U.S.-based player, that request is randomly routed to one of the U.S. zones.

The ‘thundering herd’ challenge

This might seem counterintuitive, but it is done to prevent cascading failure modes. The majority of U.S. viewers are located in the eastern parts of the country. If we were to route them all to the zone closest to them, and our cloud provider experienced a failure in that region, the majority of our viewers would experience that failure. Compounding the issue, if all those viewers refresh the stream, and we’re now directing viewers to their next-closest healthy zone, we would experience a “thundering herd” problem where all the viewers from the failed zone now dogpile onto the next closest zone. The resulting unexpected traffic spike could potentially cause a secondary failure until our systems in the new zone can scale up to meet the additional demand.

Instead, randomly distributing our U.S. viewers helps mitigate the effect of any initial failure, as well as allows us to evenly distribute failover traffic to the rest of the healthy zones.

In our streaming platform we distribute manifest server load across zones. This prevents overloading any specific zone during audience surges, especially if viewers are suddenly shifted to an adjacent zone during failover.

Each of our zones has a separate data store dedicated to storing associated session data. Once a viewer is routed to a zone, we create a session for that viewer and save it in the zone’s session cluster. The session is a bunch of data about the viewer, as well as parameters supplied by the customer about how they’d like to customize the session for that viewer. To overcome the challenge presented by the stateless nature of the Internet, the manifest servers build URLs for each session included in the manifest returned to the player. Subsequent requests from the player are routed directly to the zone where the viewer’s session was created and stored (instead of randomly routed to one of the other zones).

As shown in the three graphs below, different events can have many different requirements depending on the size of the audience and whether the audience is local or geographically dispersed. Take a look at the three examples that illustrate the infrastructure challenges broadcasters face in supporting live events.

In the first scenario, we feature a food eating contest (yes, we have live streamed one of these) because it illustrates a distributed but small audience size. Perhaps the day will come when food eating contests will become mainstream, but for now, they remain a niche event that attracts small numbers of viewers across a wide geography. Manifest server load is spread easily across multiple zones and manifest server clusters. Here’s where the value of personalization becomes obvious, by making it easy to insert ads that are appropriate for each region, and also being able to manage rights and blackouts.

The scenario changes considerably for the Texas state football championships where large numbers of viewers are in the same geography. We handle this in a couple of ways. As discussed above, we’ve found that we can assign viewers to manifest servers located in zones outside the immediate geography without impacting the viewer’s experience. On top of that, we have a system that monitors the viewership levels in each of the zones and is capable of automatically spinning up manifest generation servers as necessary on a per-zone basis.

For large events such as the NBA Finals, we may pre-scale based on expected viewership, but we have had multiple events where our autoscaling systems handled nearly one million viewers without requiring any pre-warming. In addition to increasing scalability, the ability to instantly spread load across manifest servers in a zone-agnostic fashion significantly improves reliability and redundancy across the network.

Player requests and ad beaconing

A number of changes and trends across the industry are making cloud scaling more important than ever. A major driver is the shrinking interval between requests from the player. Our standard live-linear 8-second “what’s next” player request is being driven to 5 seconds, and can be as brief as every 2 seconds for streams where low latency is important. This has a major impact on CPU utilization because the server has to respond to 4x as many requests (at 2-second intervals compared to 8-second intervals). Additionally, blackout and content recommendations now must also be checked more frequently than in the past to avoid errors.

Similarly, the ad-tech world is becoming more complex and demanding. For every ad inserted into a stream, an ad server will have at least five beacons used for reporting purposes. A server-side ad insertion (SSAI) solution is required to make sure it sends those beacons so that its customers get paid by their advertisers. While five beacons are the minimum, there can be many more. In fact, we have seen cases where a single ad has anywhere from 30 to 100 beacons to report on.

Further, complex networks of ad servers are becoming more common in our customers’ campaigns. Multiple ad network hops can start to cause latency issues. For example, network #1 might say, “Here is the list of ads in this break,” only to discover that ad #2 in that break requires a request to another network. Ad #3 introduces two more hops, and so on.

In the example above, ad breaks can double or triple CPU utilization. Live video player requests can compound this factor by 10–30%.

Looking ahead — microservices and scalability

With complexity on the increase, one of the steps we’ve taken is to separate the different workloads previously handled by the manifest servers into smaller microservices. The first step we’ve made is to move ad server communications and beaconing to its own service, helping to address the challenge of ad server latency. This ad proxy service offers several advanced capabilities that we will discuss in more depth in an upcoming blog post.

Going forward, we will continue to develop other microservices to remove more work from the manifest servers and offer a more targeted approach to scalability. Soon, we will add zones from multiple cloud providers to become resilient to failures of any single one. By coupling scalable SSAI with microservices, we can optimize server costs, the structure of our code base and other characteristics specific to ad traffic. Additionally, we can overcome several key challenges, including ad server latency, blackout restrictions, and monetization. At the same time, by spreading the processing burden across multiple zones, our live video streaming service can scale broadly and overcome key challenges, allowing us to reliably deliver millions of simultaneous personalized streams without it being a strain on our delivery network.