When IBM began constructing its Cloud Video portfolio earlier this year through the acquisitions of Clearleap and Ustream, it was always envisaged that at some stage, its Watson AI assets would be brought into play to bolster its video analytics capabilities with cognitive computing.
An early example of the application of this technique to video included IBM’s use of experimental Watson APIs to create a “cognitive movie trailer.” The system learned from previous horror trailers what was likely to have made them effective, and then identified relevant scenes in an un-released movie that could improve response.
IBM also partnered this year with the US Open to convert commentary to text with greater accuracy by having Watson learn tennis terminology and player names before the tournament.
The recently announced slew of Watson-inspired video applications indicates that promise has now become a reality.
David Mowrey, Vice President of Strategic Planning at IBM Cloud Video, singles out two new capabilities in particular: the Media Insights product, scheduled for release later this year; and scene detection and understanding, which is still at the pilot stage.
“Media Insights is really taking a lot of data that’s been out there in the video world for a while, such as consumption and QoS data,” explains Mowrey. “But it’s bringing in some new data around subscriber information, and around social integration and social profiling.”
Watson adds intelligence to the mix: the AI ingests all the data from the different silos, seeks to interpret and understand it, and then makes business recommendations to IBM’s clients based on that.
Interestingly, Watson is already actively deployed in the healthcare industry in a not dissimilar way, helping doctors to make quicker and more informed diagnoses by assimilating screeds of medical texts alongside patient-related data.
On the video side, Mowrey says Media Insights will be able to offer operators insights into general business questions such as ‘‘’how do you reduce churn?’ “A lot of different factors go into churn of any subscription service, but it’s really difficult for customers to make the decisions on where to put their investment dollars, to focus in on certain types of change. Is it features and functionality? Is it content? Is it pricing? Is it user-flows? There are a lot of different tweaks to a service to reduce churn, but deciding which ones are going to have the most meaningful impact to your service, those are hard decisions to make. […] Everybody in the industry is really struggling with the same questions.”
As for scene detection, Mowrey observes that breaking up video into clips is, technically speaking, nothing new. “What is new is being able to understand the context of the video,” he points out. “If you can actually analyse the audio tracks, if you can analyse the actual frames of videos to understand context and the shift in context of topics, then you can start really doing some interesting things around catalogues and searching, recommending, clipping of the videos, to provide a lot of value into the service – both for consumers and the service providers themselves.”
Mowrey sees this application of cognitive computing being applied to both live and on-demand video, in a process which he describes as the creation of ‘programmatic metadata’: “You’re creating metadata on a frame-by-frame, second-by-second, basis for the video – so you know exactly what’s happening in the video all the way through.”
Parsing video in this way obviously offers all sorts of powerful content search and discovery applications, particularly when Watson’s ability to understand natural language is applied at the user end as well. Mowrey envisages an AI-enhanced recommendation engine, which is “not only explicitly being told what you’re looking for and just understanding the context of it, but also then implicitly understanding what the user wants to do, and providing that.”
Ultimately, IBM would like the technology to be able to exploit those situations where Watson might want to recommend a piece of content to a user only to discover that it wasn’t actually available in the operator’s catalogue. “In that sense, how do we help our clients and the industry make better content acquisition and production decisions?”
This is all on the road-map, says Mowrey, including speech-based user interfaces: “That is a part of Watson right now, being able to understand contextually-spoken words and translating that not only to text but then to gain an understanding of that. Those are some of the technologies that we now have available to us that we are building directly into the IBM Cloud Video platform. That’s not fantasy, that’s reality.”