AI for AV: Real world considerations

Artificial intelligence promises smoother meetings, more interactivity and fewer problems. Tim Kridel explores how AI can live up to its promise and examines some of the challenges standing in the way.

Good news for AV pros juggling dozens of projects at once. In the future, artificial intelligence [AI] will power synthetic brains that enable people to operate 500 versions of themselves.

So says Igor Jablokov, who invented the AI-powered voice recognition technology that’s now the foundation for Amazon Alexa.

“People will not be able to tell if they are interacting with you or your AI proxy,” Jablokov told the Financial Times. “Right now, you could be doing two interviews at once. Or there could be 500 versions of you, running 500 interviews. They would be learning more second by second and telling the other versions what they have learnt.”

Far-fetched? Maybe. But not long ago, so was the prospect of telling a speaker to lower a conference room’s shades. And so were deepfakes, where AI creates a digital version of a person’s face in a way so lifelike that’s it’s indistinguishable from the original.

One example is “Dalí Lives,” at The Dalí Museum in St. Petersburg, Florida. The AI used a 57-year-old interview with the surrealist artist to learn nuances such as his facial expressions. The AI-generated video was then superimposed onto the face of an actor. (A few excerpts from the deepfake performance videos are available at https:// thedali.org/dali-lives)

“It presents the opportunity to experience Dalí’s personality,” says Beth Bell, marketing director of The Dalí Museum.

“It’s really an entrance into Dalí’s spirit. Visitors are delighted, and even awed, by the experience. “In addition to learning about Dalí’s inspirations, guests are especially thrilled to receive a photo with Dalí that is taken by one of the AI screens at the exit of the museum, often forming lines and gathering groups to share their experiences with each other.”

Fake business opportunities

Deepfake technology provides a way to create video content that grabs people’s attention simply because it’s new and unusual. In that respect, deepfakes are like the hologram performances of actors and musicians—some deceased, others very much alive—that Inavate explored in the September 2019 issue.

Both technologies can drive business for AV firms if that ‘wow’ factor convinces museums and other clients to add or upgrade their hardware and software.

Deepfake technology also keeps improving at the same time it’s becoming less of a black art. Those two trends also could prompt more clients to consider deepfake projects. “The thing that is moving the quickest right now in faceswap specifically is the democratisation of access to the code,” says Nathan Shipley, technical director at Goodby Silverstein & Partners, the creator of the “Dalí Lives” content.

“What was, at first, cloning github repos and editing python code on a Linux box is now a GUI one can download and run in Windows.

“It’s still complicated to use, time-consuming, requires special-ish hardware, but it will get easier and easier. The quality will continue to get better.”

One example is how deepfakes are expanding beyond faces to include entire bodies. For example, this link (bit.ly/2sa6HHu) shows how just a short video clip can be used to make another person appear to be a great dancer.

It’s not hard to envision, say, a sports venue offering fans the ability to have themselves transformed into a famous athlete making an epic play.

“We will need less and less input data to create convincing fake output, the quality will continue to improve, and it will become even harder to believe what we see,” Shipley says.

“Things that require days of compute time now will eventually become real-time. What was once in the domain of large VFX companies will be on someone’s phone, for better or for worse.”

Building a better meeting

Smart speakers that control conference room systems are one example of how AV is already using AI to provide better meeting experiences.

There are plenty of additional potential applications.

“Those can range from translating the speaker’s language in real time or in post-meeting recordings to enabling handsfree conferencing driven entirely by voice commands to sharpening visual focus on the speaker and blurring the background, thus eliminating distractions,” says Pavan Inturi, Lifesize director of cloud and media engineering.

“AI can also help us get more out of meetings. Participants’ facial expressions can be mapped to analyse the health of the meeting and the project overall, as well as provide more articulate recaps, actionable next steps and more reliable follow-up than human attendees can (or sometimes choose not to do).”

AI could also recommend meeting spaces based on factors such as the number of participants and how they’ll be collaborating.

“That way, AV-equipped spaces aren’t being underutilised nor are they being taken over by teams that don’t need as much space as they’re being afforded, effectively locking other teams out of the space,” Inturi says.

Fear of letting go AI can also make life easier for the AV pros supporting collaboration systems. For example, telecom operators and enterprise IT departments are increasingly using AI because their networks keep getting bigger and more complex.

Machine learning enables AI to recognise patterns associated with, say, malware attacks or emerging traffic trends. The AI can then respond accordingly— and often faster than a human— by blocking malware or shifting network resources to keep up with demand.

AI could do the same thing with AV applications, freeing staff to focus on other tasks. One example is automatically making changes to ensure each collaboration session has the right amount of bandwidth and prioritisation. These changes could be made on the fly during a session as conditions change or beforehand or both.

“[Suppose] that in conference room one there’s going to be a videoconference, and it’s going to connect five different locations,” says Tom Tuttle, Nectar senior vice president for strategic alliances and enterprise sales. “It could be scheduled that prior to that call, we’d set off a test to make sure everything is ready to go.”

Other applications require letting go—something some enterprises and telecom operators are reluctant to do. They’ll use AI to identify problems, but they’ll require the AI to alert a human rather than allowing the AI to resolve them on its own.

But eventually the size and complexity of telecom, IT and AV networks are reaching—such as hundreds or thousands of surveillance cameras—will outweigh this fear of letting go, and AI will be given more autonomy. Multi-vendor systems further increase the complexity and the challenge of ferreting out bugs.

“Those teams are getting overtaxed on the amount of things they have to look at,” Tuttle says. “A large enterprise might have been entirely a Cisco shop or a Microsoft shop or an Avaya shop. That’s not the case [any more].

“Eventually it’s got to get to the point where ‘I’ve tested it, it works, I trust it, I think it’s done’. Otherwise, those teams are going to be so burdened down.”

Early-stage reality check

AI capabilities are starting to emerge as market differentiators, judging by how some integrators and vendors are playing up their customer facing collateral. For example, Whitlock’s “We are the Robots and We’re Here to Help You” blog post envisions applications such as using facial recognition to identify expressions of frustration or confusion, and then automatically offering prompts.

“AI for smarter and better meetings is in an early stage,” says Roopam Jain, Frost & Sullivan industry director for information and communications technologies. “We see leading vendors well ahead on the path of initial implementation while other vendors have AI as a road map item.

“AI enhancements like intelligent view or intelligent framing, background noise suppression, rich in-room analytics for better meeting diagnostics, voice interactive commands for key functions such as starting and ending meetings and recording; automated transcripts and automated meeting notes with intelligent tagging and search, will soon become table stakes.”

But like concerns about hackers using smart speakers to eavesdrop in conference rooms, many next-gen AI applications face privacy and security hurdles.

“For example, I could use a computer vision camera to recognise a face, compare that against Facebook’s massive facial database, tie that social media record to an individual in the corporate directory, then look up their calendar and launch their meeting,” says Ryan Poe, Whitlock national solutions architect.

“The data points are there, but the thought of getting permission to do something like that in our current state is laughable. It’s social versus technological hurdles that prevent us from realising a fuller potential of AI right now.”

But as with GDPR, there are also business opportunities in helping clients balance their regulatory obligations and technological ambitions.

“Our role as an integrator is to advise our customers as to what data is readily available, the level of anonymity of that data and the associated risks of sharing the data between platforms,” Poe says. “It’s also important that we understand and communicate that these types of engagements can take longer and involve more parties than a typical hardware installation.”

Integrators also can help clients distinguish between reality and the hype that comes with any new technology.

“Customers will have to parse through all of the information to evaluate the real-world value of the some of these ‘shiny AI’ announcements, especially as it applies to their business needs and use cases,” says Frost & Sullivan’s Jain. “It is critical that as vendors fully get on the AI bandwagon, they offer features that matter and lay out use cases where AI can be meaningful— what we call AI with an intent that solves real world customer pain points.”

One example is designing AI AV systems so they can use application programming interfaces [APIs].

“Exposing APIs to third parties will indeed create AI-enabled applications that are customised to specific use cases in sectors like digital health care, education, smart manufacturing, smart banking and so on,” Jain says.

Interactivity and personalisation

AI is also a good fit for signage and other digital out-of-home [DOOH] applications. For example, Grand Visual developed facial-recognition technology for Coca-Cola that enabled people to smile at a vending machine to get a free Coke.

“AI can be used with facial recognition technology to detect how much a person likes an advert by their facial expression, gender or age, and creative can be tweaked and updated depending on the demographics present in real time,” says Jon Jones, Grand Visual creative technologist.

Advertisers also could see a solid business case in how AI provides new and unique opportunities for interactivity, which helps consumers develop a relationship with a brand. “Interactive experiences will be much more unique and tailored to each user by sparking conversations,” Jones says.

“Imagine a virtual brand ambassador, such as Gary Lineker for Walkers, powered by AI to engage and answer questions about their products. “Advances in technology will allow us to hold a true interaction with the consumer, using deepfake technology in real time.

Currently it’s a resource-intensive task that requires content to be rendered ahead of time, but with faster machines this can happen in real time. It’s going to be a fine balance, however, between creating a more personalised experience without being too accurate to cause privacy concerns for the public.”

Article Categories






Most Viewed