Resonance

Björn Lengers

Interactive Artificial Intelligence Sound Design 2023
completedFellowship

Summary
You just received another one of those mysterious audio messages that strangely overlay your surroundings and ask you to perform a seemingly insignificant action: to walk a little slower, to press the pedestrian lights even though you don't actually want to cross the street, something else. A voice wants you to believe that you are decisively changing the future. - - - Resonance is an interactive, immersive and theatrical narrative, which can be experienced by many people independent of time and place on their smartphones.

Initial ideas

I am interested in immersion, the "reality" part of Virtual Reality. As CyberRäuber, Marcel and I have tried and created various avenues into immersion. A very short summary of our findings would be that a high degree of immersion is generally not necessary to tell a story or to convey what you want to. And that immersion generally suffers if you aim for too much realism: the more you rely on your target's collaboration, their imagination and established social and cultural norms, the more they will get "into it". Participation / interaction trumps detail, trumps realism.

But of course, the more you let people interact, the more you lose the the ability to steer a narrative. In this dilemma we usually opted for limited realism (high aestheticization), limited interaction, a limited scope of character, space and time, and worldbuilding as a narrative strategy.

Resonance as an idea manifested itself over the course of years. It basically grew on the compost of CyberRäuber's work in theatre, in which we usually didn't have the time, scope and resources - and more importantly - not the framing for such a project.

Resonance aims to be maximally immersive with a given, limited technological effort, following an approach of a minimally detailed realism, maximal imagination and direct, personal contact.

Whatever works...

Hundred thousands of years of evolution have led us to develop certain quirks that can help storytellers create compelling narratives, something that can also be ab-used, of course. Think about what we imagine in clouds, in the shadows of a forest lighted by flickering flames of a campfire. What kind of faces we see in objects, landscapes, moving leaves, what voices and sounds we hear in the wind. The less there is to see or hear, the more we augment; little impulses, visual or aural clues are launch pads for our imagination, something every horror movie, every mystery story told on a nightly walk makes use of.

Confidence Games - "Enkeltrick"

One way of direct, personal contact is the human voice. It can supply content and - with language, dialect, wording, timbre and emotion - context. We ascribe very much to a voice that could only be safely verified by meeting and conversing with the person with this voice.
In this project, the means to deliver the voice will be a recording, a voice message. Why? Because it is a controlled way of communication. For the ordinary person an ordinary conversation is something that can be quickly understood and deciphered, the situation, the conversation partner can be assessed. We would normally aim to reduce mystery, to understand what is happening and only continue to engage with the person and the topic if intrigued and feeling save. A voice message instead can be mysterious, incomplete, poetic, out of the norm. It can also contain additional information, environmental sounds. A voice message can be dilligently tailor-made for a situation and our goals. It doesn't need and anyway couldn't deal with an instant reply. The communication here isn't a realtime dialogue, but an elusive conversation over a longer timespan, similar to an infrequent exchange of letters, including lost and/or overlapping messages.

A conversation with an AI-agent

Cloning a voice: the narrator

The engine: AdaptorEx is an open-source framework for interactive experiences

Binaural recordings with in-ear microphones

Nevertheless the recipient has to have the feeling of being heard, of being reacted to. And as in any communication, truth and trust plays an important role...

Areas of interest

During the fellowship different areas have to be explored:

Story - what kind of stories work best?
Voice - characteristics, difference to simple text
AI - what can be done with artificial neural networks? Story, text to speech, speech to text...
Soundscape - binaural sound, artificial and recorded; AMSR (Autonomous Sensory Meridian Response)
Single experience vs. Multiplayer
The role of time (length, pace, intervals...)

Audio sample (cloned voice)

TradingPlaces · Bjoern On Resonance

(Preliminary) Results

As was to be expected, my research plan didn't survive contact with reality. I thought the main focus of my research would be on "Soundscape" and "Story". Instead, most of my time was spent on voice and specifically the creation of text and voice with neural networks, with timing and the delivery system of the messages (AdaptorEx, Telegram).

"Soundscape" didn't play the expected role due to the realization that interaction with smartphones, the chosen device, is mainly happening visually, and when people listen to voice messages, they oftentimes don't use higher quality headphones and will be surrounded by other environmental audio. After creating and afterwards checking out some binaural and ASMR material on the phone it didn't seem relevant any more in the scope of this research. It is one of those engineering issues that can take a lot of time and are fun to work on, but ultimately don't deliver for an audience.

"Story" got deprioritized when I realized that neural networks, or more precisely large language models (LLM), would take over a large part of creating the narrative.
Here maybe some further explanation is in order. As CyberRäuber, we've been working on AI in theatre since 2018. In our so-called "neural theatre"-pieces (Prometheus unbound, Linz 2019; Der Mensch ist ein Anderer, Wiesbaden 2021; Mensch am Draht, Ingolstadt 2022; The Merge, Bangalore 2023) we aim to unleash the creative power of generative AI. The text the actors speak and play is created in real time by LLMs and transmitted to in-ear speakers via text-to-speech. You could also call this a text-to-body-mechanic, the actors repeat and interpret what they hear and thereby embody the AI. The process works amazingly well. While still maintaining parts of improv theatre, this isn't improv because the actors follow the logic and emotional content of the texts. Actors can never lose focus, have to stay "on" all the time and are oftentimes visibly struggling with their task, because the text will always be new and unknown, its twists and turns unforeseen. This is fresh material! The audience on the other hand usually themselves struggle with the uncanniness of the whole set-up, try to follow what's happening on stage, enjoy the actors' struggle and they take on much more responsibility, because the play develops on stage and in their heads.

So, AI is already playing a crucial part in CyberRäuber's artistic methods, and we've been following the latest developments in the field on a daily basis. And with the current generation of LLMs (2023/24) we now actually possess very powerful storytelling tools. The question was: is one person alone capable of creating, editing, curating large amounts of coherent, stylistically appealing, sensible text? Can a network convincingly take on the role of a narrator? To check and ultimately prove this thesis became one of the main issues of the research at ATD.

This documentation is an ongoing process and will grow as the fellowship progresses. Last entry: 30. November 2023