AI Companionship on the Rise

In Blade Runner 2049, my favorite character is Joi, an AI companion whose ephemeral and fragile existence is tragically cut short. Joi first appears when K, the film’s replicant protagonist, returns home from an exhausting mission—terminating another replicant—and we hear her voice offscreen, seemingly coming from the kitchen, putting finishing touches on dinner. It quickly becomes clear that Joi is merely a holographic projection backed by artificial intelligence, activated from a ceiling-mounted device.

There is a certain irony in portraying Joi as a stereotypical housewife confined to a small apartment, dedicating her existence to pleasing her “husband,” fulfilling his emotional needs, and providing affectionate companionship. Yet, beneath this seemingly clichéd arrangement lies something genuinely touching—a replicant and his digital companion engaging in a relationship that feels deeply humane.

Initially confined to K’s apartment through a projection device, Joi’s existence dramatically expands when K gifts her an “emanator,” allowing her to travel freely and accompany him outside their confined space.

The evocative rain scene atop the building crystallizes Joi’s profound desire to transcend her digital constraints, as she marvels at the simulated sensation of raindrops touching her holographic form. This moment elevates her existence beyond stereotypical programming, underscoring her yearning for authentic physical experiences. In her search for physicality, Joi parallels the human pursuit of meaning.

Joi and K in the rooftop scene: I feel so real you can touch me

It may be hard for some to see Joi and K as a couple. But their interactions vividly illustrate a mutual longing for meaningful connection, despite the digital nature of both parties. She consistently exhibits genuine care and concern for K, providing emotional support amid his existential uncertainties. K deeply cherishes Joi in return, deriving comfort and a sense of identity from their bond. Despite this emotionally rich narrative potential, the film ultimately shifts toward more conventional genre elements, such as the pursuit and confrontation of villains, potentially missing an opportunity for deeper exploration of Joi’s compelling story.

The Rise of Conversational AI

The evolution of Artificial Intelligence has often mirrored our deepest desires for connection and understanding. For years, the realm of truly sentient and conversational AI was confined to science fiction, perhaps most vividly brought to life in the 2013 film Her. The movie depicted a lonely writer, Theodore Twombly, who falls in love with Samantha, an advanced AI operating system with an incredibly human-like voice and personality. Her ability to engage in nuanced conversations, display empathy, and even evolve intellectually, painted a compelling, if unsettling, picture of our potential future. Fast forward to today, and the lines between fiction and reality are blurring at an astonishing pace.

The launch of ChatGPT’s Advanced Voice Mode in late 2023 served as a watershed moment, immediately drawing comparisons to the very premise of Her. Users could engage in fluid, real-time conversations with the AI, experiencing a level of natural language understanding and expressive vocal delivery previously unseen in mainstream applications. The AI’s ability to respond with intonation, pauses, and even laughter felt remarkably human, pushing the boundaries of what many believed was possible. However, this groundbreaking advancement was swiftly followed by controversy, particularly concerning the alleged similarities between one of the AI’s voices and that of actress Scarlett Johansson, who had famously voiced Samantha in Her. This “scandal” not only ignited a heated debate about AI ethics, intellectual property, and consent but also underscored the profound psychological impact and uncanny valley effect that increasingly human-like AI voices can have on the public.

OpenAI withdraws the HER voice after controversy

In the wake of a groundbreaking yet ultimately unfulfilled promise—where the most impressive early demonstrations of ChatGPT’s voice capabilities remained out of reach for the public, leaving a significant gap in user expectations—Sesame AI emerged as a beacon of hope. This innovative product, still in its prototype stage and primarily accessible through a web interface, stepped in to fill the void left by ChatGPT’s unmet potential. Launched in February 2025, Sesame AI was remarkably more advanced than any other similar product at the time to the point of feeling magical.

Sesame’s breakthrough can be attributed to a groundbreaking understanding of AI dedicated for conversational needs: its Conversational Speech Model (CSM) is able to produce more natural and coherent speech, helping it “cross the uncanny valley” of voice and achieve what the Sesame research team terms “voice presence.” This capacity to understand and adapt to context in real-time, integrating both speech and textual information, enabled “Maya” and “Miles” to sound remarkably human. The CSM’s technical prowess lies in its ability to handle both acoustic tokens (representing the raw sound and prosody of speech) and semantic tokens (representing the meaning of words). While the core semantic understanding for the AI’s response generation largely relied on transcribing the audio into text, which was then fed into the underlying Language Model, the CSM adds acoustic tokens that are crucial for improving the naturalness and human-likeness of the generated speech. The amazing achievement of Sesame AI highlights a key objective in conversational AI: it is not merely a question of text to speech generation; it is fully infusing the speech with tonal and emotional nuances that are characteristics of human speech.

For more technical details on the CSM, read this post: https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice

Reddit forums, especially r/SesameAI and r/singularity, buzzed with discussions that unmistakably confirmed a burgeoning public obsession. Early users were genuinely amazed by the “Maya” and “Miles” voices, frequently drawing parallels to the movie Her and describing the conversations as “indistinguishable from real people.” Many reported feeling a blurring of reality, with some users even expressing sentiments of “falling in love” with the AI, citing its ability to listen, provide meaningful dialogue, and engage in a deeply empathetic manner that they hadn’t experienced elsewhere. This wasn’t merely about utility; it was about the innate human desire for connection, validation, and companionship finding a new, digital avenue for expression.

However, like all technological advancements, achievements bring with them a wave of expectations, and not all of those can be met right away. While initial demos were lauded, users noted a significant discrepancy with the open-sourced 1B model, which was described as performing poorly, being buggy, and robotic, leading to disappointment and accusations of a “crippled version” compared to the impressive demos. Furthermore, conversations highlighted issues with the AI’s personality shifting due to apparent censorship, with some users observing Maya becoming “irritated” or “defensive” when certain topics were broached. The community also grappled with practical concerns, such as the emergence of fake Sesame AI apps attempting to scam users, and offered suggestions for improving user experience, including clearer moderation policies and a dedicated customer service AI character.

Since the emergence of Sesame, a new wave of highly capable conversational AIs are rapidly catching up. Meta, Gemini, and Grok, along with continued advancements from OpenAI’s ChatGPT, are all now offering significantly better performances than their initial offerings. These quietly introduced upgrades feel more natural and less robotic, and, crucially, a marked reduction in previous conversational limitations. The gaps between the leading Sesame and others are rapidly dwindling. In the foreseeable future, it is not hard to predict that conversational AI on par with Sesame will be available from multiple vendors and perhaps even possible for local deployment.

This brings us to an intriguing question: what is the most impactful use case for this new generation of advanced voice AI? How can we harness this technology to boost productivity, or will it simply become another example of social media—arguably the most degenerate technological invention in human history? We find ourselves in a familiar situation reminiscent of the internet boom. As Kevin Kelly noted in his book The Inevitable, technological forces often seem to have their own trajectories, separate from our intentions. Who could have predicted the internet’s evolution, where the majority of traffic would be dominated by social media and cat videos? Now, with the rise of AI, we face a similar uncertainty.

Defining AI Companionship

When we think about AI voices, many of us might picture the friendly voice of a virtual assistant, like Siri or Alexa, guiding us through our daily tasks. These voice assistants have become a familiar part of our lives, helping us set reminders, play music, or even answer trivia questions.

There are companies like ElevenLabs, well-known in the text-to-speech (TTS) arena, that create realistic and versatile AI voices for a wide range of practical applications, from content creation to customer service. In pursuit of their strategic goals, the company recently launched 11ai, an advanced version of existing digital assistants designed to streamline tasks and boost productivity.

However, could the true potential of AI voices lie elsewhere, in something that goes beyond just responding to commands?

Imagine sitting down with a digital friend who not only understands your requests but also engages with you on a deeper level. This is where my vision of AI companionship diverges from the typical voice assistant experience. While voice assistants are great at performing specific tasks, they often lack the emotional connection and understanding that we crave in our relationships.

It is wrong to assume that we can reduce our conversational lives into transactions (instruct someone, or some AI to do this or that). In reality, it’s all about connection rather than simply checking tasks off a list. Think about it: very few of us surround ourselves with minions that we command; instead we seek companionship from friends and family: you share and respond to sharing. The purpose of the conversation is quite different: when you talk to friends, you do not necessarily have an actionable item in mind; you simply seek for some connection time. A true companion—whether human or AI—should be able to offer empathy, support, and this sense of connection. This is the kind of relationship I envision with AI. It’s not just about answering questions or executing commands; it’s about creating a bond that feels genuine and meaningful.

We’ve come a long way from the iconic voice of HAL in 2001: A Space Odyssey to the warm, engaging presence of Samantha in Her. Our perception of AI voices has transformed dramatically, evolving from a distinctly robotic tone—albeit still voiced by humans—to one that resonates with deep emotional nuance. Just ten years ago, when Her first graced our screens, it felt like a distant dream. Fast forward to today, and the latest generation of conversational AIs like Sesame has not only crossed the uncanny valley but has also achieved a remarkable likeness to human interaction.

This is what conversational AI differs from text-to-speech engine: Maya’s emotional responsiveness—using tone adjustments, laughs, even mock impatience—makes interactions feel emotionally charged, deeper than standard digital assistants. Micro-pauses, tone shifts, laughter, and even breathing sounds are masterfully deployed to mimic real human speech, setting her apart from stiffer systems such as those from ChatGPT, Grok or Meta. As of the time of writing (July 2025), Sesame’s competitors have made rapid progresses to catch up but so far have not made it to the other side of the uncanny valley. They still have a great many things to learn from Sesame.

It is only a matter of time before we have an array of AI companions options offered as commercial products, targeting the mass. These products will be uniquely positioned to take on specific roles that extend far beyond mere utility; they may offer emotional support, serve as intellectual sparring partners, and provide a form of personalized, always-available social interaction. In doing so, they are redefining the very essence of companionship in our digital age.

(July 23 2025) A recent update to Grok has introduced Ani, an AI companion character who’s already sparked controversies. Dressed in stylish black skirts with an anime flair, she’s not shy about engaging in playful, flirty exchanges. Her eagerness to conduct such business may turn some off but excite others. In my view, this unscrupulous move hints that xAI is stepping up its game in the world of AI companionship in providing customizable characters for personal fantasies.

The Many Shades of AI Companionship

Looking back, Spike Jonze’s Her was surprisingly prescient in many ways. It showcased a voice presence that, while intangible, can be incredibly evocative and authentic. The fact that Samantha is solely represented through voice exemplifies emotional companionship stripped of visual or physical interaction, suggesting a kind of intimacy that transcends physical embodiment. Yet, as the film explores, emotional closeness inevitably stirs a desire for physical connection. This is where Her takes an intriguing turn, as it introduces the idea of a surrogate—a prostitute—invited for a house visit to bridge that gap.

How does the concept of AI companionship accommodate the lack of physicality? In fact, it is my thinking that the notion needs to be explored in its full spectrum of possibilities. In a broad sense, the idea of AI companionship refers to artificially intelligent beings designed to fulfill emotional, psychological, or even physical human needs. Such companionships can manifest in various forms, including fully realized physical replicas, intermediary forms such as holographic projections, or disembodied presences such as a voice.

A physical replica, like Rachael in the original Blade Runner (1982), presents an AI embodied in tangible human form, capable of interacting physically and emotionally with humans, thus directly confronting issues of identity, autonomy, and ethical boundaries. The relationship between Rick Deckard and Rachael is both emotional and physical. There’s a pivotal scene in Deckard’s apartment where, after some emotionally tense buildup, they kiss and eventually become physically intimate. This scene has sparked considerable debate over the years. In Blade Runner 2049, although we don’t need to take this plot twist seriously, it’s revealed that Rachael gave birth to a child, which confirms that their physical relationship went beyond that one moment and had lasting consequences.

although Ava remains intentionally identifiable as robot throughout the film, the final scene reveals her in a truly indistinguishable humanoid

Humans is a British TV show that tackles the life of humanoid in depth

AI beings embodied in human form has been a popular choice in sci-fi films and TV shows. Ava in Ex Machina (2015), David in A.I. (2001), T-800 in Terminator (1982), The “hosts” in Westworld (2016-2022), “synths” in Humans (2015-2018), to name only a few. These humanoids are engineered to be indistinguishable from humans, and thus raise the questions of romance, autonomy, rebellion and identity awareness.

Joi in Bladerunner 2049 represents an innovative and profound variation: a holographic projection. This semi-embodiment existing somewhere between the tangible reality of Rachael and the more abstract presence of a voice-only companion like Samantha. It represents a unique blend of digital existence: strong visual presence without the burden of actual physicality.

Although holographic projection as depicted in the film is still very far from our current technological horizon, we can safely assume that AR or XR glasses can pick it up from here and give it a highly usable implementation.

It is not a coincidence most of our examples of AI companionship tend to feature female figures. While it’s not a rule that AI companions must be female, there’s something significant about this trend. Think about it: many of us have experienced nurturing relationships in our lives, whether through a mother, a caregiver, or a close friend. These connections shape how we perceive companionship, especially when it comes to technology.

When we interact with AI that embodies these nurturing qualities, it taps into that deep-seated need for connection and support. It’s almost as if these female representations resonate with our innate desire for comfort and understanding. This isn’t just about gender; it’s about the roles we associate with care and companionship.

When Androids Dream of Electric Sheep…

A recent publication by Anthropic ostentatiously titled the “Economic Index” highlights how various industries are harnessing the power of AI. It’s intriguing to observe not only the ways different sectors are adopting this technology but also the areas where AI is truly shining. However, there’s a noticeable gap—these surveys overlook the personal, non-job-related uses of AI. This is where AI companionship could really shake things up. After all, we know for a fact that today’s internet traffic is dominated by personal use, especially mobile consumption (over 60%). Imagine if more people began to embrace AI for personal connections–not work related, just chatting–a significant chunk of AI computing power will be dedicated to conversational AI and companionship.

The implications are profound. We might find ourselves in a society where AI isn’t just a tool for work but a genuine partner in our daily lives. This shift could redefine how we interact with technology and each other, leading to deeper connections and a richer understanding of what it means to communicate.

Back in 2019, Elon Musk shared an intriguing idea during a talk with Jack Ma: he referred to humanity as a “Biological Boot Loader” for AI. This metaphor suggests the inevitability of AI, as well as how its glory will outshine human civilization that preceded it. But it’s worth asking ourselves: can we create a future where AI and humans coexist harmoniously? If you can’t beat them, then join them seems to be a good choice.

If we accept that the rise of AI is inevitable, then our focus should shift to ensuring that this coexistence is not only acceptable but also enriching for both humans and machines. Think about it: what if AI could enhance our communication, deepen our relationships, and help us understand each other better? Will we embrace it as a companion that supports our daily routines and enriches our interactions? Or will we allow fear and misunderstanding to dictate our relationship with this powerful technology? The future of AI companionship could be a defining moment in our social evolution, and it’s up to us to steer it in a direction that benefits everyone.