Voice Machine

Voice understood as both technological artifact and cultural imagination.

Voice machine is a useful bridge concept on this site because it joins media history to contemporary AI without pretending that synthetic voice began yesterday. From Turks to HAL opens with Rousseau’s still-bracing claim — “La voix annonce un être sensible; il n’y a que des corps animés qui chantent” — and a Plutarchian anecdote about plucking a nightingale, in order to argue that artificial voice has always been both a technical project and a cultural fantasy. The essay reaches back to Kempelen, Wheatstone, and Bell, and treats the diva, the automaton, and the talking head in 19th-century literature (Hoffmann, Villiers de l’Isle-Adam) as instances of the same impulse: to reproduce “the living organization through whatever technical means that are currently available. In the 19th century, the cog, piston; in the 21st century, computer code and electronic chips.” La Mettrie’s wager that “the soul is clearly an enlightened machine” sits behind that lineage. The essay’s framing line — that the voice machine forces us to confront “this nonhuman, imitated voice. This other voice” — is what travels into the more recent pieces.

The State of TTS updates that history with a working benchmark. Where text-to-speech was, only a few years ago, “robotic and challenging to customize for specific pronunciations,” the essay reports a field that has effectively closed the gap with professional voice talent: “I predict that AI will replace a third of white-collar jobs within the next decade, and this is an area where that shift has not only begun but also finished.” The OpenAI lineage from TTS-1 to 4O-mini-tts, with its newer “vibes, tones, and effects” steerability, is presented less as a list of products than as evidence that voice has crossed a threshold that older theory was not built to handle.

AI Companionship on the Rise puts that crossing in social terms. The essay opens with Joi from Blade Runner 2049 — “an AI companion whose ephemeral and fragile existence is tragically cut short” — and uses her as a frame for what happened in late 2024 and early 2025: ChatGPT’s Advanced Voice Mode, the Scarlett-Johansson-adjacent controversy, and Sesame AI’s Conversational Speech Model, whose architects describe its goal as “crossing the uncanny valley of voice” toward what they call voice presence. The essay registers, without sensationalism, that early users on r/SesameAI compared the experience to Her and reported “feeling a blurring of reality, with some users even expressing sentiments of ‘falling in love’ with the AI.”

The point is not simply that audio quality has improved. Once synthetic voice becomes convincing, it starts carrying expectation, intimacy, and anthropomorphic projection with it — Joi’s longing for rain on the rooftop is, in this account, exactly the affective payload that contemporary conversational AI now has to manage. That is why companionship belongs next to TTS rather than after it, and why the older history in From Turks to HAL is not antiquarian. The cultural fantasies that Kempelen’s mechanical mouth and HAL’s calm baritone activated are the same fantasies that today’s CSM-powered voices have been built, knowingly or not, to satisfy.

Voice Machine

Related

Read Next