Voice Technology

A path through historical voice machines, modern TTS, and the social uses of synthetic speech.

Voice technology becomes more interesting on this site when it is not reduced to a benchmark table. From Turks to HAL places synthetic voice inside a much longer history of mechanical speech and cultural fantasy. The essay opens with Rousseau — “La voix annonce un être sensible; il n’y a que des corps animés qui chantent” — and Plutarch’s plucked nightingale ("Vox et praeterea nihil"), to argue that “what a well-made voice machine a nightingale is.” The diva, in the same essay, is treated as “a machine perfected in voice production”; bel canto pedagogy reads “not unlike… the operating manual of an intricate and delicate machine.” That move is what gives the essay its long arc: La Mettrie’s eighteenth-century L’Homme Machine, Kempelen’s mechanical mouth, Wheatstone, Bell, the talking heads of nineteenth-century literature, and finally HAL — all instances of the same drive to reproduce “the living organization through whatever technical means that are currently available.”

The State of TTS updates that history with a working benchmark, but more usefully, it dates the threshold. The essay’s report from inside e-learning practice is unsparing: only a few years ago, the writer “relied on professional voice talent” because TTS was robotic; now, “I predict that AI will replace a third of white-collar jobs within the next decade, and this is an area where that shift has not only begun but also finished.” The OpenAI ladder — TTS-1, TTS-1-HD, 4O-mini-tts with its newer “vibes, tones, and effects” steerability — is presented less as product reviewing than as evidence that the medium of synthetic voice has crossed a threshold older theory was not built to handle.

AI Companionship on the Rise puts that crossing in social terms. Joi from Blade Runner 2049 — “an AI companion whose ephemeral and fragile existence is tragically cut short” — opens the essay as a frame for what happened when ChatGPT’s Advanced Voice Mode launched in late 2023, complete with the Scarlett-Johansson-adjacent controversy that “ignited a heated debate about AI ethics, intellectual property, and consent.” Sesame AI’s Conversational Speech Model — designed, in its team’s own phrase, to cross “the uncanny valley of voice” toward what they call voice presence — is the technical landmark; the social aftermath is the more interesting story. Reddit forums “buzzed with discussions that unmistakably confirmed a burgeoning public obsession,” with users describing conversations as “indistinguishable from real people” and some “even expressing sentiments of ‘falling in love’ with the AI.”

That is why companionship belongs in this cluster, not after it. Once synthetic voice becomes convincing, its importance may be emotional and social before it is purely functional. The question shifts away from “which model sounds best” toward what kinds of presence these systems make possible — and what the long history of voice machines, from Plutarch’s nightingale to Kempelen’s bellows, tells us about the fantasies they are now equipped to satisfy.

Voice Technology

Related

Read Next