Cinema Learns to Speak

Deep Dive into the Fourth Chapter: Cinema Learns to Speak

Produced by Google NotebookLLM

0:00

Historicizing the Perceived Authenticity of Human Voice

The spoken word is most cinematic if the messages it conveys eludes our grasp.

—Siegfried Kracauer¹

And in 1929 a monster was born which was to stand the whole business on its head—the talkies. I welcomed it with delight, seeing at once all the use that could be made of sound. After all, the purpose of all artistic creation is the knowledge of man, and is not the human voice the best means of conveying the personality of a human being?

—Jean Renoir²

Having devoted our attention largely to what Chion has characterized as the “minor denizens” of the soundtrack, that is, the ambient sounds, and having spent the immediately preceding section analyzing body sounds and breathing sounds, we arrive at what is often considered the “king” of the soundtrack, the ultimate strength of sound cinema: the human voice. Some necessary theoretical preliminaries notwithstanding, this chapter treats the subject of voice mainly from a historical perspective. It demonstrates that our perception of the voice in cinema has hardly remained stable. Instead it constantly fluctuates in response to technological changes, film styles, distribution strategies and exhibition venues. All these fluctuations, I argue, constitute a vital aspect of a truly dynamic understanding of cinema auditorship. In a worlding theory of film sound, the human voice also takes up a compelling position: what the cinema has learnt to speak, as the title of this chapter prompts, is precisely a world of voices.

Voice in cinema has been granted many hearings. To briefly mention a few, Rick Altman offers a carefully researched history of the microphone technology and the various realist, ideological and economical impulses behind it.³ Michel Chion has written a widely influential and thought provoking treatise on voice⁴ and a study of the spoken language in French cinema that remains untranslated.⁵ Kaja Silverman, Mary Ann Doane and Amy Lawrence have demonstrated the fruitfulness of Lacanian approach.⁶ Sarah Kozloff devotes two brilliant monographs to the study of voice-over and film dialogue.⁷ More recently, Jacob Smith highlights recorded vocal performances of the 20^th century and demonstrates their relevance to film history.⁸ Highly relevant to cinema studies are also the philosophical investigations of voice conducted by Jean-Luc Nancy, Mladen Dolar, Don Ihde, Adriana Cavarero.⁹

My approach to the voice in cinema gives it a new hearing from a novel position that complements previous scholarship. It offers a theoretical concept “perceived authenticity” that I use to revisit, just like the other chapters have, the history of cinema. By no means a complete or detailed history, what I propose is the threading of some key historical moments where the role played by human voice evolves. At every turn of this history, I will demonstrate, the perceived authenticity of the human voice has always been sensitive to and emblematic of to changes in technology and aesthetics as well as their cultural context. I am naturally using history in its two senses, as a chronology and as a narrative. “Cinema learns to speak”, as my chapter title suggests, is a way to consider cinema’s adoption of human voice as a long, coherent and finally, ongoing process that is driven by conflicting forces taking turns to erupt and submerge. It is also an attempt to tell the story from the perspective of the voice, instead of the general, uncharacteristic and rather inhuman denomination: sound. After all, it is the voice that marks this so-called “transition to sound”, for a mere intrusion of technology would never have elicited such fanatic following were it not disguised behind a feature offering that all human beings feel intimate with.¹⁰ In this sense, cinema becomes more human like by acquiring its own voice; by learning to speak cinema satisfies a psychological need we demand of art; by harnessing spoken language cinema avoids the kind of abstraction that characterizes many strands of the 20^th century art.

The history of the voice in cinema is naturally a history of films, but it is also, and perhaps even more so, a history of auditors. It is a history of cinema’s attempt to tame the human voice; but it is also about how our perception of the human voice in cinema evolves, and how this perception shapes history. This chapter seeks to raise, if not answer, the following questions: in what form did human voices exist in cinema before the 1920s? If they did, why in 1927 the advent of voice became such a huge attraction? If everyone has a voice, why were some actors/actresses ruined by the advent of sound? Why a shrewd and sensible solution such as MLVs came to failure and the practice of dubbing, widely condemned at the time, was eventually widely accepted? How does voice in cinema relate to and reflect the ebb and flow of multilingualism and Hollywood’s linguistic hegemony? The notion of perceived authenticity may be only a particular quality of voice, but it is one that not only engages with these questions, but also exemplifies the complex form of negotiation between cinema and its auditor, between industrial norm and national style, between cinematic convention and cinematic realism and finally, between film history and film theory.

The chapter will proceed as follows: first I will try to define perceived authenticity as a theoretical construct, which is then substantiated in our little historical survey. The notion of perceived authenticity is not built on abstract ideas or imaginary scenarios (as some philosophical investigations of cinema are prone to do). Instead I highlight specific historical moments where paradigm shifts happen, from the lecturer of pre-talkie years to the voice emerging as a form of attraction, from the notorious Multiple Language Versions to the world of dubbing, from what I call the period of “freedom of speech” to that of “speaking degree zero”, and finally, from linguistic hegemony to polyglossia.

All that Utterance Allows: Defining Perceived Authenticity

A piece of human utterance may be referred to as speech, voice or dialogue depending on our purpose. In other words, the terminological difference exists to describe the utterance's function in a broader context. However, the division is not always clear in everyday usage and they can overlap to a certain extent. In this chapter, I wish to minimize the overlapping by assigning them strictly different roles. There might be a few cases where I intentionally aim for a degree of conceptual blending. Yet what needs to be highlighted is how the voice differs from the other two in its cinematic context. I refer to speech as the result of the act of speaking, which needs linguistic decoding—speech, I might say, is a text in its sonic form.¹¹ Voice, on the other hand, denotes the acoustic particularities of an individual utterance maker at a particular circumstance,¹² a quality that is mostly lost when transcribed into written form. Finally, dialogue, or monologue, emphasizes the dramaturgical functions that the utterance serves—it is a verbal communication for a narrative purpose.

These distinctions are important for they point to three separate dimensions of sound perception. A word we hear in cinema can embody all three dimensions at once; yet the three cohabitants each aims for its own direction. When the moviegoers in 1927 flooded in to see The Jazz Singer, they heard monologues that served little dramaturgical function (then still the intertitle’s job); and there was no scripted speech. What impressed the audience, therefore, was the voice of Al Jolson, especially its perceived spontaneity within a fictive context. It is perhaps for this reason that Michel Chion says, “it was the voice that truly constituted the revolution.”¹³ This revolution is however a short lived one. In The Lights of New York, the voice had somehow receded to the backstage, like the shadow of a body disappears under the limelight. It is Voice that has conquered the audience. And when it retreats, it leaves its poor cousin Speech to take the blame. The objections to talking film, it seems clear to me, are not geared towards the voice but to speech and dialogue.¹⁴

A term such as “voice in cinema” can mean different things and has been, as I mention in the brief literature review, appropriated by studies of radically different orientations. At the most basic level, voice is an essential aspect of filmic performance, an aspect in dire need of theorization. Yet to understand the nature of this performance one needs to go significantly beyond the scope of individual film, national cinema, genre or historical period. My study treats this historical dimension of the voice seriously. The phrase “learns to speak” in my chapter title seeks to capture the dialectics of a process and its paradoxical nature. On the one hand, to master a language requires one to assimilate formal traits shared across multiple individuals, which can then be abstracted into a science (linguistics)—it is a process of normalization. On the other hand, to learn to speak also means to acquire an idiosyncratic way of speaking,¹⁵ whose traits resist abstraction and remain unique—a process of individualization. The process of cinema learns to speak, in my view, embeds both trends and exhibits a fascinating tension between the two.

The Saussurian linguistics divides the language phenomenon (langage) into two parts: langue and parole. The former refers to an abstractable system, a proper candidate of science. The latter, however, must be discarded for its resistance to systemization. Cinema, on the other hand, is endowed with a unique capacity to include both. While the same words or a shared language structure can be used by conversing parties, each party has his/her own voice, which in many cases overshadows the content of communication. Voice is a platform on which one expresses oneself.¹⁶ In common language uses we often equal voice with individuality and one’s true self, unfolded in the very act of speaking.¹⁷ The voice, like the face and other gestural traits, is a critical part of our unique being in the world. Conversely, the presentation of voice fleshes out a cinematic world inhabited by human beings. Voices aren’t just sounds; their meanings are inseparable from how their corresponding¹⁸ bodies move and from roles assigned to them in the narrative. Our perception of a particular voice in a particular movie derives from the totality of its many contexts. Like virtually any other aspect of cinema, this perception draws on conventions (what the audience believes/accepts how people talk in movies), yet there is also an undeniable connection to our perception of language in the real world. It is a multifaceted process that must take into account information gathered from multiple parallel worlds.

When I talk about voice, therefore, what I am talking about is what is spoken, perceived, responded and yet cannot be written down. In the above I have used the term “acoustic particularities” to characterize the voice. But what I propose to study is not a physics of sound (acoustics); it is rather a phenomenology of voice, in that it lingers on elusive qualities of no clearly identifiable meaning, and turns familiar things into strange shapes. I am less concerned with what is said than how it is said. This study may sometimes make reference to speech, for it is somehow impossible, as I realize writing the chapter, to absolutely avoid using the term. But dialogue is not my subject. This is not because I regard the latter less important or less interesting, but because I regard the former as less studied. While the “grain of voice” (Barthes’s famous phrase) has long been recognized as an affective means of expressivity, what is spoken in cinema is generally treated as if it merely transmits linguistic information. What could be the cause of this neglect?

One might point to the primacy of written scripts and their conventionality. Granted, except for limited cases of improvisations, the majority of words spoken in cinema are indeed scripted in advance. What we hear in the movie theater, one might propose, is nothing but the result of laborious rehearsal whose effect ultimately depends on the quality of some preexisting text, compared to which the esoteric craft of delivery¹⁹ seems secondary if not tangential. The spontaneity that we take for granted in real life is hardly present in the way people speak in narrative fiction films. Sarah Kozloff describes forcefully this difference as:

In narrative films, dialogue may strive mightily to imitate natural conversation, but it is always an imitation. It has been scripted, written and rewritten, censored, polished, rehearsed, and performed. Even when lines are improvised on the set, they have been spoken by impersonators, judged, approved, and allowed to remain. Then all dialogue is recorded, edited, mixed, underscored, and played through stereophonic speakers with Dolby sound. The actual hesitations, repetitions, digressions, grunts, interruptions, and mutterings of everyday speech have either been pruned away, or, if not, deliberately included. Less time is devoted to the actual functions of everyday discourse, such as merely establishing social contact (what Roman Jakobson calls “the phatic function”) or confirming that a conversational partner is listening attentively. Although one cardinal rule of real conversation is that speakers should not tell each other what the other already knows, film dialogue is often forced to smuggle in information merely for the viewer’s benefit. Because the words are in truth directed at the filmgoer, not at the on-screen conversationalists, each word does double duty, works on double layers.²⁰

Kozloff makes two key points: dialogue in film is conventional; it is directed to the audience. She therefore (quite logically) goes on to study certain kinds of film genres where dialogue is highly conventional—the Western, screwball comedy, gangster films and melodrama.²¹ While agreeing on the conventionality of filmic speech in principle, I have some reservations on the way in which Kozloff makes generalizations of it. Kozloff may have wanted to emphasize the conventionality of genre film dialogue to warrant its value as an object of study, but in doing so she unnecessarily ignores the fact that there exists a range of this conventionality (our daily speech, too, varies considerably in terms of spontaneity) and that we do respond to them differently. Different types of films, too, may produce different effects of the conventionality of its dialogue. A Godard film, for instance, typically does not contain one single line that is not addressing the audience; a film made by Dardenne brothers, on the other hand, strikes me as considerably less so. Both films however can be scripted meticulously in advance. Does speech in a Dardenne film belong to yet another genre that establishes its “vocal realism” (much like the use of handheld camera suggests documentary realism) through the use of some “unheard” conventions? Does addressing the audience necessarily make speech less spontaneous? I consider these questions as valid points of investigation.

Thus my project, in a way, can be considered as the exact opposite of Kozloff’s. Whereas she wants to point out what is conventional and how it becomes so, I emphasize what is not, and why it matters. In a broader view, however, the conventionalization of speech constitutes a integral part of the historiography that I aim to sketch out. To facilitate a truly dialectical understanding of how voice function in cinema, it is important to see the two faces of Janus at once. It is my belief that, while cinema has certainly established a great number of conventions, her power does not come entirely from conventions. Comparing theater and cinema, Andre Bazin once writes, “the word in the theater is abstract, that, like all of theater, is itself a convention; in this case, of converting action into words. By contrast, the word in film is a concrete reality: at the least, if not at most, it exists by and for itself.”²² In fact, Bazin is one of the first to consider how accents/dialects enhance the realistic appeal of cinema. Commenting on the case of Pagnol, Bazin writes,

…accent is not just a picturesque addition to Pagnol’s films; it is not there merely to inject a local color into the proceedings. It unites with the script, and thus with the characters, to create the essential nature of a Pagnol film. These characters have an accent the way others have black skin. The accent is the true substance of their language and is consequently at the heart of its realism. Pagnol’s film is quite the contrary to theatrical, then: it immerses itself, through the immediacy of language, in the realistic specificity of film.²³

Bazin cherishes, it is well known, an intriguing notion of cinematic realism. Instead of shunning those mysterious aspects of filmic performance that elude conceptualization, he seizes upon them as opportunities to refine his formulation of what cinematic realism is. The complexity of his notion, as Dan Morgan²⁴ and Tom Gunning²⁵ recently demonstrate, is often vastly underestimated. My proposal is related to Bazin, in that I, too, am arguing for a “realistic specificity of film”, contributed by vocal performance. But to avoid terms such as realism or realistic specificity, laden with misconceptions, I propose we call it perceived authenticity. For this realistic specificity is not located within the film, but rather, between the film and its audience. It is a perceived quality, as realism has always been. My definition, in a necessarily convoluted form, goes as follows: by authenticity I refer to, in regards to human utterances in cinema, the characteristics of intonation and delivery, among many other vocal intricacies, that we recognize as not only unmistakably and creatively supporting the diegesis, but also intensely gratifying in supplying indices of an idiosyncratic way of speaking. And by perceived I mean this recognition is conditioned, or mediated by a myriad of factors such as linguistic competence (in Chomsky’s deifnition), cognitive attention, habituation of style, etc. In other words, the more one perceives a voice to be rich and spontaneous, to have been uttered by a unique individual and corresponds to a particular occasion that contributes to our overall comprehension of the situation in a film, the more one tends to grant this perceived authenticity.

Here is a quick example to illustrate what I mean. When Michel Marie, writing on La Chienne, makes the observation that the concierge’s testimony in the trial is “unspeakably funny”,²⁶ one really has to hear it to understand what this means and why the same effect cannot be conveyed by reading the script²⁷ or duplicated by an actress trained in La Comédie Française. The way the concierge speaks does not follow any convention—it deviates considerably from theatrical speech; nor does it resemble what is regarded as comic in stage and film performance of the time—but it renders convincingly a tension between her native social class and the circumstance in which she is asked to speak. It is not merely authentic and funny, but “authentically funny.” In an attempt to speak “good” French, to make her tongue presentable to the court, she stretches every syllable so as to articulate them better, only to make the whole sentence incredibly monotonous, as if a kindergarten child is asked to recite a poem by Mallarmé. Her labored use of past conditional, a rather literary use of language, slips quickly into colloquial incoherence. The voice exhibited here leads us to a heightened sense of authenticity, for it is highly unconventional, but still offers a textual richness justified by our knowledge of the language and of the film. By way of comparison, in Alain Resnais’s Smoking/Non-smoking (1993) one same pair of actor and actress (Pierre Arditi/Sabine Azéma) is to perform each four to five roles that have different temperaments or belong to different social classes. The believability of such impersonating is conveyed through visual appearances (costume, hair-style and kinesics). The two also attempt to develop for each role a particular way to speak, the effect of which is hardly convincing. This instance seems to confirm the comparatively inviolable nature of voice. ²⁸

The term authenticity, as it is a highly charged one in humanities, demands further elaborations. First, my definition of authenticity has certain connection with but is nevertheless distinct from the philosophical implication of sound recording, which has been a key issue, or even ethos, in early sound theory and practice. Rick Altman criticizes what he calls the “reproductive fallacy” that naively believes sound recording can, and should take upon itself to reproduce faithfully a sound in all its dimensions. James Lastra comprehensively extends this line of thinking by raising the stakes to a mutually permeating circle of technology, representation and human senses. My use of the term draws upon the above and focuses on a particular kind of sound: human voice. This narrowing down of the scope is what makes the revival of the term possible—the sound of a gun shot,²⁹ for instance, apparently demands a different term than authenticity, for it entails a different context of listening where the unique physical qualities of sound do not matter much as long as they offer a functionally equivalent meaning. This is decidedly not the case with human voice. Here I would argue that authenticity plays an important role in the cultural mechanism built into cinema, that is, an institutionalized, technologically mediated form of communication. Similar to the many other filmic conventions, it is essential that recording engineers, performers and the audience share a good deal of what is regarded as authentic and spontaneous in a voice so that the apparatus can be fine tuned, performances stylized and the audience would then be able to recognize and appreciate this quality.

Secondly, authenticity is not automatically guaranteed by drawing from, capturing, the diegetic reality, even if we assume the apparatus has achieved maximum fidelity there (notwithstanding the problematic nature of the very notion of fidelity). Simply put, authenticity is not about rendering a preexisting reality, but rather about how elements of style and technology build a new reality that is the cinematic world. How do we know a vocal performance is authentic? We may be tempted, as Kozloff suggests, to regard our daily conversation as the point of reference: casual, overlapping, sometimes hesitant, with mistakes and lots of phatic units. But someone habituated of giving formal speech (professional politicians, star professors) may speak in the absence of all these qualities, yet still display a genuine spontaneity—even if it is carefully rehearsed. In talking about the authenticity, one need not embrace an indexical realism, with all its ontological burdens, for a film builds its own world where every object, including speech, has its aesthetic context and particular way of functioning.³⁰ One would hardly criticize Laurence Olivier’s reciting of a line in Shakespeare as inauthentic, merely because it does not resemble how one speaks in the real world. The criterion would instead be how the cadences in the delivery corresponds to and enriches the particular situation presented to us in a film called Hamlet. Even in daily life, we habitually evaluate the naturalness of human utterances based on a series of criterions: the speaker’s social status, the context of the speech, our own linguistic and professional competence, and last but not least, our judgment of the intention of the speaker. For a film, one must also add the mediation of mise-en-scène, cinematography, narrative strategy and so on, all of which have a palpable influence on our perception of voice.

Finally, as Lastra has argued, the effect of authenticity is nothing but the product of its “historically defined and mediated conditions.”³¹ Edison’s tone test, which are performed in thousands iterations between 1915 and 1925 to millions Americans, can hardly fool a contemporary audience. Similarly, having been bombarded by almost a century of sound cinema, one can only imagine how strange and exciting it was to hear Al Jolson’s “you ain’t heard nothin’ yet!” for the first time. Authenticity needs to be anchored to a particular historical moment, to a particular situation of perception. One’s schema of listening can indeed change within a very short period of time, under the spell of one single film or a body of films. The authenticity of voice, in a sense, is truly something that “now you hear it, now you don’t”. What I am fascinated by, ultimately, is how every technological and aesthetic breakthrough on the way voice is represented reawakens and rekindles this desire for authenticity, a desire that speaks to, if we listen to Bazin, a fundamental psychological need that is our “faith in reality.”

Raising the Voice: Vocal Performance Before 1927

The voice made a big hit in 1927. But it had also made frequent appearances before 1927. The paradox is, although a period remembered with a marked absence of speech, the so-called silent (the French term muet describes better what is lacking here) cinema, thanks to recent decades of early cinema scholarship, is discovered to be particularly noisy,³² with all elements of film soundtrack present, only in different forms. What are the voices of silent cinema? In the following I will address two particular kinds.

The first kind is that of the lecturer, whose existence was largely forgotten and denied until the 1980s film history turn. Tom Gunning, in an essay cogently titled “The Scene of Speaking” summarizes the research conducted after the initial discovery. The global presence of the lecturer, as Gunning says, is a rather recent discovery by film scholars, partly thanks to the previous knowledge to the benshi tradition in Japan. The archival research along this line is ongoing; many gaps need to be filled and continents explored. But the issue of lecturer has a broader theoretical significance. Noël Burch for instance in his much contested study of Japanese cinema uses the benshi as a key bit of evidence to advance his ambitious theoretical framework of opposing modes of representation in cinema: distanciation vs. illusion; avant-garde vs. mainstream; the PMR vs. the IMR. Subsequent research revealed that the benshi practice may not be as Brechtian as Burch idealizes and what happens in the “West” can also be ideologically “resistant”.³³ Because lecturing has now been established as a global phenomenon, a cultural object (Gunning), the mediation a lecturer exercises between the film and the audience needs be understood as a dynamic and flexible process: it varies, not only from one culture to another culture, but also from one particular lecturer to another, and one film to another. Despite all these variances, there are two things that remain essential to this kind of practice: the lecturer/benshi has a prominent visual presence; and he/she has only one voice.³⁴

The second kind of voice differs from the first in two aspects. It is a “behind the screen” approach, as opposed to a lecturer’s “beside the screen” one, and it often involves multiple voices. Rick Altman gives some detailed accounts of this practice, especially the success of the Howe troupe, made possible by its leading impersonator, LeRoy Carleton. According to Altman, their success is attributed to two factors. First, Carleton was able to provide a variety of sound effects and dialogues, all synchronized. Second, the troupe was given a long period of time to perfect any given show, so sound effects were carefully planned and arranged. But if all the sound effects for a given film were well rehearsed, one might ask, why not simply record it and reuse it in the following exhibition? In fact, Howe was initially a phonograph showman and the tricks of recording were no stranger to him. But it is precisely for this reason that he knew the limitations of the technology—there was no reliable means of synchronization; the phonograph did not record certain human voices well;³⁵ without amplification, the sound thus produced could not accommodate a reasonable size of audience. Last but not least, there is also the issue of aura—what a live performance is perceived as offering more than a recording. Carleton was admired for his artistry, and his performance is valued precisely because it cannot be duplicated. This is why the mechanic version of Carleton, that is, the sound effects cabinets such as Soundograph or Allefex were regarded as poor substitute for the real thing. Similarly, Zukor’s Humanovo troupe stressed it did not use “a phonograph or any other kind of machine.”³⁶ The itinerant troupe is divided into many teams, each devoted to one single film and given the time to rehearse and perfect the dialogue delivery. The only difference between Humanovo and Carleton or Henry Lee is that the former does not rely on a star impersonator—certainly less awe-inspiring but more practical.

The voice beside the screen and the one behind the screen constitute two modes of immersion that either disregard or disguise the inherent non-unification of sound and image. Neither of the two is entirely novel in the history of art and entertainment. As is often the case, a new technology often only substitutes an old role in the modular configuration of any given form of entertainment and acts what David Bordwell has called a “functional equivalent” of the old technology. The film lecturer’s immediate predecessors are magic lanternist, narrators of illustrated songs, travelogues or passion plays. Their voice-overs offer a kind of immersion that does not depend on perceived sound-image synchronization, but instead, mainly derives from the orality³⁷ tradition that preceded literacy.³⁸ This may help to explain why the benshi were able to achieve star status³⁹ in their heyday, for in this mode, filmic images were perceived as secondary to the oral performance.⁴⁰ It is the voice that evokes and dispels the images, instead of the images that contain a world in which voices dwell.

The voices behind the screen, by contrast, can be conveniently compared with various forms of puppet shows, where voices and sound effects do manifest a degree of synchronization. Yet we may call it “coarse grain” synchronization, since it often does not offer—and the context of exhibition generously allows it—the kind of fine-grained synchronization in lip-sync. Granted, the “dubbing troupes” (their practice is no different to a modern day dubbing company, except for the use of recording technology⁴¹), thanks to their time-consuming rehearsal, are able to achieve a close approximation to the classical configuration of cinema. The ventriloquism effect there is strong (in the early days of sound reproduction an unmediated human voice might have even sounded better in many cases) and there is not much distinction, in terms of spatial orientation, between a loud speaker and a loudspeaker, both of which were located behind the screen. Yet the reason why these human voices are superseded by technologies of sound recording and reproducing cannot be attributed only to issues of practicality and quality control (the inescapable destiny of cinema shifting from cultural performance to cultural industry, where product standardization is the key). Even when lip-sync is somewhat satisfactory, the practice still leaves a lot to be desired: the voice is not perceived as sufficiently authentic.⁴²

What is important to note in the behind-the-screen scenario is that the images now take precedence; voices, as well as other sounds, are perceived as serving them. Or rather, it is the narrative that takes command. Altman has named “the canary effect” or “the cowbell effect”⁴³ the unfortunate occasion where a bird or cow in the background, having little narrative significance, is given a sound that does violence to the story. This example illustrates, through a counterexample, how the rise of narrative calls for sound’s submission. Similarly, while a lecturer may with great eloquence vivify a scene with rhetorical modes of address (which goes far beyond the practice function of explication for the illiterates among audience), inevitable disruptions come from his/her decentering bodily presence, forced direct quotation and finally, a “mono” presentation of a polyglossic world. The lecturer may manipulate his/her voice to a certain extent to impersonate different characters, but its effect is understandably limited. The voices behind the screen may serve the purpose of narrative immersion better, since they are synchronized, multiplied and disembodied—thus capable of being anchored to bodies on screen and creating “pictures that actually talk”. The craft of Carleton and his colleagues will emerge again triumphantly in the dubbing stage, where modern technology offers a permanent fixture of the impersonating voices. The benshi’s voice, too, is not archaic, as it might lead to a different form of immersion (contra Burch), as the frequent use of voice-over in cinema testifies.

Despite the prominence of benshi in Japan, the global presence of lecturers and the success of some dubbing troupes, the majority (in a strictly statistical sense) of film exhibition before 1927 was indeed “speechless”, or “mute”. The voice only had a marginal presence in cinema. Music, whether in the form of orchestra, piano or some ingeniously conceived mechanical devices, was the dominant aural presence in film screenings. Even in cases where the voice did flourish, whether in the form of lecturer or impersonator, it was perceived as explicitly diverting from the fictional world. What the voice offered in the case of benshi may be characterized as—to play with Chion’s famous term—an “added value”. Indeed, I am tempted to suggest that the benshi case may be called a “musical” use of voice, since its hermeneutic role is not dissimilar to that of the music in silent cinema. The Jazz Singer, however, signals the advent of a different kind of voice, one that operates on completely different principles. It functions through a mechanism of alignment, whose power not only raises the voice’s status as opposed to other sounds, notably music, but also triggers a sustained interest in the voice, which completely transforms the industry. This sudden proliferation of human voices in cinema should be treated as a connected yet entirely different event from the technological availability to play pre-recorded sound with film projection in an acceptably synchronized fashion. The so-called “coming of sound”, therefore, is better divided into two phenomena: the coming of synchronization and the coming of voice. There is naturally a causal relation between the two, for what could be a better demonstration of perfect synchronization than a speaking mouth, somewhat disdainfully depicted in Singin’ in the Rain? Music, on the other hand, apart from limited circumstances (such as a stinger⁴⁴), does not require precise synchronization. This is another reason why, while sound technology was envisioned initially to record music but not human voice, voice soon became the real star, to the extent that it completely overshadowed and exiled (briefly) music.

The sensational effect of Al Jolson’s voice, scarce used as it is in the film, is twofold: on the one hand, the audience gets to hear Al Jolson’s true voice, not that of some anonymous impersonator behind the screen. On the other, Jolson’s vocal performance in The Jazz Singer is quite spontaneous, no doubt due to its improvisational nature. It is in fact considerably more authentic than many part-talkies or all-talkies that came out much later that subscribe to a theatrical mode of speaking. Both these two effects contribute to a high level of perceived authenticity of the voice, which retrospectively sounds decidedly atypical in the context of early sound film. Ultimately, it is this high level of perceived authenticity of the voice that can explain not only voice’s status of attraction for nearly two decades, but also its permanent installation in cinema. It is not the voice, but rather, a perceived authenticity of the voice, that is truly unique and uplifting, that is previously unavailable, as if a much desired (although one might not know what one desires) dream now suddenly becomes a reality.

But how did the audience (especially those who hadn’t been to Jolson’s live performances) “know” it was Jolson’s voice? This “knowledge,” or rather, “faith,” depends on many things. One of the most important of them is perhaps that the sound recording technology used at the time, or rather, its technical limitation, was still perceived as a guarantee of authenticity, just like photography was regarded initially as inherently truthful—“the apparatus can’t lie!” A brief period of ontological unison of sound and image strengthens our belief in the voice. No matter it is a Movietone newsreel or a Vitaphone vaudeville piece, it is to be understood that a voice we hear truly originates from a body we see: a celebrity (George Bernard Shaw, Mussolini, President Roosevelt), or a famous performing artist (an opera singer, a standup comedian). The attraction of going to a sound film with Greta Garbo is precisely: “Garbo speaks!” Interestingly, this taste of authenticity, once acquired, leads the industry and the audience alike to reject impersonating, albeit temporarily. At least for a while, to use someone else’s voice is considered a bad practice and guarantees disasters as those depicted in Singin’ in the Rain. John Gilbert, therefore, cannot resuscitate his career simply by having someone else’s voice—it has to be his voice or nothing at all.

MLVs and the Decline of Voice’s Authenticity

Rudolf Arnheim once claimed, “A work of art is not a shirt with removable sleeves.”⁴⁵ Yet the history of talkie offers precisely a counterexample. For a period of time, if a film were to be distributed in different languages, it had to be shot several times, with different casts (sometimes including a different director), to ensure that every voice belonged to its true owner. Polyglot actors could stay and speak their lines in multiple language versions (MLVs); but otherwise they could only appear in one version of the film.⁴⁶ This hectic⁴⁷ and expensive practice was the main strategy of foreign export from 1929 to 1931 before it eventually yielded to dubbing and subtitling. So far the history seems pretty straightforward. But one vexing question remains: why didn’t Hollywood studios (led by Paramount the evangelist) jump at the technical possibility of dubbing and immediately change their operation in Joinville?⁴⁸ The 1929 Hallelujah uses extensive post-synchronization, which signaled the availability of acceptable, if not perfect,⁴⁹ dubbing. Yet Paramount’s European center only began operation after March 1930. UFA, American studios’ main competitor in those years, exhibits the same behavior pattern. Films were already dubbed as early as 1929,⁵⁰ but of all the features produced in Germany between 1931 and 1932, “20 percent were subbed, 10 percent dubbed, and 70 percent were MLVs.”⁵¹ The 1931 Kinematograph Year Book reflects on the practice saying, “the problem of ‘foreign versions’ is still a vexing one. The solution has not yet been found…the belief that the ‘foreign language problem’ will be solved only by producing special versions, with players imported from Europe to Hollywood, is growing.”⁵²

MLVs represent a fascinating moment in the history of cinema where, instead of the usual post-production tweakings (intertitle translation, inclusion or exclusion of specific scenes), a new mode of production is introduced to solve a problem in distribution. Naturally, MLVs is not the only option adopted by the film industry facing this sudden problem of foreign distribution. Films are shown in their original version, with detailed synopses. The soundtrack is sometimes stripped off, dialogue scenes cut short and intertitles reinstalled—effectively turning a talkie into a good old fashioned silent. But the industry’s preference of producing multiple language versions suggests that this practice possesses at the time a significant advantage over their alternatives. All historical accounts indicate that the unification of the voice and the body yielded a superior perceived authenticity of voice, which is regarded as a highly desirable feature at the time, whereas dubbing is widely condemned. Ginette Vincendeau writes that,

\[\... \]

like the Siamese twins, Face and Voice are inseparable, the death of one implying the death of both.” Dubbing upset the feeling of unity, of plenitude, of the character, and thus the spectator position. Moreover, it produced in the contemporary audience a feeling of being duped. A trade paper announced in June 1930: “Dubbed films are easily recognized as such by audiences who daily get more sophisticated.” Dubbing was on the whole accepted only because of its novelty, and even then it was considered that it would “go for a while on the novelty angle” but would soon by found unsatisfactory on account of poor synchronisation.⁵³

\[...\]

the practitioners of dubbing would be burnt in the market place for heresy. Dubbing is equivalent to the belief in the duality of the soul.”⁵⁴ René Clair, too, wonders why “actors, who generally show so much concern for their glory and take such pains to safeguard ’the dignity of the acting profession,’ passively accept a degrading practice that is the very negation of their craft.”⁵⁵ Contemporary reflection on the issue shifts the object of condemnation from “witchcraft” or “the dignity of the acting profession” to the integrity of cultural identity. Shohat and Stam argue, “to graft one language, with its own system of linking sounds and gesture, onto the visible behavior associated with another, then, is to foster a kind of cultural violence and dislocation.”⁵⁶ Sometimes this “cultural violence and dislocation” can have long-term effects. Mark Nornes recounts the shock experienced by Japanese audience when they were offered a “carefully” dubbed version of Raoul Walsh’s The Man Who Came Back (1931), where the familiar faces of Charles Farrell and Janet Gaynor are grafted with Japanese voices with a thick Hiroshima accent (“plucked”, as he describes it, out of the Little Tokyo community in LA) and concludes that this “may have been a deciding factor in the standardization of subtitling in Japan. Had Fox conducted its initial dubbings in Japan, this might have been a very different story.”⁵⁷ Even now, as dubbing has been established as common practice and the audiences is supposed to have long habituated to it, it can still produce jarring effects, especially for bilingual audiences who have been exposed to the original version. The case is at its worst when the dubbed voice is inherently incompatible with the image. Stam for instance offers the example of the “collision of cultural codes associated with Brazilian Portuguese (strong affectivity, a tendency to hyperbole, lively gestural accompaniment of spoken discourse) and those associated with television-cop English (minimal affectivity, understatement, controlled gestures, a cool, hard, tough demeanor).”⁵⁸

While MLVs cannot resolve all of the above difficulties—the plausibility of the plot, moral inclination, character psychology, etc. cannot be safely translated into another language without distortion—it does offer something to alleviate the Babel symptom. It is a relief to the audience, knowing that the voices one hears are authentic, in the sense that they belong to the bodies one sees; whether the overall film turns out to be satisfactory or not is entirely another matter. As I mentioned in the previous section, the status of the new voice as an attraction fostered a heightened awareness of the acoustic quality of the voice, and of its precarious anchor to a body. That this preference causes a delayed acceptance of dubbing is almost universally agreed upon. In the short but prosperous life of MLVs, voice struggles to maintain its bodily tie. MLVs represent the belief that the authenticity of voice is so desirable that it outweighs other qualities of the film. And it is precisely the gradual decline of the said authenticity that triggers a change of balance, because a now depreciated voice can no longer redeem its ensuring costs.

My characterization of the ebb and flow of the voice in its transition years is not meant to underestimate the complexity of this period. On the contrary, it intends to offer a much needed and nuanced model of evolution of cinema seen from the perspective of voice. The issue of MLVs presents yet another challenge to the teleological historiography that dictates: cinema always moves onto something better or something more essential, which Tom Gunning has termed respectively as the evolution assumption and the cinematic assumption.⁵⁹ With the benefit of hindsight, we may claim rather conveniently that MLVs point to a dead-end. Yet a teleological account would have to struggle much to explain why MLVs, clearly a more authentic mode of production, and perceived as such by its contemporaries (including both the industry and the audience), would eventually fall into oblivion.⁶⁰ Previous accounts of this inexplicable affair center on the less-than-ideal profitability of the practice. Ginette Vincendeau for instance articulates the inherent weakness of MLV’s mode of operation as: “MLVs were, on the whole, too standardised to satisfy the cultural diversity of their target audience, but too expensively differentiated to be profitable.”⁶¹ But the exact observation can be made to Hollywood productions in general, which have always been highly standardized yet manage to satisfy (to the extent of dominance!) its global and culturally diversified audience. And it does this precisely by differentiating the strategies needed for domestic and foreign market. While standardization is a necessity for the industry and its rather stable domestic market, Hollywood has always been more than diligent in their effort to adapt their product to the global market. Localization has always been a priority in the producer’s mind and it routinely influences if not determines the final form of product. Ruth Vasey shows voluminous evidences of the extent to which the industry’s self-regulation shaped the content of films, so as to make them salable in as many markets as possible.⁶² In this context the idea of making the most of one package of movie material (script, décor, crew, etc.) for multiple markets is in fact a reasonable idea. Hollywood producers must have believed that by streamlining the process and offering a better product, they can deal away with this bugging problem and keep on doing good business—in theory all these should work!

What also needs to be acknowledged is that MLVs as a mode of production is not a patent American affair. In fact, recent scholarship has challenged the traditional focus—if there is any focus at all—of MLVs being depicted mainly as such. Indeed, European companies practiced MLVs before and long after their Hollywood counterparts. The first MLV, Atlantic (1929) was in fact a British production, aided by German émigrés (it was made in English and German first, and then French). UFA’s MLV production extends to 1939, when Hollywood had long been firmly converted into dubbing.⁶³ Even after 1930s, MLVs are still being made, albeit sporadically.⁶⁴ In fact, the idea that multiple films in different languages tailored to different national markets can share a same plot remains a plausible one even today in particular contexts. The lasting appeal of MLVs can be seen from the numerous remakes in contemporary filmmaking which in a broad sense can be called multiple language versions of the same film. Think, for instance, of Roger Vadim’s Et Dieu créa la femme (1956, 1988), Michael Haneke’s two versions of Funny Games (1997, 2007). And it is not unusual to have one actress or actor play the same role in these versions.⁶⁵

Another site of intense complication is that: although I do see (and a considerable portion of the early sound film audience too) dubbing or voice doubling⁶⁶ as the opposite of voice-body unity, I do not wish to advocate a naive belief of this ontological unity. If dubbing as a form of deviation is inherently disruptive to cinema’s ontology, the degree to which it is perceived as such varies. With the help of skillful dubbing artists, the increasingly sophisticated technology and the audience’s happy ignorance of the voice being tempered,⁶⁷ it is very much a resolvable disruption. As one reviewer put it, “In the matter of voice-doubling in the movies … there can be only one crime, the crime of being detected in your voice-doubling. The creed of the dumb movie player might well be, Let not thy public know what thy voice-double doeth.”⁶⁸

The now prevailing practice of looping, sometimes referred to as ADR (automated dialogue replacement), often uses the same actor or actress for the voice. What is violated, therefore, is not the voice-body bond, but rather, their unified site of enunciation. This is perhaps the least disruptive route, noticeable only to an auditor who has intimate knowledge of sound recording. On the other hand, a true voice, recorded onsite, can be perceived as disruptive for many reasons. Ultimately, what an audience perceives as authentic in a voice produces depends on the auditor’s extra-cinematic knowledge, linguistic competence and habit of listening to films. Katherine Spring recounts an investigation geared toward the authenticity of Richard Bathelmess’s singing in Weary River (1929), where “a committee of local musicians was appointed to analyze and assess the matching of Barthelmess’s physical performance with the sonic qualities of the voice on the soundtrack.” He was found guilty based on the ground that “the intensity of breathing so essential to singing is entirely lacking” and “Barthelmess moves his lips with no movement at all in his throat.”⁶⁹

A dubbed film is most jarring if the auditor knows the original language that is supposed to be spoken; otherwise it would be merely a tolerable nuisance compared to a wall of opaque sounds created by foreign languages. The loss of authenticity is compensated by an easy access to intelligibility, for reading subtitles demands considerably more cognitive effort. The contemporary practice of dubbing and subbing in Europe and Asia is most revelatory. One observation really nails it: there are dubbing countries and subbing countries, and there is one country that does remakes. Although the case is complex, there seems to exist a general rule: in monolingual countries of considerable population (thus domestic circulation can easily sustain the cost of dubbing) everything is dubbed; in countries of multiple official languages or those of small population, dubbing is allowed for children and everything else is subbed.

The Taming of Voice: From Freedom of Speech to Speaking Degree Zero

The issues of MLVs and dubbing are closely related, historically and theoretically, to those of accent. Our linguistic competence includes a natural (or acquired) ability to detect foreign accent and a varying degree of knowledge of the dialectal variations of this language. In hearing a filmic speech, accent becomes involuntarily an indicator of geographical and ethnographic region and sometimes, social class. The detection of accent unjustified by the diegesis thus potentially leads to unwanted distraction. This is the primary reason why, in the heyday of MLVs, most American studios have opted to establish their production facilities in Europe (Paris, London, Berlin), where talents of proper accents can be easily obtained.⁷⁰ Some of the supposedly polyglot actors may be able to speak several languages well, but with an increasingly demanding audience, nothing less than perfection accent is required. A Variety writer reports on Adolphe Menjou’s susceptibility to this rigorous test,

The trouble is that Frenchman now want their films, where French dialogue is spoken, without any foreign accent whatsoever. Menjou’s French is very good, but his accent is nevertheless felt at times.⁷¹

For the American film industry, the same issue emerges in the domestic market as well as in international market: to a party celebrating the voice, accent came as a sort of uninvited guest. Vincendeau describes,

The necessity of showing films in the language of their country of exhibition confronted Hollywood with the ethnic, linguistic and cultural diversity of its audience. Suddenly studios were aware that Latin American audiences did not appreciate films in Castillian accents, that British accents provoked mirth in the Midwest, and that in the Midlands Yankee voices seemed equally funny. Hollywood was also alerted to a large ethnic range on its home territory.⁷²

The audience’s far from indifferent reaction to accent (sometimes hotly debated in fan magazines and other venues) indicates a heightened awareness of not only the quality of voice, but also the ways in which voice works in conjunction with the image, namely, how a voice is anchored in the diegetic world. Within a few years, however, the public loses interest for the topic. One can interpret this lost of interest as the result of a topic having run its course, as the audience’s perceptual habituation, or as the changes brought to the films themselves. The transition to sound is a speedy process, considering the scale and complexity of its operation. Yet even this brief period shows different attitudes in regards to the presentation/reception of accented voice.

In some of the early sound films accents are so defiantly prominent compared to the tamed voices heard later. Applause (1929), for instance, sounds surprisingly unrestrained compared to the majority of films made shortly after. Similar cases can be found in French cinema. Jean Renoir, a tireless explorer of human vocal expressivity from the start, culminates his vocal realism in Toni (1935), famous for its faithful rendering of the potpourri of distinct dialects of the immigrant workers. In l’Atalante (1934), Dita Parlo is supposed to be a French country girl, despite her German accent.⁷³ Michel Simon, too, has a heavy Vaud⁷⁴ accent, which does not correspond to his identity in the film. If only several years earlier speaking proper French without accent had been important to the French audience—to this day French cinema generally speaking does not tolerate non-French French (Quebecois is an example)—why this sudden linguistic tolerance? Is the case of l’Atalante exceptional?⁷⁵

In retrospect, it seems to me that there may have existed a brief period when sound cinema was allowed a certain degree of “freedom of speech”. In this period, I propose, vocal diversities flourished, not by any artistic intention, but as a result of the genie being released from the bottle. To furnish substantial details that fully support this claim would go significantly beyond the scope of cinema scholarship.⁷⁶ Like many transitions in the history of cinema (e.g., from attraction to narrative integration), it is impossible to draw a hardline that fits all, for the length of this period, as well as the degree of liberty taken, varies from country to country, case to case. Nevertheless, my proposal is not exactly contentious. Most would agree, I believe, that when voice was first introduced as an integral part of the film, neither actors nor filmmakers know exactly what to do with it. Actors and actresses whose face have long been accepted and adored by the audience all of sudden discover that their voices might be potentially disturbing, either because they are accented, or they simply do not “record well”. It is a period of intense experimentation, where rules for vocal performances were borrowed from other forms of entertainment, consolidated or established from scratch. What results is a wild diversification, an unprohibited proliferation of voice.

Again, freedom is only a fact retrospective recognized. Indeed, this freedom of speech did not last very long and was counterbalanced from the very beginning by a force of normalization. While the raw quality of voice may be thrilling to hear, and constitutes a novelty in itself, the need to tame it was felt immediately, and most urgently. Cinema’s taming of its own voice happened on two fronts. On the one hand, one observes the global rise of a standard, theatrical and neutral speech within each national cinema, where signs of regionality are minimized. On the other hand, conventions of dialects, accents, and vocabularies are established for specific genres. Kozloff writes,

Throughout the 1920s, 30s and 40s Theatre Speech or Transatlantic was taught in America’s professional acting schools. It represented a neutral dialect that borrowed from both Standard British and Standard American pronunciations. . . . Standard American is that variety of American speech that is devoid of regional or ethnic characteristics and does not reveal the geographical or cultural origins of the speaker. . . .When talking films were introduced in 1927, actors wishing to work in the movies rushed to obtain instruction in this elevated mode of pronunciation.⁷⁷

Michel Marie suggests that the same idea is applicable to French Cinema,

In short, what Renoir's cinema rejects, from 1931 on, is a “neutral” French, the "zero degree of spoken French" that in fact is only the speech of the Ile de France, and more specifically that of the intellectual bourgeoisie; an "unmarked" speech because it determines the norm, the "standard of reference". This anemic language that is taught in the Conservatories reigns unconditionally over the speech of the French cinema today; it is the speech of "dubbers", of all the actors who do post-synchronisation. Only those films shot with direct sound, those of Jacques Rivette for instance, escape this domination.⁷⁸

The transformation of an accented speech into a neutral one effectively eliminates all references to one’s ethnographic and class identity. In the beginning of My Fair Lady (1964), professor Higgins boasts that “Anyone can spot an Irishman or a Yorkshireman from his brogue, but I can place a man within six miles; I can place ‘im within two miles in London, sometime within two streets.” But when Eliza Doolittle starts to pronounce her “AEIOU” in perfect manner she stops being a lower class girl selling flowers at Covent Garden. In fact, the Hungarian language expert who specialized in detecting from accent a speaker’s origin —a pupil of professor Higgins on the subject of phonetics—claims she speaks English too well to be English; therefore she must be a foreigner who leant English well—a Hungarian princess.

Dialects and accents are not, however, entirely suppressed; rather, they are domesticated through a process of conventionalization. Dialects and accents are allowed to exist in a tamed state of being, because they now become part of a genre convention. Sound cinema has created new film genres. And every such new genre soon finds itself a distinctive way of pronunciation, a vocal signature of its own. Dialects and accents prove to be extremely effective in creating a particular kind of realism that film genres need.⁷⁹ Onto the base layer of narrative comprehension, a dialectal inflection is often added to convey ethnographic background of the speaker. The frequent use of Italian and Irish accent in American films no doubt owes their first appearance in gangster films. Jonathan Munby writes,

What makes Little Caesar special is not that we hear a gangster talks but that he talks with an accent. Although it is a case of Jewish-American imitating Italian-American—thereby false, inauthentic—the accent itself strengthens the authenticity; it is an added layer of signification which the film can use and the audience can access. His speech made us aware of the act of narration is conducted from a specific angle, “a specific cultural space; the accent frames his desire for success within a history of struggle over national identity.⁸⁰

On the other hand, as the opposite of this culturally low register, we have an added scene in Scarface that is most “audible”, a diatribe on gangsterdom in the office of a newspaper, by its “distinctly Anglo tones.”⁸¹

In American cinema, Kozloff observes, genres have a linguistic pattern that is quite comparable to plot construction, character depiction, acting style and numerous aspects of cinematography. For the Western,

\[...\]

In APWD all women are addressed as ‘ma’am’, all strangers are referred to as ‘pardner’, horses are ‘ponies’, homes are ‘ranches’, meals are ‘chow’, clothes are ‘duds’, a gun is a ‘piece’, employees are ‘hands’ or ‘boys’, Indians are ‘injuns’, ‘bucks’ or ‘squaws’, hello is replaced by ‘howdy’, think and/or believe folded into ‘reckon’, thank you covered by ‘much obliged’. Along with a specialized and instantly recognizable vocabulary, western characters commonly employ an informal pronunciation and syntax: ‘git’ instead of ‘get’, ‘gonna’ instead of ‘going to’, ‘fella’ instead of ‘fellow’, ’evenin’ instead of ’evening’.⁸²

As Kozloff points out, although the “all-purpose western dialect” is based on various dialectal inflections that really existed, the dialect itself is entirely fictional, with little claims for historical authenticity. But insofar as it is consistently used in the genre, it constitutes an alternative reality, a cinematic worlding device, that excels at bringing the audience into the “genre land.” It is a convention not backed up by linguistics, but rather, invented by and circulates within cinema. Once established, this fictional language can be then referenced by other fictions, thus contributes to an intricate network of linguistic exchanges specific to cinema. Screwball comedies, for instance, specializes a speech pattern Kozloff calls “eastern upper class, spiced by urban slang.”⁸³ And the Western accent and idioms can be then contrasted with this verbal dexterity and linguistic sophistication.

From Polyglot Films to Polyglossic Cinema: Language Strategies and Aesthetic Effects

An alternative solution to the problem of foreign distribution, in addition to those already discussed, is the polyglot film, that is, films where multiple languages are spoken. Some of the earliest sound films were already polyglot: Kameradschaft (1931), Allô Berlin ici Paris (1932), and the less well-known Niemandsland (1931).⁸⁴ Bilingualism soon became an effective symbol of national conflict in war films such as La Grande Illusion (1935). Today, polyglot film remains a staple in European productions. Recently Chris Wahl⁸⁵ has proposed the idea of polyglot genre, which he further divides into a series of sub-genres: episode, alliance, globalisation, immigration, colonial, existential.

To what extent can the polyglot film function as an alternative solution to the problem of foreign distribution? And what kinds of advantages and disadvantages does this solution offer? If a film speaks several languages, does it mean it can be automatically marketed in countries where these languages are spoken? If apparently a film made in multiple languages is comparable to a film in its MLVs form—that’s perhaps why Vincendeau characterizes the polyglot film as the third category of the MLVs— polyglot films and MLVs are in many ways two worlds apart. MLVs remain a historically ephemeral⁸⁶ phenomenon while polyglot films will always be made, as long as there are different languages in this world. The so-called different versions are essentially different films that share some of their production resources (plot, cast, set, etc.) but each of them demonstrates a linguistic homogeneity. The defining characteristic of polyglot film, on the other hand, is its linguistic heterogeneity. A MLV film carefully disguises the babel nature of the world and makes its language transparent to its target market; a polyglot film mirrors linguistic barriers in the real world and showcases their differences.

What is carefully evaded or disguised in MLVs takes the central stage in the polyglot film: the question of language strategy. What I call the language strategy is a form of negotiation that involves three parties: the film, the diegetic world that the film presents, and the target audience. Unless a film produced by a linguistically homogenous community and is only circulated within that community, problems of communication and representation will arise. On the one hand, a film needs to convey the necessary narrative information when its audience may not speak the language or understand all the nuances; on the other, the authenticity of performance is at risk if the linguistic reality of the diegetic world is misrepresented. The film, sandwiched between the world in front of its camera and the one in front of its projector, wants to but may not be able to please both. It is under this dilemma that different linguistic strategies are proposed, each of which offers its own set of benefits and trade-offs.

One strategy may be called “intelligibility strategy.” This strategy tailors the dialogue so that the majority of speech is delivered in the language that the target audience does speak, while using the foreign languages only as embellishment. The foreignness is there to convey a sense of authenticity, but what the audience does not understand either is unimportant, deducible from the action, or is paraphrased by a character using the home language. This is the path that Hollywood has walked too often. Take the opening of Design for Living (1933), where Miriam Hopkins and Gary Cooper exchange awkward French about Frederic March’s nose. It matters little if the target audience didn’t understand the dialogue. For Lubitsch’s masterful mise-en-scène is graphic enough to convey the gist of the scene: instant attraction between the opposite sex. A more typical (and less inspired) use goes like this: in the exotic settings is inserted often an American who explains everything to the audience. Yet one only hears foreign language in the first five minutes. Soon everyone is speaking English, and naturally with foreign but somewhat charming accent.

Allô Berlin, ici Paris features a rather even linguistic distribution of French and German, with no particular language perceived as dominant. It tackles the babel problem by using a considerable amount of parallel speeches, often strengthened by identical visual treatments. These segments act like an illustrated bilingual dictionary where speakers of either language can comprehend and, if so wish, learn the corresponding expression in the other language. There are also long segments that are reminiscent of silent filmmaking where actions are carried out without resorting to speech (the button episode⁸⁷ is but one example among many). There might still be sections whose meaning cannot be conveniently guessed, which create potential gaps in the narrative. But the film can get away with them because the narrative itself is constructed rather episodically, where a series of colorful vignettes are loosely threaded to form a romantic narrative. Indeed, filmmakers often use such linguistic opaque passages to facilitate subjective projection, reasoning that the baffling experience of the film’s protagonist might well correspond to that the audience.⁸⁸

Kameradschaft, however, contains little parallel speech through the use of montage and one character hardly repeats after another. Most often than not, one can not guess what is happening without a proper knowledge of the two languages spoken in the film: French and German. The film is therefore exemplary of the “authenticity strategy”, which retains the foreignness of languages and do not cave in to the audience’s potentially limited linguistic competence. For this kind of film, a certain amount of linguistic opacity is inevitable. Yet to alleviate the situation, as well as to convey crucial narrative information, a polyglot character is often needed who undertakes voluntarily the task of translation whenever needed. In Kameradschaft this role is undertaken by Kasper, while Niemandsland’s even more ambitious scope of languages (English, French, German, Yiddish) is mastered by an African American solider (Louis Douglas) in the service of French colonial army and of a travelling vaudeville performer background. For the English-only speaking audience, the subtitle helps the comprehension, but it also eliminates the linguistic signs that allow one to easily identify a segment’s geographical location. This is especially true for Kameradschaft, where the French side and the German side frequently alternate with often-similar decor. Complains would certainly pile up, were it not for the fact that this very linguistic opacity happens to be the message that is to be delivered, that the miners can communicate without the help of language, but with an instinctual sense of solidarity. In Niemandsland, too, the linguistic barrier serves the purpose of demonstrating the absurdity and inhuman nature of the war.

The reality of filmmaking demonstrates a mixture of the above two strategies to varying degrees. The Saga of Anahatan (1953) presents a fascinating example of how this mix can be pushed to an extreme: instead of occasional embellishment, the film retains throughout an opaque but perceivably authentic use of language for its intended Anglophone audience. A disembodied voice-over in English (of Sternberg), however, compensates for the alienating effect. It is there to explain and comment on the action and dialogue, just as a benshi would do—one might say that Sternberg recreates the benshi experience for the Anglophone audience. And this benshi, we should add, has control over the editing so he doesn’t have to compete for attention with the other voices. Like that of benshi, Sternberg’s voice is disruptive to the audience’s total immersion in the diegetic world. But the soothing voice-over also lends a peculiar kind of intimacy, which helps the audience navigate an otherwise entirely alien world of harsh, unintelligible sounds. Compare this with a recent film of similar setting, Letters from Iwajima (2006). The Eastwood film opts for subtitles and, arguably as part of its artistic ambition, keeps the foreignness of the filmic world intact. Finally, Noël Burch makes the observation that for a non-Japanese speaker, the language has a musical quality that resembles Schoenbergian Sprachegesang.⁸⁹ The incomprehensible Japanese therefore ceases to be speech, but becomes background music.

Another interesting example of conflicting strategies can be found in Deserter (1933). The first hour or so of the film takes place in Hamburg, where German shipbuilders (played by Russian actors) go on a strike. But what language do they speak? Russian. The audience needs to know what’s going on. The moment the Germans come to Russia, however, they start to speak German. Now they need to distinguish themselves linguistically from the diegetic Russians. This abrupt change of linguistic representation (certainly not acceptable any more) creates a weird sensation that the film one is watching is actually joined together from two different language versions. As such it becomes an excellent example of the kind of linguistic problems many early sound films faced. In the climax scene, Karl Renn (Boris Livanov) confesses his deserting behavior in front of a Russian audience. Renn’s speech is relayed by his bilingual coworker, who reluctantly translates and, one has reason to suspect, far from literally. For the target audience of this film, there exist long segments of linguistic opacity when Renn speaks. The scene is particularly striking as it foregrounds the clash between two languages that the film has so far entirely suppressed, following the intelligibility strategy. Caving in to the non-German speaking domestic audience, Pudovkin nevertheless is apparently aware of the artistic potential of authentic language uses. The final scene can be said to have effectively exploited a “counterpoint” use of voices, namely, a perceived contrast between the highly emotional, alien and opaque German and the seemingly rational, familiar and assuring Russian.

Multilingualism has become, with the advent of sound, a perennial challenge for cinema. Considering the fact that multilingual speakers have always outnumbered monolingual speakers in the world’s population, this challenge is highly meaningful and timely. The presence of audible and heterogeneous languages in cinema constitutes in my view a new aesthetic dimension that the sound cinema has learnt to harness. A polyglot film may present multiple languages within itself, but the effect of this multiplicity varies considerably in terms of how these languages are presented on the formal level, namely, the linguistic strategy of the film. The languages may repeat, corroborate, contradict or confront each other, very much like the kind of linguistic performance in the real world. If we have to consider the polyglot film as a genre, then the true nature of this genre consists of less the potentially infinite themes it can embrace, but rather, the overall aesthetic effect of such a network dynamics of languages.

Conclusion

“It only lacks the voice.” Such is the ancient praise, long before the technologies of moving images became a reality to man, of works of visual art, and works as a criterion for determining the ultimate success of artistic expression^⁠.⁹⁰ In a similar fashion, in the voiceless years of cinema, this very aspiration had always been there, palpable, despite the presence of a myriad of sounds that went along with the image. Cinema aspired to have its own voice, not issued from a lecturer standing aside of or anonymous figures behind the screen, as if the mute greatness of the sphinx needs to be brought out by a hidden tourist guide.

What the advent of voice means to cinema may be compared to a child’s transition from pre-linguistic utterances to full-fledged talking. In the long geminating period sounds made by a child evolve from accidental utterances, repetitive articulation not anchored in language, to a limited vocabulary that depicts the world only in its vital aspects. A child not yet capable of speech can make 99% of her wishes known by nonsensical syllables, yet the moment language emerges, it is as if the world were created anew. From the sporadic experiments, misguided intentions and technical imperfections, it is as if an invisible force finally hit cinema and made her dream come true. The ecstasy experienced by the audiences of The Jazz Singer is also experienced by every parent, hearing the first articulated word from the child. A voice emerges, which signifies a decisive step of the child’s acquiring of her human identity. The voice does not emerge; it erupts. Once discovered, the efficiency of the means, and the overwhelmingly positive responses received all propel the child to expand its use to every aspect of her life. A child thus becomes extremely talkative, for talking constitutes by itself a novelty that one needs time to absorb.⁹¹ But this is a necessary price paid to have a voice of one’s own. If I may be allowed to carry on my analogy of developmental psychology one more time—there is a reason why Bazin frequently makes this sort of analogies—when cinema is not yet able to speak by itself, we speak for her; when cinema learns to speak, it speaks for us.

Instead of presenting a complete history of how cinema learns to speak, this chapter offers a methodology of constructing such history. I conceive the perceived authenticity of the voice as a way to understand the history of how voice contributes to the cinematic world. It explains why certain stylistic and production choices of filmmaking are received with enthusiasm; others, having enjoyed a certain period of popularity, soon fell into oblivion. Elusive as it may sound, the sense of authenticity is a crucial component of the cinematic experience. The history of voice in cinema, taken from this view, shows an interesting alternation: on the one hand, this authenticity is often renewed by technological breakthrough or formal innovations; on the other, between renovations the voice is made conventional. This history of voice threads a series of critical phenomena such as the charm of benshi, The Jazz Singer’s sweeping force, the inexplicable decline of MLVs, the normalization of speech and the marginalization of dialects, etc. There obviously can be more topics added to this list of events. In the next chapter, I propose that the reemergence of dialects in multiple national cinemas can be better understood as a renewal of the perceived authenticity of the voice. By renewing our contract with the filmic world, the dialect films allow us to perceive, again, this phenomenal authenticity of the human voice. Ultimately, what the voice contributes to the experience of cinema is that it goes beyond the surface of the inanimate objects of the world; through voice we access a crucial aspect of the filmic world: the living, moving and speaking human being.

Kracauer, Theory of Film, 107. ↩︎
Jean Renoir, My Life And My Films (Da Capo Press, 2000), 103. ↩︎
Rick Altman, “The Technology of the Voice (Part I),” Iris 3, no. 1 (1985): 3–20; Rick Altman, “The Technology of the Voice (Part II),” Iris 4, no. 1 (1986): 107–19. ↩︎
Chion, The Voice in Cinema, 1999. ↩︎
Michel Chion, Le complexe de Cyrano: la langue parlée dans les films français (Cahiers du Cinéma, 2008). ↩︎
Kaja Silverman, The Acoustic Mirror: The Female Voice in Psychoanalysis and Cinema (Indiana University Press, 1988); Doane, “The Voice in the Cinema: The Articulation of Body and Space”; Amy Lawrence, Echo and Narcissus: Women’s Voices in Classical Hollywood Cinema (Berkeley: University of California Press, 1991). ↩︎
Sarah Kozloff, Overhearing Film Dialogue (University of California Press, 2000); Sarah Kozloff, Invisible Storytellers: Voice-Over Narration in American Fiction Film (University of California Press, 1989). ↩︎
Jacob Smith, Vocal Tracks: Performance and Sound Media (University of California Press, 2008). ↩︎
Nancy, Listening; Mladen Dolar, A Voice and Nothing More (MIT Press, 2006); Adriana Cavarero, For More than One Voice : Toward a Philosophy of Vocal Expression (Stanford, Calif.: Stanford University Press, 2005). ↩︎
Take another other major technological change in cinema that has been universally adopted and stabilized: the replacement of nitrate by acetate. It hardly demands any change in the tools and languages of cinema. More importantly, this change is completely transparent outside the sphere of practitioners. ↩︎
Chion’s use of the term voice overlaps with my use of the term speech, for he notes that one particularity of the I-voice is that it needs to have “a certain neutrality of timbre and accent, associated with a certain ingratiating discretion…the voice must work toward being a written text that speaks with the impersonality of the printed page.” The Voice in Cinema, 1999, 54. If the I-voice is neutral, then it ceases to be a voice in my category. ↩︎
I emphasize the circumstance because it is possible that an individual can have multiple voices—I am not referring to faking one’s voice, but true voices such as that of ventriloquist and singer (e.g., Maria Callas’s singing voice is quite different from her speaking voice). ↩︎
Chion, The Voice in Cinema, 1999, 11–2. ↩︎
This is where my definition of the term clearly differs from Chion’s, for he takes the historical objections to speech, dialogue or talking as objections to voice. The Voice in Cinema, 1999, 12. ↩︎
I am often amazed by how the voice of one infant or toddler resembles another. It seems therefore we acquire a truly unique voice only after a maturity stage. ↩︎
One person can have multiple voices and therefore multiple platforms of personality. We all change our voices to impersonate somebody else. A professional singer’s singing voice is often different from her speaking voice. A film star is often in need of establishing for dramaturgical purposes many “screen voices” that are distinct from his/her “natural voice”. In Le schpountz (1938), for instance, Fernandel is auditioned for a film and he proposes to demonstrate multiple recitations of the same line from civil code to elicit different emotions. ↩︎
Psychoanalytic therapy, as we know, relies on exclusively this method. ↩︎
This is exactly where the strength of Michel Chion’s Voice in Cinema lies: it deals with primarily the acousmatic use of voice, namely, the voice’s power in the absence of its anchoring body. ↩︎
This craft is so remarkably present in some and not in others. Marilyn Monroe, for instance, is notorious bad in remembering what she is asked to say in a film. A recent portrait (A Week with MM, 2012) seems to suggest that this is because she is truly spontaneous, more so than those (Laurence Olivier for example) who can recite from memory thousands of pages. ↩︎
Kozloff, Overhearing Film Dialogue, 18–9. ↩︎
I consider it a small flaw that Kozloff didn’t subtitle her second monograph with a proper restriction as she did for her first: “American Fiction Film”. ↩︎
André Bazin, “The Case of Marcel Pagnol,” in Bazin at Work: Major Essays and Reviews From the Forties and Fifties, ed. Bert Cardullo (Routledge, 1997), 54. ↩︎
Ibid., 54–5. Ironically, the very film Bazin makes reference to (Marius, 1931), was actually made in the context of MLV. It was funded by Paramount France and shot entirely in studio (Alexander Korda never set foot in Marseille but relied on his set designer Alfred Junge). Two foreign versions, German and Swedish, were made simultaneously (I doubt anything interesting can be said about them). In one same film therefore embodies a convenient reconciliation of the two extremes of voices: one unique and essential and the other perfectly translatable, mere conveyer of narrative information. ↩︎
Daniel Morgan, “Rethinking Bazin: Ontology and Realist Aesthetics,” Critical Inquiry 32, no. 3 (2006): 443–81. ↩︎
Gunning, “Moving Away from the Index: Cinema and the Impression of Reality.” ↩︎
Michel Marie, “The Poacher’s Aged Mother: On Speech in La Chienne by Jean Renoir,” trans. Marguerite Morley, Yale French Studies, no. 60 (1980): 221. ↩︎
For the convenience of my reader, here is my transcript: “J’étais avec mon mari en train d’écouter des chanteurs des rue quand je les entendu monter à appartement de Madame Pelletier. Quelques instants avant que je me monte moi-même et les courriers et que je vois crime. Même que je me suis dit, il l’aurait bien pu dire bonjour en croissant vue que monter la lettre de Madame Pelletier. Mon mari et moi, on a dit, c’est vraiment du drôle de monde!” ↩︎
A recent example of similar nature can be found in the TV series Orphan Black (2013), where one same actress, Tatiana Maslany, impersonates an endless horde of roles (the similarity of their appearances is justified by the fact that they are all clones) with the help of costume, makeup, a dose of method acting and the audience’s willing suspension of disbelief. Yet the voice clearly betrays the fact that all these characters are just one person. ↩︎
For an extended discussion on the issue of gunshot see Lastra, Sound Technology and the American Cinema: Perception, Representation, Modernity, 124. ↩︎
Kozloff in her study of dialogue in cinema proposes a verse mode, a prose mode, and the many other modes in between: there are dialogues that strive for casualness (Dardenne Brothers); there are also those that opt for carefully polished cadences (David Mamet’s House of Games (1987)), or a mixture of the two (Gus Van Sant’s My Own Private Idaho (1991)). ↩︎
Lastra, Sound Technology and the American Cinema: Perception, Representation, Modernity, 152. ↩︎
There are naturally intermittent silences, as Altman famously revealed. But as Gunning put it, it “differs enormously from the later reverent silence of the cinémathèque and formed a sort of oasis within a noisy environment.” “The Scene of Speaking: Two Decades of Discovering the Film Lecturer,” Iris 27 (1999): 68. ↩︎
The term is used by Germaine Lacasse in his study (unpublished dissertation) of Québec lecturer (called bonimenteur) practice. It describes a period of marginality, when local bonimenteurs revolt against the institutional meaning of the filmic text and exercise their power of parody. See Germaine Lacasse, “Le bonimenteur et le cinéma oral, le cinéma muet entre tradition et modernité” (PhD Thesis, Universite de Montreal, 1997). ↩︎
The one voice scenario is complicated by two factors: first, a benshi can, like an impersonator, imitate other voices to a certain degree of success. This technique is known in Japan as Kowairo (声色), literally “sound color”. Second, several benshis are sometimes used in service of a domestic production, not to explain or comment, but simply to reenact the dialogue. See Joseph Anderson, “Spoken Silents in the Japanese Cinema: Essay on the Necessity of Katsuben,” Journal of Film and Video 40, no. 1 (1988): 18. Note this only applies to Japanese films; foreign films are still rendered in the mono fashion. This practice falls somewhere in between the two norms I discuss here and shows how diverse and dynamic pre-sound voice practice can be. A particular occasion, I want to emphasize, does not always fit neatly into a category, which is conceptual and conceived to allow optimal coverage of instances. ↩︎
Edison knew this well. But being obsessed with the superiority of his technology, he insisted that the voice should fit the recording technology instead of the other way around. Altman, Silent Film Sound, 147. ↩︎
Ibid., 170. ↩︎
Together with gesturality, orality is arguably the oldest form of narration in human civilization. As any parent telling story to children can testify, orality can operate by itself or enlist image into its service, as an aid to excite imagination. Here lies the true origin of benshi. ↩︎
Following a similar line of thinking, Sheila Nayar makes an interesting juxtaposition between early cinema and Bollywood’s Masala films, and observes that the latter shares many traits with the former: the public space where movies are consumed, the low literacy of its audience, comparable plot construction, performance style, etc. A cinema of orality, therefore, can be used to characterize both. See Sheila Nayar, Cinematically Speaking: The Orality-Literacy Paradigm for Visual Narrative (Hampton Press, 2010). ↩︎
Anderson reports that Katsuben performances are also made available in radio and phonography, unaccompanied by the film. Anderson, “Spoken Silents in the Japanese Cinema: Essay on the Necessity of Katsuben,” 21. ↩︎
Burch famously argues that “Japanese silent film was the most silent of all” because the benshi’s voice works against and effectively inundates the imagined voice of the characters. Burch, To the Distant Observer: Form and Meaning in the Japanese Cinema, 78. ↩︎
Dubbing may give the impression that it refers exclusively to cases of translating between languages. In fact, spoken lines are often altered in the post-production stage, which inevitably results a momentary lost of lip-sync. The modern audience has become increasingly tolerant to such lost. More about dubbing later. ↩︎
There are many reasons for this perceived inauthenticity. One reason may be that these impersonators need to shout through the “drop” to be heard clearly, which compromises the spontaneity of their voices. ↩︎
Altman, Silent Film Sound, 238–9. ↩︎
Stinger refers to a sharply attacked but not necessarily loud musical chord that is often used in classical score to indicate surprise or sudden revelation. This nature dictates that it must be precisely synchronized. See Gorbman 88-89. ↩︎
Arnheim, Film Essays And Criticism, 33–4. ↩︎
At least one film circumvents this dilemma and uses a non-bilingual actor to play in two versions. The miracle occurs with a brilliant alteration of the plot. This is Fritz Lang’s The Testament of Dr. Mabuse (1932). Insisting on using the same actor (Rudolf Klein-Rogge) to play Dr. Mabuse, who does not speak French (the other members of the cast do), Lang and his collaborators have to come up with an ingenious solution: to make Mabuse a mute figure. Mabuse speaks! Nevertheless we never see him speaking. This way, as Chion wonderfully put it, “the terrible Mabuse is divided into a mute body a bodiless voice, only to rule all the more powerfully.” Chion, The Voice in Cinema, 1999, 31.Ibid., 31. ↩︎
This rather clichéd image of MLVs production may only be applicable to Paramount’s Joinville operation, which according to Charles O’Brien was “unique even among American production companies.” “Multiple Versions in France: Paramount-Paris and National Style,” Cinema & Cie 4 (2004): 82. He also observes that UFA’s MLVs allow more time to rehearsal. ↩︎
This is precisely what happened after 1932, the former site of MLVs was transformed to a post-synchronization center. ↩︎
From a technical point of view, perfect dubbing is only possible with the advent of multi-track sound technology. ↩︎
“Films had in fact been dubbed for abroad as early as 1929, for example Gustav Ucicky’s sound film-operetta Der unsterbliche Lump and Kurt Bernhardt’s Prussian ballad Die letzte Kompagnie”. Joseph Garncarz, “Making Films Comprehensible and Popular Abroad: The Innovative Strategy of Multiple-Language-Versions,” Cinema & Cie 4 (2004): 73. ↩︎
Abe Mark Nornes, Cinema Babel: Translating Global Cinema (U of Minnesota Press, 2008), 139–40. ↩︎
Ibid., 139. Italic mine. ↩︎
Ginette Vincendeau, “Hollywood Babel: The Multiple Language Version,” Screen 29, no. 2 (1988): 33. ↩︎
Jean Renoir, My Life And My Films (Da Capo Press, 2000), 106. ↩︎
Cinema Yesterday and Today, 136. ↩︎
“The Cinema After Babel: Language, Difference, Power,” Screen 26, no. 3–4 (1985): 52. ↩︎
Nornes, Cinema Babel: Translating Global Cinema, 149. ↩︎
Robert Stam, Subversive Pleasures: Bakhtin, Cultural Criticism, and Film (Johns Hopkins University Press, 1989), 76. I had a similarly traumatizing experience watching Crouching Tiger and Hidden Dragon in English. This is not even a case of dubbing as the two main protagonists (Yeon and Chow) speak their own English lines. But the perceive incompatibility between speaking English and the fictional Wuxia world ruined the film for me: Chow may speak English; but Li MuBai must not. ↩︎
Tom Gunning, “Now You See It, Now You Don’t: The Temporality of the Cinema of Attractions,” in Silent Cinema, ed. Richard Abel (London: The Athlone Press, 1996), 71–2. ↩︎
Is it the prohibitive cost (three times of dubbing)? The failure to produce enough box office success? Or something else we yet to know? This failure, like many others in the history of cinema (3D in the 1950s for instance) is fascinatingly revealing about the complex and dynamic nature of cinema. ↩︎
Vincendeau, “Hollywood Babel: The Multiple Language Version,” 29. ↩︎
Ruth Vasey, The World According To Hollywood: 1918-1939 (Univ of Wisconsin Press, 1997). ↩︎
Chris Wahl, “Inside the Robot’s Castle: Ufa’s English-Language Versions in the 1930s,” in Destination London: German-Speaking Emigrés and British Cinema, 1925-1950, ed. Tim Bergfelder and Christian Cargnelli (Berghahn Books, 2008), 49. This essay also reveals that Eric Pommer first experimented with hybrid dubbing (the same cast trained to speak phonetically different foreign languages, and then dubbed by native speakers). Found it lacking, he then became a firm advocator of MLV throughout the 1930s. ↩︎
Mark Betz cites Renoir’s Carosse d’or (1952), Elena et les hommes (1956) and Herzog’s Nosferatu (1979). See Mark Betz, “The Name above the (Sub)Title: Internationalism, Coproduction, and Polyglot European Art Cinema,” Camera Obscura 16, no. 1 (2001): 83. ↩︎
One example is Penélope Cruz in Abre los ojos (1997) and Vanilla Sky (2001). ↩︎
The term refers historically to the practice of having someone say the lines standing out of view but close to the microphone. ↩︎
How exactly do we know if a voice is dubbed? We rely on real world plausibility and an intimate knowledge of an actor’s linguistic capacity. There are easy cases such as Janet Gaynor speaking Japanese, but there are also more difficult ones, where the potential authenticity cannot be ruled out in advance. For years I, like millions others, believed that the dubbed Mandarin voice of Stephen Chow is his own until I read that it belongs to a professional DJ in Taiwan, Shi Banyu. My “false belief” is however well founded, for not only does Chow indeed speak Mandarin (admittedly not so well), the perceived qualities of this voice correspond perfectly with Chow’s screen persona. This personal anecdote seems to suggest that the audience would actually prefer an artfully dubbed voice such as this to an authentic yet linguistically implausible (and poorly faked) voice such as Okada Eiji’s French in Hiroshima mon amour. ↩︎
Quoted in Katherine Spring, “To Sustain Illusion Is All That Is Necessary”: The Authenticity of Song Performance in Early American Sound Cinema,” Film History: An International Journal 23, no. 3 (2011): 295. ↩︎
Ibid., 293. ↩︎
As a counterexample, the majority of Hollywood films featuring Chinese tend to use Cantonese, regardless of what the diegesis requires (two examples come to mind: The Bitter Tears of General Yen (1933), and The Cat’s-Paw (1934)). The complete linguistic inauthenticity contributes much to the ludicrous and grotesque effect of such representations to a Chinese audience. ↩︎
Quoted in Nornes, Cinema Babel: Translating Global Cinema, 143. ↩︎
Vincendeau, “Hollywood Babel: The Multiple Language Version,” 35. ↩︎
French cinema has an interesting “policy” regarding non-native speakers of the language: it demands perfect syntax and rich vocabulary yet allow foreigners, especially actresses, to keep their foreign accents, which I suppose sound exotically charming to French ears. Romy Schneider, Jean Seberg and Anna Karina are in this category. ↩︎
Vaud is Simon’s native town in Swiss. Note when he counts 90, he says nonante, instead of quartre-vingt-dix. ↩︎
Chion hypothesizes that generations of the audiences of this film did not have an issue because the sound before restoration is really muffled and therefore obscured. He also remarks that in his numerous appearances in French cinema and television, except in this film, Michel Simon does not again use his original dialect—the underlying statement being: if he had done this often, it would have been noticed! See Chion, Le complexe de Cyrano, 21. ↩︎
One interesting work of social linguistics is Michaël Abecassis, The Representation of Parisian Speech in the Cinema of the 1930s (Peter Lang, 2005). The book studies five French films produced at the end of 1930s (fric-frac, circonstances attenuates, le jour se lève, hôtel du nord, and la règle du jeu). The method is largely statistical and one of the primary objectives the author sets out to demonstrate is the quantitative difference between the vernacular speech and the so-called upper class speech, or standard French. I would be interested to know if these films are representative of the period (they sound rather canon like). More importantly, I want to know if, compare it with another study of the later period (for instance 1940s or 1950s), the proportion of standard French spoken has increased. In a recent collection Abecassis contributes another essay that compares these 1930s films to the languages spoken in contemporary French cinema and coins the term “plurilingualism”. Michaël Abecassis, “The Voices of Pre-War French Cinema: From Polyphony to Plurilingualism,” in Polyglot Cinema: Migration and Transcultural Narration in France, Italy, Portugal and Spain, ed. Verena Berger and Miya Komori (LIT Verlag Münster, 2010), 33–48. ↩︎
Kozloff, Overhearing Film Dialogue, 25. ↩︎
Marie, “The Poacher’s Aged Mother: On Speech in La Chienne by Jean Renoir,” 220. ↩︎
It is important to add here that this phenomenon is not restricted to cinema but rather applicable to all media that have a speaking part (e.g., theater, radio). Also, in silent films dialects are already present in the form of intertitles. ↩︎
Jonathan Munby, Public Enemies, Public Heroes: Screening the Gangster from Little Caesar to Touch of Evil (University of Chicago Press, 1999), 44. ↩︎
Ibid., 59. ↩︎
Kozloff, Overhearing Film Dialogue, 151. ↩︎
Ibid., 113. ↩︎
The film is often remembered as one unique (if not entirely successful) pacifist sound film made in the Weimar period. The Nazi government thought they destroyed all copies of this film. But it survived and was restored in 1969 by Maurice Zouary, to 69 min, and recently further restored by the Archives françaises du film du CNC, Bois d’Arcy into 81 min. ↩︎
Chris Wahl, “Discovering a Genre: The Polglot Film,” Cinemascope 1 (April 2005).This is an adaptation of chapter five of his dissertation: Das Sprechen der Filme. Online via [http://d-nb.info/970359101/34]{.underline}, accessed 2012-9-17. ↩︎
This seems to contradict my previous claim about MLVs’ lasting appeal. But what is essential in our discussion of MLVs is its role as a major solution to the sound film’s foreign distribution problem. It was once a paradigm to be followed en masse. Nowadays it is mostly a rarity that acquires value from its very exceptionality, like shooting in black and white in the 1990s, or making a silent film in the 21st century. ↩︎
Max was at the gate of Lily’s apartment. Not willing to leave, he cuts a button off his coat and asks Lily to mend it for him. The whole episode is shot without any diegetic sound or dialogue, although obviously he has to call her and explain the situation. ↩︎
Lost in Translation (2003) is a recent well-known example using this strategy. ↩︎
Noël Burch, Theory of Film Practice (Princeton, N.J: Princeton University Press, 1981), 96. ↩︎
Gombrich quotes this phrase in both Art and Illusion (1977, 235) and The Image and the Eye (1994, 78). In the latter he adds immediately, “alas, painting not only lacks speech, it also lacks most of the resources on which human beings and animals rely in their contacts and interactions.” The sole mentioning of the voice therefore suggests an unusual urgency the lack of voice may entail. ↩︎
It needs to be added that for a long period of time (1-3 year) a child may have this tendency to frequently regress into the non-verbal stage, as if the acquired language suddenly disappears. Even adults can be “speechless.” ↩︎

Cinema Learns to Speak

Contents

Deep Dive into the Fourth Chapter: Cinema Learns to Speak

Historicizing the Perceived Authenticity of Human Voice

All that Utterance Allows: Defining Perceived Authenticity

Raising the Voice: Vocal Performance Before 1927

MLVs and the Decline of Voice’s Authenticity

The Taming of Voice: From Freedom of Speech to Speaking Degree Zero

From Polyglot Films to Polyglossic Cinema: Language Strategies and Aesthetic Effects

Conclusion