Hearing the World of Film

Deep Dive into the First Chapter: Hearing the World of Film

Produced by Google NotebookLLM

0:00

The Emergence of Audiovisual Diegesis

Once upon a time in a movie…

First we hear the creaking wooden door; then comes the always cringe-inducing chalk scraping on blackboard. A man who appears to be the station agent stops his marking—“delays to/from Flagstone”—and turns his head to look. The camera, tilting upwards from a shot of a pair of boots, reveals in an intensely deliberate fashion a duster overcoat, a sawed-off shotgun and finally a greasy face. The wind, present since the very start, now seems to pick up its pace; we can almost imagine seeing clothes flapping furiously in the wind when camera is about to find the first gunman’s face. There are more creakings from the metallic hinge. A series of slow and similarly deliberate movement sounds, such as the strident sound of boots scratching the floor, accompanies a wider shot of two other gunmen. And we start to notice other sounds—again, already there since the beginning—that we have not seen the source: birds chirping (We do not see the bird until one man holds a birdcage in close-up.); an eerie sound that we later realize comes from the windmill (we associate this sound with its source in a latter shot where the squeaky sound becomes noticeably louder, signifying the proximity of its source); hens clucking—we never get to meet the hens.

The camera pans left to an establishing shot of the scene. Hollow footsteps accentuate one of the gunmen’s pacing forward. An Indian woman mumbles as she tries to exit, but gasps when stopped by the first gunman. The station agent now speaks in a ridiculously cheerful voice, breaking the menacing lack of speech. But the leader of the gang grabs his neck and squeezes out of him the sound of a rooster. The three intruders remain silent. Nevertheless they exert an almost unbearable sonic presence through the many close-up sounds carefully laid out on the soundtrack: the obviously exaggerated sound of tickets flipping in the air, the funny and questionable noise made by one man to the bird in cage, the unbelievably loud shutting of the metal safe (with the station agent inside). This is followed by three sounds of whip crack (sometimes referred to as non-diegetic sound effects) and a title card, “A Sergio Leone Film”.

Having secured the station, the three men disperse to take their positions: one sitting in a rocking chair beside the window; one under the water tower; the third, to the opposite end of the platform, where a water trough is located. Together they form a triangle with the station in the middle. Each is given unique sonic motifs instantly recognizable even with eyes closed. The first man: the soothing pattern of rocking chair is interrupted by the annoying clicking coming from the telegraph machine, eventually silenced in one violent gesture, only to be replaced by the fly, whose buzzing persists even after being captured in the pistol barrel. The second man: the splashing of condensed water from the bottom of the water tank on his forehead, which he immediately turns into in a much more subdued tone, to his satisfaction, by putting on his hat. Eventually it is also the splashing sound that tells him the hat is full and he drinks from it. The sounds associated with the third fellow form an almost poetic sequence: distant wind, gentle tapping of water, a dog’s whining, a man cracking his knuckles. When he stares at the horizon, there is a POV shot of the landscape that seems to show white smoke behind the mountain. The effect is such that in the meantime we seem to hear barely audible train sounds in the distant. Yet one cannot be sure—this could be an audiovisual illusion—the character in the film seems to share the same suspicion and soon concludes that it is just clouds and winds.

With the arrival of the train, the sparse yet highly foreboding “musique concrète” of the previous nine minutes suddenly increases its intensity. Now the sonic background is dominated by the heavy breathing and thumping of the locomotive (the windmill no longer audible). The only sound that is able to penetrate this rhythm and consequently alerts the three men is that of the sliding door of the mail car: it opens and closes, and a large box is thrown out, thumping to the ground. The three men no longer have their individual noises; now they (and we too) are united in one intense acoustic expectation. A loud whistle and release of steam: the train finally departs. The three men are about to go back to their waiting again when a harmonica tune emerges, playfully anchored inside the diegesis. It is completely free of environment acoustics (usually indicating extradiegetic status); yet the three men’s apparent reaction testifies to its diegetic status. Finally the whole orchestra is brought in, which complicates the matter even more. A few words are thrown back and forth—this is the first time we hear the three men talk—and the voices sound surprisingly banal and much less expressive than the mumbles, squawks and grunts. Gunshots, bodies falling, horse whining. The windmill sound triumphantly re-emerges from this chaotic burst of sounds of violence, as if nothing had happened.

What you have just “heard” is the opening of Once Upon a Time in the West (1968). As a canonical case of genre revision and innovation, the film has been widely recognized for its visual achievements. Clearly, its aural aspects are no less stylized and intriguingly effective: the sequence described runs almost thirteen minutes long with only several lines of dialogue and a tiny bit of music towards the end; yet it is able to not only sustain itself through sound, but to excel in its sonic dimension. Similar observations can also be made for other sequences later in the film.

Sound cinema, Michel Chion suggests, is overwhelmingly verbocentric. Nevertheless words surely cannot be, or even begin to describe (witness my futile effort above) what sound cinema is all about. There are sequences in films or even entire films that either keep the use of dialogue to minimum or reject them entirely; yet they achieve the kind of acoustic intimacy and immediacy that only sound cinema is capable of. These instances (called laconic films or mute films by Chion) are not to be confused with silent (or more appropriately, deaf) yet visually loquacious cinema. The mute or laconic films justify their transgression of the verbocentric norm by featuring unusual protagonists who cannot speak (animals or prehistorical humans), do not want to speak, or alternatively are engaged in some “unspeakable” affair.¹ Consequently their soundtracks promote non-verbal communications that are carried out by music and sound effects.

The point of having an almost excruciatingly detailed description of how a classical laconic sequence sounds—many such close hearings will be offered throughout this dissertation—however, is not to congratulate the sound team of Once Upon a Time for having done such a good job. Instead of the artistic choice in foregrounding sound effects, or the particular sonic style exhibited in such choice and its hyperbolic execution, this dissertation is more concerned with finding a way to understand how the audience experiences it. In other words, I am interested in the ways in which sounds and images come together to offer the audience a sense of presence, an intuitive access to the filmic world. It is to the concrete presence of this world in the Leone film, this palpable spatiotemporal continuum accentuated by sound that we owe this intense yet strangely meditative atmosphere, this anxiety and anticipation of death.

How does the film offer its audience a world? I believe that this is at once an issue of film theory and film history. In this chapter I propose to use the term audiovisual diegesis to describe an aspect of this world. I shall discuss the term in detail in the section that follows. But briefly, it is a concept that builds on the notion of diegesis, with an important qualifier: audiovisual. By using this qualifier, I mean to not only reconceptualize the term in relation to the filmic world but also endow the term with a historical significance and offer it as a way to characterize the paradigm shift in cinematic experience initiated by the advent of sound. The chapter will first give a brief exegesis of the term diegesis and a recapitulation of its recent problematization—of which the notion of audiovisual diegesis can be partly regarded as a theoretical solution. Instead of an ahistorical notion of diegesis I argue that we need to acknowledge the dynamic nature of the term and a historical evolution of its meaning. Most importantly, we need to acknowledge sound’s pivotal role in this evolution. What does audiovisual diegesis mean in terms of sound-image relation? Here I explore the rhetoric of the sound-image with a new set of terminology. Finally I will offer a detailed comparative case study of M and Kameradschaft where both the historical and theoretical sense of the term can be better understood.

From Diegesis to Audiovisual Diegesis: a Short Exegesis

The term diegesis, its adjective diegetic, and its many prefixed forms have become part of the standard vocabulary in film studies. Despite its obvious pedagogical value and positive contribution to our understanding of cinema’s unique mode of engagement with fiction, the term has never been rigorously defined. We customarily speak of the diegesis of a film, of what belongs or doesn’t belong to this diegesis, yet the moment we start to investigate this diegesis with exactitude, it appears to collapse under its own weight, under the conflictual interests of its many connotations: the story, the space, the subjective/emotional life of characters. Each of these makes demand of different things from the diegesis and formulates its coherency in different terms. Added to this difficulty is the fact that the diegesis doesn’t exist as a fact but as our mental construction. This is a process that relies heavily on the elusive cognitive faculty of inference; but it also unfolds only in time. Instead of the diegesis of the film, it makes more sense to speak of a diegesis that takes shape from moment to moment. Recently, as I shall show, the term has also been challenged repeatedly for its ineffectiveness or messiness in dealing with complex sound-image situations in contemporary films, notably when music is concerned.

Does the notion of diegesis still hold some value? To what extent its meaning has been revised to cater to the needs of different critical discourses? How is the idea connected to the central theme of this dissertation? In the following I shall give a brief review of how the term has been used and what my theoretical proposal intends to achieve.

The etymological history of the term diegesis has been traced extensively in recent scholarship.² Suffice it to reiterate several key points of this history and clarify where I stand. What the term currently means in the discipline of film studies bears little resemblance to its etymological origin: the Greek word used by Plato and Aristotle that refers to the act of telling as opposed to mimesis, the act of showing. The modern use of the term is commonly attributed to Étienne Souriau,³ who seeks to define the cinematic phenomenon in precise (what is understood as “scientific” by filmologists) terms. Souriau proposes that each film offers a unique “filmic universe” which structurally speaking consists of a concatenation of seven levels⁴ of which diégèsis is but one. For Souriau this level corresponds to “all that belongs, by inference, to the narrated story, to the world supposed or presupposed by the film’s fiction.”⁵

Despite the ambivalence of this definition, the term proves to be of high value in different kinds of discourses about cinema and literature. Earlier discussions of the term (as well as its Greek origin) focus on how the term is either related to narration or exemplifies a certain mode of narration. Gérard Genette’s appropriation of the term establishes its current status in modern narratology. In film studies David Bordwell defines the diegesis in narrative terms: “the diegesis is the total world of fabula—its spatio-temporal frame of references, its furnishings, and the characters that dwell and act within it.”⁶ Edward Branigan’s definition doesn’t include a Russian formalist shortcut, but the idea is nearly identical: “the implied spatial, temporal and causal system of a character— a collection of sense data which is represented as being at least potentially accessible to a character.”⁷ This definition remains a model of consensus on how diegesis is understood in cinema studies.

The term’s entrance into film sound studies, however, proves to be problematic from the start. When Daniel Percheron,⁸ Bordwell and Claudia Gorbman⁹ employ the term to describe the work of sound in cinema, they simply take the notion of diegesis as a given: diegesis is already defined; now how can the sound fit in (and preferably conveniently classified)? In these mapping exercises, sound is first defined in relation to the image, and then to the diegesis. Percheron’s essay is explicit about this point: sound is firstly either on/off screen; and then, only when off-screen, diegetic/extradiegetic. This mapping effectively excludes the possibility of sound that is anchored to the screen but somehow doesn’t belong to the diegesis—precisely where problems abound.

More recently, a new generation of scholars whose primary concern is film music has entered the scene of debate. These scholars find the binary opposition of diegetic/ non-diegetic increasingly problematic in contemporary films (and then realize that old films have the same problem!). How do we characterize music or sound that is narratively anchored to the diegesis (e.g., understood to come from the car radio) but clearly cannot possibly be emitted from the diegetic space? (It sounds way too perfect) How can a piece of music move smoothly, imperceptibly from a diegetic source to a full-fledged orchestral recording that cannot be justified diegetically? Where is the boundary? Is the distinction necessary? Consequently various solutions have been proposed. Caryl Flinn rejects the term without offering any alternative.¹⁰ Anahid Kassabian argues that we should instead adopt the terms used by the film industry (source music, source scoring, pure and dramatic scoring).¹¹ Robyn Stilwell proposes to blur the boundary by calling the vast gray area between diegetic and non-diegetic “the fantastical gap.”¹² Jeff Smith opts to maintain the current reading but advise a more careful use of it.¹³ David Neumeyer offers his refined model which retains the dichotomy while differentiating between three stages of its application.¹⁴

Still others set out to revise the term with prefixes. Following Genette, Claudia Gorbman already uses the term meta-diegetic to describe the music “heard” (subjectively) by one character but not heard by the others.¹⁵ In the context of game audio Kristine Jørgensen proposes the term transdiegetic to account for the complex interactions between function, location and referentiality in computer games.¹⁶ Ben Winters tries and succeeds to a certain extent to revise what the term diegesis mean in order to accommodate the music’s presence better (by going back to Souriau and drawing from Frampton’s filmilosophy). Like Neumeyer, Winters offers a refined narratological model where a intra-diegetic level sits between the extra-diegetic and the diegetic.¹⁷

The current debate about the applicability of diegetic qualifier to sound (especially music) testifies to the coming of age of film sound scholarship as it exposes the roughness of previous theoretical operations. I have said in the introduction that it is time to find sound a better place in film theory. This entails, for this chapter, a reconsidering of the meaning of diegesis as informed by sound. Instead of conceiving diegesis exclusively in a narratological sense, I inquire about the nature of the audiovisual experience that cinema presumably offers. Although decidedly an academic jargon,¹⁸ the success of the term in pedagogy testifies to its intuitive applicability: it makes sense to talk about a filmic world that is the result of our mental efforts in collaboration with concrete images and sounds presented by the film. The diegesis is a concept that any movie audience can relate to her own percept and, if shown helpful in articulating complex sound-image relations, accept to use. But that doesn’t mean diegesis has fixed meaning across different media or different historical periods in the evolvement of one certain form of storytelling.

Notwithstanding the anachronism—the term diegesis was only reinvented in the 1950s by French filmologues—the benefit of applying the notion to sound/silent films, a Proust novel, or a Greek tragedy is obvious. Broadly speaking any form of storytelling—be it strictly oral, in written words, in mimetic gestures or in any hyper-futuristic way we have yet to encounter—would take this cognitive orientation as fundamental: it supplies a stage where time, space and causality can play their roles. We might therefore agree with Roland Barthes when he claims, “there exists a diegetic form common to different arts.”¹⁹ But in each art, the formation of diegesis needs to depart from different configurations of raw sensorial materials. Reading a novel on paper may allow the reader to construct a diegesis. But the sense of diegesis involved here is not identical with the one an audience (even if the same reader is involved) might get from looking at moving images and hearing sounds arranged in a certain way (even if the same story is involved). The nature of perception as well as that of inference has changed; so should the meaning of diegesis. The advent of sound therefore necessitates a reformulation of this sense of diegesis, because the audience’s cognitive faculties are radically reconfigurated to adapt to what the sound cinema offers in terms of stimuli.

I propose that we call this newly forged mode of cinematic perception the audiovisual diegesis for it draws from both visual and auditory perception. The first criterion of the audiovisual diegesis would be that it consists of non-imaginary objects imposed on the audience. In other words, what is involved here is a direct perception of images and sounds. The audience may still be able to pay selective attention to various details, but the nature of this form of mental engagement is decidedly distinct from phantasy, hallucination or dream. A sound/talking cinema, one might say, has always existed in the audience’s imagination. ²⁰ But hearing actual sounds is different, for better or for worse, from imagining hearing sounds (a common phenomenon in silent cinema experience) from watching visual events that have strong sonic implications.²¹

Simply presenting these raw audiovisual materials (such as the ways in which sound is presented in silent cinema), however, does not necessarily result an audiovisual diegesis. In order to arrive at this percept, the active cognitive efforts on the part of the audience need to be channeled in a certain way, namely, geared toward perceiving concrete entities—among which human figures and animals feature prominently—that exist in time, space and connected by causality (a much more precise term than inference). When multiple sensorial faculties are involved, it is especially important to take into consideration of how sights and sounds interact with each other to arrive at a common understanding.

The main point of my proposal is that diegesis may have a narratological component, but that it should be conceived first and foremost as an access to the filmic world and ultimately a form of worldhood²². This understanding of the diegesis is not completely new, as Souriau’s definition already mentions it (the world supposed or presupposed). But Souriau’s definition wavers between the two: the world and the story. Christian Metz’s definition displays a similar indecisiveness but replaces the “world” with “denotation”: diegesis “designates the film's represented instance… that is to say, the sum of a film's denotation: the narration itself, but also the fictional space and time dimensions implied in and by the narrative, and consequently the character, the landscapes, the events and other narrative elements insofar as they are considered in their denoted aspect.”²³ The literary lineage of the term is so prominent that it may have obscured the fact that storytelling is not always essential to cinema. Take for instance the opening sequence of Once Upon a Time in the West, which I described in length in the opening of the chapter. Indeed, the example belongs to narrative fiction film where a story does exist. But I would argue that the sequence engages the audience in a worldly way that is before and independent of any narrative intervention—it gives pause to the narrative progression to facilitate the audience’s absorption of the filmic world. By saying that sequences like this offer us an access to the world, I mean to say that they offer an immediate, quasi-automatic sensorial gestalt. The act of inference, narrative or otherwise, may occur simultaneously or after; but it is not always required. In extreme cases, the narrative intervention may even be effectively evacuated from the audiovisual diegesis.²⁴

Once Upon a Time in the World: from Sonic Attraction to Full Diegetic Effect

The new arrives unnoticed.
--Shklovksy²⁵

Throughout its history, the circumstances of moviegoing have changed considerably and what the cinematic experience means to the audience is by no means a term of fixed connotation. Yet the notion of diegesis as currently formed in its structural and narratological sense does not include, and is not even compatible with, a historical dimension. What does the term diegesis entail in terms of conceptualizing the filmic experience of the so-called silent era, regardless of whether any member of the audience or film industry has ever heard of or is inclined to use the term? How does that experience differ from what we now understand diegesis means? What is sound effect’s role in this transformation? To be precise, what are sound effect’s different roles in the historical evolution of cinema? These are the key questions that I raise in this section. By sketching out in broad strokes a trajectory of the different ways in which sound effects function in cinema I hope to not only affirm the notion of diegesis as a pragmatic conception of filmic experience but also propose its indispensable role in the evolutionary history of cinema.

The topic of silent film sound is too expansive to be treated exhaustively in this dissertation, let alone in this chapter. What is undertaken here is a very brief historical account of the use sound effects in early and silent cinema, which I believe is crucial in establishing the many different conceptions of how sound functions in relation to the diegesis, so as to make it clear that the advent of audiovisual diegesis does offer something that is significantly different. In the following I will give a brief survey of how sound effects were practiced in the late aughts, focusing on two modes of their functioning: the sonic attraction and the generic sound. These concepts will then be used in the more detailed studies of individual films that follow.

The first problem one encounters, if a history of sound effect is to be undertaken, is where to start. While only recording technologies can assign sound effects a permanent seat in the soundtrack, thus make them into tangible objects for our study, sound effects are present in an ephemeral form in the so-called silent era. They may not have a comparable presence as music does, but they are far from negligible. In fact, historical evidence indicates that there exists a period when sound effects were widely used in movie exhibition, which I call the first golden age of the movie sound effect. Film historians have not reached a consensus on the precise boundary of this golden age. Rick Altman uses the term “late aughts” to vaguely define this prosperous time. Describing it as “a love affair,” ²⁶ Carol Hamand claims a whole decade between 1905-1915, approximately the entire Nickelodeon period. Stephen Bottomore suggests a much shorter one: from 1906 and already passed the high-water mark by 1908²⁷. According to Bottomore, even in such a short period the movie audience becomes has become habituated to sound effects due to overexposure. By 1909 there are already claims of their indispensableness, that seeing events that have strong acoustic implications without accompanying sound effects becomes unnatural—just like a modern day audience would believe.

To use live sound effects in all kinds of entertainment venues, where cinema is but one among many in this raucous turn of the century, is of course not exactly a radical idea. But it pays to notice that the call for sound in the case of cinema has a special flavor. Gorky describes the first few Lumière screenings as “soundless spectre.” Deeply impressed, he nevertheless complains that, “all this in strange silence where no rumble of the wheels is heard, no sound of footsteps or of speech. Nothing. Not a single note of the intricate symphony that always accompanies the movements of people.”²⁸ For Gorky, the lack of sound effect in these screenings is all the more intolerable precisely because the images are so real. This sentiment, I believe, is shared among many of his contemporaries, including Thomas Edison. Around 1888 Edison has envisioned “an instrument which does for the eye what the phonograph does for the ear.” A decade later he reports back with a new research direction, “I have already perfected the invention so far as to be able to picture a prize fight—the two men, the ring, the intensely interested faces of those surrounding it—and you can hear the sounds of the blows.”²⁹

The prizefight seems to be a popular genre in the 19^th century cinema, and sound effect’s presence there is firmly requested. We might reasonably assume that the presence of sound effects is perceived as indispensable for some genres, superfluous for others, and indifferent for the rest. Around the same time when Edison was busy experimenting with mechanically synchronized blows—notice how his work is at this point geared toward sound effects instead of the human voice or music—a film critic in Sydney suggests how a fight film might be improved,

The two phantom pluggers plug each other without making a sound like lost souls blackening each other’s eye on the plains of Acheron. If the management would hang up a piece of beef somewhere, and smite it with a bat every time a hit is made, it might make things more realistic, always provided the beef was smote at the right moment.³⁰

What is common among these three instances (Gorky in Paris, Edison in the US and the anonymous critic in Sydney) is the idea that the audience of early cinema had felt, from the very start and globally, the desire not only to hear sounds, but also to hear certain sounds in a certain way. However, the actual state of sound technology at the time had not been able to fully accommodate that need. Sound effects indeed prospered. But despite its apparent prosperity, the early practice of sound effect has a lot to desire in terms of precision and believability. Gunshots are perceived as misfired if they come before the smoke; and the increasingly critical audience finds it unacceptably annoying that “both cars and trains were often given exactly the same sound effect of a motor running”³¹, or that “the quick, sharp ring of a hoofbeat on a hard road” is made same with “the hoofbeat on a sandy road or on grass ground.”³² What really crushed the practice, however, is its acoustic obtrusiveness and lack of narrative relevance. A practitioner of the craft admits in 1909 that “too much noise gives your work the appearance of horse-play and it is far from pleasing to the ladies in the audience.”³³ George Beynon, an early proponent of film music, recalls ruefully “Cowbells, sand-blocks, wind machines and traps of all descriptions are frequently brought in at every possible junction. In fact, a drummer is sometimes judged by his agility in handling, one after the other, every contraption around him.”³⁴ In order to demonstrate this “agility in handling”, the drummer—our very first soundman—has adopted what Noel Burch calls a “topographical”³⁵ approach that searches the entirety of each image for “a pretext for virtuoso displays of sound.”³⁶ Altman coins after the amusing anecdote³⁷ about one excessively diligent soundman the term “the canary effect” or “the cowbell effect” to describe the unfortunate occasion where a bird or cow in the background, having little narrative significance, is given a tremendous sound that is perceived as doing violence to the story.

\[s\]

showcases its technological basis and emphasizes unfamiliarity as the source of thrill. Adding sound effects to picture show, therefore, was simply adding attractions: the more the merrier. For movie houses, which were intentionally modeled after fairgrounds, being noisy is not a vice, but a virtue. In this sense, sound effect does appear to have, to use Gianluca Sergi’s phrasing, a “bastard origin,”³⁸ as compared to speech and music’s nobler lineage (theater and concert hall).

In retrospect the abuse of sound effects in the late aughts can be regarded as a chaotic stage of experimentation from which a new “norm” is born. It is an institutionalization process by which the range of sound effects is being narrowed down. But more important than this selective reduction is a change of mode in which sound effects are supposed to function. Previously, in the attraction mode the sound effect did not have a purpose other than to assert itself loudly; like a fairground barker, it needs to compete, instead of cooperating with its neighboring attractions. It is not a coincidence that the taming of sound effects occurs at approximately the same time with the transition to narrative integration; it is part of the transition. As the Nickelodeon period’s answer to the question of necessity gravitates towards a limited set of sound effects that have “a psychological bearing on the situation as depicted on the screen”³⁹, for the first time, sound effect is related not to a “picture”, but to a “situation”.

I propose to use the term acoustic protocol to describe this new mode of sound effect. An acoustic protocol differs from a sonic punctuation, which is yet another mode of functioning for sound effect that film sound inherits from previous and concurrent stage sound practices, notably vaudeville. A sonic punctuation consists of simply highlighting the timing inherent in visual spectacles through acoustic means (think about how the vaudeville drummer works); what the sound is, on the other hand, matters little. An acoustic protocol, on the other hand, actually suggests an indexical link between what we hear and what we see. Yet this audiovisual link is not presented as belonging to any specific, tangible event. Instead it has an abstract or generic nature, a sort of acoustic signifier if you wish. Indeed, my definition of the acoustic protocol bears some similarities with a term that has been proposed by Rick Altman,

generic sound: sound that clearly represents a specific, easily recognizable type of sound event, but without salient particularities. Usually used semi-sync with a generic long-shot image, (for instance, of a crowd, a street scene, a race, or a battle), generic sound is often chosen from a sound library and arranged as a sound loop.⁴⁰

While Altman’s definition seems to point to a very specific type of ambient sound that is commonly referred to in the US film industry as the walla,⁴¹ I have in mind a rather different idea of the term. A generic sound in my definition can be a precisely synced sonic event. What is generic here is not the fact that the sound can be reused in other circumstances (it certainly can), but its mode of functioning shows an indifference to the sound’s concrete acoustic properties, a lack of concern for sound’s particular shape in a specific diegetic space. A good example of this is the locus classicus of sound theory: gunshot. When Christian Metz claims that “nothing distinguishes a gunshot heard in a film from a gunshot heard on the street,”⁴² he is definitely not listening: for from the early rather popcorn-like sound to the much sweetened resonance in a modern soundtrack, gunshots almost never sound the same with a gunshot heard on the street. The spaghetti western already pushes, as we have just heard, the volume of the gunshot through absurd amplification and extends it with long, echoing ricochets. A more recent example such as Johnnie To’s The Mission (1999) exemplifies the kind of process that has become habitual^⁠ thanks to digital sound production.⁴³ Not only is every gun assigned a different sound profile and thus becomes a unique acoustic entity, but in order to increase its impact, the sound of cannon firing is added to its tail, so that the phrase “bring out the big guns” becomes literally true.

Historically speaking gunshots have been implemented with all kinds of contraptions: wood blocks, snare drum, nail gun and blank shot. To a modern audience these sounds, featured in numerous early sound films, often do no sound quite believable. This is not because such audience (for instance the present author) has better direct perception of the real sound so as to be able to make such judgments of verisimilitude (in fact now their sonic references mostly come from movies); it is rather because this audience demands the sound in a different capacity—a gunshot in a modern soundtrack is one particular sound that is tailor-made to sound right for the particular gun being pictured and the particular space where it is fired. This is of course not to say the sound needs to be “faithful” to the actual sound. But undeniably a huge amount of effort is habitually put into the diegetic believability of the sound and its emotional character. In contrast, gunshots in early sound films are rather perfunctory—Thunderbolt (1929) contains a scene where the gun fight is heard off screen therefore the meaning of the scene relies completely on the capacity of the sound; no synchresis comes to help. The only way it can work is if its contemporary audience literally took the sound at its face value. Instead of hearing a specific acoustic event in the diegesis, one is offered a sort of sonic caption that says: lo and behold, a gun is fired!

By definition a protocol is a mere convention agreed upon by all the parties in communication. Yet an acoustic protocol is not the equivalent of pure arbitrariness. A certain amount of acoustic verisimilitude is desirable. One does need, for instance, a pop sound to signify a gunshot; but it does not have to be acoustically accurate. As long as the sound can be easily recognized as what it intends to convey, with minimal training, it can be regarded as an acoustic protocol. Perhaps a good analogy might be the linguistic phenomenon known as onomatopoeia, which demands a certain degree of verisimilitude but still consists of a convention entirely determined by the specific language in question. When one’s finger is caught in the door, one can choose to say ouch, au, aïe, aiya, ite, or one of the hundreds varieties offered by human culture. After all, it is an expression of the pain, not its actual sound.

The notion of acoustic protocol denotes a mode of functioning for sound independent of its implementation. As such it not only persists throughout the teens and the 1920s but also extends well into the transition to sound era. To understand the nature of this heritage, we need to return to the so-called “coming of sound” and to understand its many epistemological implications. What is coming, of course, is not sound per se; nor is it a quantitatively better technical solution to the perennial problem of synchronization. By itself synchronization does not dictate or prioritize any particular kind of sound-image relation. It merely facilitates the faithful execution of such intentions; it allows for, for example, a shift of the responsibility of the creation of such intentions from the exhibition to the production, as the latter has always so desired. Previously, before the “advent” of sound, the sound-image relation can only be suggested by the production; it remains to the exhibition to implement such intentions, to the extent it sees fit. What we call sound technology should therefore be more appropriately called synchronization technology, which offers a way to embed the sound-image relations permanently into the physical medium, which the exhibitor has only to reproduce in a completely mechanical fashion.

When the synchronization technology finally became reasonably acceptable for commercial use in the late 1920s, a small repertoire of acoustic protocols was already well established through the various stage practices (vaudeville, melodrama, opera, theater, etc.), waiting to be put on the new soundtrack. The limited range of such sounds and their generic nature had not escaped the attention of those filmmakers who aspire to salvage cinema from the onslaught of sound. Being one of the most successful in making the transition from silent to sound, and widely regarded as an innovative sound filmmaker, Rene Clair made the following disparaging remarks at the onset of the sound era,

If almost everyone is in agreement on the value of mechanically reproduced music…the same is not true for the noises that are added to the action. The usefulness of these noises is too often questionable. On first hearing, they are surprising and entertaining. Soon they grow tiring. When you have heard a number of sound films and the time of wonderment has passed, you discover, not without surprise, that the world of noises seems much more limited than you would have believed earlier…⁴⁴

He went on to list “the striking clock, the cuckoo calling the hours, the applause in the nightclub, the automobile motor and the dish breaking " among the sound effects that one hears over and over again in those days. To this list of acoustic protocols one might add walla and traffic noises (especially the car honking sound), featured in numerous early sound films to signify the general idea of “a crowd” or “a busy street.” Constrained by a narrow notion of what sound effect is and how it should relate to the diegesis, Clair’s disappointment is completely justified. Instead of opening up to a brave new sonic world offered by the technology, it seems, the so-called sound cinema is at this point content with mechanically duplicating the sounds of the previous two decades.

Nevertheless, there emerged eventually a new conception of sound: instead of commenting on the imagistic diegesis from outside, this sonic sensation positions itself resolutely inside the diegesis. Not only does the sound communicate, as an acoustic protocol does, the semantic meaning of a sonic event, it wishes to give in earnest the full sensorial extent of such an event. What results from this effort is a noticeably stronger sense of presence as conveyed by the film’s audiovisual means. This is what helps to forge a new form of diegesis that is audiovisual. Consider a scene in La petite Lise (1930). Berthier is waiting for his daughter in the hotel room. We hear the sounds of a train close by, but the sounds don’t really mean much—we are not sure if this is another acoustic protocol.⁴⁵ Now he moves to the window and suddenly we hear a loud release of steam—and we see the steam coming from under the window. At this moment an irreducible simultaneity seizes the audience: the scene crystalizes and becomes suddenly palpable. This kind of revelatory effect is reminiscent of Bazin’s praise of having one shot that includes both the parents, the child and the lion. Bazin’s comment about this shot is in fact quite appropriate for the audiovisual moment in the Grémillon film: it “gives immediate and retroactive authenticity to the very banal montage that has preceded it” and that it “carries us at once to the height of cinematic emotion.”⁴⁶

This sonic sensation of releasing steam doesn’t call attention to itself. Instead it calls attention to the presence of the filmic world both viewed and heard. Instead of a redundant signifier (the notion of redundancy sounds strange here as the image of the steam and the sound of it carry very different weight: one light, the other heavy), this sound gives us a sensation that didn’t exist before. The radicalness of this shift of modes of experience cannot be overemphasized. The advent of certain kinds of sound, or to be precise certain uses of sound effect and human voice, has produced a double effect. On the one hand, it makes the image less malleable. Instead of functioning purely on an imagistic basis, they now acquire “volume,” so to speak, and are no longer the two dimensional sheets of paper that one can shuffle easily in front of the mind’s eye. Bazin’s explanation of the evolution of the language of cinema sketches out a similar thesis: “The sound image,” Bazin observes, “far less flexible than the visual image, would carry montage in the direction of realism, increasingly eliminating both plastic expressionism and the symbolic relation between images.”⁴⁷ He singles out 1938 (it is important to acknowledge a belated effect of sound on the norm of editing) as a crucial year in which the expressionistic and symbolic mode of editing give way to the analytic and dramatic mode of storytelling, which can be regarded as a mode of editing images dictated by the integrity of the audiovisual diegesis.

On the other hand, the advent of synchronized sound engenders a reassessment of the moving images. This move is comparable to the leap of faith from viewing still images that, as Gorky famously describes, “stir to life” with motion. In a careful appraisal of a much neglected essay of Christian Metz on the subject of the impression of reality, Gunning argues for the central role played by motion as “we experience motion on the screen in a different way than we look at still images, and this difference explains our participation in the film image, a sense of perceptual richness or immediate involvement in the image.”⁴⁸ Likewise we might say that we experience sound-image in a different way than we look at moving images, and that this difference explains our enriched perception of the filmic world, a gestalt whole that leads to the audiovisual diegesis, forged by the cognitive fusion of images and sounds.

In arguing for a dynamic and fully historical understanding of the notion of diegesis, I am preceded by Noel Burch, whose phrase “there is much more to diegesis than narrative”⁴⁹ sounds surprisingly more cogent than ever. Burch’s position on the matter is first revealed in the few paragraphs shelved under the title of “some terminological indications” in his monograph of Japanese cinema and further explained (although still far from being pursued with rigor) in a polemical but now largely forgotten essay.⁵⁰ Distilling out the non-essential elements Burch’s following insights resonate with this chapter’s central ideas. First, instead of taking diegesis as “a fixed, simple object,”⁵¹ Burch opts for terms such as “diegetic process” (a mental process of the spectator’s absorption and a process of the film’s “writing”) and “diegetic effect” (the result of the diegetic process “whereby spectators experience the diegetic world as environment”⁵²). Secondly, he posits that the effect has a “diachronic” dimension. Burch speculates that there are multiple “thresholds of emergence” in the history of cinema. These thresholds include the first time “the pictures stir to life”, the advent of lip-sync sound and then the introduction of color and other “indices of phenomenal reality.”⁵³ Perhaps to explain the rather slow adoption of color and 3D, Burch adds that these indices are “diegetically trivial”⁵⁴ or “non-pertinent” compared to the audio-visual. In comparison the introduction of motion and the advent of synchronized sound constitute for Burch the only two major thresholds. In Burch’s phrasing, the sync sound achieves for the first time in cinema a “full diegetic effect.”⁵⁵

The term full diegetic effect implies a partial effect that has existed in silent cinema. But Burch has even more ambitious plans for the term as he includes in his discussion the literary world of Balzac, the American television, and what he calls “the imperfectly or weakly diegetic film.”⁵⁶ He characterizes the experience of watching modern US commercial television “a deliberate ‘detensification’ of the diegetic process in favour of a form of ‘induced disengagement,’ a fascinated non-involvement which is several removes in passivity away from ’the spell of motion pictures’ ..”⁵⁷

Most intriguingly, although the synchronized voice is clearly a true game changer in the conversion to sound, Burch suggests that sound effects now play a non-secondary role in giving the diegetic world its full-fledged liveliness. About the lack of speech in The Thief, Burch says the following,

It shows that the presence of synch sound effects—even just background reverberation—is quite enough to raise the diegetic level to perfect fullness: we keep expecting these characters to talk, they obviously have that capacity, we constantly see consequences of their speaking, we are just never there to hear it happen, but the synch sound effects are the guarantee of that potentially manifest presence.⁵⁸

Although Burch’s essay is ultimately ambivalent about the nature of the diegetic effect and its applicability, the idea of serving one concept –the diegetic effect—with at once a historical dimension (the history of cinema as a history of the impression of reality) and a theoretical dimension (the medium specificity of cinema as compared to other media) resonates with mine. In fact, both ideas resonate with Bazin’s famous thesis that the evolution of cinema is driven by a mysterious force of or desire for total cinema, which Tom Gunning interprets as “an image of the world-beyond individual expression, communication of information, or the representation of a single viewpoint.”⁵⁹ Most importantly, if we follow Gunning’s reading of Bazin, this ideal of the “world in its own image” can only be approached by a dialectical process that involves “sublating the processes of illusion, absorbing its techniques, yet also transcending them.”⁶⁰ The emergence of audiovisual diegesis therefore signals an essential step in this dialectical process: it serves as a prerequisite condition for sound cinema’s absorption, sublation and transcendence.

Anchorage and Relay: Audiovisual Diegesis and Its Worldly Possessions

We have examined the notion of audiovisual diegesis mainly through a historical lens, namely, how the emergence of audiovisual diegesis constitutes a radically different conception of sound in the history of cinema. In this section we shall give close attention to the particular sound-image relations that are constitutive to this perception. I use the term audiovisual diegesis to emphasize the active role played by the audiovisual perception in forming the diegesis, as opposed to its narrative-oriented definition. But the only thing that needs to be emphasized here is the sound; the images already got our attention. The exclusion of sound in contemporary film theory—what I call epistemological deafness earlier—is at the core of (and responsible for) the current debate of the utility of diegesis in regards to sound. While most scholars would agree in theory that the mental construction of diegesis takes input from the images and the sounds, in practice virtually everyone tend to refer to the diegesis solely in imagistic terms and then attempt to find a place for sound in this construction. Instead of acknowledging sound as part of the construction team, current conception of the diegesis treats it as an awkward afterthought. Consequently one needs to argue endlessly whether a sound belongs or does not belong to the diegesis. All sorts of labels need to manufactured in order to position a sound in relation to a diegesis whose existence has not consulted this very sound.

The first step toward solving this problem, as we have taken, is to dynamize the notion of diegesis in its many historical contexts. When we talk about the diegesis in the context of silent cinema, what we actually mean by it is quite different from what we mean in the context of sound cinema. The emergence of the audiovisual diegesis is a historically significant moment because it assigns sound a different role to play in relation to the diegesis. In return for this new role, the sound has revised what the term diegesis actually means. But the differences between the old and the new paradigms can also be approached from a microscopic level: they boil down to the kind of sound-image relations that are potentially characteristic of the different historical periods. To clarify the matter, I am not claiming that sound cinema has discovered any new sound-image relation from scratch—I believe instead that virtually all the possible sound-image relations in sound cinema can find precursors in the “silent” period—but I do hold the view that sound is conceived in radically different ways in cinema after the so-called advent of sound. Now the question becomes: how do we approach this radical shift in a formal way, that is, on the level of analysis?

Although the synchronization technology itself doesn’t necessarily dictate any particular sound-image relation, in reality I would argue that two kinds of such relations stand out as having a decisive impact on the evolution of cinema. These are what I take as two basic possibilities of sound-image relation in sound cinema: anchorage and relay. In retrospect, the advent of reliable synchronization does have the effect of favoring these two kinds of sound-image relations not only because they are previously unavailable (or available but not in a reliable form), but also because they speak to the specificity of the new, audiovisual cinematic medium. Again, I advocate that we adopt a truly dialectical and historically dynamic view of the notion of medium specificity: there is no room for an essentialist view if the medium, as well as its specificity, is conceived as constantly morphing in reaction to technological and cultural contexts. Broadly speaking, while the history of audiovisual media (of which the history of cinema makes one act) can be regarded as a continuous recycling process, admirable for its zero waste policy, different media or different stages of the evolution of a certain audiovisual medium such as cinema can indeed be conveniently identified (if we proceed cautiously) with the particular sound-image relation that it makes the most of. We might use Roman Jakobson’s notion of “the dominant” to characterize what is being favored here, without necessarily excluding all other possibilities brought by the diachronic continuity and the interpermeability of media practices.

I borrow the two terms from Roland Barthes’s analysis of an advertisement image of Panzani spaghetti,⁶¹ although the meaning of the terms is actually quite straightforward. Barthes offers in his essay an attempt at the rhetoric of the image; what is proposed here instead is an attempt at the rhetoric of the sound-image. I use the term rhetoric mainly in the sense that there exist systematic ways in which the image and the sound corroborate with one another in weaving a continuous thread of signification, facilitating the audience’s mental construction of audiovisual diegesis. By discerning how sound and the image anchor or relay each other, I mean to provide new analytical vocabulary for sound cinema’s medium specificity. Ultimately, I intend to offer here an alternative to the widely influential framework of thinking that has been applied to the issue of sound-image relation: parallelism and counterpoint.

The term anchorage identifies a category of sound-image relation where the sound is anchored by the image, and vice versa. Anchorage obviously can and should work both ways. Yet for the majority of sound theorists, the sound always comes to the aide of the image, not the other way around. Sound is, as Christian Metz famously claims, “of a basically adjectival nature.”⁶² Michel Chion, too, proposes that sound constitutes an “added value”⁶³ to the image. This is certainly not incorrect. But this characterization puts sound in a curious position, as if sound by itself is always unambiguous, and therefore ready to clarify vision. Nothing could be further from the truth. Listening to films with eyes closed, or attending what I call the “blind films,”⁶⁴ it becomes immediately clear that hearing the sounds alone is often not enough to decode their meaning in certainty. Without the anchorage of the image, the sound becomes a locus of polysemy, a “floating chain of signifieds.”⁶⁵

Consider the windmill sound mentioned in the opening example. One can indeed agree that the sound offers a third dimension to the image of the mill, which constitutes an added value. The mill becomes more concrete with the addition of sound, which makes it an object in the world instead of a mere visual symbol. The image of the mill, one might say, is anchored by a carefully chosen sound, so that it come alive and distinguishes itself from the rest of the silent image. But it is also important to acknowledge what this image of the mill is doing to the sound—remember in this case the sound is already there when the image emerges: on the one hand, it offers a meaning by which the sound is to be interpreted (it is a mill, instead of whatever Foley tool the sound team may have used); on the other hand, any specific setting introduced by the image imposes upon the sound and reduces its ambivalence, which may be considered a subtraction of its polysemic value. To think that a sound can only add value, or only the sound can add value to the image is therefore problematic.

More importantly, what anchorage produces is not merely an effect of simultaneity, but rather, an audiovisual gestalt that is perceptually more than the combinatory expression of both. Take the most banal form of anchorage, the lip-syncing, which has been often denigrated as redundant and dismissed as a trivial form of realism. Elite filmmakers such as Eisenstein, Chaplin and René Clair hold (at least initially) considerable contempt towards lip-syncing, horrified at the sight of a future cinema filled with the likes of The Lights of New York. To avoid doing the same, while still offering the audience the benefit of sound (especially the human voice), Clair (the first of the three to jump into the water of making sound films) comes up with multiple rather ingenious solutions in his first sound film sous les toits de Paris: either you see people talking behind the glass door so you can’t hear anything or you hear people talking in the darkness so you can’t see anything. Vision and audition, in other words, cannot be granted simultaneously, or it will be accused of “parallelism.” Same contrivances can be found in Alone (1931), which Kristin Thompson praises as one of the few truly “early sound counterpoint.”⁶⁶

While the idea of counterpoint remains a cogent way to describe sound-image combinations, the persuasive power of the talkies demands further explication. How was it that The Lights of New York, together with the two Al Jolson vehicles (The Jazz Singer and The Singing Fool), was able to convert the US film industry (and the world follows) in three years, with apparently no basis in artistic merit? Can lip-sync’s immense appeal be simply dismissed as the salaried mass’s lack of taste in art and insatiable appetite for novelty? With the benefit of hindsight, we may safely say that the extremely influential view that cinema (actually meaning the “cinematic art”) needs to avoid audiovisual synchronicity at all cost is largely mistaken.

The nature of the discursive objection to synchronized talking in cinema is a topic that I feel requires considerable more space to unravel than is allowed here. Suffice to say we need to acknowledge the different degrees of the impression of reality as a crucial factor in the historical development of cinema. If, thanks to Griffith, Porter and many others, the ways in which shots are connected—that which we retrospectively name continuity editing—facilitate the mental process of constructing a coherent spatiotemporal continuum where the story takes place, seeing an event with a plausibly corresponding sound considerably enhances the efficiency⁶⁷ of the process, to the extent that the construction of diegesis now becomes almost involuntary, automatic. Despite the frequently imperfect synchronization and crude sonic verisimilitude, an audiovisual approach to diegesis constitutes a leap of faith towards a perceptual present tense. It signifies a strengthened access to the filmic world. Watching a person talking and hearing the words at the same time is therefore hardly a redundant form of communication, for the simple reason that one experiences the person quite differently in vision and in audition. Contrary to what is commonly believed, the combinatory experience should not necessarily bore anyone, unless the talking itself fails to deliver anything interesting.

While the talking lips are for a limited time regarded as a towering (yet somewhat evil) achievement, sound cinema soon perfects another sound-image relation that is of equal prominence. I use the relay to describe a situation where the sounds and the images work independently to produce a coherent audiovisual diegesis. In contrast to anchorage, relay is hinged on a non-simultaneously; it refers to the ways in which sound and image alternatively come to the fore in the film’s world-building process. A relaying sound can be one that has been or is yet to be anchored—but it can also simply presents itself without the need to be anchored. This is the reason why relay deserves its own category instead of being defined as a (temporarily) lack of anchorage.

What relay brings to the picture is a unique contribution to the filmic world through sound. Some of these sounds are never intended to be on-screen. They enter into our consciousness as purely acoustic presence, without the need for any screen time. It is charged often exclusively with bringing up a particular element of the filmic world to our attention: an element that we prefer to hear than to see. Recall in the Leone film the clucking sound: the sound does not call for, or is called for by, the image. It simply is an acoustic realization of a diegetic plausibility—there are chicken at this train station, but not pigs, while both are plausible inferred from the images alone—that constrains our experience of the film by distinguishing concrete perceptions from the result of active imagination.

In addition to act as ambience, relay can also have a more intense engagement with the images. In the Leone film the sound of the fly is a good example. One feels not quite right to characterize this sound as either on-screen (we can’t really see the fly) or off-screen (it appears to be right there although we can’t see it). Here the case reveals the inherent problematics of the term “off-screen” sound, because to lump all the sounds that do not have a screen correspondence into this category does little to help clarify what the sound is really doing there.

In sound films the two functions—anchorage and relay—constantly alternate. A sound would often be anchored to an image, then relayed by it, and possibly be anchored again. It is important to note that this is a hard-earned achievement of sound cinema. A skillful alternation of these two is a formal achievement—we take it for granted nowadays—rarely found in early sound films. Dynamite (1929), a film that according to Barry Salt “seems almost to be searching for a whole new form for the medium,”⁶⁸ contains an early and admittedly belabored example of such attempt. In the prison wedding scene dialogue, diegetic singing and the construction sound for scaffold can all be heard at once. But to ease the audience into such a daring use of off screen and layered sound, Cecile de Mille first uses a shot of hands waving hammers, then a shot of a prisoner in his celling singing “how am I to know” with guitar and only in the third shot introduces the girl, the prisoner on death row, the priest and two prison officers in medium long shot. The sound of the hammers (implausibly continuing), the singing and finally the dialogue constitute respectively three layers of sound that unfold progressively in a very calculated but awkward fashion.⁶⁹ In the fourth shot, a closer two shot of the marrying couple de Mille outdoes himself by pushing the priest’s voice off screen, thereby having all three layers of sound in the relay position. This precarious burden of the heap of sounds external to the images soon becomes almost unbearable, and the film duly cuts back, after two medium close-ups of the two protagonists, to the establishing shot and then backtracked to the singing prisoner.

The sound’s careful hopscotching between the state of anchorage and relay may be regarded as extraordinary maneuver before 1930,⁷⁰ but it is eventually absorbed into the arsenal of cinema and becomes a standard operating procedure. As a sign of maturity for sound cinema, a modern audience has become so adept at decoding the process that today films no longer feel the need to explicate the status of sound through bracketing shots depicting their sources. Consequently we might suggest that this rhetoric device has become a “classical” one for its simplicity and effectiveness.

As a theoretical lens, the pair of terms also offers some other advantages. Obviously, the pair unravels a basic sound-image relation without being partial to any party. Both anchorage and relay can be used to describe a mutual relation, instead of one simply subordinating to another. Chion’s formulation of film sound’s relation to the image is somewhat wanting in this respect, for his description of the sound’s functioning (e.g., the added value thesis) implies that sounds are semantically parasitic to the images. Despite his impressive work on film sound, unparalleled in scope and depth, Chion insists that sounds are immediately absorbed by the images, each in its own way, instead of being perceived as a whole, namely, as a soundtrack. To fully extend this logic of magnetic absorption—I sometimes visualize it as the screen absorbing a shower of sonic arrows—Chion arrives at his sensational conclusion: there is no soundtrack.⁷¹

Chion’s bold statement does contain some truth about the images’ great force exerted on the sounds, and his characterization of the irresistible nature of synchresis remains a seminal conception of the theory of audiovisual media, but to say there is no soundtrack simply because all the sounds have to work with the images somehow has the unfortunate effect of denying sound’s unique access to the filmic world outside the reach of the images. In essence, Chion’s characterization rehashes the untenable position of conceiving sound first in relation to the images and then to the diegesis. The benefit of placing sound at the center of the audiovisual diegesis, as my theorization proposes, is precisely to eliminate this kind of conceptual malfunctioning: sounds do relate to images; but it is more productive to take this relation as a consequence, but not a condition, of sound’s role in the audiovisual diegesis. Instead of saying sound constitutes an added value to the image, it makes more sense to conceive the pulsating images as a site of meaning generation where the sounds orbit freely. Clearly it is not only by making contact with the images that a sound can be regarded as meaningful. Even when such a contact does take place, we need to acknowledge sound’s inherent volatile temperament. Thus anchorage describes a moment of penetration where a sound goes through a particular point or area on the image, instead of being absorbed and annexed to it. The notion of relay sets the sound free again, so it can hover around the peripheral of images, waiting for the next impact. Very often, some of these wandering sounds are never once docked within the images.

Finally, like the apparatus theory Chion’s formulation of the rhetoric of the sound is decidedly ahistorical: it describes a scenario where sounds are perfectly assimilated by their anchors; not so much when some stray arrows miss their targets. Listening to almost any film of the transition to sound era will show precisely the opposite of a perfect marriage between sound and image: instead of being absorbed smoothly, instantly, and somewhat miraculously by the images, the sounds bump into the images in all sorts of embarrassing fashions and fall apart. For the long period of time leading up to sound cinema’s coming of age, the sounds were having a hard time rhyming with the images, regardless of how they were produced. The following section presents a case study from this critical stage where such difficulties are foregrounded and new relations between sound and image are forged.

M & Kameradschaft: a Comparative Analysis

The final section of this chapter offers a comparative analysis of two canonical films where the theoretical terminology developed in this chapter can be tested and substantiated in a concrete historical context. Conversely, this is also a theory-informed reading of a critical historical moment. Having established the notion of audiovisual diegesis, we might be in a better position to understand not only how exactly films in the transition era sound differently but also what are the paradigms implied in these differences. This mini case study takes departure from (and is undoubtedly indebted to) Noël Carroll’s perceptive analysis of M and Kameradschaft.⁷² I agree with Carroll that the juxtaposition between the two films is extremely informative. Both are sound films, yet the way they sound is different. Acknowledging this difference, Carroll looks for larger aesthetic frameworks that could have informed the ways in which sound is used. Unlike in many other cases where the different stages of applying sound technology can account for sonic differences (it would be unfair for example to compare a Hollywood film made in 1930 to a Russian sound film made in the same year), here the two films are made in the same year, by the same studio (Nero Film), and share a same cinematographer (who certainly knows how to blimp his camera). It is thus reasonable to assume that Lang and Pabst have the same access to sound technology. The difference must therefore reside not in the toolset, but in the mindset.

Yet Carroll’s analysis leaves much to be desired from the sound’s perspective. He describes in extremely vivid fashion the visual parameters of the two films and offers numerous insights on how the stylistic contrasts between two films, especially the roles played by editing and camera movement, have profound theoretical implications. Yet the essay contains practically zero information in regards to how these two films sound—after all, this essay purportedly deals with early sound practice and is collected in a highly respected volume called Film Sound: theory and practice! Carroll does make a claim that M is a “silent sound film” while Kameradschaft is a “sound sound film”,⁷³ which may qualify as a theoretical gesture. Yet what really constitutes the “non-regressive” use of sound exemplified by Kameradschaft? Do they contain different sounds? Are the sounds used in different ways in relation to the images? Is there a new system of sound in place?

Carroll’s analysis is a delight to read; but it is heavily biased toward the visual. His visual memory serves him well by supplying a wealth of accurate and meaningful descriptive details concerning the editing and the camera movement. In the meantime, the auditory analysis of the two films is incomplete at best. Most importantly, Carroll’s general statement about the paradigm shift exemplified by the two films needs a new assessment. According to Carroll, M subscribes to the silent paradigm that treats sound as an additional element of montage (just as Eisenstein and Jakobson envisioned), while Kameradschaft accentuates the physical recording capacity of the medium. Put this way, it is as if the aesthetic choice embedded in Kameradshaft is predicated on the direct recording of sound, a practice soon abandoned on a global scale.⁷⁴ What happens to a modern sound film, one might ask, that uses very little (or in the case of animation, doesn’t use any at all) production track? How does non-direct sound contribute to the cinematic realism? In what capacity can sound function within an entirely constructed soundtrack?

A thorough investigation of the aesthetic and rhetoric implications of direct sound remains to be done. But this rather naive notion that one can simply point the microphone towards reality, which would then be magically recorded on a medium of your choice, is far from tenable. Moreover, if a more progressive—even if the term is never used by Carroll, he almost implies it—conception of sound cinema entails “a commitment to recording”⁷⁵ then the history of sound cinema so far becomes inexplicably regressive, since it follows precisely the option to “reconstitute reality”, an option Carroll associates with silent cinema. The idea of associating or even attributing the impression of reality to the physical recording, which if not faithfully records, at least re-presents reality in a much less subjective way, is an extremely important yet anything but flawless proposal in film theory. Carroll’s thesis, although far from being thoroughly developed, seems to offer the sonic equivalent of an indexical theory of the image, which he then (mis) attributes to Bazin. It is conceivable that Bazin, too, may hold naive notions about the sound recording technology; but at least he has not explicitly suggested it as ontological. This conceptual conflation is, I believe, the main reason that impedes Carroll’s analysis of sound—the sounds are simply what the reality is in front of the microphone! They record; they capture; and they produce “an awesome feeling of authenticity.”⁷⁶

My theory of the paradigm shift differs from Carroll’s in that the notion of the audiovisual diegesis is not predicated on the capacity to record. Instead it is a psychological effect that the film, or indeed any audiovisual medium, can produce on its audience through an increasingly sophisticated system. If a film or a moment in a film gives the audience a unique sense of tapping into the film’s world, it is not because what this audience sees and hears is the recording of something real, a mechanical duplication of another, past reality. What is real to the movie audience is a phenomenon rooted in the present moment of seeing and hearing. To supplement my critique of Carroll, I offer in this section some details of how the two films sound, which may serve as the basis of a full-scale comparative audiovisual analysis. Here I will dispense with any comments on the thematic and visual concerns, not because I regard them as unimportant, but because they have been extensively covered. Despite the canonical status of the two films, however, very little has been written about how they sound. This is especially true for the Pabst film.

According to Carroll, sound in M operates mainly under the principle of montage, which is somewhat equaled to asynchronicity in Carroll’s phrasing. Montage is proposed in Carroll’s essay as an organic principle that unites the practice of sound as well as editing. And he believes that the series of sounds that signifies Elsie’s absence is exemplary in this respect. As one of the most memorable and poignant moments in the film this sequence demands more than a passing remark. And it happens to be the only point where sound is discussed in any detail in Carroll’s essay. Here Carroll’s description is remarkably different from what the current restored version (The Criterion Collection DVD is referenced here) shows. Carroll states that the shots are cut as “further and further away” in the order of the dinner table, the staircase, and a yard in the neighborhood, and claims that the voice is “audibly dropping.”⁷⁷ But in my examination the sequence shows the staircase, a penthouse of sorts for hanging clothes, the dinner table, a patch of earth where a ball rolls, and finally a toy balloon momentarily caught between the telephone wires. The spatial progression in the sequence is ambiguous and somewhat contradictory—one can hardly describe it as “further and further away.” The voice only carries over the first two shots and there is no audible dropping. The rest of the shots are accompanied by silence.

The point is not to fault Carroll for his “misdescription”—a different, unrestored version is very likely involved here—but rather to explore the reasons that might have caused this discrepancy, which I find significant and fascinating. In processing the series of shots of empty spaces, Carroll (or any perceptive modern viewer) may have intuitively grasped their spatial locations within the diegesis, although most of these locations are never shown before or after this sequence. Most crucially, he seems to have conceived the relative distances of these locations in relation to a sound, namely, the mother’s voice. Instead of a stable voice over a series of pillow shots (a pillow shot is a static shot that contains no human figure)—which is precisely what it is—the voice is perceived as a changing sound that is heard from points in space further and further away from its source in the diegesis. This audiovisual understanding is crucial in supplying meaning to the montage, which delivers a poignant message: Elsie has drifted far away from her mother; she is lost. Based on this meaning Carroll may have mentally corrected the actual sequence to its “ideal” form: the shot of the dinner table should come first, and the mother’s voice should drop “audibly” over these shots.

Why should the voice drop audibly? To raise this very question entails a new conception of sound’s immanent quality in relation to diegesis. A modern audience would understand the mother’s voice not as a commentary on the series of shots, that is, a voice over; instead it is understood as belonging to the spatial continuum that these shots suggest. More importantly, the sound builds this continuum together with the shots by contributing to their spatial progression—the effect is such that one can almost hear the echoes that are actually not there. The illusion belies our inculcated sense of audiovisual diegesis, that for us a sound should make sense in diegetic terms almost by default.

However, many of the sounds we hear in M seem to serve a different purpose. Instead of actively contributing to the audiovisual diegesis, these sounds are what we have described as acoustic protocols. One distinct difference between M and Kameradschaft is the abundant presence of these sounds in the former and the complete absence in the latter. The opening scene of M in front of Elsie’s school is a case in point. We may safely assume that this scene is shot in silent. And when it comes to pasting sound onto it, Lang adheres to the silent practice of adding only the most significant sound onto the images. Therefore a honking is offered when we spot a car passing by; a few moments later, another honking startles Elsie. The only other sound offered by the scene is that of the bouncing ball (which apparently doesn’t even make an effort to sound believable) that is singled out from the rest of the busy street. The entire world that surrounds Elsie makes absolutely no sound, for it is regarded as irrelevant to the main action. This treatment (theoretically supported by Arnheim’s partial illusion thesis) can be found consistently throughout the film whenever a street scene is involved.

Earlier in the chapter I have identified the genesis of the acoustic protocol and mentioned its extended use in the transition era. In films such as The First Auto (1927), Blackmail (1929), Deserter (1935) or Crossroad (1937)—these examples are from different national cinemas so as to signal an uneven progression of the timeline—such uses are now identified with the sound being added only as an afterthought, justified by the production circumstances unfavorable to sound. The case of M, however, exhibits a curious persistence of this sort of sound-pasting practice, almost perplexing if considered within the context of German cinema. Kameradschaft may have been released a few months later than M, but during the previous year, several Tobis-Klang films have demonstrated the new possibilities of sound fully incorporated into the audiovisual diegesis. Der blaue engel (Apr 1930), Westfront 1918 (May 1930), Drei von der tankstelle (Sep 1930) may be drastically different in their answer to the problem of sound aesthetics, but what is commonamong all these explorations is the consistent attempt to characterize the world of the film in solid audiovisual terms.

M is a remarkable exception to this trend. Take for instance the police raid sequence that happens a little over twenty minutes into the film. The scene immediately follows Lohmann’s highly graphic report. The previous shot shows a silent scene of the police asking for papers in a low class cafeteria, accompanied only by Lohmann’s last sentence, “despite all these, the police haven’t succeeded in finding anything.” The sentence spills over a little to the next shot, which shows a desolated street where a man dismisses a prostitute’s offer. Now both of their footsteps are not only audible but also credible. The sudden emergence of this audiovisual scene from the silent concatenation of images accompanied by Lohmann’s lecture is perceptually shocking as it effectuates a qualitative change in terms of the perception of diegesis. Instead of mere images, the space suggested by the diagonally composed image, sparsely lighted by street lamps, becomes all of sudden a palpable space with considerable effect of diegetic immersion. One feels instantly propelled into a concrete and specific location; all senses are being activated…only to be floored by the next shot, which depicts another street, presumably in the neighborhood, from overhead. We see people walking; a man is walking his bicycle; there is water on the ground—all these make no sound at all. The following shots are even more outrageous in their lack of sound: a car quickly passes by a door and two men jumps off; another car, full of plainclothes, stops in front of the camera and unloads its police force. The raid sequence unleashes plenty of actions, all in the characteristically silent cinema way, until out of nowhere a whistle and some honkings are sounded. A woman rushes down the stairs to a tavern, yelling “The police!” Sound cinema resumes.

Indeed, while many of M’s sound strategies can be considered innovative even today, the film also contains many vestiges of the modes of sonic accompaniment in the silent era. What is fascinating in M is to observe how some of these strategies are assimilated by the sound cinema and become part of its standard vocabulary. One interesting case comes with, as I already mentioned, the commissar Lohmann’s long telephone report to the minister, which is possibly one the first genuine uses of voice over in sound cinema. Several years ahead of Sacha Guitry’s Le roman d’un tricheur (1936), Lang uses the voice and the subtle change of its quality to make smooth transition between different space-time-locations. First anchored to Lohmann, the voice is then carried over to a series of shots of police investigation, maps, documents. Inebriated with the power to conjure up those images the voice becomes a voice over, and its tone subtly shifts from a conversational one (report to one’s superior) to a full assertive and almost triumphant voice. With the minister’s interruption “this is not enough”, the voice audibly softens and changes back to its humble diegetic origin. The sonic sequence however is halfhearted in its attempt to depict the world in audiovisual terms: most of these shots are presented without diegetic sounds, with the only exception being a dog barking and some walla in the “flop-houses” and underground hangouts. The contingent addition of sound in some of these shots is a telltale gesture of the acoustic protocols in circulation at the time.

I have belabored on the sound of M, but it is much more difficult to talk about that of the Pabst film, for they are much more “natural.” It is for this very reason, I would imagine, that Carroll said virtually nothing about the sound in Kameradschaft. In his review of the film the New York Times film critic Mordaunt Hall singles out the sounds such as “the gurgle of water, the rush of coal and the final crumbling of the black mass and men beating on iron pipes to signal to the rescuers”⁷⁸ as “wonderfully natural” to a contemporary audience. But for a modern audience, an unnatural approach of sound such as the one found in M might have more aesthetic appeal. Indeed, the wonderful naturalness of the sounds in Kameradschaft can only be appreciated by contrasting it with the artificial M, the very juxtaposition proposed here.

What does the term “wonderfully natural” mean? It points to a perception of sound that is not only believable but also intensely savory—the golden standard of contemporary film sound. When comparing M and Kameradschaft on sonic terms, the most striking character of the latter is a conception of sound that refuses to be an afterthought grafted onto the images, one that presents the world in its audiovisual fullness. The film does this through its narratively implicated use of sound, like M does; but it also presents a nascent form of audiovisual diegesis that goes beyond the beckoning of narrative imperatives.

In terms of using a sonic motif that guides the narrative progression, M and Kameradschaft share a common legacy: in M it is the sound of whistling that has led the blind beggar to identify the murderer; it is the sound of Beckert picking the lock that alerts Paul his hiding location. In Kameradschaft it is the beating of the pipe that leads first a French man and then a group of German miners trapped in the mine to be discovered by the rescue team. Both cases can therefore be traced back to the lineage of “sonic attraction films”⁷⁹ where the narrative is explicitly constructed around the circulation of sound.

The circulation of sound in Kameradschaft acquires symbolic meaning by making a character named Kasper playing a key role in all events that can be classified as communicative: being the only bilingual worker in the film he mediates constantly between the two antagonistic nations. It is he who leads the party to the bar; it is he who gets the idea of going to the rescue from underneath; he is the one who breaks the metal gate that serves as the symbolic border between the two mines/nations; and finally he is the one that makes the crucial sound that secured their rescue. What is most notable in this final rescue sequence is a gradual intensification of the beating sounds that are soon joined by muffled digging sounds. While rooted to the diegesis, these sounds constitute a piece of concrete music not unlike the opening of Love me Tonight (1932). All rejoice and dance under the rhythmic sway of this music; and it is hardly a coincidence that the scene immediately follows is the celebration ceremony where a military marching band is playing.

While certain sounds in the Pabst film are tightly woven into the narrative, most of them would go way beyond, as if to illustrate Noël Burch’s notion “there is more to diegesis than narrative.” If one were to make a inventory list of sounds featured in Kameradschaft, virtually nothing on the list is generic: the rhythmic pouring of the coal into the cart, the vibration of the drill,⁸⁰ the sound of metal wrench beating on iron pipes and other sounds that one doesn’t even have a name for. Even the more banal sounds, for instance the footsteps, are perceptibly validated in Kamaradschaft. An attentive listener will be rewarded with different footstep sounds that vary according to the actual surface as shown in the images. When the fire erupts, the families of the French miners rush to the site. Here a montage of the crowd running, a staple in silent cinema, is accompanied by their hurried footsteps that vary subtly from shot to shot. These footsteps convey a solemn urgency that is inconceivable in Eisenstein or Lang. Given sound, the mass of human figures are no longer abstract shapes that move across the screen, they become actual bodies that carry weight in the space.

In the “Bal Kursaal” sequence, where three German workers almost picked a fight with Frenchmen out of a language problem, the footsteps become wonderfully excessive. The norm of shooting such scenes would eliminate footstep sound for it offers no narrative information whatsoever. In addition this is no regular Foley step or generic footstep sound: here the sound of feet shifting according to the music has a unique rhythmic scratching quality to it, which upon first hearing perplexes us for its lack of immediate semantic value. Their very presence contradicts a narrative-oriented definition of diegesis and produces an intriguing effect of non-illusionistic immersion. The case is somewhat similar to Danielle Huillet and Jean-Marie Straub’s Moses und Aaron (1975), where the banal footstep sound of dancers on the sandy ground constantly calls attention to itself and competes with the far-off, idealistic German utterances and the dodecaphonic music of Schoenberg.

In serving a denotational purpose, these sounds are overkill; their uniqueness and concreteness defy the norm of a semantic use of sound; they call attention to themselves by going beyond what is required by the plot. In contrast with an acoustic protocol that is conceived to enhance the realism of an image, here the sound insists its own birth right and it demands the images to solve its mystery: it is a sound in search of images. In Kameradschaft there are multiple instances of this sort of acoustic puzzle. At one point, above the ground, rescue teams are coming out of the elevator. People are being carried out on stretchers. As the camera starts to pan left, we hear a curiously rhythmic sound that we cannot quite understand. This shot lasts 30 seconds without revealing the source of the sound, during which our sonic expectation grows ever more intense. The next shot showing a man lying on the ground hints about the source without revealing it. As the camera moves in we see his face is covered with a mask and his stomach moves according to the rhythm of the sound. It is only in the end of this shot that the camera pans left and finally grants a view of the respirator that produces this sound. The camera’s pilgrimage to reach the respirator serves little narrative purposes other than to delay the revelation of the source of the sound. The sound may not be meaningful plotwise, but it is valued by the diegetic world in terms of its concrete trajectory of anchorage. In building this acoustic puzzle and solving it the audiovisual diegesis gains weight.

A similar situation occurred earlier, when under the influence of Wittkopp the German miners decide to help their French colleague and form a rescue squad. Rather intriguingly, the scene is set in a huge bathhouse where metal chains are seen going up and down while producing stringent sounds. The presence of these sounds calls attention to the puzzling custom to hang clothes high above in the air. And it demands that we seek an understanding of this acoustic puzzle. In the final shot of the scene, which supposedly shows the members of the rescue squad leaving, a large portion of the frame is in fact taken by those metal chains and the soundtrack is entirely filled by the said metallic sound. Initially perceived as a baffling disturbance, this sonic persistence eventually succeeds in bringing a unique and perceivably authentic acoustic presence to the audience.

One might get the impression that the Pabst film simply records what is in front of the camera. But upon close hearing, sound was used with considerable ingenuity and complexity that we normally associate with modern sound cinema. One instance stands out here. It depicts a miner Jean—a sort of leader figure on the French side—trapped under the mine. He beats rapidly on the pipe that enables a German rescuer (Wittkopp) to approach him. Already in a state of hallucination Jean seems to be confused by the very sounds he produces himself. He yells, “Come on fellas! Fire! Open up on them!” And the rapid beating of his wrench changes to the sound of a machinegun. This audio dissolve is accompanied by a cut from his intense face to a lamp that Wittkopp carries. Wittkopp lowers the lamp and advances in a menacing fashion, thanks to his mask and the suggestive power of sound. The film then dissolves into a close-up of the French miner’s face, now a soldier on the battlefield of WWI. Grenades are thrown and explosions are heard nearby: the Germans are charging. In the trench, Jean, now a French soldier, is seen fighting with a German soldier who wears a similar mask. Eventually the film goes from this flashback triggered by sound to its present tense, where Jean and Wittkopp fight underground. Notice though that the sound of the two fighting in the mine precedes the end of the flashback. This means that, while the images are still showing the battlefield, the sound already brings back the film’s diegetic reality, as if somebody wakes us up from dreams or hallucinations by calling our names (a rhetoric of sound frequently found in contemporary cinema). Most intriguingly, when accompanying the fight on the battlefield shots the sound of explosions exists in a diminished form as if to signal its unreal quality. The audience is able to recognize at this moment that the fighting sounds should belong to the mine instead of the battlefield, because its acoustic qualities indicate a close space instead of an open space. This technique effectively articulates a sonic space of the mine, which is then superimposed onto the images of the battlefield. This kind of intricate sound work is rarely heard throughout the 1930s, let alone in the beginning of the decade.

Conclusion

By juxtaposing M and Kameradeschaft, I hope to show how the many differences of the two indeed constitute a paradigm shift. Clearly, this shift didn’t happen overnight. Starting from the late 1920s, the process took more than a decade to complete. The transition period therefore exhibits at once a surprising continuity of practice and a radical break with its past. Both can sometime be observed from different moments in a single film. The so-called transition to sound by no means signifies sound versus silent (fortunately I no longer need to make this point); but neither does it entail one type of sound film superseding another. What matters, instead, is to charter the complex trajectories through which these different uses of sound find their ways into cinema.

Throughout its history, the constituents of cinema and its relation with the world have undergone a continuous evolution. Cinema has never stopped changing, even in its most stable period. While most of the time these changes are gradual, fine-grained and somewhat transparent to the moviegoing experience, the “advent of sound” has been by far the most traumatic event of cinema’s lifespan. Although the period has been approached from many different angles—indeed the period is the most beaten path in film sound scholarship—I feel that key questions concerning the period remain to be raised, let alone answered. This chapter revisits the scene of this traumatic event from a specific (and hopefully novel) angle—the same can also be said of other chapters of this dissertation—namely, the advent of sound as a reconceptualization of sound in relation to diegesis. I argue that in addition to, and partly as a result of, the wide availability of a reliable synchronization technology and the newly recognized immense popularity of human voice, the advent of sound reformulates the terms and conditions in which all types of sounds are heard for movie audience; it redefines the contexts in which these sounds are understood.

What I call the audiovisual diegesis is a new context of hearing against which existing sounds (especially music) need to reconcile. Not only does it exert an invisible force on these sounds, asking “what is your place in the audiovisual diegesis,” but the same question is also posed to the images. The result is that the percept of diegesis, long existing in the silent era, becomes much less malleable. Previously not all images in a film are obliged to contribute to the diegesis—they roam instead around a loosely defined imaginary territory called diegesis. Indeed, key techniques of the moving images such as Eisenstein’s montage trope (e.g., juxtaposition of sheep and factory workers in City Lights) or intellectual montage (e.g., the display of gods in October) hinges on a blatant disregard for diegetic coherency. The advent of audiovisual diegesis made such practice questionable. In Fritz Lang’s Fury (1936), for instance, a shot of women chatting dissolves into a shot of clucking chickens. This juxtaposition, common in the silent cinema, is now perceived as out of place, precisely because the sense of diegesis has been reinforced: it now has little tolerance for extra-diegetic images and at least for a while, rejects any form of extra-diegetic sound.

Audiovisual diegesis may have become a dominant (in the Jakobsonian sense) mode of perception in narrative cinema. But this doesn’t mean that a narrative fiction film (let alone films of other kinds) needs to promote it every second or every sound needs to “belong” to this diegesis. A modern sound film in fact embraces all the sound-image relations possible since the good old silent era. Ultimately, the emergence of audiovisual diegesis did not suffocate cinema as a form of artistic expression. Instead it opens up new aesthetic pathways to the filmic world. In Camera Lucida, Roland Barthes describes how, initially, photography seeks to defy the laws of probability and focuses on what is remarkable to the naked eyes, but that soon it made “notable whatever it photographs. The ‘anything whatever’ then becomes the sophisticated acme of value.”⁸¹ I find here an apt description of the consequence of an audiovisual conception of what cinema is. The sounds are not necessarily remarkable, nor are the ways in which they are juxtaposed with the images. Yet the audiovisual diegesis shows us how remarkable and powerful are even the most banal moments, once transformed into their cinematic incarnation—a key technique I believe of the so-called “slow cinema”. The new soundtrack affords not only a renovated sense of diegesis that is distinctively audiovisual, but also a new sense of immersive auditorship, which is the subject of next chapter.

Some examples that correspond to the above categories are: Jean-Jacques Annaud’s The Bear (1988) and Quest for Fire (1981), Themroc (1973), Naked Island (1960). For Libera Me (1993), any speech would seem to jeopardize the silent determination that amounts to a savage violence. Finally, the fact that nobody speaks adds palpable psychological tension to The Thief (1952), where the protagonist is visibly tortured by his treason. ↩︎
David Neumeyer, “Diegetic/Nondiegetic: A Theoretical Model,” Music and the Moving Image 2, no. 1 (April 1, 2009): 26–39. K. Jørgensen, “Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer Games Revisited,” Game Sound Technology and Player Interaction: Concepts and Developments. Hershey, PA: IGI Publications, 2010. Ben Winters, “The Non-Diegetic Fallacy: Film, Music, and Narrative Space,” Music and Letters 91, no. 2 (2010): 224–44. Clive Myer, “Theoretical Practice: Diegesis Is Not a Code of Cinema,” in Critical Cinema: Beyond the Theory of Practice (Columbia University Press, 2011), 11. ↩︎
Étienne Souriau, “The Structure of the Filmic Universe and the Vocabulary of Filmology,” The Structure of the Filmic Universe and the Vocabulary of Filmology, 1951. Eleftheria Thanouli has suggested that Souriau’s daughter, Anne Souriau, also needs to be given credit for the coinage of the term and her definition differs slightly from her father’s. See her entry in Edward Branigan and Warren Buckland, eds., The Routledge Encyclopedia of Film Theory (Routledge, 2013), 134. For a recent account of the French Filmologie school see D. N. Rodowick, Elegy for Theory (Harvard University Press, 2014), 112–131. ↩︎
These seven levels are: the afilmique (the real world), the profilmique (the part of the world taken by the camera), the filmographique (what can be seen from the physical medium of film, the film strip), the filmophanique (the projection system), the diégètique, the spectatoriel (the spectator’s subjective activities), the créatoriel (resides in the individual or collective mind of the filmmaker) level. This structure is elaborated in the 1953 book L'Univers filmique, (an additional category, écranique, is added to refer to what the screen contains) which is an edited volume under Souriau's supervision: its twelve chapters were written by a total of eight authors (Souriau himself among them). It is important to point out that neither François Guillot de Rode's chapter on sound nor Jean Germain's chapter on music uses the word diégètique. ↩︎
Étienne Souriau, ed., L’univers filmique (Paris: Flammarion, 1953), 7. My translation. The original reads: “tout ce qui appartient, dans l’intelligibilité (comm dit M. Cohen Séat), à l’histoire racontée, au monde suppossé or présupposé par la fiction du film.” It is important to include the original here as this sentence is not always quoted correctly. One popular version (Edward Lowry, The Filmology Movement and Film Study in France (Ann Arbor, Mich.: UMI Research Press, 1985), 85.) goes “Everything which concerns the film to the extent that it represents something” plus “that type of reality supposed by the signification of the film.” This translation garbles almost all the important elements of this definition: the story told, the world, the “intelligibilité” (degree of comprehension). Another, by Claudia Gorbman, is nearly identical to mine except that she translates présupposé into “proposed,” therefore changes its original meaning of “require as a precondition of possibility or coherence.” See Claudia Gorbman, “Narrative Film Music,” Yale French Studies, no. 60 (1980): 195. ↩︎
Quoted in Elisabeth Weis and John Belton, eds., Film Sound: Theory and Practice (Columbia University Press, 1985), 197. ↩︎
Edward Branigan, Narrative Comprehension and Film (Routledge, 1992), 35. ↩︎
Daniel Percheron, “Sound in Cinema and Its Relationship to Image and Diegesis,” Yale French Studies, Cinema/Sound, no. 60 (1980): 16–23. Although the English translation of Percheron’s essay is published after the first edition of Film Art (1979), the original article was written earlier, in ca/cinema no 3, Jan, 1974. ↩︎
In her 1974 essay on Fellini/Rota, Gorbman uses Genette’s term extradiegetic. Then in her 1980 essay in Yale French Studies she changes to nondiegetic, as if to tune in with Bordwell and Thompson, whose Film Art came out a year earlier. Although Gorbman is not the first that applied the term to film sound, her original contribution lies in her matching the term against a previously existed distinction in film music industry, namely, that of the source music and underscore. ↩︎
Caryl Flinn, Strains of Utopia : Gender, Nostalgia, and Hollywood Film Music (Princeton, N.J.: Princeton University Press, 1992), 11–2. ↩︎
Anahid Kassabian, Hearing Film : Tracking Identifications in Contemporary Hollywood Film Music (New York: Routledge, 2001), 42–3. 42-3. In a recent article sensationally titled “The End of Diegesis as We Know it” Kassabian further announces that the many radical aspects of contemporary media culture have made the term largely obsolete. See Anahid Kassabian, “The End of Diegesis as We Know It,” in Oxford Handbook of New Audiovisual Aesthetics, n.d., 89–106.Oxford Handbook of New Audiovisual Aesthetics, 89-106. ↩︎
Robynn J. Stilwell, “The Fantastical Gap between Diegetic and Nondiegetic,” in Beyond the Soundtrack, ed. Richard Leppert, Lawrence Kramer, and Daniel Goldmark (Berkley; Los Angeles: University of California Press, 2007). ↩︎
Jeff Smith, “Bridging the Gap: Reconsidering the Border between Diegetic and Nondiegetic Music,” Music and the Moving Image 2, no. 1 (April 1, 2009): 1–25. ↩︎
Neumeyer, “Diegetic/Nondiegetic.” Neumeyer offers his refined model which retains the dichotic pair while differentiating between three stages of its application. ↩︎
Claudia Gorbman, Unheard Melodies: Narrative Film Music (Bloomington: Indiana University Press, 1987), 22–3. ↩︎
Jørgensen, “Time for New Terminology? Diegetic and Non-Diegetic Sounds in Computer Games Revisited,” 85. ↩︎
Winters, “The Non-Diegetic Fallacy: Film, Music, and Narrative Space,” 238. Ben Winters’ arguments are some of the most intriguing ones I have encountered in this debate. Why is that we refuse to categorize, he argues, something we can easily remember leaving the theater as not belong to the film? I am sympathetic to his efforts in finding a place for music in the diegesis. Yet I think the diegesis is clearly not everything a film offers therefore the need to find a place for music in diegesis somewhat misconstrues the problem. ↩︎
Randy Thom, whose writing now frequently appears in academic collections on sound aesthetics, made the following remark in a forum devoted to the notion of diegesis: “In the thirty years of conversations I’ve had with co-workers on feature films in the USA and Britain, nobody has ever used the word diegetic except to deride it as an academic term of little practical use.” See Randy Thom, “Acoustics of the Soul,” Offscreen 11, no. 8–9 (2007). ↩︎
Roland Barthes, “The Rustle of Language,” in Bruissement de La Langue, trans. Richard Howard (University of California Press, 1989), 83. ↩︎
And some believed that the imagined sound is always better than the actual sound. Rene Clair once said, “The imaginary words we used to put into the mouths of those silent beings in those dialogues of images will always be more beautiful than any actual sentences. The heroes of the screen spoke to the imagination with the complicity of silence. Tomorrow they will talk nonsense into our ears and we will be unable to shut it out.” Cinema Yesterday and Today (Dover Publications, 1972), 144. ↩︎
In an expertly written essay Szalorky argues that “silent films have never been silent” thanks to the audience’s active imagination, not because there are sounds to be heard! Melinda Szaloky, “Sounding Images in Silent Film: Visual Acoustics in Murnau’s Sunrise,” Cinema Journal 41, no. 2 (2002): 109–31. ↩︎
Both Heidegger and Merleau-Ponty use the term in their writings. Although I do wish to evoke this phenomenological sense of the term and propose its applicability in the studies of the cinematic experience, it is not my interest to pursue here a definition of it in its philosophical context. ↩︎
Metz, Film Language: A Semiotics of the Cinema, 98. ↩︎
Consider the two final shots of Stray Dogs (2013). I do not intend to make an essentialist proposal here, but it might be argued that this form of engagement is precisely where sound cinema excels. The same shots definitively cannot exist without sound. ↩︎
Quoted in Tom Gunning, “Re-Newing Old Technologies: Astonishment, Second Nature, and the Uncanny in Technology from the Previous Turn-of-the-Century,” in Rethinking Media Change: The Aesthetics of Transition, 2003, 44. ↩︎
Martha Carol Hamand, “The Effects of the Adoption of Sound on Narrative and Narration in the American Cinema” (PhD Thesis, University of Wisconsin Madison, 1983), 68. ↩︎
Stephen Bottomore, “An International Survey of Sound Effects in Early Cinema,” Film History 11, no. 4 (1999): 485–98. Also Stephen Bottomore, “The Story of Perce Peashaker: Debates about Sound Effects in the Early Cinema,” in The Sounds of Early Cinema, ed. Richard Abel and Rick Altman, 2001. Bottomore’s evidences are explicitly designating movie exhibition in Great Britain. ↩︎
Maxim Gorky, quoted in Colin Harding and Simon Popple, eds., In the Kingdom of Shadows: A Companion to the Early Cinema (Cygnus Arts, 1996), 13. ↩︎
Quoted in Scott Eyman, The Speed of Sound: Hollywood and the Talkie Revolution 1926-1930 (Simon and Schuster, 1997), 26. Edison’s words are ambiguous here: while the sounds of the blows might indeed have been represented to a satisfactory level, there is no mentioning to what extent they are synced to the images. “You can hear the sounds of the blows” loosely floating around is simply a very different effect from “every punch is accompanied by a sound.” The public that receives Edison’s words, however, will likely to believe he is describing the latter. This goal is most likely not achieved—otherwise Edison would have probably boasted explicitly. With the interval of about 15 years Edison picks up the idea again in 1913 and presents his kinetophone with a much better synchronization. ↩︎
Quoted in Bottomore, “An International Survey of Sound Effects in Early Cinema,” 487. The writing appeared in 1897. ↩︎
Bottomore, “The Story of Perce Peashaker: Debates about Sound Effects in the Early Cinema,” 131. ↩︎
“Sound Effects: Good, Bad, and Indifferent,” Moving Picture World 5, no. 14 (October 2, 1909): 441–42. ↩︎
Clyde Martin, “Working the Sound Effects,” Moving Picture World, September 23, 1911, 873. ↩︎
George W. Beynon, Musical Presentation of Motion Pictures (G. Schirmer, 1921), 86. ↩︎
Burch defines it as “a reading that could gather signs from all corners of the screen in their quasi-simultaneity, often without very clear or distinctive indices immediately appearing to hierarchise them, to bring to the fore what counts, to relegate to the background what doesn’t count.” Noël Burch, Life to Those Shadows (Berkeley: University of California Press, 1990), 154. ↩︎
James Lastra, Sound Technology and the American Cinema: Perception, Representation, Modernity (Columbia University Press, 2000), 105. ↩︎
The anecdote deserves to be read in its entirety. But as previous scholars have repeatedly told the story I shall not include it here. See Bottomore, “The Story of Perce Peashaker: Debates about Sound Effects in the Early Cinema,” 133. Rick Altman, Silent Film Sound, Film and Culture (New York: Columbia University Press, 2004), 238–9. Lastra, Sound Technology and the American Cinema: Perception, Representation, Modernity, 105. paraphrases it. ↩︎
Sergi Gianluca, “In Defence of Vulgarity: The Place of Sound Effects in the Cinema,” Scope, no. 5 (n.d.). ↩︎
Stephen Bush, “When ‘Effects’ Are Unnecessary Noises,” Moving Picture World, 1911, 690. ↩︎
Rick Altman, Sound Theory, Sound Practice (Routledge, 1992), 250. ↩︎
The commonly recited story of its origin is that several people repeating the word “walla” would sound like a crowd. According to Wikipedia it is called rhubarb in UK, rhabarber in Germany, rabarber in the Netherlands and Flanders (Belgium) as well as Denmark, Sweden & Estonia, and gaya in Japan. ↩︎
Metz, “Aural Objects,” 29. ↩︎
Bordwell gives an informative account of the practice of tinkering with the gunshot sound in this film. David Bordwell, “The Boy in the Black Hole,” Observations on Film Art, accessed July 25, 2015, http://www.davidbordwell.net/blog/2008/04/19/the-boy-in-the-black-hole/. ↩︎
Clair, Cinema Yesterday and Today, 133. ↩︎
Elsewhere in the film Gremillon does put the acoustic protocol to good use in a montage sequence later where a series of shots of the awakening city (a young man asleep in the park, a view of the Seine, a construction site, a boulevard, etc.) is accompanied by city noises: church bells, barges on the river, heavy machinery, automobile. ↩︎
Bazin, What Is Cinema? Vol. I, 49. Footnote. ↩︎
Ibid., 33. ↩︎
Gunning, “Moving Away from the Index: Cinema and the Impression of Reality,” 42. ↩︎
Noël Burch, To the Distant Observer: Form and Meaning in the Japanese Cinema (University of California Press, 1979), 19. ↩︎
Noël Burch, “Narrative/Diegesis-Thresholds, Limits,” Screen 23, no. 2 (1982): 16–33. For critiques of Burch’s position see Edward Branigan, “Diegesis and Authorship in Film,” Iris 7, no. 4 (Fall 1986): 37–44. Donald Kirihara, Patterns of Time: Mizoguchi and the 1930s (Univ of Wisconsin Press, 1992), 21–23. Myer, “Theoretical Practice: Diegesis Is Not a Code of Cinema.“All these critiques rightly take issue with the ideological overtone of Burch’s theory but ignore the “non-ideological” insights. ↩︎
Burch, To the Distant Observer: Form and Meaning in the Japanese Cinema, 19. ↩︎
Ibid. ↩︎
Burch, “Narrative/Diegesis-Thresholds, Limits,” 20. ↩︎
Ibid. ↩︎
Ibid. ↩︎
Ibid., 17. ↩︎
Ibid. ↩︎
Burch, Life to Those Shadows, 253. ↩︎
Tom Gunning, “The World in Its Own Image: The Myth of Total Cinema,” Opening Bazin: Postwar Film Theory and Its Afterlife, 2011, 124. ↩︎
Ibid. ↩︎
Roland Barthes, “Rhetoric of the Image,” in Image, Music, Text (New York: Hill and Wang, 1977), 32–51. I see the object of my study quite comparable to that of Barthes: a banal object presented in the context of an intentional construct that seeks to maximize its persuasive power. And there is certainly a similarity in terms of what the received wisdom feels about this kind of analysis. Barthes remarks in the beginning of his analysis: “there are those who think that the image is an extremely rudimentary system in comparison with language and those who think that signification cannot exhaust the image's ineffable richness.” (32) Both opinions easily find equivalent in respect to sound—one simple needs to replace the word “image” with “sound” in the above sentence. The same equivalency can also be found in the goal of analysis, which I share with Barthes: “How does meaning get into the image (sound)? Where does it end? And if it ends, what is there beyond?” (32) ↩︎
Metz, “Aural Objects,” 156. ↩︎
Chion introduces this notion in his Audiovision, 6-7. “By added value I mean the expressive and informative value with which a sound enriches a given image so as to create the definite impression, in the immediate or remembered experience one has of it, that this information or expression ’naturally’ comes from what is seen, and is already contained in the image itself.” Although this definition is far from clear to me, I am inclined to think that both Chion and I are trying to articulate the particular magic that the image and the sound do to each other. ↩︎
A film with absolutely no image (not necessarily figurative) doesn’t qualify for cinema. Nevertheless films that occasionally take away or downplay images and accentuate the sound are quite plausible. Films of this category, which I call blind films, announce loudly that the most underexploited element of modern cinema is the sound. Rene Clair’s sous le toit de Paris (1930) contains a sequence that supposedly shows a couple conversing in darkness. In Godard’s Letter to Jane (1972), the image is constantly switched off, and sometimes for a lengthy period of time (Godard does this consistently in many of his films). At the end of 1970s Marguerite Duras made a series of experimentations of this sort which culminated in L’Homme Atlantique (1981), a film that features over twenty minutes of darkness (“it is in the dark that we discover ourselves”, Duras once said). More recently Derek Jarman’s Blue (1993) notoriously presents its viewer with a pulsating blue rectangle for the entirety of its duration. Joao César Monteiro’s Snow White (2000) revisits the same ground by rigorously patterning a muting of the images. As in most of these cases (except the Jarman film), the sudden illumination and black out of the screen constitutes a strong sensorial division that induces the effect of rhythm. ↩︎
Barthes, “Rhetoric of the Image,” 156. ↩︎
Kristin Thompson, “Early Sound Counterpoint,” Yale French Studies, Cinema/Sound, no. 60 (1980): 115–40. ↩︎
Nelson Goodman makes an intriguing point on what he calls the “touchstone of realism”, that is, realism lies “not in quantity of information but in how easily it issues.” See Nelson Goodman, Languages of Art (Hackett Pub Co Inc, 1976), 36. ↩︎
Barry Salt, Film Style and Technology (London: Starword, 1983), 233. ↩︎
Curiously in the second shot the sound of hammers are not there for the first two or three seconds and only comes in later and gains volume. The cut between the second and the third shot, however, is continuous with no audible gaps of either the hammer sound or singing. If technical problem is not the culprit here then this might be interpreted as intentionally to convey the different kind of spatial transition involved: between the first and second shot from exterior to interior; but between the second and third, a matter of adjacent cells. ↩︎
Pabst’s Westfront 1918 (1930), in this context, may be regarded as a towering achievement for its adept use of both techniques for an array of sounds (the clock tickling, the machine gun, the canon fire) throughout the film. ↩︎
“There is no place of the sounds, no auditory scene pre-existing in the soundtrack—and therefore, properly speaking, there is no soundtrack.” Michel Chion, Audio-Vision: Sound on Screen, trans. Claudia Gorbman (New York: Columbia University Press, 1994), 68. For an early critique of this position, see Rick Altman, McGraw Jones, and Sonia Tatroe, “Inventing the Cinema Soundtrack: Hollywood?’s Multiplane Sound System,” in Music and Cinema, 2000, 339–59. ↩︎
Noël Carroll, “Lang, Pabst and Sound,” Cinetracts 5 (1998): 15–23. ↩︎
Ibid., 16. ↩︎
As Charles O’Brien points out, although direct sound may have been an important option in those early years, German cinema is known for exemplifying the opposite approach from early on. See Charles O’Brien, Cinema’s Conversion to Sound: Technology and Film Style in France and the U.S (Bloomington: Indiana University Press, 2005). ↩︎
Carroll, “Lang, Pabst and Sound,” 23. ↩︎
Ibid., 21. ↩︎
Ibid., 17. ↩︎
Mordaunt Hall, “Movie Review: Kameradeschaft (1931),” New York Times, November 9, 1932. ↩︎
Similar to the phenomenon of constructing a film around visual attraction and special effects onto which the film hangs itself, in the Nickelodeon period films emerge that seek to encourage or even to require the use of sound effects, which we may call “sonic attraction films.” Rick Altman details many such films from Gaumont (The Irresistible Piano, 1907), Kalem (Dot Leedle Little German Band 1907, School Days 1907, Merry Widow 1908), Essanay (The Dancing Nig 1907), Lubin (Love’s Sweet Melody 1909), Biograph (Schneider’s Anti-Noise Crusade 1909). These films make sound (and listening) a critical element of the plot and prove to what extent the profuse of sound effects can be narratively justified. To find narrative justification of auditory attraction will indeed become a central concern for many transitional films. See Altman, Silent Film Sound, 214. ↩︎
In M, there exists a scene where Franz tries to drill a hole on the floor to get to the room below. This instance gives a glimpse of what the film would sound like if made in the Pabst way. But the rarity only proves the norm here. ↩︎
Roland Barthes, Camera Lucida : Reflections on Photography (New York: Hill and Wang, 1981), 34. ↩︎