Immersed in Sound

Deep Dive into the Second Chapter: Immersed in Sound
The Third Dimension of a Sonic World
Poets often introduces us into a world of impossible sounds, so impossible, in fact, that their authors may be charged with creating fantasy that has no interest.
—Gaston Bachelard1
One of the most striking differences between a film made in the 1930s (such as M or Kameradeschaft) and a modern sound film (roughly from 1977 onward) lies in the sensorial density of the soundtrack: the sparseness of sound effects and their strict compartmentalization in the former, and the richly interweaved and almost infinite details in the latter. The changes are perceptually unmistakable, as even a casual listener would notice the sheer amount of sound effects presented in a modern soundtrack and the astonishing range of their intensity and variation—what I am tempted to call “the rise of ambience”—that often go beyond the capacities of human hearing. Yet a truly radical change lies elsewhere: in addition to offering its audience a quantitative increase of the amount of acoustic data, the modern soundtrack also seeks to redefine the moviegoing experience by positioning the audience at the center of a sound space. The phrase “immersed in sound” in the title of this chapter refers to this intensified auditory experience that characterizes contemporary cinema; but it also points to a desire of cinema to immerse the viewing/listening subject, a desire that can be traced back to the emergence of the audiovisual diegesis. In a sense, cinema has always already promised a sense of immersion in the filmic world as experienced through the multitude of sounds.
Yet in cinema, the world heard is never a faithful replication of the diegetic world in its sonic dimensions—although it does keep referring to it—but instead a fantasied rework of our phenomenological grasp of this world. Nowhere is this point made clearer than in the many layers of intention embedded in the sound’s spatialization. Sound may have a naturally spatial dimension in the real world, but in cinema, the sound space is nothing but a highly constructed entity. Natural sounds spill over the extent of space with its plenitudinous reverberations; they envelope us from all directions indiscriminately; many of the sonic snippets, due to their lack of definition, serve to confuse than to orient our largely regressive ears. In contrast, film sound has always belonged to a different pedigree: what we hear in today’s multiplex theaters are carefully packaged “good,” “clean,” larger-than-life sounds, characterized by the lack of unintentional decay and reverberation, by the degree of articulation or magnification of what is thought to be sonically characteristic, emotionally expressive, or narratively significant. In recent decades, thanks to an ever more sophisticated sonic apparatus, film sound has also become increasingly spatial: sound is literally given more and more space on the recording medium as well as in the auditorium. Yet the effort to harness sound’s spatial dimension, to produce a hyperreal kind of depth, movement, contrast and other immersive effects, testify to this perennial need to contain and shape sound, to have complete control of the audience’s listening experience.
What do we mean exactly by sound space or spatialization? Isn’t sound always already spatial? The moving images, as we know, are contained by frames, which refer both to how images are separated into discrete units on the filmstrip (if only historically) and the four edges of the screen. Being natively only two dimensional, the moving images rely on depth cues, camera movement, staging, and constructive editing to help the spectator to perceive a space beyond the surface of the screen. The stereoscopic cinema adds parallax to the equation, without changing the fact that the deepened or layered view still has a clear boundary. What we can say about sound is almost the opposite: there is no frame, no container for sound; sound is natively spatial. As Michel Chion put it, “the sound in itself is by definition a phenomenon that tends to spread out, like a gas, into whatever available space there is.”2
Yet simply saying that sound travel through space or spreads out into a space doesn’t mean sound is always evoking space—when it does, it becomes such an extraordinary phenomenon that we instantly feel as different. Impressed and intrigued by nine “purposefully oneiric” Dolby Digital promotional trailers, especially their promise to “open people’s ears in a new way,” Vivian Sobchack in an essay provocatively titled “when the ear dreams” muses on the phenomenological consequences of hearing hyperrealistic sounds and how these trailers make our assumptions between seeing and hearing in the cinema “less certain, if not completely reversed.”3 “Throughout,” Sobchack describes, “emphasis is on sound emergent, moving, swelling and fading, on sounds separated, spatialized, and amplified to create an intensified sense of acoustical presence and sonic immersion.”4 The Dolby Digital trailers are prime examples of how extra-ordinary the sound space can become once untethered to a narrative context. These audiovisual vignettes may be, as Sobchack shrewdly observes, “made to foreground sound (as well as to corporately shape it).”5 But the basic characteristics of the sound space offered by these trailers are meant to be deployed in the mainstream filmmaking. In other words, although a sound space that consists largely of pure sonic motion6 may be characterized as sonic avant-garde, the attempt is fueled by the same desire that has propelled cinema sound’s exploration of space, a space centered around the cinema auditor.
However, Sobchack’s fine perception and beautiful articulation also point to the exact opposite of what Chion has said about the sound’s spatial quality:
At the same time, however, it also seems contained, indeed framed. That is, while the amplified, swelling sounds fill what seems like a vast space, that space seems nonetheless discrete, and doesn’t precisely coincide with—even as it intersects with—the actual space with the theater in which we sit. The space of the sounding, particularly in its discrete elements, seems ‘deep’ with a depth that has no discernible parameters and yet its breadth seems to have sharply defined edges and seems absolutely bounded in its clarity, its lack of sonic “leakage.” Furthermore, foregrounded in their clarity, the more isolated sounds and effects move from place to place (and speaker to speaker) around the listener and the theater but don’t so much “fill up” the space continuously as “describe” it in a kind of discontinuous “mapping.”7
How do we reconcile the above two descriptions of sound’s spatial qualities? How can sound shape space, the very medium in which it travels? How is it possible that sound can create a space that doesn’t coincide (but merely “intersects”) with the space of the auditorium? When we say sound is evoking space, what exactly is the space being referred to and where is it located? While the phenomenon of sound space is intuitive for everyone to understand, the precise nature of the term, it seems to me, presents us nothing short of a conundrum. But let us step back a little. How could we talk about certain instances of spatialization in contemporary film soundtracks without asking how does the practice come into being? And based on what ideas from the past? Is this exploration of space new to film sound? Or was it there in the beginning? What is the sound space, we might ask, of the so-called silent cinema? Where is the noisy, virtual soundtrack densely populated by musical and vocal performances as well as a well-chosen repertoire of incidental sounds located? To look at the issue from the angle of sound space might give us some clues as to why various forms of synchrony attempted before and around 19078 are not successful. Without delving too much into the period, it seems that, despite all attempts at synchronicity (and synchronicity they indeed may have achieved), the space of sound and the space of image hardly ever overlap. The situation is dramatically changed by the few moments in The Jazz Singer. For the first time, the audience is able to catch a glimpse of how the two spaces collapse onto each other: there is nothing unheard of or unseen; yet the effect of their fusion is overwhelming. Instead of mere synchronicity (abundantly present in all Vitaphone shorts), Al Jolson’s soliloquy is able to achieve its sensational effect partly because it presents for the first time a palpable sound space in which real people inhabit and converse with each other.
\[...\]the creation of a sound/image illusion was a highly tenuous process, and one whose success revolved around the parameter of space.”11
Today’s movie theaters equipped with 3D digital projectors and loudspeakers spanning across the wall as well as the ceiling may be an entirely different world than what Arnheim had in mind when he wrote the above passages. But the problem remains the same: exactly how is space rendered by film sound? How does a film cue its audience to hear depth? This chapter wants to supply a historical narrative of film sound that is told from the perspective of sound space. In fact, not only is the notion of sound space highly constructive to the periodization of film history; it is also crucial to film theory. One often talks about “the cinematic space” as if the film presents us a singular, unified, ready-to-wear space wherein all the cinematic artifacts can be conveniently located. But in reality, the so-called cinematic space is itself the result of successful fusion between sound space and image space. A cinematic space can exist only when, for better or for worse, the cinematic apparatus has fulfilled its designed illusionistic purpose. In a sense, the successful perception of a cinematic space has as its condition the invisibility of the very mechanisms of this fusion. Instead of talking about the cinematic space, therefore, it helps to engage with sound space and image space individually and investigate the ways in which they work together. A myriad of questions can be raised in this fashion. To what extent can sound space be attributed to the type of microphone used, to the way sound is edited together? What work is being done by the number of channels, speakers, the size of screen and the pair of polarized glasses in front of our eyes? What do the sounds need to do to remain in line with the images? And vice versa? What is the nature of their agreement? Under what circumstances can sound breach (or merely poke at) this agreement? —What would be its effect to the audience? If sound is more indicative of space (the “more” is certainly debatable here), what does surround sound add to that effect? Finally, what does the digital turn mean to film sound in its quest of space? If immersion is the ultimate goal for future development in cinema sound, has sound technology gone ahead of image technology in reaching out and enveloping the audience? Wouldn’t it be the case that the sound is already there while the image plays the catch-up?12
While some of the above questions cannot be fully addressed in this study, I believe that raising them here shows clearly the importance, as well as the urgency, of historicizing/theorizing sound space. I propose that the issue of space can act as a focal point where the study of film sound interacts with contemporary audiovisual aesthetics, industrial norm and technological history. This chapter therefore presents a preliminary attempt at unraveling the many theoretical intricacies of sound space in cinema; it offers a conceptual framework in which the mutual interference between sound technology and sound space can be understood; it also weaves a narrative of the historical evolution of how sound space is experienced by the cinema auditor. It does so by conceptually distinguish two modes of sound’s spatialization: the embedded space and the embodying space. Such a distinction works as both a historical and theoretical category. I shall first present some general observations of sound’s spatialization in cinema and introduce these two modes of spatialization. Then I will give each mode a more detailed account, engaging previous scholarship on the subject and offering concrete examples.
Spatialization and Its Discontents
In a seminal essay13 that explores the maddening term of “voice-off”, Mary Ann Doane offers some thoughts on how a particular kind of sound, that is, the voice, is related to space. Doane’s inquiry has led her to distinguish three kinds of space: 1) The space of the diegesis; 2) The visible space of the screen as receptor of the image (not audible); 3) The acoustical space of the theater or auditorium (not visible). This taxonomy, especially the parallel between the screen and the auditorium, serves as a good starting point: in order to serve as a receptor, both the screen and the auditorium need to be neutralized first in respect to the sensorial stimuli they will receive and reflect. The screen manages this task with ease—in order to receive light, it just needs to be blank, to possess a certain reflection rate. The case of the auditorium, unfortunately, involves much more work and the fact that it can actually do its job already deserves our admiration. The acoustic space of the auditorium is never quite blank and it is actually difficult to make it so. With the advent of sound “countless theaters across the nation were suddenly discovered to be acoustically deficient.”14 Their excessive reverberation suits well one particular kind of sound (music) but fares terribly with another (human speech). Sound absorption materials were invented and installed. But the problem didn’t go away. In fact, throughout the history of cinema, every new step made by sound technology seems to rekindle a particular acoustic attention to the auditorium, as the placement of additional speakers always poses new challenges and rewrites the rules by which sound is to be conceived in the physical space. The task of projecting images onto the screen looks comparatively straightforward considering sound’s constant need to negotiate with the acoustic space of the auditorium. Every new sound system seems to manifest itself inside the auditorium in new ways, and in so doing must find new ways to avoid being contaminated by the spatial acoustics of its receptor.
What this effectively means is that although the space of the auditorium is the only physical space there exists for sound, it is not the sound space per se—just as the screen is not the image. In fact, the presentation of sound space hinges on the very neutralization of the physical space of the auditorium. The analogy between the screen and the auditorium is apt here: both apparatuses function on condition of their own invisibility, or to use a less visually biased term, unobtrusiveness. Sometimes it appears that this sonic sterilization is performed only too well. As Chion noted, one of the effects created by Dolby’s hyperreal sounds is the “elimination” of the auditorium. “The real size of the auditorium is immaterial,” Chion observes, and “one no longer has the feeling of the real dimensions of the room, no matter how big it is.”15 In an almost absurd way, therefore, the listening experience in movie theaters becomes somewhat similar phenomenologically to that of the stereo headphone. An “intimately immense”16 space is built across different audiovisual media, testifying to the same axes of control over sound spatialization and listening experience in a wide range of technologically mediated listening scenarios.
In one of the first essays to address the issue of sound space and technology Rick Altman observes, “The ‘reality’ which each new technology sets out to represent is in large part defined by preexisting representational systems.”17 Like many other aspects of the cinema as a technologically mediated experience, the question of sound space is very much a question of representational system. The fact that we may get the same kind of sound space going to movie theaters and putting on a pair of headphones suggests that the two very different listening scenarios may share one same representational mode. In fact, as anyone who has owned a modern home theater AV receiver that is capable of surround sound would know, such a receiver would ostensibly offer a series of modes of listening fashioned after the different kinds of sound space it claims to emulate: concert (it is fairly common to see a list of different musical genres such as Classical and Rock here), cinema, stadium, etc. It is therefore a reasonable assumption that sound space should be understood primarily as a mode of experience, which can evolve in different historical periods, migrate across different media industries, and be implemented by different technologies.
Consider the early attempts to introduce mechanically synchronized sound in the movie theater. What is often highlighted is the ability (or the lack thereof) to reliably and precisely synchronize the playback of sound and image; what is less discussed, however, is the spatial dimension of a new apparatus. For these attempts literally bring a new issue to the room, namely, the question of speaker placement. Previously there of course exist sounds that accompany moving pictures; but the placements of their sources are largely determined by the (un)desirable visibility of the human subjects that produce these sounds. Impersonators and drummers need to be hidden from the view while the orchestra or pianist are allowed a partial view; a benshi or lecturer, on the other hand, is often granted a full view as their performance (which is not exclusively sonic) is perceived as a invaluable part of the overall moviegoing experience. The locations of these sound producers are predetermined by the different nature of their practice in relation to the screen. The case of the technologically reproduced sound, however, foregrounds the issue of placement since from now on the source of sound (the speaker) no longer has any perceivable allure (except maybe for audiophiles or rock concerts attendees). It should therefore be hidden from view, or mounted in positions that are previously impractical for human subjects.
The issue of placement generates for the first time not only an awareness of sound’s spatial location in the auditorium but also its implied directionality. The technologically reproduced sound, because it has to take on three roles (music, human voice and sound effects) at once, it has inherited three different placements that are traditionally allocated to these sounds. Since it cannot be at three places at once it has to choose one particular location that corresponds to what is perceived as the dominant. Because the mechanical soundtrack is first designed to replace the orchestra the loudspeaker (or horn) is installed in the now empty pit (sometimes covered with flowers!) facing upward. When the spoken words come, however, it is soon discovered that speech demands minimal reverb and is best transmitted directly facing the audience. Clearly inspired by the public address system, the speaker is then placed above the screen, a practice that is revived by Cinerama and still employed in today’s IMAX system. When the perforated screen arrives,18 the speakers shift their place again and move behind the screen. In fact, for a brief period of time, two of the above modes coexist—two horns separately located and facing different directions are dedicated to reproduce two different kinds of sounds: music and dialogue. This awkward setup needs manual switching between the music and the dialogue, a practice soon abandoned for its impracticality. A decade later (1940), however, when Warner Brothers’ Vitasound revisits the scene it resurrects the switching mechanism in the form of a control track printed between the sprockets, turning on and off music in the side speakers.
While new technologies are conceived in ways that are obviously indebted to the old ones, the opposite is also quite true: new technologies, especially the radically new ones, often redefine preexisting representational systems in their own terms. The notion of channel, which is central in contemporary scholarship on surround sound, is a case in point. Originally referring to a geological formation, it has morphed to mean a pathway of communication, the medium over which a signal is relayed from the transmitter to the receiver. Claude Shannon’s use of the term abstracts the physical medium, “a pair of wires, a coaxial cable, a band of radio frequencies, a beam of light, etc.”19 into a virtual, mathematically definable one. The term’s adoption in audio technology signals a different kind of abstraction, namely, an encapsulation of multiple sound streams that can be deployed later in different spatial locations. The notion of channel was received with great ease among sound engineers probably because the mental image of the term “channel” resonates well with the optical or magnetic track—one observable proof is that the term “multi-channel” is sometimes used interchangeably with “multitrack,” the term that has been used by the industry so far.20
While the concept of channel plays a crucial role in the development of sound technology since 1950s, it has the unfortunate effect of forging a teleological history of sound space. Several important previous historical accounts of cinema sound technology subscribe to this model. 21 These accounts would typically extrapolate the origin of multichannel sound back to the 19th century, via Blumlein’s 1933 patent and the series of stereophonic film experiments in the 1940s, Dolby Stereo’s commercial triumph in the 1970s, to the global proliferation of digital surround sound in the 1990s. This type of historiography starts from a contemporary notion of sound technology, namely, the multichannel sound, and then tries to identify what are the events and technologies that lead to the formation of that configuration: a sort of “let us now praise famous men” 22 (especially because Agee’s use of the term is decidedly ironic). By giving the history of sound technology a predestined purpose it would then claim that the sound cinema as we know it has been channel-based throughout its century long history. Needless to say, one of the necessary tasks of such historiography consists of identifying what is or is not truly a predecessor of the current culminating norm, the numeric-prefixed multichannel sound. Mark Kerins for instance claims that “Vitasound was not a true multi-channel system but rather a multi-speaker one” while the “the first true multi-channel sound film” came with Walt Disney’s Fantasia (1940).23
My account of the evolution of sound technology differs from previous ones in that it highlights what I believe as significant paradigms of sound space, namely, basic phenomenological categories of how space is experienced through sound. In my view, a historiography of sound’s spatialization needs a thorough understanding of the evolution of technology; this is why this chapter has a heavy investment in technology: it both reviews past technologies and takes the risk of anticipating new ones. Nevertheless, a history of sound space also needs to keep a certain distance with sound technology, especially the corporate slogans and their ideology of unstoppable technological progression. What needs to be meticulously examined is not always what the industry regards as significant steps (e.g., from Dolby Stereo to Dolby Digital), but how the audience experiences spatialization. It is here that a historiography of film sound needs to take off from the ground by supplying conceptualizations that are sufficiently detached from individual technological implementations.
A history of multichannel sound narrates typically how sound technology gets “better and better.” In contrast, a history of sound space traces how our ideas about sound’s relation to space change. For the same historical periods, previous scholars tend to foreground a technological distinction, namely, between the monophonic sound and the multichannel sound. In contrast, the distinction I propose to make is between two kinds of sound space that are not strictly successive; nor is one superior than the other. In fact, they almost complement each other by focusing on different ways sound can relate to space. The first set consists of evoking space through the acoustic manifestations of a single sound source. It does this with the characteristics of the microphone, its positioning and other technical parameters of rerecording. Inasmuch as the space thus created can be considered a magic spell cast from a point source, we may call it an embedded space. This mode of casting space is obviously indebted to radio. Yet conceptually speaking, the embedded space as a way recorded sound can evoke space is shared across different media industries ranging from the phonograph24 to the Academy standard of optical soundtrack.
A different line of research is needed in order to find out how does speaker placement construct space through a structurally spatial deployment of multiple sound sources. I propose to call the space thus formed embodying space mainly for the reason that it is a space centered on the auditor’s body. While the embedded space mainly involves production, the embodying space concerns exhibition. Essentially, the embedded space can only convey the spatial dimension of sound through volume variation and reverberation. What we hear as spatial in monophonic films therefore is the combinatory effect of images lending their spatial cues to the embedded space, and occasionally, this embedded space reinforcing this visual perception of depth. Instead of being embedded inside a single sound source and unfolded in the theater of mind of the auditor, the embodying space relies on a strategic distribution of acoustic information among somewhat different sound sources emitting from different spots of the auditorium. In the embedded space, we can recognize the spatial acoustics and their implication of space, but the space remains over there: it is an almost contemplative act, not unlike looking at a painting. In the embodying space the auditor is positioned amidst the perceived space; the fact that sounds come from different directions in a meaningful fashion suggests to the auditor a sensorial urgency and engages her involuntary bodily reactions—this space is right here.
One of the main differences between the embedded space and the embodying space lies in how the auditorium is activated. The embedded space treats the auditorium as non-existent or irrelevant: all that matter is the space captured and rendered in a strictly past sense—hence the past participle. In contrast, the actual space of the auditorium becomes critical when the embodying space is conveyed through a careful coordination of sounds coming from different places in the auditorium. In comparison to the embedded space, where a holistic source takes full responsibility for the auditory part of the cinematic space, the embodying space breaks apart this space and precariously reassembles it in the auditorium, in between the loudspeakers and the auditors. The embedded space exists already on the medium; the embodying space, on the other hand, remains in a sort of undetermined state until physically realized by the actual placement and type of speakers—thus the present participle—and the body of the auditor plays a central role in this realization.
An embodying space is always constructed. An embedded space, on the other hand, can be either captured by the microphone in a real space, or it can be simulated through various kinds of technological means. On the one hand, we have the direct recording option, where a segment of the soundtrack may be recorded in real location with minimal modification; on the other, a sonic world may be entirely generated by algorithms, namely, the rapidly emerging procedural audio processes such as “digital waveguide modeling”, " vector synthesis", and digital Foley toolkits.25 This is why we may make a further distinction within the sphere of embedded space: captured vs. simulated. Naturally, setting up these two categories doesn’t mean an either/or scenario. Although cases of both extremities do exist, most practical cases reside in the vast grey area in-between the two poles. A sound can be recorded “dry” (with minimal spatial acoustics) and then rerecorded in another space to acquire the desired spatial characteristics, or it can be manipulated through some forms of simulation, either in hardware or software form. It is customary for radio drama facilities, for instance, to simulate spatial effects through the use of some simple yet ingenious contraptions. Along with the rapid development of a set of rerecording procedures in the first decade of sound film, simulating spatial acoustics by artificial means becomes increasingly desirable, as it affords the kind of malleability not afforded by location recording. An echo chamber, for instance, can add different types of reverberations to a “dry” recording of a dialogue so that it may be functionally equivalent to an actual direct recording made in the kind of environment suggested. But the amount of echo can be conveniently adjusted so that a fine balance can be achieved without sacrificing the intelligibility. As radio drama (or podcast, as it is called nowadays) relies solely on the auditory channel to supply crucial narrative information (character recognition, setting, time etc.) a skillful manipulation of space becomes essential as it is often nothing but the spatial acoustics that tells the audience what, where and how everything is happening. The sound space therefore is decidedly a point of cross-semination between film sound and its many sibling practices.
On the other hand there exists the option, and in some national cinemas a highly preferred one, to record the action as it is played out in front of the camera. To record sound directly, however, is no minor task. Jean-Marie Straub in an interview makes the following intriguing comment: “You can make a dubbed film, but it is necessary to use a hundred times more imagination and work to make a direct-sound film.”26 The so-called direct sound is not only a technique of sound recording, but also an aesthetics of sound and philosophy of sound perception. The complex nature of this practice demands more theorization than is allowed here. But suffice to say that a central issue at stake here is again the space. What is perceived as superior in the French son direct tradition is not merely an ontological aura (the idea that this is authentic) of the recording, but also its acoustic imperfections, unforeseeable background disturbances, and other kinds of sonic contingencies that are highly indicative of space. A good direct sound recording does not offer an immaculate, prêt-à-manger, sound space for the auditor. Instead, the impression of reality is conveyed through the very raw quality and the many inevitable “flaws” of recording thus obtained. Precisely because it is not clean, and contains more than what the ideology of spatialization would recommend, it can afford unexpected discoveries that lead to a sense of authenticity.
It bears to note here that direct sound as an aesthetic choice cannot be equaled to direct recording as a technical necessity at a historical moment of cinema sound. All the Vitaphone shorts for example are made with direct recording; yet the sound thus obtained seldom is indicative of space—a via negativa of how sound space works. This is the result of staging that carefully eliminates the effect of space (most of the time the Vitaphone performers stay put as a fixed sound source); but it is also due to an aesthetic choice made by the sound engineers who prioritize the intelligibility of the recording (spatial acoustics are regarded as noises interfering with signals). One notable exception to this norm is found in one of the first Vitaphone vignettes, Caro Nome (1926), where a rather interesting audiovisual discrepancy exposes inadvertently the conventions at work. When Maria Talley exits the set in the back (as is customary for her character to do at this point) and moves up the stairs, the microphone, fixated along the edge of the stage, is left behind. All of a sudden a space emerges in sound, a space where the perfect recording of the soprano’s voice has traversed. The editing, however, partly destroys this effect by cutting to the telephoto lens that insists on framing her in full shots.
With the use of multiple microphones, live switching and the walk-and-deliver method, popular in the transition years, sound space is systematically eliminated. Sound, be it voice, music, or sound effects, are all unimaginatively foregrounded. To say this is the norm of the 1930s wouldn’t be an exaggeration; yet rare cases of sound space do exist. The Movietone newsreel, for example, shows an affinity to spatial sound from the beginning, as its superb outdoor capacity both allows and necessitates. While Fox also produces musical numbers à la Warner Bros, for its official unveiling (Apr 29, 1927) its secret weapon is an outdoor recording of the marching of a military band at the West Point. The sound of the band noticeably grows louder as the cadets approach the camera. In May 21, when Movietone opens at Roxy, New York, the featured newsreel is that of the Lindbergh take-off shot five days earlier.27 In both cases, Movietone’s choice of material and the way to record it (in a way that reminds us of Lumière’s diagonal framing of the coming train) allows for strong indication of spatial acoustics that in many ways prefigure the practice of son direct.
Embedded Space: the Case of Orson Welles
The acoustics perfect the illusion to such an extent that it becomes complete, and thus the edge of the picture is no longer a frame, but the demarcation of a hole, of a theatrical space: the sound turns the film screen into a spatial stage!
–Rudolf Arnheim28
To understand how the embedded space works in cinema (as well as in radio), there is no better way than to conduct a close listening of the works of Orson Welles. This admittedly limited scope is compensated by its exemplarity. The sound space in Welles’s films, especially Citizen Kane, is in my opinion a culmination of what the embedded space can achieve. Welles’ contribution to film aesthetics has not exactly gone without notice; yet scholarly work that deals with the sonic aspect of his films remains surprisingly scarce. Apart from a few notable exceptions, which I shall engage in the following, the voluminous literature on Kane remains silent on its use of sound.
To understand Kane’s sound, we need to retrace the pathway taken by sound’s spatiality in the 1930s. Apart from a protest group against talkie a handful of film theorists begin to envision sound in ways that were vastly ahead of sound’s contemporary technical capacity. Béla Balázs, for instance, offers many thought provoking observations on how sound renders space in film. “The business of the sound film”, Balázs writes, is “to reveal to us our acoustic environment, the acoustic environment in which we live, the speech of things and the intimate whisperings of nature; all that has speech beyond human speech, and speaks to us with the vast conversational power of life and incessantly influences and directs our thoughts and emotions, from the muttering of the sea, to the din of great city.”29 Balázs is particularly perceptive in describing how sound can reinforce the perception of seeing space and giving it invisible depth:
The widest space is our own if we can hear right across it and the noise of the alien world reaches us from beyond its boundaries. A completely soundless space on the contrary never appears quite concrete, and quite real to our perception: we feel it to be weightless and unsubstantial, for what we merely see is only a vision. We accept seen space as real only when it contains sounds as well, for these give it the dimension of depth.30
Balázs’ observation points to an effective way to convey the immensity of space: to use ambient sounds that not only are invisible (yet plausible) but also seem to come from far away.31 Sound is essential, in my understanding of Balázs, in giving us a sense of inhabited space and in so doing transforms cinema into a deeply humane art. Yet at the time of Balázs’ writing, the primitive recording technology faces an almost insurmountable problem: the conflict of interest between dialogue intelligibility and a rendition of our acoustic environment. The limited dynamic range of the optical soundtrack cannot accommodate both except by sacrificing one or another. Instead of benefiting from spatial acoustics in terms of immersive auditorship, the prevailing attitude is to consider them as unwanted noises that interfere with dialogue comprehension. Direct sound recording, admittedly a charming alternative for some,32 is by and large regarded as an inferior form of technology that Hollywood can’t wait to do without. The advent of Foley, multitrack recording, rerecording and other sound technologies facilitate the practice of breaking the soundtrack into pieces and reassemble them according to a certain preconceived hierarchy, in which spatial acoustics has no place. Despite some notable exceptions, the vast majority of movie soundtracks throughout the 1930s present a deadened, flat space, a space without spatial acoustics.
Around the same time a different story is told through radio drama. Although for more than a decade film sound shares a similar technological base with radio and early film sound engineers almost all come with a background in radio, the aesthetic and rhetoric codes of the two media are decidedly bifurcating from the very start due to the presence or absence of images. Without the visual aid, a radio drama has to rely heavily on background sounds that are easily recognizable to convey the sense of locale. To help the audience mentally picture the relative positioning of characters, spatial acoustics not only becomes necessary but it needs to be exaggerated to facilitate auditory perception. In contrast, although film sound’s technological infrastructure may be radio-based, its “superstructure” draws more from an enunciatory theater mode (to simplify the matter a bit). Considered within its own trajectory of development, cinema sound has never been more electrifying (all talking! all singing!); yet when compared to radio, there seems to be so much more film sound can and should do. The many constrains that film sound exhibits at this point are intriguing precisely because they are not entirely technical in nature (admittedly the technical facility, especially the dynamic range of recording, is extremely primitive).
In 1929 Rouben Mamoulian infuses the newly minted sound cinema with ideas fresh from the theater with his Applause; in 1941 Orson Welles, thanks to his extensive experience in radio drama, renovates the deadened cinema space with much more lively sounds. In a sense, Citizen Kane is able to transcend the technological limitations of the period thanks to its convergence of three identities: cinema, theater, radio. If we follow film soundtrack’s evolution to this point, the cumulative gain of technological improvements certainly contributes to the increased scope and accuracy of effects, but these technical know-hows are far from being extensively used to create anything that sounds remotely close to what we hear in Kane. Kane’s distinguished contemporaries such as How Green Was My Valley (1941), The Lady Eve (1941), The Maltese Falcon (1941), Suspicion (1941), The Philadelphia Story (1940), to name only a few more prestigious ones, are dramatically different films in terms of genre and photography, yet they all sound strikingly similar: a genre codified score (underscored according to budget), nonstop dialogue and theatrical delivery, selective incidental sounds from the repertoire of acoustic protocols that have been in circulation for more than two decades.
In contrast, Kane sounds as dazzling as it looks, even today. The use of human voice (especially Welles’s portray of Kane) in Citizen Kane picks up the broken lineage from Applause and features extensively overlapping dialogue and stylistically varied vocal delivery that makes characters sonically memorable. Kane also initiates an unprecedented degree of concreteness of speech (ungrammatical construction, hesitation, interruption) that boost the perceived authenticity of vocal performance. In terms of sound effects, too, Kane is the first film that really sounds radically different since the stabilization of soundtrack in the mid-1930s.
How do we understand the mode of the soundtrack of Kane? Where does it come from and what would be its context? “The remarkable thing about Welles’s films”, Walter Murch once claims, “is that you can turn off either the picture or the sound, and the films are still understandable.”33 Murch may be exaggerating a little here on the images’ sufficiency—I doubt if the film would make enough sense shown without sound; but he is absolutely correct about the soundtrack’s independent sufficiency. François Truffaut was so impressed by the soundtrack of Citizen Kane that he taped it and listened to it while taking his bath.34 To anyone (myself included) who has listened to the soundtrack without looking at the images, the format of the soundtrack clearly resembles radio drama of the time,35 especially Welles’s own work before he came to cinema. It is quite plausible that at least in the case of Citizen Kane, Welles may have conceived intuitively the sounds first, as a radio drama. Walter Murch is apparently of this view when he says, “All of his interest in sound effects, in ‘realistic’ dialogue overlaps, in the manipulation of acoustic space, came from innovations that he developed for his own radio programs, and then imported wholesale into the films that he made.”36
Acknowledging sound’s genealogical precedence in Kane helps us understand the excessive effort that Welles went through, considering the norm of film sound of the period, to render the auditory space with unprecedented depth and details. Only by recognizing the radio drama embedded in the film can we truly appreciate the rigorous patterning of sounds that dynamizes each scene. Unbridled by images a radio drama can excel in blending music, voice and sound effects together and achieve a fine balance between dialogue intelligibility and the construction of a mental image of diegetic space. Take for instance Susan’s suicide scene. According to Bazin’s classic description:
The screen opens on Susan’s bedroom seen from behind the night table. In close-up, wedged against the camera, is an enormous glass, taking up almost a quarter of the image, along with a little spoon and an open medicine bottle. The glass almost entirely conceals Susan’s bed, en-closed in a shadowy zone from which only a faint sound of labored breathing escapes, like that of a drugged sleeper. The bedroom is empty; far away in the background of this private desert is the door, rendered even more distant by the lens’ false perspectives, and, behind the door, a knocking. . . . The scene’s dramatic structure is basically founded on the distinction between the two sound planes. . . . A tension is established between these two poles, which are kept at a distance from each other by the deep focus.37
Clearly, sound is doing a lot of work in this scene. The sound of breathing not only gives a sign of life of Susan Alexander (she remains immobile), it further indicates the abnormal quality of this breathing (“like a drugged sleeper”). In the depth of the scene, where our vision is blocked by the door, it is sound that tells us what to see, or to imagine: how Kane impatiently knocks, how he exchanges alerted looks with the others, how he gradually realizes the gravity of the situation, how he finally resolves to take the door down, etc. All this time, we remain curiously close by and aligned to Susan because of a sonic proximity, because we hear the breathing.
Now imagine the scene based on silent staging. No breathing sound: no sign of life. A sense of urgency is lost if Susan appears to be dead already. The sound of pounding at the door, which happens to be a sound that we instantly recognize, fires up our imagination of what remains invisible. The scene can probably still work without changing its visual design if it can come up with ways to accentuate the movement of shadows under the door. But that will only reach perhaps the most attentive audience. To make the scene sufficiently legible it has to include a shot of Kane outside the door, which breaks the spell, or beauty, of the single shot structure. Needless to say there would be no more dramatic structure "founded on the distinction between the two sound planes"—here lies the central tenet of radio drama.
Welles’s acute sense of auditory space does not subscribe to a particular kind of spatial realism. Instead he sees fit combining different models of sound representation to achieve the effect intended. Rick Altman in his aptly named essay “Deep Focus Sound” describes with unprecedented precision the dialogue in the Colorado sequence, focusing on volume and reverb.38 He shows that the sequence only embraces spatial realism in the beginning, and then slides into “discursively useful” mismatches39 between the image space and the sound space. By deploying the spatial acoustics in the bracketing positions while focusing on intelligibility in the midsection, Welles bypasses the problem of deadened space that has plagued many early sound films. This strategy may be understood from a communication theory perspective: the space is conveyed in a “handshake” phase where protocols of communications are established; in what follows, as long as the spatial acoustics doesn’t change during the sequence, it would be superfluous to constantly include this type of information. Instead of getting to know where it is said, we can now focus on what is being said; hence intelligibility takes the reins.
What is even more striking is how Welles uses the presence of images to grant the audio even more liberty and credibility. He solves the dilemma between spatial acoustics and dialogue intelligibility by enlisting the help of images. Because lighting, camera angle, the characters’ body language, and their relative spatial locations all help the audience to grasp the meaning of their verbal articulations, the words now can afford to be a little garbled. The many impressively layered visual shots gain credibility by carefully chosen corroborating sounds. But this visual information also serves to mask sound’s highly constructed nature. Rick Altman suggests that even the much-touted Colorado boarding house scene is not completely recorded in direct, as child Kane’s voice from outside the window is very likely to have been recorded separately. This may be also the case for Susan’s suicide scene. We have yet to uncover archival material to prove the “matt shot” nature of many of these recordings—and probably never will. But there are reasons to suspect, given Welles’s expertise and interest in manipulating sound, that many scenes may have a highly artificial soundtrack. But this shouldn’t come as a surprise at all. For the set may look like a huge cave (the rally site, the Xanadu hall), but in fact it is just a carefully construed optical illusion. The image may suggest a closed vault with hard reflective walls (the Thatcher library), but beyond the frame the ceiling doesn’t actually exist (or exists as a piece of cloth). Hence direct sound recording in these circumstances cannot actually produce the desired effect. Even if the set is precisely what is supposed to be in the diegesis, the sound thus recorded may be either unusable (unintelligible speech) or not sufficiently exaggerated. The sound space in Kane, therefore, is highly simulated space, as is absolutely necessary in radio drama.
In response to Hollywood’s preeminent emphasis on dialogue intelligibility, Welles has presented ingenious solutions that do not entail a complete sacrifice of the spatial quality of sound. He also sets an excellent example for many films to come with consciously trading a bit of intelligibility for emotional effects. By experimenting with the amount of acoustic distortions, Welles can balance intelligibility and expressivity at a precise point—a concept that is virtually unheard of at the time. In fact, the coalescing of the image space and the sound space is so successful, despite their respective constructedness, precisely because of the high degree in which they mutually reinforce each other. In praising Kane, even a most perceptive viewer such as Bazin is led to believe that the spaces created by optical illusion do exist, because they are phenomenologically strengthened by sound space.
While Citizen Kane presents a textbook inventory of layered and purposefully vectorized radio space, Welles further explores the idea of choreographically combining camera movement with a continuous manipulation of point of audition in the opening of Touch of Evil.40 Such an idea is not exactly foreign to filmmakers since the advent of sound. The opening of Sous les toits de Paris, for instance, already contains a rudimentary yet effective use of this technique. First we hear chants off screen, whose volume increases noticeably from the 6th shot. The camera, with tilt and pan, finally visually locates the source of the chorus. The ensued crane movement is accompanied by a gradual and further increase of volume, until it reaches a level that indicates close proximity. Conversely, in the returning ascension of the camera, the volume gradually decreases until it fades out. The technique is repeated soon, in another “elevator shot”, where different tenants’ voices (and a piano) are heard dissolving into one and another, all rehearsing the same tune.
The three decades that separate Clair’s work and Welles’s add much complexity to the same “spatial sound montage” idea, both visually and acoustically. According to Welles’s 58 pages typed memo and 9 pages of “sound notes” jotted down during the filming, the music he has intended to accompany the opening camera movement is to be composed of “a succession of different and contrasting Latin American musical numbers”41 blaring out the various night clubs, tourist traps and moving car radios. Welles’s comments show great concern that the style of music does not violate what is considered ethnographically authentic; they also show a precise conception of the acoustical properties of sounds. “Loudspeakers,” he writes in the memo, “are over the entrance of every joint, large or small, each blasting out its own tune. The fact that the streets of these border towns are invariably loud with this music was planned as a basic device throughout the picture.”42 Ideally the snippets of music would float closer to the audience and depart again momentarily; they would hug the images and sometimes they would linger at the border of the frame, as if waiting to be caught up by the camera again. The constant changing point of audition makes the virtuoso camera movement more concrete by engaging the audience in a game of hearing and not hearing. The almost synesthetic effect of hearing and experiencing self-movement reinforce each other, the overall effect is not unlikely that of the Dolby Digital trailers mentioned in the beginning of the chapter.
To accomplish this effect it would not be enough, as Clair did a decade earlier, to simply alter the volume of the music and singing. Space itself needs to be embedded into the sound: not only does the music need to be authentic, but it also needs to possess certain acoustic qualities. The following sentence in the memo, therefore, is capitalized: “It is very important to note that in the recording of all these numbers, which are supposed to be heard through street loud speakers, that the effect be just that, just exactly as bad as that.”43 How does one achieve such a bad effect? By using captured space. Welles elaborates,
The music itself should be skillfully played, but it will not be enough in doing the final sound mixing to run this track through an echo chamber with a certain amount of filter. To get the effect we're looking for, it is absolutely vital that this music be played through a cheap horn in the alley outside the sound building. After this is recorded, it can be then loused up even further by the basic process of re-recording with a tinny exterior horn.… And since it does not represent very much in the way of money, I feel justified in insisting upon this, as the result will really be worth it.44
Many of the ideas pertaining to sound Welles describes in this document anticipate the future direction towards which Hollywood soundtrack is moving. In fact, Walter Murch describes his 1998 encountering of the memo as one of being repeatedly “flabbergasted,”45 realizing that Welles has outlined things that Murch thought he invented himself in the 1970s.46 Little wonder in the 1950s, Welles’s ideas on sound run up against the compartmentalized notion of sound that Hollywood has internalized since the 1930s. Yet the biggest casualty in the film, the virtuoso 3 minutes 20 seconds opening shot that we have just described, is almost a collateral damage. As is customary for the period, especially for the designated B movie category Touch of Evil was slotted in, this sequence has to serve as opening credits, which is accompanied by floor sweeping credits music composed by Henry Mancini. Once this loud music is removed, thanks to a magnetic master that consists of separate tracks for dialogue, music and sound effects, the restoration team was able to discover a “comprehensive effects track, with traffic, footsteps, a herd of goats and everything.”47
Embodying Space: a World that Descends upon Us
When Arnheim, Balázs, or even Bazin—the so-called classical film theorists—discuss the spatial qualities of film sound, the space evoked is invariably the embedded kind. It is a space in perfect accordance with the world that the images suggest: both are obviously recorded in and of the past; both remain over there. The vast majority of contemporary film theories, too, treat sound space as if it were still functioning in the same mode. As the philosopher Stanley Cavell has summarized in a poignant fashion, the screen serves as a barrier to the audience’s access of a world past—it presents the viewer with a world but in the meantime reminds her her own absence from that world. Christian Metz apparently agrees:
The space of the diegesis and that of the movie theater (surrounding the spectator) are incommensurable. Neither includes or influences the other, and everything occurs as if an invisible but airtight partition were keeping them totally isolated from each other.48
If the world of the film is indeed isolated from the audience—thanks to the cinematic apparatuses that trivialize in various ways the world of the audience (e.g., by dimming the light and making the chair reasonably comfortable)—then the audience’s access to this world is ultimately limited. The so-called cinematic space therefore is a space of inference—that’s why the term is almost interchangeable to the space of the diegesis (recall Mary Ann Doane’s discussion). Both designate a closed space that is separated from the audience, a space that the audience can look and listen into, but to which she clearly does not belong. But the multiplication of sound sources around the periphery of the auditorium has the potential to forge a different mode of sound space and consequently, a different mode of perception of the filmic world. If by the cinematic space we are referring to a sense of space that is the result of the audience’s active reconstruction of a world from bits of images and sounds culled from every corner of the movie theater, then Metz’s claim is no longer true: sound breaks this “airtight partition” that separates a world viewed through a window/keyhole that is the screen and a world heard from speakers around us, apparently targeting us and making us the center of a new acoustic presence. While for the visible world a taboo still exists that guarantees the viewer’s (or voyeur’s) own invisibility (often considered as the essential condition of narrative cinema), the aural world exhibits its characteristic boldness by offering itself in a much more tangible form: it is not a sin to address the audience with sound. This new sound space is no longer a world apart: it surrounds us; it touches us; it says, hic et nunc.
While the distinction between embedded space and embodying space may be approached, on the technical level, as how monophonic sound differs from surround sound, the technology itself cannot account for some crucial aspects of sound space, namely, the ways in which the audience’s sensorium is activated. The emergence of embodying space necessitates theorizations based on this new experiential contract between the evolving sound technology and the human sensorium. What is needed is a contemporary form of film sound theory (as it foregrounds the nature of the cinematic experience as the contemporary film theory does) that acknowledges the emergence of this auditor-centered conception of sound space.
Consider, as a simple example, how the two spaces can render a natural environment that features birds chirping. While an embedded space can only go as further as making a statement “such and such bird chirps at approximately this far,” an embodying space can provide the locations of individual chirps relative to the space of the auditorium, to the point of pinpointing it to individual speakers—this is what the technology has enabled nowadays with an increasing degree of sophistication. But what exactly is the point that we hear sounds coming from behind or above? Does that tell us more about the bird chirping? Is it simply a source of sonic attraction/distraction (it was certainly considered as such by many)? Or does that information “serve” (a favorite term for the practitioners) the storytelling? Again, history repeats itself by presenting the same line of argument: attractions need to be incorporated into the narrative; sounds need to serve the story. But what does the call for narrative relevance mean? Is it really the narrative that such an auditor-centered sound space serves?
As I have argued in the introduction, incorporating sound into the core of a theory of cinema necessitates a new understanding of how cinema engages its audience in terms of audiovisual contract. A question of key importance raised in this chapter is how the future of sound technology can revise the ways in which raw sensations are related to the narrative fiction form that is considered paramount in mainstream filmmaking. It is my belief that an intensified form of audiovisual exposure such as the one described here cannot be sufficiently accounted for by an understanding of cinema primarily as a form of storytelling. Although a theory of the cinematic worldhood remains to be fleshed out, what I want to emphasize are the effects of the embodying sound space: instead of being narratively oriented (i.e., every sound needs to serve the story, or the psychological depth of characters) they present a world for us to behold, a world that holds its own inexhaustible mysteries. If the sense of being surrounded by sound sometimes has the effect of distancing the audience from the narrative progression, of setting them into a contemplative mood, all the better. For these sounds reveal to us a hitherto undeveloped and certainly under-acknowledged aspect of the cinematic experience. Instead of asking, as filmmakers or sound designers routinely do, “how exactly can we make the seemingly excessive sonic information relevant to the story?” there are other ways in which the sounds are relevant to us as film world travellers. It offers an intensified acoustic presence that makes us more aware of our own bodily location and orientation. Since the auditor’s body is involved in the production of cinematic space, the embodying space constitutes a radical departure from the traditional notion of cinematic experience as we know it; it is a case where the filmic world descends upon us.
Radical as it is, this mode of listening brought forth by new sound technologies maintains considerable continuity with old ones. Historically speaking, cinema sound’s conception of the auditor’s bodily location and orientation has been surprisingly consistent. This consistency is the result of one simple convention, or condition of cinema: the screen in front of the audience. The presentation of sound in cinema, therefore, has always had a basic distinction: the front vs. the rest. Indeed, for the first half of sound cinema’s history, cinema sound has been a strictly frontal business (if we choose to ignore what happens in the rows behind us). The various stereophonic experiments from 1940 onward, with the exception of quadraphonic sound, all distribute considerably more power to the front of the auditorium. In contrast, the back is optional, puny in terms of volume and often subjected to manual activation. It is regarded, even to this day, suitable only for bonus effects. After all, it is initially called the “effect channel” instead of the “surround channel.” Dolby Stereo’s implementation plan is in line with this discrimination: the back channel is “kludged”49 together only upon the request of the producers of A Star is Born (1976), the first film that is released in that format.
Does this asymmetrical distribution of sound between the front and the back has any biological basis? Does it correspond to our natural listening? According to Ioan Allen, a key figure in the development of Dolby, the ear does appear to have an uneven distribution of what he calls “acoustic acuity” around a listener. On a vertical plane, sounds from the back may be as prominent from those from the front; but on the horizontal plane (which is where all speakers are located) frontal acuity is far more superior to the backside. “A sound system to accompany picture,” Allen concludes, “should allocate most of its resources to the horizontal plane. Indeed, the greatest attention should be addressed to audio across the front of the listener.”50
Regardless of how sound is perceived in natural environment and how future sound technologies can potentially revise the rules of psychoacoustics, a fixation of visual attention to the front remains an essential condition in the case of cinema. This condition has an almost devastating impact on the evolution of sound technology. In Michel Chion’s formulation, the screen, and the bright moving images on it, acts as a giant magnet51 that attracts sounds; it pulls them in and anchors them at various points of the image: lips, footsteps, all that make a sound. When sounds cannot be anchored to the screen, namely, off-screen, the fixation is still very much in effect: the sounds are perceived as hovering over the edge of the frame, their entrance anticipated at any moment (Chion calls it “in-the-wings effect”52). Even the lofty voice over, in Chion’s characterization, is located on “a sort of balcony”53 that is protruded but not detached from the edifice of the image. Such is the power of psychological localization: it answers the urgent question “where does the sound come from” in a most sensible, meaningful way, disregarding the low level cues54 gathered by auditory faculty.
When only one loudspeaker is actually used, this effect is a blessing: no need to arrange a beehive of speakers behind the screen and to switch the sound accordingly, which is precisely what the sound engineers in the 1930s naively believed as necessary. Given a certain amount of habituation, they soon discover, sound can be perceived as coming from “appropriate” sources on the screen, regardless of the actual spatial location of the source. The coupling of visual space and auditory space becomes not only automatic but also irresistible. Like an optical illusion where the eyes insist on seeing things that are actually not there, the ear will attribute sounds to locations distinct from their actual points of source. The fixed location of sound actually facilitates the visual spatialization because whatever the latter does, it will not be contradicted by the sound. Spatially speaking, then, the sound and the image are always in sync, albeit an asymmetrical sync where the spatialization of sound relies on the power of visual imagination. But when loudspeakers begin to expand from behind the screen into the wings, eventually reaching the far end of the auditorium, this magnetization becomes a compromise, if not a curse: the spatialization of sound reactivates the space of the auditorium, but the magnetic field of the screen now constitutes a field of distortion.
The challenge facing surround sound, therefore, is largely one of creating a new field of acoustic presence that at once expands, builds on and counteracts the constraining force of the screen. This dynamic is at work in the past as well as the future of sound spatialization. Indeed, current theories of surround sound can be said to have been primarily concerned with two kinds of issue: sound’s entrapment and emancipation. On the one hand, the magnetic screen is still very much in force, even in the age of surround sound, and often renders sound’s various escape plans futile. While occasionally sounds can escape this magnetic field by popping up distinctively in the surround channel, its invisibility may not be interpreted as off-screen. By escaping the powerful magnetic field it may also escape the diegesis: it is no longer interpreted as an event in the film, but rather, an event in the movie theater. The so-called “exit door” effect55 therefore illustrates the power of magnetization: it can be broken, but the price would be the filmic illusion itself.
On the other hand, it becomes increasingly urgent to articulate not only the relatively autonomous existence of sounds in relation to images, but also how this autonomy has changed, hypothetically, the ways in which images are constructed. Historically speaking, sound’s autonomy, which consists primarily of its ability to render space independently of what the images purport to show, has been increasing in a steady pace. From purely acoustic protocols or punctuations that serve little more than confirming the visual sound effects acquire their first independence by staying away from the screen, firmly. By mid-1930s characteristic sounds are used to convey the location of the scene, replacing or complementing the establishing shot. In films such as Applause (1929), Rope (1948), Rear Window (1954), Vertigo (1958) the noise of the city outside the window not only opens up the claustrophobic visual space, but specific sounds (siren, foghorn) can be strategically timed to punctuate subtle meanings in the dialogue.
The noises of the cars, the birds or other instantly recognizable acoustic settings are such that we do not demand to see what are actually producing those sounds—they are simply generically perceived as “city,” “countryside,” etc. Recognizing their purpose to demarcate a locale Chion calls it “territory sound.”56 Notice how this category of sound already exists in the age of radio space. Its perfect usage scenario, as we can sense from the above examples, consists of depicting a space that is over there, namely, beyond the screen-window. Surround sounds however can potentially move this space back inside the auditorium: by targeting the audience with sonic waves coming from all directions, it gives the sensation that the audience is located within the sound space. Instead of suggesting a space that is more or less behind the screen, as if we were listening through a keyhole, it is now unfolded in the space of the auditorium.
One of the first to theorize on the new sound space enabled by surround sound technologies, Michel Chion has coined the term “superfield.”57 He defines it as “the space created, in multitrack films, by ambient natural sounds, city noises, music, and all sorts of rustlings that surround the visual space and that can issue from loudspeakers outside the physical boundaries of the screen.”58 It is probably problematic to say that these sounds “surround the visual space.” What they surround is the audience; the visual space, represented by a two dimensional screen, remains beyond the screen. However, we might say that these two spaces, now phenomenologically distinct, remain somewhat congruent. Indeed, one of the main uses of surround sound has been a reserved seat for territory sounds, for which audiovisual congruence can be easily achieved. Technically speaking the surround channels also liberate the front channels from rendering ambient effects and therefore eliminate its potential contamination with dialogue.
Clearly recognizing the emergence of a new sonic playground, Chion is nevertheless conservative in estimating its aesthetic value. He claims that the superfield “provides a continuous and constant consciousness of all the space surrounding the dramatic action.”59 This description seems to be similar to what he characterizes earlier as “passive offscreen sounds,” which “provides the ear a stable place.”60 Echoing Chion’s theory, Žižek articulates this view eloquently as “Bombarding us with details from different directions (Dolby stereo techniques, etc.), the soundtrack takes over the function of the establishing shot. The soundtrack gives us the basic perspective, the “map” of the situation, and guarantees its continuity, while the images are reduced to isolated fragments that float freely in the universal medium of the sound aquarium.”61 William Whittington, following this line of thinking, goes to the extreme by claiming a complete reversal of the traditional hierarchy of image-sound: “When ambient sound effects are deployed, cognitive geography is offered through echoes, reflections, and reverberations, which create spatial anchors or cues. Spaces, then, can exist without image-based referents. No image is necessary.”62
\[...\]not just these background sounds but the entire aural world of the film, including sound effects, dialogue, and diegetic music.”64
While the notion of ultrafield makes a meaningful step towards theorizing surround sound, especially in its taking account of “a much broader array of sonic elements,” its logic of reasoning is not entirely convincing. Clearly, the case of Saving Private Ryan is a special one where sound is designed precisely to express intense disorientation, that is, what the audience believes (or, is made to believe by the film) a US solider would feel that day on the beach. The film does this by shifting aural perspectives abruptly. Not all films featuring surround sound need to convey that particular sense to the audience. By basing his theory on this special case Kerins’s theory amounts to a radical proposition that attempts to characterize contemporary Hollywood soundtrack as essentially discontinuous. The case of Saving Private Ryan constitutes for him, ultimately, a norm instead of an exception. Discrediting Chion and Žižek for overestimating Dolby Stereo’s technological capacity and proclaiming that the multiple discrete channels of digital surround sound can do much more, Kerins nevertheless subscribes to the very same idea proposed by the two, that a certain kind of stability has led to the less structured image. Instead of the ambient sound’s wallpaper like stability, Kerins posits that the stability of ultrafield lies in its close coupling of sound and image space. “The ultrafield”, he claims, “seeks not to provide a continuous aural environment, but rather to continuously provide an accurate spatial environment where aural and visual space match.”65 The purpose and meaning of this kind of close matching Kerins attributes to the familiar ideological critique of the apparatus theory: “while the ultrafield in one sense draws attention to cuts in the image track, it simultaneously hides the constructed nature of the filmic apparatus by creating a tighter cohesion between image and sound; this in turn ties the fragmented image track together.”66
Both the theories of superfield and ultrafield articulate specific ways in which sound can reactive the auditorium space. In fact, both modes can find use in one film, or even in one sequence, including the said D-day sequence. Both theories’ attempts to establish a causal relation between sound and image, however, are less successful. The underlying assumption of both theories is that the construction of a coherent diegetic space is a must: either this requirement is fulfilled more by the images or more by the sounds becomes the point of contention. Similar to some psychological experiments that set out to test the perceptual validity of continuity editing, the assumption takes continuity editing’s one particular function (construction of space) for granted. Unfortunately, not every sequence in a film, even in the case of classical Hollywood cinema, subjects entirely to the principles of continuity editing. Likewise, although it is factually true that in post-classical commercial cinema the images are more fragmented (more close-ups, shorter ASL) and less stable (constant camera movements), it is questionable whether such stylistic changes (what David Bordwell calls “intensified continuity”) are the consequence of surround sound’s enhanced storytelling capacity.
Bordwell has argued that the intensified continuity is developed from and is still based on a core of continuity system. It seems that the same argument is made in the realm of sound by Chion and Kerins. An intensified soundtrack, too, needs to have a core of continuous sonic presence. The embodying space offers new ways to map out sonic movements; but it also incorporates a baseline of stable ambience, a proved way that the sound space can work with the frontal images. The increasing tendency to break sound space into fragments and to use vectorized sounds that selectively activate the space of the auditorium in today’s sound mixing can be understood a spatial form of sonic attraction that embraces the need for constant perceptual roughening and acceleration. Much like the case of employing 20th century music in science fictions where the unfamiliar tonality and orchestration serves to highlight the alien atmosphere, the case of an intensified sound space capitalizes on our natural reaction to discontinuity and translates it into a sense of pleasure based on dazzlement and disorientation. Yet the disorientating effect is constantly compensated what is familiar, or stabilized modes of representation. That the sound is a source of comfort and stability, that it helps us to overcome the ghostly discontinuous images is a thesis made over and over again in the history of cinema.67 The notion of superfield and ultrafield can therefore be regarded as its most recent reiteration.
More Space for Sound: Cinema Sound in the Digital Transition
Having examined the two dominate paradigms of sound space in cinema, I now take a closer look at what the still evolving sound technology aims to offer in the near future. More specifically, I want to explore how the industry-wide transition from photochemical medium to digital bits, namely, the “digital turn,” has changed sound space. As dramatic and sweeping as it may look in the lens of a century, the transition in reality consists of a large quantity of asynchronous, chaotic and almost self-contradictory tidbits of changes that emerge in a long span of time. What is sound’s role in this transition? Where does Dolby Atmos, Dolby’s new cinema sound platform, come from? What are the problems it addresses? What kind of future does it predict and facilitate? What are the creative as well as practical options afforded by the new sound technology that are previously unavailable? These are the questions this section seeks to raise, if not answer. I hope to show that, although Dolby Atmos may not yet present a radically different, let alone advantageous cinema sound experience for the majority of audiences, there are enough incentives from the industrial side to ensure its eventual mainstream adoption. Even if Atmos were only adopted in limited but prestigious venues, it represents a no less than revolutionary expansion of the possibilities of representing sound in cinema, the impact of which will be felt in years to come. Ultimately, what Dolby Atmos and its competing platforms offer is a new contract between sound and space, a potentially radical reformation of the cinematic experience in its sonic dimension.
Why does the digital conversion have such a transformative power to sound? To answer we need to go back to an age-old problem of space—as in “storage space.” Despite the dazzling historical diversities of sound recording media (wax cylinder, phonographic disc, optical stripe, magnetic tape, DAT, CD, etc.), all representational media of sound have exhibited a similar physical inertia that resists change at every turn. New formats may offer better definition; but invariably they also require costly hardware upgrades that the exhibition side of business is often unwilling to undertake. This creates a threshold of innovation where the benefit of the new technology has to be demonstrated as significant or preferably overwhelming for the exhibitors to make the jump. In the realm of film sound historically only two such innovations have penetrated both the production and the exhibition section on a global scale: the mass adoption of mechanically synchronized sound since 1927 and that of Dolby Stereo half a century later. In both occasions, the industry, especially the exhibitor section, has to wait for blockbusters such as The Jazz Singer or Star Wars68 to motivate them to invest in the future. Magnetic tape, in contrast, failed to reach the exhibition sector and remains primarily an internal process for the production industry until eventually being replaced by various digital storage media.
What does digital cinema mean? Our answer sheet to this question should be at least divided into four quadrants: on the one side the industry and the audience, and on the other, image and sound. On the image side the attitudes of the industry and those of the audience can be quite different, if not opposed. For the industry the benefits are immediate and far from negligible: the distributors may see a significant saving on making and shipping release prints,69 or a tighter control of piracy;70 the exhibitors, too, benefit from reduced labor and maintenance cost.71 But for the viewer the digital imagery, at least in its current shape, suggests a suspicious surrogate. One sees a good deal of skepticism against the relentless steps in moving towards a total digitalization. Gone are the warmth and softness, the exquisite tonal range, the almost imperceptible shaking and flickering of the image; gone is the swarm of grains that mesmerizes the attentive viewer at the subconscious level. The elegy sung by the cult followers of celluloid can go on and on.72 Digital imagery, one might say, does not yet possess a conclusive advantage over its analog counterpart and often needs to be sold in tandem with the attraction of 3D.
The story on the sound side is quite different. If for the eyes digital imagery has so far signified convenience versus quality, there has been little question about the perceived superiority of digital sound. No one wants to revert back to Vitaphone discs; and I have yet to meet any aficionado of the baseline hiss of the pre-Dolby optical soundtrack. In fact, if going digital is inevitable, then the sound was already ahead of the image for some years. Since the 1990s several digital sound formats have been in circulation: Dolby Digital, Sony Dynamic Digital Sound and DTS. These discrete multi-channel sound encodings reside in the previously unused parts of the filmstrip, side by side with the analog image. Together with the Dolby Stereo analog double wavy lines, the quad-format 35mm print leaves only the space between sprockets on the right side of the strip for future expansion. Indeed, the problem of space (pun intended) has always existed for sound, for the film medium was originally conceived for image alone. When sound finally came to the celluloid strip in the late 1920s, this chatty guest has to be content with what room there is left. It has to bunk up on the side, between and outside sprockets and its expansions quickly hit the ceiling. Since the sound and the image require different optimum exposure and developing, the soundtrack often suffers if the film is to be processed with best image quality in mind. Throughout the years, several formats that promise more space for sounds are proposed, which invariably involve either additional strips (Fantasound, Cinerama, IMAX73) or physically larger strip (Todd-AO).
In all current accounts of the transition to digital cinema and the history of Digital Cinema Initiative (DCI), there is virtually zero mentioning of sound. This is partly because, for all of DCI’s successful or unsuccessful maneuvers, sound seems never to be an issue. It is so largely because sound has already been digital, or at least digital-friendly. Although still printed as analogue patterns on the film strip, the three formats in a quad soundtrack (Dolby Digital, SDDS, DTS) are essentially digital bits, much like if someone writes down a long string of 0s and 1s on a piece of paper. Yet the new format adopted by DCI, the Digital Cinema Package (DCP), gives the soundtrack what it has dreamt for decades. Native digital storage for sound is a godsend. Both Dolby Digital and SDDS present soundtracks that, due to the lack of space on the optical strip, have to be compressed. The resolution of the scanner, as well as the space left on the celluloid (or polyester), determines the amount of auditory bits that can be registered on the strip. DCP, however, essentially offers unlimited space for sound. And at least theoretically this space needs not necessarily be smaller than the image’s portion. It is therefore no coincidence that as soon as DCP achieves wide acceptance, Dolby proposes a new format that radically expands the space reserved for the soundtrack and takes full advantage of cinema’s new digital offer.
The advent of Dolby Atmos can be interpreted in many different ways. The corporate marketing of course would have it to signify Dolby’s commitment to the continuous advancement of sound technology. Alternatively, it could be said that sound comes to the rescue of 3D, since by 2012 the initial excitement and impressive box-office results brought in by 3D had significantly waned. When it looks like 3D has run its course again and is on its way to repeat its failure in the 1950s, Dolby Atmos joins the digital revolution as a fresh incentive. Needless to say, Dolby’s opportunist strategy works out exceedingly well in this case. DCI began working in March 2002. After extensive testing and negotiation it finally cracked the problems of image resolution, security, funding and published its final report in July 2005. It took another 6 years and a series of 3D digital blockbusters promised by George Lucas, James Cameron, Peter Jackson and Robert Zemeckis to convert the exhibitor section. By early 2012, over half of the 13700 screens in the world have been converted.74 Almost immediately, in April 2012, Dolby announces its new sound platform Dolby Atmos. The timing couldn’t be more perfect—it is as if sound was waiting all this time for the image to conquer the theaters; the moment the feat is accomplished, sound jumps onboard and reaps the benefit.
Dolby’s more than clever marketing maneuver (as always) aside, this new iteration of sound technology is indeed potentially revolutionary. Dolby Atmos challenges previous sound representations on two fronts. First, while the previous iterations of digital surround sound systems extend sound considerably outside the shelter of the screen, all these systems specify a horizontal plane (the level of ear) onto which sound is delivered. To expand from this plane to the whole upper hemisphere seems to be a natural move. The overhead direct sound sources can be useful in many real world scenarios (such as the many jungle scenes in Brave); in addition, overhead sonic movements (such as helicopter and dragon) can be significantly better implemented. But Dolby and its main competitor Barco have significantly different visions in terms of how the expansion should be carried out. While Dolby recommends two lines of speakers be installed on the ceiling extending from the screen all the way to the back of the auditorium, Barco has proposed its own “immersive” sound platform called Auro-3D that boasts a three-layer approach. Above the baseline of the established 5.1 layout, the Auro system adds a height layer that consists of four channels on each corner, appends one more channel on the same height level, but above the center speaker (similar to IMAX), and finally puts a “voice of God” channel on the ceiling.
The most significant and controversial change brought by Atmos, however, is not more channels or more speakers. It is an object-based approach to sound representation. Having existed in game audio for years cinema sound has gained access to the format only after its recent self-emancipation from the optical strip, as discussed above. All signs show that object-based sound is on its way to become a critical component of the new industrial standard for immersive sound.75 While the channel-based representation depicts sounds in a composite fashion and then feeds each sonic composition to speakers located in specific points in the auditorium, an object-based representation isolates “sound objects” as they exist in the film world (a bullet, a voice, a plane, a bird, etc.). A sound object is, in terms of implementation, an audio recording bundled with metadata that define its size, location in the virtual space as well as its trajectory of movement (if any). This metadata is composed in a virtual space independent of any auditorium and recorded in the DCI package as such. It is then rendered in real time, in the real auditorium, by a dedicated piece of software (Rendering and Monitoring Unit) that takes into account of the specific configuration of the speaker layout and acoustic specificities of the auditorium.
The notion of sound object is not a whimsical idea in service of self-perpetuating corporate interest. Instead it represents a major step taken by the industry to solve the perennial technical problems of the channel-based sound practice in its worldmaking effort. It offers a much more intuitive (hence superior) platform for sound designers to map sound to the filmic world. It is as if previously one needs to make a sculpture by poking from six or eight openings (channels) of a giant box. Now the box is no longer needed and one is free to move sound to wherever it needs to go. A sound is defined directly to (an abstraction of) the auditorium space, without the intermediary of channel compositing. This said, the new technological representation system of sound finds a way to reimagine its past. The Atmos system declares that the object-based model of sound, innovative as it is, is not enough to simulate our listening experience and needs to be complemented by other models. Notably it needs a field-based representation, termed as “bed”, which corresponds to sounds that do not originate from specific points in space but rather from a certain direction and have an ambient quality in them. Coincidently or not, the notion of bed seems to be precisely what the now archaic notion of channel can do best. The ingenuity of this conceptual maneuver consists of understanding the backward compatibility not simply as a technical necessity, but also as an opportunity to recognize the value of previous iterations of technology.
The object-based representation of sound makes a radical step towards redefining the relation between sound and space in cinema and it deviates in fundamental ways from the established channel-based practice. Despite the fact that channels will eventually lead to spatialization, the concept of channel has no spatial bearing in itself. To reproduce space through channels requires precise coordination of sounds recorded in different channels to phase in and out. This practice is time consuming and can function only through extensive codes and conventions that dictate how sounds are allowed to move in an illusionary space. In addition, any modification to the actual sound will necessarily have an effect on its spatialization. Now spatialization becomes an independent parameter that can exist even before the sound object itself is well defined. This potentially might create a labor division between sound designer (one who specializes in coming up with a particular sound) and panner. The latter profession refers to the labor-intensive work of panning a sound object in space according to what the image suggests at any given point of the film. Such a division is perhaps comparable to what in the animation industry known as conceptual artist and animator.
There are of course sound objects that remain fixed in space. But the notion of the sound object is most advantageous when addressing moving sound, which has always been a headache for multichannel practice. Suppose an airplane, or, as in the opening of Star Wars: New Hope, a spaceship enters from the upper side of the frame and moves towards the depth of the image. For sound to collaborate in this illusion of overhead movement it has to come from the rear speakers and moves to the front of the auditorium. Yet the speakers through which the sound needs to traverse are most likely driven by different channels. Instead of one sound moving from one place to another, what actually happens is that the mixer manually attenuates the said sound in one channel until it is completely gone, and in the meantime raises it in another. It is a process reserved for very special occasions, not only because it is time consuming, but also because its effect is so delicate that any misstep could potentially ruin the audience’s experience instead of enhancing it. The difficulties and imperfections involved in sound panning deter the extensive deployment of such technique, or at least make it too labor intensive for less prestigious productions. With the aid of digital tools proper fading as well as phase effects can be added to compensate for the “ping-pong” effect.76 And with enough attention to detail a reasonably satisfactory panning effect may be achieved in the mixing room. But ultimately the success of panning depends on to what extent this carefully crafted sonic trick can be mapped onto the target auditorium. Unless the target auditorium is reasonably similar to the mixing room, the effect of channel-based moving sound cannot be guaranteed. In reality, in different theaters, the speaker configuration (number, precise location, frequency characteristics) and room acoustics can vary substantially and the spatial dimension of sound suffers most from this kind of deviation.
To address this problem, George Lucas initiates the THX and TAP (Theater Alignment Program) that aim at calibrating the theater acoustics and subjecting them to stringent criteria, so that they can sound similar to Hollywood mixing studios. An alternative solution is offered by IMAX.77 In the early days IMAX sound was often mixed onsite,78 thereby creating an ideal situation of sound mixing. Even today an IMAX soundtrack might still be customized to a certain extent so that special sound mix is produced for an individual theater, as IMAX theaters often vary considerably in terms of acoustics. Needless to say, this is hardly a model for the rest of theaters in the world. The unreliable reproduction of spatialization, I believe, is the root cause of an extremely conventional implementation of surround sound in cinema.
The object-based representation of sound proposes to deal with this problem by dividing the task into two parts. The first part strives to maintain a high level and consistent spatial representation of sound where the auditorium is abstracted into a rectangular box. A sound object’s position, size, and movement are all defined in relation to this virtual space. The second part then translates this three dimensional map of sound into a deployment strategy for a real auditorium. If an actual speaker exists in the location specified, for instance, then the sound will be sent to that particular speaker. Alternatively, a phantom image may need to be generated in that spot to create the illusion that a sound indeed emits from the gap between speakers.
A channel doesn’t know what it is that is being played because it can only play all or nothing. Working almost like a photocopier it makes no distinction to what is actually in the image that is being copied. Hence there is one way to evaluate its effectiveness: fidelity. The best sound is one that sounds identical to what the sound mixer hears in the mixing room. The only circumstance that can ensure that effect, namely, the mixing room being acoustically identical to the actual theater, is nevertheless highly implausible. At best it is an approximation, the effect of which ranges from intolerable distortion to passable reproduction. The object-based representation of sound introduces an element of interpretation, and hence, flexibility. A sound object needs to be interpreted, instead of simply being played. We may understand it as a speaker system that has acquired a grain of “intelligence,” and knows how to rephrase, instead of simply repeat what it is told to say. Notice that the word intelligence is in quotation mark. This is because the interpretive unit (RMU for Dolby Atmos) is essentially an algorithm that knows nothing but following set rules. The rate of its success depends on many factors: the complexity of the algorithm, the radicalness of the designer’s attempt, the degree to which the auditorium is calibrated and so on. But it would be all too convenient to fault the technology for realizing the human intention less than perfectly. As in the case of self-driving cars, relinquishing human control to machines is often perceived as negative and dangerous. And it is very likely that the initial results obtained from this approach can be less than perfect. Nevertheless the algorithm can be constantly improved (at minimum cost) and the approach may soon be able to retain an unprecedented degree of sound designer’s compositional intention, even much more so than its channel-based predecessor.
A most unexpected benefit of object based sound authoring, one that is less interesting to engineers than to film students and scholars, is that the soundtrack’s release format contains such information. The Atmos printmaster is essentially an ultra-container that packages the Atmos native mix together with backward compatibility mixes such as 5.1 and 7.1. Since there is no “flatten out” process as in the case of channel-based sound, at least in theory the multiple sound objects and their metadata can be recovered under proper authorization. The original sound design can thus be examined and studied closely.
Clearly, Dolby envisions Atmos to be multiple things at the same time. It is a theatrical attraction that boasts more immersive sound; but it is also an industrial innovation that is almost transparent to consumers. Even if an average moviegoer does not see the immediate benefit of paying more for sound, for sound designers, Atmos certainly opens up new creative possibilities and automatizes the rather repetitive process of authoring for multiple release formats. The platform allows the sound designers to compose sounds in spatial terms without even knowing or worrying about the speakers and rooms in which the sounds will be played. This literally can mean composing for the future, for a precise spatial definition of sound can be registered into the system even when the current generation of theater sound reproduction system cannot fully render it. For instance a sound object can be defined as traveling from left to right across the upper side of the auditorium. In current status of the technology the object will inevitably have to jump from the surround plane to the overhead speakers—a huge acoustic gap that in the future might be progressively filled in to make the transition smoother. This case, albeit imaginary, reminds us of the restoration of the soundtrack of Touch of Evil. Because the restoration team was able to work with the original six-track magnetic tape, which registered a higher resolution of sound than its original theatrical release (on optical print), the restored soundtrack not only offers a closer rendition of Welles’s authorial intention, it also restores the magnetic-grade quality of sounds. When we listen to the new Touch of Evil, therefore, we hear better-defined sounds than those heard by an audience from 1958.
Conclusion
“Whereas it was once a given that vision was the dominant and most nuanced (and hence poetic) element of cinematic experience,” Vivian Sobchack wrote a decade ago, “of late that dominance has been challenged by shifts of emphasis and attention in both sound technology and our sensorium.”79 This chapter takes the above observation seriously and gives a specific example of the coevolution of sound technology and human sensorium: the sound space. I have explored here not only how a sense of space is constructed in cinema through sound, but also how such practice evolves in the history. Tracing sound technology’s historical negotiation with this third dimensionality, I believe, it is of foremost importance to sketch out (even if it means to do it in the broadest strokes) the modes of experience that encapsulate film style and technology. Sharing some of the same concerns with sound studies in general, this chapter studies cinema’s particular blend of sound space that hinges on a specific set of conditions: frontal presence of moving images; collective viewing under a controlled and dedicated auditory space; finally, a specific cluster of audiovisual technologies that will eventually migrate to HDTV broadcasting, console gaming and other unforeseeable home & mobile audiovisual media.
Needless to say, sound’s quest in space is not a journey that belongs exclusively to cinema. Contemporary ecological soundscape projects, sound art installations, augmented reality games and aural architecture all have embraced sound’s manifested multiplicity in space as a key element to unleash their affective and rhetoric power. Consequently spatialization has become a key issue in sound studies. The sonic qualities of space, the spatial properties of sound, and the myriad other links between space and sound have produced a rich vein of literature that spans the disciplines of sonic art, architecture studies,80 cultural history of listening, phenomenology of listening,81 radio studies82 and many others.
Yet the intersection between sound and space has a special significance in cinema. For more than half of a century the development of cinema sound technology has been indebted (an euphemism to “derivative”) to music industry, radio industry, and many other industries that cinema maintains perennial affinities; cinema sound is often perceived as significantly lagging behind what these other industries are capable of. But when it comes to exploiting of sound’s third dimension, cinema was and still is, a fearless harbinger. Limited by the broadcast bandwidth and issues of portability, both the radio and music industry struggle to push the boundary of sound spatialization beyond stereo.83 In contrast, the development of cinema sound, after decades of stagnation, has taken off to an entirely new level of sophistication. Compared to the very limited reach of museums and art galleries, it is in the movie theaters equipped with surround sound where today’s mass population is daily exposed to carefully crafted and curated pieces of sonic installation. Admittedly, most of the time, walking into a multiplex and paying for the admission of a blockbuster movie won’t get us much more than a ninety minutes sonic massage. But occasionally a film would raise the question of sound and space in such an intriguing manner that it invites much broader questions to be asked. In the next chapter, I shall consider one of the most technically immersive films of the recent decade and contextualize its sonic achievements.
Gaston Bachelard, The Poetics of Space, trans. Maria Jolas (Boston: Beacon Press, 1994), 176. ↩︎
Chion, Audio-Vision, 79. ↩︎
Vivian Sobchack, “When the Ear Dreams: Dolby Digital and the Imagination of Sound,” Film Quarterly 58 (June 2005): 2, doi:10.1525/fq.2005.58.4.2. The Phrase “when the ear dreams” is by Gaston Bachelard. ↩︎
Ibid., 8. ↩︎
Ibid., 3. ↩︎
Sobchack’s description of the sonic motions of these trailers recalls Germaine Dulac’s notion of pure cinema, that is, a cinema of pure motion: “a visual symphony, a rhythm of arranged movements in which the shifting of a line or of a volume in a changing cadence creates emotion without any crystallization of ideas” Quoted in Gunning, “Moving Away from the Index: Cinema and the Impression of Reality,” 38. ↩︎
Sobchack, “When the Ear Dreams,” 8. ↩︎
For a detailed account see Altman, Silent Film Sound, 157–8. ↩︎
The book is published in English by Farber in 1933 as Film. It contains a whole section called “Sound Film” that does not appear in the 1957 version. The titles of the essays in this section include: “The genesis of sound film; Is sound film the same as the stage? Language; Radio plays; The hundred-per-cent talkie; Should sound film be stereoscopic? From silent film to sound film; Parallelism and counterpoint; Asynchronism; Film and music; Not sound film but film.” ↩︎
Rudolf Arnheim, Film (Farber and Farber, 1933), 235. ↩︎
Lucy Fischer, “Applause: The Visual and Acoustic Landscape,” in Film Sound: Theory and Practice, 1985, 232. ↩︎
Indeed, Thomas Elsaesser seems to nod in Arnheim’s direction when he proposes the point as one of the four “counterintuitive claims around 3-D”: “At least since Dolby noise reduction systems were introduced, sound has been experienced as three-dimensional, “filling” the space the way that water fills a glass but also emanating from inside our heads, seemingly empowering us, giving us agency, even as we listen passively. In the cinema, the traditional hierarchy of image to sound has been reversed in favor of sound now leading the image or, at the very least, giving objects a particular kind of solidity and materiality.” Thomas Elsaesser, “The ‘Return’ of 3-D: On Some of the Logics and Genealogies of the Image in the Twenty-First Century,” Critical Inquiry 39, no. 2 (January 2013): 227, doi:10.1086/668523. ↩︎
Mary Ann Doane, “The Voice in the Cinema: The Articulation of Body and Space,” Yale French Studies, Cinema/Sound, no. 60 (1980): 33–50. ↩︎
Emily Thompson, The Soundscape of Modernity : Architectural Acoustics and the Culture of Listening in America, 1900-1933 (Cambridge, Mass.: MIT Press, 2002), 259. ↩︎
Chion, Audio-Vision, 100. ↩︎
Sobchack, “When the Ear Dreams,” 10. ↩︎
Rick Altman, “Sound Space,” in Sound Theory, Sound Practice, 1st ed. (Routledge, 1992), 46. ↩︎
An article titled “On the Recent Development in Dynamic Loud Speakers” (Transactions of the Society of Motion Picture Engineers 1928, 12:836-844) indicates that the perforated screen has been in use but has not become a standard. ↩︎
C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal 27, no. 3 (July 1, 1948): 379–423. ↩︎
It might be difficult to identify the first occasion where the term channel is used instead of track. But JSMPTE publishes an article in 1952 titled “Multichannel Magnetic Film Recording and Reproducing Unit,” describing a machine that records three magnetic tracks on one 35mm film and effectively eliminates crosstalk. ↩︎
Gianluca Sergi, The Dolby Era: Film Sound in Contemporary Hollywood, Inside Popular Film (Manchester: Manchester University Press, 2004). Jay Beck, “A Quiet Revolution: Changes in American Film Sound Practices, 1967–1979” (PhD Thesis, University of Iowa, 2003). Francis Rumsey, Spatial Audio (CRC Press, 2012). Mark Kerins, Beyond Dolby (Stereo): Cinema in the Digital Sound Age (Indiana University Press, 2010). ↩︎
See for example the section titled “the growth, decline and rebirth of multi-channel sound” in Jay Beck’s dissertation and Mark Kerins’ largely derivative but more sensationally titled “Cinema’s hidden multi-channel history and the origins of digital surround.” ↩︎
Kerins, Beyond Dolby (Stereo), 23. ↩︎
According to Jay Beck, around 1930 both A. C. Keller and Alan Blumlein have individually attempted to record stereophonic sound on phonograph. See Beck, “A Quiet Revolution: Changes in American Film Sound Practices, 1967–1979,” 64–66. ↩︎
See Vytis Puronas, “Sonic Hyperrealism: Illusions of a Non-Existent Aural Reality,” The New Soundtrack 4, no. 2 (2014): 181–94, doi:10.3366/sound.2014.0062. ↩︎
Jean-Marie Straub and Danièle Huillet, “Direct Sound: An Interview with,” in Film Sound: Theory and Practice, 1985, 150. ↩︎
Variety reports the screening features a “cacophony of sound comprised of on location sound embedded in the newsreel, the applause of the audience and festival noise effects contributed by the Roxy orchestra.” quoted in Ross Melnick, American Showman: Samuel “Roxy” Rothafel and the Birth of the Entertainment Industry, 1908-1935, Reprint edition (Columbia University Press, 2012), 285. ↩︎
Rudolf Arnheim, Film Essays And Criticism (Univ of Wisconsin Press, 1997), 30. ↩︎
Béla Balázs, Theory of the Film; Character and Growth of a New Art (New York: Dover Publications, 1970), 170. The passage, like many others (in quotation marks), has been directly lifted from The Sprit of Film, written in 1930. The same passage in different wording can be found in the recently published Béla Balázs, Béla Balázs: Early Film Theory : Visible Man and The Spirit of Film (Berghahn Books, 2010), 185. I have chosen the old translation out of personal preference. Like Rene Clair, Balázs justifies this extended quotation by claiming that sound film aesthetics has not made significant aesthetic progresses in the two decades that separate the two books. ↩︎
Balázs, Theory of the Film; Character and Growth of a New Art, 206–7. ↩︎
Some recent reports of the Dolby Atmos system point precisely at this kind of perception. See both http://designingsound.org/2012/11/my-impressions-of-dolby-atmos and http://designingsound.org/2012/11/ambiences-with-dolby-atmos. Accessed June 17 2014. ↩︎
It has been argued that the direct recording (son direct) was preferred in France throughout the 1930s. See O’Brien, Cinema’s Conversion to Sound. In a recent talk (SCMS 2014) O’Brien suggests that there are evidences indicating that this practice was also quite present at UFA, Germany. ↩︎
Walter Murch, “A Tremendous Piece of Filmmaking’ - Walter Murch on ‘Touch of Evil,” Parallax View, accessed July 2, 2015, http://parallax-view.org/2008/10/07/walter-murch-on-touch-of-evil/. ↩︎
See his preface to André Bazin, Orson Welles: A Critical View, trans. Jonathan Rosenbaum (Los Angeles: Acrobat Books, 1992), 11. ↩︎
Kane’s debts to radio aesthetics is identified in Rick Altman, “Deep Focus Sound: Citizen Kane and the Radio Aesthetic,” Quarterly Review of Film and Video 15, no. 3 (1994): 1–33. According to Altman the film exhibits an acoustic grammar of bookending that is characteristically from the radio: each segment begins and ends with loud sound events. ↩︎
Walter Murch, “Touch of Silence,” in Soundscape, ed. Larry Sider (Wallflower Press, 2003), 87. Italic mine. ↩︎
Bazin, Orson Welles, 77–8. ↩︎
Though it is a huge step from the impressionistic and wishful thinking of previous descriptions of sound in this sequence, we have to rely on Altman’s subjective judgment on these seemingly objective values. ↩︎
I have some reservations on the use of term “mismatch” here. Altman notes an apparent conflict between the constant volume level of dialogue and the changes of shot scale and character distance (especially the penultimate shot of the sequence which is a close up of Mrs. Kane moving down to that of child Kane). But Altman seems to take the idea of matching all too strenuously. “Mismatch” would be a wrong description of either our or the film’s contemporary audience’s experience. While the images do construct space, the different shot scales also have other purposes. In my opinion the close-up serves as a sculptural highlight of sort and is consequently temporarily relieved from its duty to construct space. It doesn’t contain any new information regarding the space and where characters are located within it. Most likely, Welles has to cut in mid-phrase “That’s why…he is going to be brought where you can’t get at him” instead of using a simply dolly in and reframing because the dramatic effect of the shot demands lighting condition adjustments (characters might also move from their positions in the previous shot). Instead of a mismatch, the dialogue stitches the two shots together rather seamlessly. ↩︎
Here I am referring to the restored version. For further information see Murch, “Touch of Silence.” Tim Tully, “The Sound of Evil,” accessed July 1, 2015, http://filmsound.org/murch/evil/. Tony Grajeda, “A Question of the Ear,” in Lowering the Boom: Critical Studies in Film Sound (University of Illinois Press, 2008). Several interviews with Murch are available online. My study here has no intention to cover all the changes made in this restoration but limits itself to the soundtrack that accompanies the opening shot. ↩︎
Murch, “Touch of Silence,” 86. ↩︎
Ibid. ↩︎
Ibid., 88. ↩︎
Ibid. ↩︎
Murch, “A Tremendous Piece of Filmmaking’ - Walter Murch on ‘Touch of Evil.” ↩︎
I believe Murch is referring here to the process of recording source music that are in fact moving to reflect the experience of listening to car radios in American Graffiti. ↩︎
Tully, “The Sound of Evil.” In Murch’s description, the restoration entails not only a major reshuffling of the soundtrack, but also an “upgrade” in terms of sound resolution. Murch was able to transfer the sound recorded on the magnetic strips to DTS format, while the original audience has to listen to the inferior and reduced version of sound in the optical soundtrack. ↩︎
Christian Metz, “On the Impression of Reality in the Cinema,” in Film Language: A Semiotics of the Cinema (Oxford Univ. Press New York, 1974), 10. ↩︎
Beck, “A Quiet Revolution: Changes in American Film Sound Practices, 1967–1979,” 44. ↩︎
Ioan Allen, Matching the Sound to the Picture (Dolby Laboratories, 1991). Francis Rumsey however makes the following intriguing observation: “most people, when played binaural recordings of sound scenes without accompanying visual information or any form of head tracking, localise the scene primarily behind them rather than in front. In fact obtaining front images from any binaural system using headphones is surprisingly difficult.” Rumsey, Spatial Audio, 34. ↩︎
Chion, Audio-Vision, 69–70. ↩︎
Ibid., 83. ↩︎
Ibid., 68. ↩︎
Rumsey details many of these cues: interaural time difference, amplitude and spectral cues, binaural delay and various forms of precedence effect. Rumsey, Spatial Audio, 21–33. ↩︎
The said effect is observed when surround channels play a sudden and loud sound that the audience (notably the early surround sound audience) would attribute to the exit door of the movie theater. Kerins discusses this issue in Kerins, Beyond Dolby (Stereo), 158–9. ↩︎
Chion, Audio-Vision, 75. ↩︎
The original French term is “superchamp,” which clearly indicates its association with the screen (champ). Like offscreen (hors-champ), it is defined in relation to the screen, which limits its autonomy. ↩︎
Chion, Audio-Vision, 150. ↩︎
Ibid., 151. ↩︎
Ibid., 85. ↩︎
Slavoj Žižek, Looking Awry: An Introduction to Jacques Lacan through Popular Culture (MIT press, 1992), 40. ↩︎
William Whittington, Sound Design and Science Fiction (University of Texas Press, 2007), 126. The scene that Whittington refers to is the trash compressor scene in Star Wars: A New Hope. ↩︎
Kerins, Beyond Dolby (Stereo), 92. ↩︎
Ibid. ↩︎
Ibid. ↩︎
Ibid., 97. ↩︎
Theodor W. Adorno and Hanns Eisler, Composing for the Films (New York: Continuum, 2007). Drawing from Guy Rosaloto, Chion uses the umbilical cord metaphor. Gorbman’s famous “unheard melody” rehashes the same idea. ↩︎
Dolby’s success at this point can be almost considered cheating: the spectacular effect exhibited by the 70mm version of Star Wars is not supported on its standardized 35mm incarnation. For a detailed and informative account of the sound systems involved in this film see Beck, “A Quiet Revolution: Changes in American Film Sound Practices, 1967–1979,” 455–470. ↩︎
Even with the Virtual Print Fee of 800 per film passed to the exhibitor, it is still considerably less than the 1500 estimated for the real print. ↩︎
Off-hour screening, title switching and telecine have always been realistic objects of paranoid for distributors. Now such anxieties can be safely eliminated. In addition to the complex security measures that ensure a file can only be played after having been identified, authorized and monitored, the system includes a fancy way to trace, if not to prevent piracy: “a unique watermark, invisible onscreen, would be picked up by a camcorder and betray the site where the pirated video had originated.” David Bordwell, Pandora’s Digital Box: Films, Files, and the Future of Movies (Irvington Way Institute Press, 2013), 61. ↩︎
The task of the projectionist is fully automatized and remote controlled. Scheduling is done through drag and drop and maintenance is outsourced to third party Network Operating Center. ↩︎
Roger Ebert for instance writes, “Film carries more color and tone gradations than the eye can perceive. It has characteristics such as a nearly imperceptible jiggle that I suspect makes deep areas of my brain more active in interpreting it. Those characteristics somehow make the movie seem to be going on instead of simply existing.” http://www.rogerebert.com/rogers-journal/why-im-so-conservative. Accessed Apr 19 2014. ↩︎
Since 1988 IMAX has switched from the six channel magnetic strip to external CD or harddrive based solution similar to DTS. ↩︎
To follow up on the report a Screen Digest article published in June 2013 posits that 90% of world screens will be digital by the end of 2013 and the conversion is “approaching the end game.” ↩︎
SRS Labs, acquired by DTS in 2012, has also developed an object-based sound platform called Multi Dimensional Audio (MDA). MDA resembles Atmos in many ways, but boasts its “openness” as opposed to Dolby’s proprietary standards. But as the time of this writing no film is mixed using the MDA format; nor does the platform has any installation base. Barco initially was clearly against the object-based sound (see Barco’s whitepaper for an extensive critique). But with SMPTE’s immersive sound study group actively promoting the concept, Barco’s stance is now rather ambiguous. ↩︎
Sound perceived as bouncing from one speaker to another, with a gap in the middle. ↩︎
IMAX has maintained the six channels legacy of Cinemara since 1960s. A top-center channel/speaker is added for its pronounced vertical dimension of the screen. Recently IMAX is reported to be moving to a 9.1 format. http://www.homecinemachoice.com/news/article/the-sound-of-imax/16250. Accessed Apr 2014. ↩︎
Allison Whitney, “The Eye of Daedalus: A History and Theory of IMAX Cinema” (University of Chicago, 2005), 24. footnote 4. ↩︎
Sobchack, “When the Ear Dreams,” 2. ↩︎
Barry Blesser and Linda-Ruth Salter, Spaces Speak, Are You Listening?: Experiencing Aural Architecture. Colin Ripley, Ed., In the Place of Sound: Architecture|Music|Acoustics. Thompson, The Soundscape of Modernity. ↩︎
Jean-Luc Nancy, Listening, trans. Charlotte Mandell (Fordham Univ Press, 2007). Bachelard, The Poetics of Space. ↩︎
Douglas Kahn and Gregory Whitehead, Wireless Imagination: Sound, Radio and the Avant-Garde. Neil Verma, Theater of the Mind: Imagination, Aesthetics, and American Radio Drama (University of Chicago Press, 2012). Especially chapter 2 “Producing Perspectives in Radio.” ↩︎
The quadraphonic recordings and the more recent Super Audio CD and DVD-audio are rather curiosities that never had an impact on the mainstream market. ↩︎