Laura Feibush
University of Pittsburgh

This essay examines embodied listening behaviors—gestural listening—in the context of online video tutoring in university writing centers. Drawing on interviews and observations with writing center practitioners, as well as theoretical frameworks in sound and gesture studies, this essay examines the central role of listening in writing tutoring, and furthermore, how listening behaviors adapt to the virtual boundaries of video conferencing. The author argues that video tutoring highlights the expressivity of listening in the way that aspects of common video tutoring apparatuses (including the hardware of personal computers and software such as Skype or WCOnline) fragment and compromise the sensorium. In the face of this fragmentation, listening’s expressive qualities serve to overcome and cohere the disjunctures brought about by video tutoring technologies. Furthermore, cultivating the expressive aspects of gestural listening helps video tutoring succeed. The essay touches on eye contact, posture, and nonverbal backchanneling, among other embodied listening behaviors.


As more and more writing centers offer writing conferences conducted online via video conferencing software, the time is ripe for an investigation of how writing center pedagogies adapt to the screen— moreover, how writing center pedagogies are shaped by the rhetorics inherent to software and user interfaces. This essay focuses on aspects of listening in writing tutorials, particularly the physically enacted manifestations that I call “gestural listening.” I define gestural listening as the expressive, embodied ways that tutors and tutees manifest the otherwise silent, interior act of audition. I examine, for example, the way the participants in writing tutorials use their posture, eyes, hands, and even choices about where to sit as communicative tools, and as ways to both exert and subvert power. While many tutoring handbooks touch briefly on the importance of listening via backchanneling and nonverbal cues, or techniques that involve reading aloud, listening has yet to be more substantially theorized in both its palpable and metaphorical dimensions. A combination of sound studies and gesture studies allows me to articulate the central role of listening in writing tutoring, and examine how listening behaviors adapt to the virtual boundaries of video conferencing software.

Without relying on clichés about listening, like those laid out by Jonathan Sterne in his “audio-visual litany,” this article positions listening—especially gestural listening—as a force that coheres disjuncture. I argue that listening tends to “fill in” for the sensory gaps that come about in screen-mediated learning environments. It may seem counterintuitive to claim listening as a cohering force in the context of teaching via video-conferencing software. However, listening’s expressivity emerges as especially important because of the way video-chatting technologies deeply compromise gestural listening behaviors, e.g. eye contact. To illustrate and argue for this counterintuitive claim, I draw from interviews conducted with the directors and tutors of writing centers offering a remote, video-tutoring option and from observations of video tutorials at the University of Pittsburgh that I organized for the purposes of this essay. I present ethnographic vignettes that both convey instances of gestural listening and occasion methodological reflection.

In the first section of this essay, I provide a brief theoretical framework for my study of gestural listening. In part two, I move to an analysis of lived moments, based on observations I conducted primarily in the University of Pittsburgh Writing Center. Here, I focus on the construction of a frame in typical video tutoring set-ups before exploring how the face achieves primacy as a result of that apparatus. Moving then to a close reading of the screen as a materiality, I conclude with an exploration of eye contact as an important gestural listening behavior uniquely vulnerable to the disjunctures of online video tutoring apparatuses. In the last section, I give a brief list of practical takeaways with which practitioners may experiment in their own move towards online video tutoring.

Towards a Rhetoric of Listening: Gestural Listening in Context

Traditionally, rhetorical theory focuses primarily on speakers and writers, paying comparatively less attention to readers and listeners. Echoing Gemma Corradi Fiumara in her book, The Other Side of Language, I intervene on the proposition that speaking without listening is only half of a communicative act, what Fiumara calls a “reduced-by-half concept of language” (2). My research in this essay springs from an interest in acts of listening and their potential as a rhetorical force.

What I mean by listening as a rhetorical force is simply that although we commonly think about listeners being affected by speakers, and not the other way around, there are, in fact, things listeners do that influence what gets said. Listeners, in other words, can actually impact communicative situations. This “rhetoric of listening” turns the tables on a more traditional rhetorical paradigm, and understands listening not just as a mode of reception but as a formative, even expressive, component of communicative situations.

In order to begin identifying what a rhetoric of listening might look like, then, it becomes necessary to focus on the embodied actions of listeners. I proceed on the observation that people often put the act of listening into their bodies in ways that people may come to notice, and even use purposefully. The ways in which acts of listening manifest as sets of physically embodied behaviors I call “gestural listening.” Gestural listening posits listening as a productive and varied art, rather than a passive or monolithic one, troubling the line between reception and expression.

Writing center practitioners are likely to share a strong felt sense that forms of listening lie at the heart of writing center work. The gap in writing center literature with regards to listening probably exists for all the reasons that listening gets sidelined in general, the first among them being that it is tricky to measure. Gestural listening provides one way to make listening “visible.” I hope to show that aspects of gestural listening inform pedagogical and technological questions that arise with the growing practice of video tutoring, beginning with one of the most constitutive aspects of video tutoring: the frame.

Questioning the Frame

“Just yell out if you want me to stop!” the tutee says. The student commences reading her piece of writing aloud, interspersing her reading with verbal self- corrections and questions for the tutor as she goes along. At one point, the tutor breaks in to say: “You read ‘which look’ there; you have ‘who look’ in here [on the page].” The tutee makes the correction, and continues reading.

At first glance, the vignette above depicts a triumph of video tutoring. The classic writing center practice of having a tutee read their work aloud proceeds remarkably unhindered by the technological interferences, like echoes or delays, that sometimes plague video conferencing software. During the session, the tutor remarks on sentences that feel too long, the tutee catches and changes words that sound repetitive, and together, they improve the precision and flow of the student’s writing. Indeed, in many ways writing center work is well-supported by video conferencing software, and many centers throughout the United States have begun to offer video tutorials. In offering this option, however, writing center practitioners need to evaluate the interfaces that enable it, using the same care with which they interrogate aspects of conventional, in-person tutorials. After all, personal computers and their software are not neutral conduits for writing center pedagogies. Consider the following moment of capture from another one of my observations of a video tutorial:

While speaking, the tutor gestures gently with one hand, which hovers just above the laptop’s trackpad. Glancing at the computer screen, I note that her hand remains too far beneath the frame of the camera to be caught by it. I can see the tutor’s gesturing, but her tutee can’t. Later, beneath the desk, the tutor bounces one leg as though full of barely contained energy, another embodied signal the tutee will not be able to see.

As I note in this observation, the most commonly used video-conferencing software in writing tutoring, including Skype, BlueJeans, and GoToMeeting, all rely on the construction of a frame. In a co-present classroom, it might be possible to argue that a lectern, the chalkboard, or the proxemic arrangement of students and instructor in the room might create something like a frame as well, but the logistical and felt experience of that co-present classroom suggest a kinesic flexibility not allowed by the small, rectangular frame of a computer’s camera. In Ambient Commons, Malcolm McCullough writes, “One core belief in media studies is that when a frame fixes a perspective, it also fixes a cultural position,” and that “to question the frame is to expose those conventions” (McCullough 155). Following McCullough, writing center practitioners need to ask what cultural position the frame that enables video tutorials fixes, and how it shapes pedagogical practices.

To begin analyzing the conventions of video tutoring, I turn to Reading Writing Interfaces, in which Lori Emerson examines and historicizes the idea of an interface. Emerson¹ traces the progression within the personal computer industry from the command-line interfaces of the early 1980s to the window-based interfaces introduced by Apple in the mid-1980s, and in doing so, she identifies a certain strain of rhetoric that has powerfully guided the development of both hardware and software. Two of the central ideas in this rhetoric are invisibility and user-friendliness, both of which have implications for video tutoring. Deeply associated with Apple products, although not limited to them, Emerson’s “invisibility” refers to the way that devices like personal computers have come to seem “hermetically sealed,” that is, almost completely unable to be opened or tinkered with by typical consumers (Emerson 31). Emerson argues that contemporary computing devices are defined by their “no-longer noticed closed architecture,” their sealed quality hardly ever questioned by most users, and, in fact, actually admired as a part of the products’ sleek, elegant aesthetic. The idea of invisibility also carries over from smooth hardware into seamless software. According to Emerson, software interfaces, too, have followed the trend towards invisibility, which, she writes, “now also implies inaccessibility” (6). The document-formatting conventions of Microsoft Word, for example, have become so deeply normalized that breaking out of its given options is a challenge at times, even discouraged. “We need not know how it works,” she writes, “or how it works on us rather than us on it” (Emerson 6). Emerson’s stance is clear, here: that the appeal of invisibility is also dangerous in its masking of personal computers’ inner workings, leaving most users profoundly unaware of how these technologies, so central to the working lives of so many people, actually function. The idea of user-friendliness works alongside invisibility, and refers to the way that designers of personal computing devices strive to make both hardware and software as easy to use as possible, their operation coming to feel deeply natural, almost inevitable. According to Emerson, however, qualities of invisibility and user-friendliness actually hide ideology about what the public’s role as users of hardware and software should be: referring to the smooth, intuitive operation of personal computing interfaces, she notes that “we no longer have access to [the] digital tools for making” (3). “Instead,” she writes, what we have are “predetermined choices” (Emerson 3, emphasis added).

Video-conferencing software offers what Emerson would describe as a predetermined set of choices. Even when options for customization exist, those customizations often appear in the form of a predetermined list of settings or preferences. Sometimes, this lack of choice even impinges upon closely-held writing center practices. For instance, one common writing center practice is for tutor and tutee to sit next to one another, as opposed to across from each other, a proxemic arrangement which reflects the non-agonistic, non-hierarchical power dynamic that writing centers strive to construct. But the graphical user interface of most video conferencing programs effectively asks tutors and tutees to sit face to face; that is, the face-to-face quality is a powerful existing convention for how these programs are used. While it is possible to angle a camera differently, the “default” camera position is a direct, centered shot at the user’s face. Further, as of the writing of this article, many tutors and tutees are using laptops, which typically embed a camera into the top, center of the screen. With this setup, the angle and height of the camera are givens due to their position in the hardware of the computer. In this case, the conventions of the software and hardware actually subordinate a long-standing pedagogical technique valued in many writing centers. Even this minor example illustrates an idea key to Emerson’s argument, which is that an interface, whether embedded in hardware or software “does not simply lead from one space to another,” but rather inflects how meaning is made, and even what kind of communication is possible (132).

The way the face-to-face layout of video conferencing software overrides the writing center preference for sitting side by side emphasizes how users are asked, essentially, to submit to the set of communicative choices that current hardware and software interfaces allow. It would be easy, at this point, to take on Emerson’s skepticism and remark on how much is lost when instruction occurs online via video. As I point out in the vignettes above, certain communicative actions and important elements of gestural listening, end up “lost” outside the parameters of the frame, which is one of the most fundamentally constitutive elements of online video instruction. But perhaps better than submission is simply the idea of constraint. Shifting away from an attitude of loss, we might simply say that the limitations of the interfaces shaping video tutoring ask participants to adjust their behaviors accordingly, and may even give rise to new gestural practices. In addition to whatever losses may be associated with the construction of a frame, the frame brings about a concentration or heightening of other sensorial elements. Either way, the construction of a frame in the graphical user interface of video- conferencing software results in certain, concrete physical adjustments on the part of users. For example, one experienced online tutor noted, “I tend to talk with my hands a lot, so I sit back from the screen.” Here, the tutor adjusts in order for the better part of his gestural communication to remain intact. Another tutor responded differently, and said that she gestured less during her online tutorial, because she could see that her gestures were not inside the frame and that as such, the tutee could not see them.

The use of another technology often goes hand in hand with video tutoring software: headphones. Headphones are often necessary to minimize echoes or other auditory interference during online video appointments. Headphones, though, exemplify a technology that strongly circumscribes the movement and positioning of the body during their use. Due to their cords, they require the user to sit within a certain distance of the computer. Although wireless headphones are of course on the rise, many writing center tutors and tutees are literally leashed to the workstation by the use of older, corded headphones.

What comes from this questioning of the frame, so far, is the realization that personal computers used for video tutoring cause the bodies of users to be still and separate: still in the sense of held still, or contained, within the confines of the video camera, and separate in the sense of being placed and spaced apart from the bodies of others. To take this generalization a step further, personal computing devices, designed for use by humans, in turn shape the physical behaviors of their users. That is, a certain physical regime arises from the use of computers as objects. In her study of screen-reliant installation art, Screens, Kate Mondloch notes how, when interacting with video installations involving cameras that cause viewers to see themselves, “viewers [often] implement self-policing boundaries to keep themselves visible on the screen” (34). Although a range of interactive behaviors might be possible, Mondloch notices that viewers tend to adjust to keep themselves centered in the frame.

Listening, like all other communicative elements, adapts to the construction of a frame. To compensate for the limitations of the frame, participants in video tutoring rely more heavily on the aspects of gestural listening that do survive the digital in-between of the screen, and they also rely more heavily on verbal cues that reflect attention and responsiveness. One important way that this plays out stems, once again, from the construction of a frame, and gives rise to a prioritizing of the face.

The Primacy of the Face

The cameras flicker on, and tutor and tutee center their head and shoulders in the frame. After brief greetings, the tutor says, “So tell me about the assignment you’re working on today.”

“One second!” the tutee responds. “I was trying it without the headphones, but it’s not as good [. . . .] Okay, so this project is [. . .] I did an internship, and I have to sort of write a reflective paper [. . .]” Without further ado, the appointment is underway.

The frame that video-conferencing software produces leads to a focus on the faces of the participants. The face, after all, tends to be centered in the frame, and the frame itself captures only a small rectangle containing the face and what surrounds it. This aspect of selection, as “normal” as it may feel to those habituated to video conferencing software, nevertheless has implications for writing center practices and scholarship.

As in the vignette above, introductory aspects of face-to-face appointments tend to get shortened. The director of one writing center, whom I interviewed for this project, noted that video tutoring, as opposed to asynchronous online tutoring, helps in “making the reader very real,” but he then went on to remark upon the differences in the first few minutes of a tutorial between in-person and virtual set-ups. Online, he said, “we don’t have the back-and-forth of a greeting that feels natural.” Another interviewee (also a writing center director) agreed, noting that online sessions seem to just “hit the track running” and skip some of the early conversation that helps set up a writing center-style relationship between the tutor and tutee.² This aspect of the interface, then, most obviously affects aspects of rapport-building.

The quality of simply appearing in the frame of the camera, however, may have more far-reaching effects as well, due to information that can be attained by a person’s physical, in-person presence. Consider the choreography of a co-present setting: a student enters the writing center and often checks in at a front desk. They may sit, possibly in a waiting area, until their appointment time, when their tutor comes over, greets them, and leads them to a table to begin their tutorial. They arrange themselves at the table. These steps, as mundane as they may seem, get elided or eliminated by the interface of video conferencing, where tutor and tutee just “appear” with their faces in the frame.

As obvious as these aspects of co-presence may be here, from the perspective of gesture studies, their informative power should not be underestimated. Harry Denny explores aspects of this kind of “informative in-personness” when he takes up the idea of face in Facing the Writing Center: Towards an Identity Politics of One-to-One Mentoring. Denny deals with the idea of face by walking a line between its physical and metaphorical meanings. At times, a student’s physical appearance, their physical face, brings about a threat to their metaphorical face, or a sense of respect or belonging in the eyes of others in a given community. Denny recounts a time when a white international student from Russia sat down to have an appointment with Allia, a Black woman and graduate student tutor at St. John’s University. The tutee, Denny writes, “has inflected a current events paper with what the tutor perceives as racist rhetoric. When she pushes the student to think about her argumentation, the student says she thought her tutor was going to be one of the white tutors and questions her tutor’s qualifications” (32). Allia responds by explaining her qualifications to the tutee and continuing with the appointment. Allia’s literal face, in this instance, becomes part of a conflict in the tutorial, one that brings about a challenge to the tutor’s credibility and forces her to defend it.

Other times, Denny’s “face” is more strongly the metaphorical kind, although not without a dimension of the physical. Denny writes the following passage about a student named David who comes to the writing center for an appointment. David comes from a working-class Latino community in New York City, and Denny describes him as highly self-conscious about his own ability to fit, especially in his use of language, into St. John’s middle-class environment. David at one point has a tutorial with a “thoroughly middle-class white woman who had transferred to [Denny’s] school from an elite liberal arts college” (Denny 76). Observing them, Denny writes the following:

She performed the very all-American college affect that David sought to mirror. Watching them from my office was a curious ethnographic experience: From afar Eliza and David looked like an ad for Abercrombie and Fitch, Eliza more casual and effortless than David, whose performance of the college boy persona felt forced, too self-conscious at times. It was in this sense that he represented a failure to negotiate the complex rules of class: that to assimilate or cover requires a profound internalization and performance; and that success is almost always fleeting. (76)

Denny’s observations bring forward an idea within gesture studies: that bodies and faces can, at times, be a kind of liability—they can “give us away,” as David is given away here in his appointment with Eliza. Despite looking from afar “like an Abercrombie and Fitch ad,” an image that encapsulates a certain idealized vision of the American “college-kid” identity, upon closer examination, David’s performance is too “forced” and “self-conscious.” Being together in the room is what allows the appointment to even approximate the appearance of the idealized advertisement, and it also allows that unrealistic illusion to come up short. The reader sees this through Denny’s eyes, with Denny performing a kind of spectatorship in which he views the appointment between Eliza and David from a distance. His ability to draw the conclusions that he does about the two of them, and their respective sense of “belonging” in the academy, comes directly from being able to see them, their whole bodies, their postures, comportment, and self-arrangement in the space. In this passage, he gives a reading of a tableau, a tableau that would be fractured and stretched by the apparatus of video tutoring, which, as I argue above, stills and separates the bodies of participants in a way that disallows this kind of reading.

It is easy to see why the face, as one of the most concentrated sites of emotion and communication in the body, would be prioritized by designers of video- conferencing software. But simply being able to see the face of another person does not necessarily guarantee clarity of meaning or expression. One of the most significant scholarly efforts to study the complexities of facial expression comes from psychologist Paul Ekman. In his book with Wallace C. Friesen, Unmasking the Face, Ekman and Friesen outline the minute physical manifestations of six different common emotions, including surprise. Here, he describes when the overt features of surprised facial expressions become a kind “performance” that actually comes to signify something else: “Although the surprise brow is usually joined by wide-open eyes and dropped jaw,” Ekman and Friesen write, “it sometimes appears in an otherwise neutral face. When this happens, the facial expression no longer signifies an emotion; it has different meanings, some of which are related to surprise” (39). They go on to describe this facial phenomenon, not real surprise but “related” to it, asa common one leveraged by listeners:

When the brow is held in place for a few seconds or more, this is an emblem which means doubt or questioning. Frequently it is shown by a person who is listening to what someone is saying; it registers without words a question or doubt about what is being said. The questioning or doubting may be serious or not; often this emblem will express mock doubt, the listener’s incredulity or amazement about what she has just heard. If joined by a head movement, sideways or backwards, it is an exclamation. If the surprise brow is joined by a disgust mouth, then the meaning of the emblem changes slightly to skeptical disbelief, or if the head rotates back and forth, incredulous exclamation. (39)

Two major points come out of this passage. First, this kind of performative facial expression represents one way that listeners commonly continue to exert communicative power in conversational situations even when not speaking. Notably, what Ekman and Friesen refer to here are not spontaneous, involuntary facial expressions, but rather strategic performances of them. According to Ekman, people often control their facial expressions because of “cultural display rules” (139). Cultural display rules are a useful way of thinking about how students and teachers use facial expressions in classroom environments, and especially the phenomenon of misleading facial expressions. Students and instructors can be said to operate within cultural rules of display in their classroom comportment.

Secondly, this excerpt reminds readers that simply being able to see the face of another person does not guarantee any kind of simple clarity or authenticity in a communicative encounter. While I devote significant space to analyzing aspects of screenic interference and affordance in this article, it is important to remember that the human face itself is said by some gesture theorists to be itself a kind of screen. At first, this realization may seem to undercut the value of being able to see the face without the veil of a screen in between participants. But Ekman’s research on detecting lies demonstrates the multi-layered complexity of the face, and why it may still be important to be able to see it. “Usually when a person is said to lie with his face or words, he lies to meet some need of the moment,” he writes (Ekman 139). But the controlling of facial expressions “can involve false messages or the omission of messages” as well (Ekman 139). “The word lying,” he continues, “may be itself misleading about what occurs. It suggests that the only important message is the true feeling that underlies the false message. But the false message is important as well, if you know it is false. Rather than calling the process lying, we might better call it message control” (139). What Ekman calls “message control” is one reason why centralizing the face in video conferencing software removes important communicative information. As can be seen in the excerpt from Denny, above, the bodies of participants allow for different communicative choices and the gathering of more embodied information.

What emerges, further, is the sense that the visual regime of personal computing and, by extension, of video-conferencing platforms, is complex, but that the gestural and audio regimes do not match that complexity. Moreover, this has implications for writing pedagogy. Often, for example, tutor and tutee will move through a piece of writing with the help of deictic gestures, pointing, grasping, framing, etc. But, as I have shown, the visual conventions of video tutoring give primacy to the face: the hands can make an appearance in the frame, but not in relation to the text. To some extent, the lack of the gestural can be made up for on programs like WCOnline, which allows tutor and tutee to interact with the written text on the screen, but the communicative flexibility of the hands is nevertheless reduced to the functions allowed by the screen—highlighting and typing, for example. The privileging of the face in most video conferencing software used for writing instruction reveals a lack of creativity in the way those softwares are designed. Why not, for example, develop a type of software or camera orientation that privileges the hands? The hands, in many respects, are one of the primary bodily locations for writing. Giving primacy to the face makes the idea of dialogue and audience the main facet of writing tutoring. Giving primacy to the hands, in contrast, would centralize slightly different aspects of writing: perhaps aspects of craft and construction, like how much space on the page is given to a paper’s different ideas, or the importance of ordering and arrangement on the page. Better yet, why not be able to move between the face and the hands, or be able to see both at once? Greater flexibility and creativity in the design of video tutoring software will be necessary, but new interface choices should come from writing center pedagogies and priorities.

In Jackie Grutsch McKinney’s Peripheral Visions for Writing Centers, Grutsch writes, “Writing center practitioners were able to resolve themselves to online tutoring only when it looked more familiar” (17). Part of what this means is that online tutoring became acceptable as it became possible to look through rather than at its technological apparatus. I want to resist the momentum to only look through, in line with Emerson’s skeptical attitude toward invisibility and user- friendliness, and to dwell further on the ambivalent materiality of the screen.

The Materiality of the Screen: From Frame and Window to Barrier and Mirror

The willingness of writing center practitioners to look through rather than at the screen that enables online tutoring requires a kind of close material reading of the screen itself. Anne Friedberg does this in her book, The Virtual Window, in which she traces an exhaustive history, both material and intellectual, of windows, as architectural features and as objects with rich metaphorical function. As an art historian, her starting point is a famous quotation from the Renaissance painter Alberti that likens the frame of the painting to a window that frames the subject being painted. Alberti writes: “Let me tell you what I do when I am painting. First of all, on the surface on which I am going to paint, I draw a rectangle of whatever size I want, which I regard as an open window through which the subject to be painted is seen” (Friedberg 249).

According to Friedberg, this quotation set the stage for the development of genres of painting, which would in turn go on to shape future media as well, especially screen media. Personal computers fell in line with the same set of metaphors as their prevalent interfaces developed in the 1980s. Friedberg writes, “The computer screen is both a ‘page’ and a ‘window,’ at once opaque and transparent” (19). Furthermore, and of particular interest to this study, “it commands a new posture for the practice of writing and reading—one that requires looking into the page as if it were frame of a window” (19). She continues,

The computer screen adds a new depth to the perpendicular surface. Its overlay of ‘windows’—open to different applications for word-processing, Web browsing, emailing, downloading—transforms the screen surface into a page with a deep virtual reach to archives and databases, indexed and accessible with barely the stroke of a finger. (19)

For Friedberg, the coexistence of depth and flatness characterizes screen media:

The desktop metaphor of a stack of papers, in overlapping array, implies a view from above. The window metaphor implies looking into or out of an aperture, a “perspective” position facing an upright perpendicular surface. Stacking windows on top of each other, piling documents in layers, meant that the user could maximize the limited “real estate” of the relatively small screen. The space mapped onto the computer screen was both deep and flat. It implied a new haptics in the position of its user: in front of and above. (227)

Each user in a video writing tutorial, then, may look upon the other as though slightly from on high. These haptics are clearly quite different from the haptics of being in a space together, more clearly dramatized by the excerpt from Harry Denny in Facing the Writing Center, above. The “real estate” of the screen brings about questions of what can be seen at the same time—the student as well as the document they are working on—but also questions relating to visibility and opacity through and on the screen itself. For instance, the stacking windows that allow for the piling of documents means that one user can quickly switch to a different window on the desktop and completely cover the face of the other participant.

By definition, then, screens raise a barrier, even as they convey the illusion of depth, or opening onto another space. This plays out in particular ways in the context of video tutoring. At centers where Skype and GoogleDocs are used simultaneously during online tutorials, or even if a shared document is simply on the screen alongside the Skype window, it is possible for the tutor or tutee to be looking at something else entirely on their own screen rather than at the other person in the Skype frame. An experienced online video tutor noted to me in an interview that although this may not be intended, he can always hear when a tutee is typing. In response, he told me that he too makes sure to let the tutee know if he is typing during an appointment, and why. The director of this tutor’s writing center, whom I also spoke to, picked up on the theme of the opacity of the computer screen. He remarked that there is a need to pay attention to

managing expectations about who’s writing during the session in that shared GoogleDoc; whether the expectation [. . .] is even higher electronically than it is in person that a tutor is going to write things on the document [. . .] so being explicit about those roles.

In a case like this, listening may manifest in doing something to the shared document. It becomes important to be particularly explicit about who will be doing what, and when.

After all, silence is a complex signifier, and can be unsettling in the context of a video tutoring session. The director I mention in the last paragraph went on to say that,

Then there are times when the writer seems to go missing or be quiet—we don’t know if connection—you know what it’s like with Skype sometimes [. . .] it just cuts out or you can’t hear something for a while or you’re not sure—did the writer step out of the room? Did they go get something to eat, are they watching a movie? Who knows how people are multitasking—I hear this from TAs on our staff [. . .] there are some things they have to get used to.

These observations show most clearly how the screen acts literally like a screen, that is, like a barrier.

In addition to the construction of a frame, by using many video-conferencing software, participants are made to see themselves in unusual ways. Over Skype, both parties can see a small image of themselves, moving and talking, in the corner of the screen. Video conferencing platforms routinely include a function where participants can see themselves—in fact, it has become an expectation for this type of software. At times, the self-facing camera actually acts as the confirmation that the other caller can see you. That is, if you cannot see yourself, your contact cannot see you, either.

In a follow-up conversation with the tutor in the vignettes above, she noted how distracting it was to see herself on camera during the tutorial. “It does make you more self-conscious,” she said, noting that during the tutorial she had seen and fixed an out-of-place lock of hair in the self-facing camera. She also observed that the setup creates the problem of not knowing where to look: at the tutee, at the text, at herself? The tutor went on to say that the screen contained a lot of information for someone easily distracted like herself, and she wondered whether it might not be better to go back to just using telephones for distance conferencing.

Last fall, I interviewed an experienced video writing tutor who said: “Sometimes when I switch back over to Skype [after not being in that window for a while] I’ll realize that I’ve been [. . .] kind of hunched down to the point that I’m almost out of the frame [. . .] of vision.” She often responds by sitting up straighter. After pauses during which she is not looking at herself, one of the tutors I interviewed said, “I semi- regularly kind of forget how my face is coming across.” The tutor put this forward as a problem—similar to sitting up straighter upon seeing herself slouched in the camera frame—as though she should be aware at all times of whether or how her face is communicating. The self-facing Skype camera seemed to chasten her with the constant reminder of what her face was doing when she was paying the least attention.

The self-facing camera could be said to heighten participants’ awareness of how they might be communicating with their faces in any given moment. Essentially, these situations unique to the video conferencing apparatus ask tutors, teachers, tutees, and students to become more aware than they might usually be of how their bodies are or are not manifesting the act of listening. The anecdotes that I sketch in the last paragraphs start to suggest how. For example, when the first tutor is hunched down almost out of the frame, she would be likely to say, if asked, that that posture probably would not make the tutee feel that she was engaged in attentive listening.

The way participants in video tutoring sessions compensate for the barrier of the screen goes to show that there exist listening behaviors that have developed into expected conventions in communicative situations. Furthermore, participants tend to compensate for the diminishment of those behaviors in two ways: first, by relying on verbal cues that reflect listening. Second, they adjust listening’s gestural elements, keeping themselves within the frame of the camera and being sure to sit up straight, for example. This is a reminder that listening is an entrained behavior, so much so that certain elements of gestural listening prevail in spite of their detriment to mental processes. A major example of this is a physical behavior deeply tied to performances of listening: eye contact.

Screenic Listening and Eye Contact

Within the face, the eyes merit more attention as particularly concentrated communicative tools, for which video-conferencing software creates a special set of issues. Among the interviews that I conducted with experienced online video tutors, one says, “I always struggle with where to look.” Whether looking at the tutee’s face on the screen or at the computer’s camera, he told me, “something’s a little off.” Skype and its ilk, as they stand now, actually hinder the form of body language that might be the most commonly used to reflect attentive listening: eye contact. A person can choose to look at the eyes of the other person on the screen, or at the computer’s camera. Looking at the camera brings about the visual illusion, for one participant, that eyes are meeting, but in fact one person is looking not at the image of the other person but at the camera at the top of the computer frame. In other words, there is no way to make real eye contact.

Eye contact is a gestural listening behavior that is as important as it is misunderstood. Harboring both cultural and physiological components, eye contact is closely tied to cognition, intimacy, and power. Studies have shown that infants prefer faces that engage them in direct gaze, and that early experiences of eye contact may form the basis for future social skills (Senju et al.). But a fundamental misnomer lies at the heart of the term “eye contact”: what we tend to think of as eye contact is actually not really contact at all. The phrase “eye contact” lends itself to the sense that the eyes of two people meet and lock in a kind of beam stretching between them. This is a misleading image: in reality, a sustained gaze is actually comprised of multiple, fleeting eye movements that encompass not just the eyes but also the nose, cheeks, and mouth, other areas of the face that inform about identity and emotion (Ekstein and Peterson). Photographs that track study participants’ gaze when looking at a face show concentrated clusters not on the eyes, but rather on and around the nose (Ekstein and Peterson). These rapid movements are what allow the visual system to put together information about the identity of a viewed person and what they may be thinking or feeling. Further, eye contact is socially preferred in many cultures, but as it turns out, this may be more socially preferred than cognitively optimal. Japanese studies have found that maintaining eye contact impedes participants’ ability to do perform a cognitive task, such as naming the color of a word (Kajimura and Nomura). Eye contact, it seems, takes up large amounts of mental bandwidth, and may not actually foster the best cognitive environment for certain types of tasks.

In spite of this knowledge, however, the concept of eye contact in popular imagination is so strong that it remains important pedagogically, and it emerges as a key aspect in the success and failure of video tutoring. As a result of the way that eye contact is compromised, other gestural listening conduits need to be leveraged more heavily, as well as more verbalizing to help alleviate the disjunctures of the video apparatus. One tutor noted,

one way that I demonstrate that I’m listening is that I’m highlighting things or making marginal comments [. . .] [and] I’ll forwarn them that I’ll do that [. . .] so that they know I’m with them and engaged, that something is happening.

She is echoed by another one of my interviewees:

the removal of visual cues asks the consultant to be intentional about signaling their place in the conversation, and maybe kind of guiding things in a more explicit kind of way to make it clear that they’re still listening, that they’re following along, that they’re understanding, but since there’s not the normal nodding of the head or smiling [. . .] those sorts of moments can be more intentional, if the consultants are thinking about doing that. I think initially sometimes they forget.

Tutors and directors may want to consider what this writing center director calls “intentionality” here, being sure to respond to the need to replace the usual visual and auditory cues with verbal or textual ones. When the dialogic aspects of listening don’t get communicated, negative consequences can result, ranging from confusion to alienation to resentment.

There may also be a need to tolerate silences whose significations are unclear because, as this same tutor admits, the affective dimension is harder to read online. “Even with the video,” he said, “it’s a little harder sometimes to read facial expressions and visual cues.” One of the tutors I spoke to talked briefly about how this plays out, how she pays attention to “how I’m listening to [the tutee] and how I’m letting them kind of guide the conversation [. . .] making sure to take note of moments when they look like they want to say something or they start to say something right as I do when I cut them off and make sure to go to back that.” She went on: “so their face is also really important to me in the session [. . .] if they look confused, or disconnected, or like they want to say something.”

In “Contact with My Teacher’s Eyes,” Yin Yin investigates some of the affective dimensions of eye contact in pedagogical situations, attending to both students’ and teachers’ experiences. Using a similar vignette method as I used in this study, Yin Yin identifies pressure, belonging, and intellectual excitement all as possible results of making eye contact with one’s instructor during class. In one vignette, the author brings forth the sense of pressure that can arise from eye contact between student and teacher:

Andy [. . .] “hears” the teacher’s demand for concentration. He becomes so nervous and vigilant. His body is no more under the veil of pre-reflectivity; his nervousness gathers extra attention to hasten his hand to take note word for word. (72)

Here, the author prioritizes the student’s response to eye contact with the instructor, which is to become nervous and vigilant after making eye contact. The author writes that “his body is no longer under the veil pre-reflectivity.” Rather, it is pushed out from that veil, and stands exposed. Eye contact can be such a powerful a choice that Yin Yin writes,

the value of knowing student’s experience of having eye contact with us lies in getting to know both the power and limit in the eye contact. We should not only know when to make eye contact with a student. We have to know when to not give it. (78)

According to Yin Yin and my interviewees, eye contact functions as a kind of two-way street: it can serve as a means of reception, but also as means to provoke speech and build expectation. A distinctive aspect of eye contact is its exclusiveness—it can only be formed between exactly two people at a time, after all, and this sudden forming a dyad within a group may account for the sense it creates of closeness, or pressure.

Implications and Recommendations

For writing center tutors to identify their own embodied habits, choices, or kinesic ways of being emerges as a worthwhile exercise. While filming tutorials might seem like a natural choice, I suggest a different approach. An activity that I have conducted in writing center workshops involves having two tutors enact ten minutes or so of a mock tutorial. While they do so, two other tutors silently observe, taking notes on certain aspects of gestural listening that they are given secretly in advance on an index card (eye contact, posture and body positioning, how participants use their hands, for example). After the mock tutorial, the observers reveal what they have been looking for and share their findings with the group. Then, tutors switch roles and the new observers take notes on a different set of listening behaviors, followed once again by discussion.

Building from this, it will be important to become aware of cultural preferences around eye contact and other kinesic choices, as well as the ways these habits and choices may respond to aspects of identity like race and gender. Power dynamics often manifest in acts of listening, so tutors in ongoing training sessions might be encouraged to consider the following: in any given situation, who gets to be listened to and who gets to listen? Are certain individuals or groups more often forced into listening roles? On the other hand, when do the more powerful players in a communicative situation leverage that power to become a kind of listening “judge”?

Lastly, writing center directors might want to consider how they can help those facilitating online video tutorials to become sensitized to aspects of interface, and to the ways that layered, culturally- and historically-situated tutoring interfaces come to bear on writing center pedagogies. Directors may imagine different ways of going about this, asking tutors to reflect on elements of interface and embodiment.


1. Interface refers to human-to- hardware interfaces (keyboards, screens, etc.), and human-to-software interfaces (namely, graphical user interfaces).
2. I have preserved the anonymity of my interview subjects. All interview quotations come from telephone interviews I conducted from October 18-25, 2016.

Works Cited

Ekman, Paul and Wallace Friesen. Unmasking the Face: A Guide to Recognizing Emotions from Facial Expressions. Los Altos, Malor Books, 1975.

Ekstein, Miguel P, and Matthew Peterson. “Looking Just Below the Eyes Is Optimal Across Face Recognition Tasks.” Procedings of the National Academy of Sciences of North America, Edited by William S. Gievslervol, vol. 109, no. 48, 2012. doi: 10.1073/pnas.1214269109.

Emerson, Lori. Writing Reading Interfaces: From the Digital to the Bookbound. U of Minnesota P, 2014.

Denny, Harry. Facing the Center: Toward an Identity Politics ofOne-to-One Mentoring. Utah State UP, 2010.

Fiumara, Gemma Coradi. The Other Side of Language: A Philosophy of Listening. New York, Routledge, 1996.

Friedberg, Anne. The Virtual Window: from Alberti to Microsoft. MIT P, 2009.

Kajimura, Shogo, and Michio Nomura. “When We Cannot Speak: Eye Contact Disrupts Resources Available to Cognitive Control Processes During Verb Generation.” Cognition, vol. 157, 2016, pp. 352-357. Science Direct, doi:10.1016/j.cognition.2016.10.002.

McCullough, Malcolm. Ambient Commons: Attention in the Age of Embodied Information. MIT P, 2013.

McKinney, Jackie Grutsch. Peripheral Visions for Writing Centers. Utah State UP, 2013.

Mondloch, Kate. Screens: Viewing Media Installation Art. U of Minnesota P, 2010.

Senju, Atsushi. “Early Social Experience Affects the Development of Eye Gaze Processing.” Current Biology, vol. 25, no. 23, 2015. pp. 3086-3091. Science Direct, doi: 10.1016/j.cub.2015.10.019.

Sterne, Jonathan. The Sound Studies Reader. New York, Routledge, 2012.

Yin, Yin. “Contact with My Teacher’s Eyes.” Phenomenology and Practice, vol. 7, no. 1, 2013, pp. 69- 81.