The thorny path of philological reading: prosodic images

Written by: Dr. Elena Yakovleva, Lomonosov Moscow State University
The question of what happens when we, philologists, meet with the task of dealing with a sample reading of a text of verbal art or teach our class find interest in the process of reading the voices of the characters arises again and again. In the dimly remembered 1967 D.Abercrombie described at least three classes of markers that reveal personal characteristics of the speaker. Today we can safely refer to them as social markers, physical markers and psychological markers respectively:

a) those that mark social characteristics, such as regional affiliation, social status, occupation and social role;
b) those that mark physical characteristics, such as age, sex, physique and state of health;
c) those that mark psychological characteristics of personality and affective states.

11 years later one further category was proposed: ‘those that reveal changing states of the speaker’. Lyons calls this category a ‘symptom’, recalling the diagnostic use of signs in medicine:

“Any information is a signal which indicates to the receiver that the sender is in a particular state, whether this be an emotional state (fear, anger, etc.), a state of health (suffering from laryngitis, etc.), a state of intoxication, or whatever, can be described as symptomatic of that state (Lyons, 1977)”.

The notion of “symptom” might be useful as it is the one that refers to a transitional state of the speaker which is difficult to characterize precisely: the acoustic information is bound in this case to be “polysemantic”.
Although voice production and identification are our primary interest (how many voices and of what kinds can be produced by one speaker and easily identified by his listeners), it might be to the point to avail ourselves of some of the basic notions concerning speech production.
The respiratory system supplies the vocal tract with a stream of airflow; the vibrating vocal chords constitute the phonatory system; the pharyngeal system controls articulatory activity at the bottom of the vocal tract; the velopharyngeal system makes our voices nasal; formation of sounds in the mouth cavity constitutes the lingual system ; last but by no means least, an important role in producing various voices is played by the labial system (lip movements) and the mandibular system (movements of the jaw).
On the level of perceiving speech it is described in terms of pitch, loudness (intensity), tempo (length or duration) and quality (timbre). And here again more examples can be given in order to show how different systems can interact. Thus it is well-known from the literature on the subject that pitch ‘jitters’ and loudness ‘shimmers’ (that is, aperiodic cycle-to-cycle variability of fundamental frequency or intensity around the main value) are both heard as contributing to auditory quality, giving a ‘rough’, ‘harsh’ auditory texture. Loudness, in its turn, is the sum total of all the frequencies and one must take into consideration that a stretch of whisper may sound “louder” at a short distance from the listener than a stretch of pure tone somewhere in the distance. Tones, as distinctive suprasegmental units, on the acoustical level, therefore, should be understood as combinations of fundamental frequency and overall intensity since the latter can affect the former.
Now, what can be regarded as already more or less known about the correlation of the prosodic features with the above mentioned physical, psychological and social characteristics in phonetic literature?
And here the concept of voice-quality arises as described and named by G. Fairbanks in 1960. It should be emphasized, however, at once that the approach adopted by Grant Fairbanks is fairly different from ours: we are interested, first and foremost, in sound production by a normal speech apparatus, whereas Fairbanks was primarily interested in speech pathology. He was quite explicit on the subject saying he identified four types of voice quality disorders: harshness, breathiness and hoarseness (defects of tone generation) and nasality (a defect of transmission). All these qualities were viewed against the background the neutral or clear voice, which is particularly praiseworthy.

“Irregular, aperiodic noise in the vocal-fold spectrum. A common cause is excessive laryngeal tension. Harsh speakers tend to initiate phonation abruptly, with obtrusive glottal attacks in which the clicks, or sharp transients are unduly prominent. Some harsh speakers, especially when fundamental pitch is very low, exhibit trains of such clicks that are ratchetlike in sound (This is what in our notation is called creak and, as practice shows, it appears not only at the bottom of one’s voice. The italics are ours). Probably more adjectives have been applied to this quality than to any other vocal characteristic. For instance: coarse, discordant, dissonant, grating, guttural, hard, metallic, noisy, rasping, raucous, rough, strident” (pp.175-177).


“In breathy quality the vocal folds vibrate, but the intermittent closure fails and air-flow is continuous. The firmness of the basic glottal closure is insufficient for a given airflow (or the force of the airflow is excessive for a given closure). Breathy quality is almost invariably accompanied by limited vocal intensity. Vocal attacks tend to be aspirate (the italics are ours), in contrast to the glottal attacks of harshness” (pp.177-178).


“Whispering, or voiceless speech, is often used as a means of restricting intensity to a small area. Some speakers develop a “delightfully confidential manner of speaking”, as one teacher calls it sarcastically. Such a speaker may use a low pitch and low intensity. This now-I’m-going-to-tell-you-a-big-secret kind of speech may be just the thing for a fairy-tale hour” (pp.179-182).


“Universally familiar as a symptom of acute laryngitis, hoarseness combines the features of harshness and breathiness. Some call this combination husky. The harsh element predominates in some hoarse voices, the breathy element in others” (p.182). (The term ‘husky’, as we see, seems to fall between two stools and its aesthetic value is somehow dubious: there are so many “creaky” voices, as will be seen a little below, that adding one more term ‘husky’ is of no great help, though in future it might acquire a proper definition; the italics are ours).


“Excessive nasality, or hypernasality is one of the most common voice problems, but mild nasality is heard in many good voices (the italics are ours). It may be a virtue, in fact, although the evidence is inconclusive. Nasality is imparted to the vowel spectrum by lowering the velum and coupling the nasal cavity into the system” (p.172).

Since in principle we do not see why ‘whisper’ cannot be made a specific characteristic of a personage’s voice, we compare the definitions of breathy and whisper again, this time on the basis of the book “The Gift of Speech” by John Laver.

“The mode of vibration of the vocal folds is inefficient, and is accompanied by slight audible friction. Muscular effort is low, the glottis is kept somewhat open along most of its length. There is a close auditory relationship between breathy voice and ‘whisper’ (the italics are ours)” (p.203).

Now, what about the above-mentioned correlation of qualities, on the one hand, and their functions in actual speech, on the other? Here, we are afraid, the situation is even more complicated.
Let us now concentrate on the correlations, which have been established by now with a fair degree of certainty. We shall proceed from physical to social and psychological markers.
Physique and height are probably judged accurately because of the good correlation that seems to exist between these factors and the dimensions of the speaker’s apparatus. A tall, well-built man will tend to have a long vocal tract and large vocal folds. His voice quality will show low ranges of formant frequencies and correspondingly a low range fundamental frequency. His large respiratory volume will be reflected in a powerful loudness range (cf. Laver, p.242).
Age can be also indicated by voice quality, although the correlation is fairly accurate only if associated with the ‘breaking’ voice of puberty (vocal mutation may result in whispery voice) or with extremely old age (tissues become less elastic and this explains the appearance of shrill or thin voices). In the latter case to achieve better phonation, greater effort has to be exerted, as a result rather often a harsh voice can be heard… Together with a tendency to chronic bronchitis hacking, coughing and throat-clearing may more often than not indicate an old person, all these are not permanent characteristics as compared with harshness, and, therefore, are less reliable.
The situation seems to be more encouraging in the case of sex differentiation. Females have on average one octave higher fundamental pitch as compared with men.
Specialists are of the opinion that “because voice settings are under potential muscular control, they are learnable and imitable (the italics are ours).” The adoption of a particular voice setting often acts as an individuating marker, when its use is idiosyncratic to a particular speaker. But voice settings often form part of the typical vocal performance of particular regional accents, and can thus also act as social markers.
It is well known in the book “Sociolinguistic Patterns in British English” P. Trudgill has clearly shown that the speech of working-class, in contrast with that of middle-class speakers, is marked by the habitual use of a ‘creaky’ phonation, a high pitch-range, an increased loudness-range, a particular type of nasality and a relatively high overall degree of muscular tension throughout the vocal tract (the italics are ours).
Obviously, if the problem of social markers boils down to imitating dialectal deviations there is hardly any hope for philologists to read a dialogue convincingly enough, although certain deviations are well known. Leaving this aspect of voice production for the time being alone, let us now turn to voice qualities as psychological markers.
We know already that the fundamental frequency of vibration of the vocal folds is perceived as “pitch”: the higher the frequency, the higher the pitch is perceived and vice versa, so during pronouncing a sound (a vowel or a resonant) – to say nothing of larger segments: syllables, words, simple rhythm groups (feet), syntagms or sentences – pitch is either raised or lowered or sustained and in this way can form various configurations. If it happens within a sound or a simple rhythm group (a foot) the configuration is referred to as “tone”, if the unit is higher, then the corresponding configuration is already a “tune” (“contour” or “intonational contour”, “melody” or “melodic pattern” etc.).
The term “range (diapason)” is used to refer to the whole band frequencies which it is possible for the individual to produce from the lowest to the highest (some scholars prefer to speak of the “total diapason” but stratify it into “registers”).
It appears to be of great interest that normally we speak only within one third of our total pitch range (its lowest part). Men, for instance, if they go beyond this third, can imitate women or even children.
“Loudness” is the product (with some reservations, of course) of the amplitude of vibration of the vocal folds brought about by differing intensity of air pressure from the lungs. Perceptually loudness is the sum total of all the amplitudes of the constituent frequencies of a sound and thus pitch and loudness depend on each other, only within certain limits they can be regarded as independent parameters (and consequently demonstrated as such). From this point of view the request “Speak up!”, in actual fact, often results merely in raising one’s pitch.
There are some more prosodic parameters (which play a very important role both on the syntactic and the suprasyntactic levels of English speech): “pauses (junctures, disjunctures)” of different length (the absence of voice or phonation, although one can also speak of “voiced pauses”) “tempo (length, rate, duration)” of speaking, “stress (accent)” and “rhythm” (the last two being obviously complexes of the previous parameters).
In the case of the human voice the note is given its quality (or timbre) by certain variations in the generation of the vocal-fold pitch and by its distribution among the resonators – oral cavity, nasal cavity, larynx and pharynx. The modulation of these resonators and the distribution of the note among them are susceptible of an almost infinite variety of states. We shall never tire of repeating, however, that not every variety is actually functionally burdened or even actually perceived.
What has been said above is meant to serve an illustration to what students of philology are dealing with when confronted with the direction of research popularly known as “Philological reading” pursued by the scholars of the English Department at the Philological Faculty of Lomonosov Moscow State University.

Leave a Reply