Daily Archives: December 29, 2013


Perception of Overtone Singing

Chen-Gia Tsai

Pitch strength

Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.

A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.

Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation

Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.

The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.

Perception of rapid fluctuations

Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.

Return to Mongolian Khoomii Singing main page


CHEN-GIA TSAI : Perception of Overtone Singing, TAIWAN

TRAN QUANG HAI & DENIS GUILLOU: Original Research and Acoustical Analysis in connection with the Xöömij Style of Biphonic Singing, FRANCE


Original Research and Acoustical Analysis in connection

with the Xöömij Style of Biphonic Singing

Tran Quang Hai , Centre National de la Recherche Scientitique, Paris 1980

Denis GUILLOU, Conservatoire. National des Arts et Métiers, Paris


The present article is limited in its scope to our own original research and to acoustical analysis of biphonic singing, this is preceded by a summary of the various terms proposed by different researchers. The first half the article concerning xöömij technique was written by Tran Quang Hai. Guillou has written the second half concerning acoustical analysis.


Until the present time it has not been possible to confirm that the centre, of biphonic singing within Turco‑Mongol culture is in fact Mongolia. Biphonic singing is also employed by neighbouring peoples such as the Tuvins (Touvins), Oirats, Khakass, Gorno‑Altais and Baschkirs; it is called kai by the Altais, uzliau by the Baschkirs, and the Tuvins possess four different styles called, sygyt, borbannadyr, ezengileer and kargyraa. A considerable amount of research is at present being carried out throughout the world into this vocal phenomenon, particularly as it is practised in Mongolia.


Research can be carried out in various ways: by means of observation of native performers after one or more visits to the country concerned, or by means of practical instrumental or vocal studies aimed at a better understanding of the musical structure employed by the population being studied. My own research does not belong to either of these two categories since I have never been to Mongolia and I have never learned the xöömij style of biphonic singing from a Mongolian teacher. What 1 shall describe in this article is the result of my own experience which will enable anybody to produce two simultaneous sounds similar to Mongolian biphonic singing.



Simultaneous two‑part singing by a single person is known in the Mongol language as xöömij (liter­ally “pharynx”). The manner in which the Mongol word is transcribed is by no means uniform; ho­mi, ho‑mi, (Vargyas 1968), khomi, khöömii, (Bosson 1964: 11), xomej, chöömej, (Aksenov 1964) chöömij, (Vietze 1969:15‑16. Walcott 1974) xöömij, (Hamayon 1973). French researchers have used other terms to describe this particular vocal technique such as chant biphonique or diphonique (Leipp 1971, Tran Quang Hai 1974). voix guimbarde. voix dédoublee (Heitfer 1973, Hamayon 1973), and chant diphonique solo (Marcel‑Dubois 1979). Several terms exist in English such as split‑tone sing­ing, throat singing and overtone singing, and in Germansweistimmigen Sologesang.


For convenience 1 have employed in this article the term biphonic singing to describe a style of singing realized by a single person producing simultaneously a continuous drone and another sound at a higher pitch issuing from a series of partials or harmonies resembling the sound of the flute.


Origin of My Research

In 1971, the date of my first contact with Mongolian music in the form of recordings made in Mongolia between 1967 and 1970 by Mrs. Roberte Hamayon, researcher at the Centre National de la Rech­erche Scientifique and especially after listening to a tape on which were recorded three pieces in the biphonic singing style, I was struck by the extraordinary and unique nature of this vocal technique.


For several months I carried out bibliographical research into articles concerned with this style of singing with the aim of obtaining information on the practice of biphonic singing, but received little satisfaction. Explanations of a merely theoretical and sometimes ambiguous nature did nothing so much as to create and increase the confusion with which my research was surrounded. In spite of my complete ignorance of the training methods for biphonic singing practised by the Mongols, the Tuvins and other peoples, I was not in the least discouraged by the negative results at the beginning of my studies after even several months of effort.


Working Conditions

According to Hamayon, the xöömij, which exists throughout Mongolia but is gradually dying out, is practised exclusively by men. It represents an imitation, by means of a single voice of two instruments, the flute and the Jew’s harp.


The xöömij refers to the simultaneous production of two sounds, one similar to the fundamental produced on the Jew’s harp (produced at the back of the throat), and the other resulting from a modifi­cation of the buccal cavity without moving the lips which remain only slightly open; positioning the lips as for a rear vowel results in a low sound, whereas front vowel positioning produces a high sound (Hamayon 1973), a technique similar to that used by the Tuvins (Aksenov 1964). The cheeks are tightened to such a degree that the singer breaks out into a sweat. It is the position of the tongue which determines the melody. Anybody who possesses this technique is able to copy any tune (Hamayon 1973).


I worked entirely alone groping my way through the dark for two years, listening frequently to the recordings made by Hamayon stored in the sound archives of the ethnomusicology department of the Musee de I’Homme. My efforts were however to no avail. Despite my efforts and knowledge of Jew’s harp technique, the initial work was both difficult and discouraging. 1 also tried to whistle while producing a low sound as a drone. However, checking on a sonograph showed that this was not similar to the xöömij technique. At the end of 1972 I got to the stage that I was able to produce a very weak harmonic tone which when recorded on tape, showed that 1 was still a long way from my goal.Then, one day in November 1973, in order to calm my nerves in the appalling traffic congestion of Paris, I happened to make my vocal chords vibrate in the pharynx with my mouth half open while ­reciting the alphabet. When I arrived at the letter L and the tip of my tongue was about to touch the top of my mouth, I suddenly heard a pure harmonic tone, clear and powerful. I repeated the operation several times and each time I obtained the same result. I then tried to modify the position of the tongue in relation to the foot of the mouth while maintaining the low fundamental. A series of partials resonated in disorder inside my ears.


At the beginning I obtained the harmonics of a perfect chord. Slowly but surely, after a week of inten­sive work, by changing the fundamental tone upwards or downwards, 1 had managed to discover all by myself a vocal jaw’s harp technique or biphonic singing style which appeared to be similar to that used by the Mongols and the Tuvins.


Basic Techniques

After two months of research and numerous experiments of all kinds I was able to establish some of the basic rules for the realization of what I call biphonic singing.


1) Half open the mouth.

    2) Emit a natural sound on the letter A without forcing the voice and remaining in the middle part of the vocal range (between F and A below    

         middle C for men, and between F and A above middle C for women).

3) Intensify the vocal production while vibrating the vocal chords.

4) Force out the breath and hold it for as long as possible.

      5) Produce the letter L. Maintain the position with the tip of the tongue touching the roof of the mouth.

      6) Intensify the tonal volume while trying to keep the tongue stuck firmly against the palate in order to divide the mouth into two cavities, one at the back  

          and one at the front, so that the air column increases in volume through the mouth and the nose.

      7) Slowly pronounce the sounds represented by the phonetic signs “i” anti “u” while varying the position of the lips.

8) Modify the buccal cavity by changing the position of the tongue inside the mouth without inter­rupting or  

    changing the height of the fundamental already amplified by the vibration of the vocal chords.

9)     In this way it is possible to obtain both the drone arid the partials or harmonics either in ascending or descending order according to the desire of the



For beginners the harmonics of the perfect chord (C. E. G. C) are easy to obtain. However, a considerable amount of hard work is necessary especially to obtain a pentatonic anhemitonic scale. Every person has his favourite note which permits him to produce a large range of partials. This favourite funda­mental tone varies according to the tonal quality of the singer’s voice and his windpipe. It often happens that two people using the same fundamental tone do riot necessarily obtain the same series of partials.


Regular practice and the application of the basic techniques which 1 have just described above per­mitted me to acquire a range of between an eleventh and a thirteenth according to the choice of the drone. Biphonic singing can also be practised by women and children, and several successful exper­iments have been carried out in this connection.


Other experiments which I have been carrying out recently indicate that it is possible to obtain two simultaneous sounds in two other ways. In the first method, the tongue may be either flat or slightly curved without actually at any stage touching the root of the mouth, and only the mouth and the lips move. Through such variation of the buccal cavity, this time divided into a single cavity it is possible to hear the partials faintly.


In the second method the basic technique described above is used. However instead of keeping the mouth half open it is kept almost completely shut with the lips pulled back and very tight. To make the partials audible, the position of the lips is varied at the same time as that of the tongue. The partials are very clear and distinctive, but the technique is rather exhausting and it is not possible to sing for a long time using it.


In the northeast of Mongolia in the borderland area between Mongolia and Siberia live the Tuvins, a people of Turkish origin numbering one hundred thousand. The Tuvins possess not only the biphonic singing style used by the Mongols, but four other different styles within this genre, called svgyt, ezengileer. kargyraa and borbannadyr. Table 1 will facilitate comparison between these four styles.


Biphonic singing is also practised by a number of ethnic groups in the republics of the Soviet Union bordering on Mongolia.


The late John Levy made a recording in Rajasthan in 1967 on which can be heard an example of biphonic singing similar to that practised by the Mongols and the Tuvins (1). The virtuoso performer in the recording imitates the double flute called the satara (an instrument producing simultaneously a drone and a melody) or the Jew’s harp with his voice. However, this may well be an exceptional ex­ample in that no mention is ever made of biphonic singing techniques in the musical traditions of Rajasthan or elsewhere in India.


Tibetan monks, particularly those in the monasteries of Gyume and Gyuto(2), make use of a technique using two simultaneous voices, although this technique is far less developed than that used by the Mongols and the Tuvins. The low register of the drone makes it impossible to produce harmonics as clear and resonant as those emitted by the Mongols and the Tuvins, and furthermore the production of harmonics is not the aim of Tibetan Buddhist chant.


In Western contemporary music groups of singers have also succeeded in emitting two voices at the same time and vocal pieces have been created in the context of avant‑garde music (3) and in recent years of electronic music (4).


An X-ray film was mode for the first time in 1974 at the Centre Medico‑chirurgical of the Porte do Choisy in Paris at the request of Professor S. Borel‑Maisonny, speech therapist and of Professor Emile Leipp, acoustician. This film which was made with the cooperation of the present author made it possible to examine closely the internal functioning and placement of the tongue during biphonic singing, and was thus of great interest. Thanks to this film the author has improved his biphonic singing technique as a result of which he has been able to decrease the volume of the drone and in­crease that of the harmonics.


Table 1 Characteristics of the biphonic singing styles of the Tuvins


                   sygyt                      ezengileer         kargyraa              borbannadyr

Pitch of      Changes in the      No change        No change,            No change

the drone or                              course of singing                            although sometimes

fundamental                                                                                     lowered by a minor



Tonality      More intense and  Same as sygyt   low                        Soft

                   higher than that of

                   the kargyraa style

Position     Half open              Half open          Half open              Almost closed

of tile



Harmonics  8, 9, 10 for uneven                         (6), 8, 9, 10,         (6), 8,.9, 10,11.2 6, 7, 8, 9,10, 12.

or partials   verses                    11, 12,13                                        13

                    8, 9, 10, 12 for

                    even verses

Special        ‑Harmonics used  ‑Alternation of ‑Each vowel         ‑Occasionally

features       as an ostinato        strong and weak                              corresponds to a three voices with

                    accompaniment.   accents like a    partial                    two used as a

                    thus resulting in a                          gallop rhythm      ‑Psalmodic drone: tonic and

                    narrow range                                   recitation with or fifth (in exceptional

                    ‑in the course of                             without special     cases) and third

                    a. song, at the end                           text on 2 pitches   voice producing

                    of each phrase a                              or drone in 2         harmonics

                    note is held                                     1 positions rising and ‑Called khomei in

                    (fundamental for                            descending by a    certain areas,

                    uneven verses, or a                                                       minor third

                    descending tone                              ‑Called borban­

                    for even verses)                              nadrt in cases

                                                                             when the borban­

                                                                             nadyr is named



Acoustical Analysis‑introduction


The present study is concerned with biphonic singing its understanding and interpretation, and does not constitute a complete and definitive piece of research. In fact the discovery of certain phenomena permits us only to imagine what might be the reality, this being particularly true in relation to the mechanism involved in the production of biphonic singing. Thus it will be necessary to carry out further research in the following areas: psycho ‑acoustics and particularly the perception of pitch and phonatory acoustics.

        Biphonic singing differs from so‑called natural singing on account of its sonority as well as of course the vocal technique involved. As its name indicates it consists of two sounds. On the basis of simple aural observation, it is possible to distinguish a first sound whose pitch is constant and which we shall call the drone and a second sound which takes the form of a melody which the singer can produce at will. It is basically possible for anybody to produce this biphonic sonority but to make the second voice dominate and to trace a melody with it depends upon the talent of the artist.

        Firstly, we shall examine the concept of pitch perception in terms of acoustics and psycho‑acoustics. Secondly we shall try to define biphonic singing, to differentiate it from other vocal techniques and to specify its scope. It will then be worthwhile to formulate several hypotheses concerning the mechanism whereby this style of singing is produced and finally to present a few examples of such a technique.


Pitch Perception

It is first of all necessary to comprehend exactly what is meant by the pitch of sounds or tonality. This concept presents a considerable amount of ambiguity and does not correspond to the simple principle of the measurement of the frequencies produced. The pitch of sounds is related more to psycho-­acoustics than to physics.

          Our own proposals are based partially on the recent discoveries of certain researchers, and partially on observations which we have made ourselves with the help of a sonagraph machine.

         The sonagraph makes it possible for us to obtain the image of the sound which we wish to study. On a single piece of paper is given information concerning time and frequency, and, in accordance with the thickness of the line traced information concerning intensity.

The classical manuals on acoustics tell us that the pitch of harmonic sounds, that is sounds with, for example a fundamental with the frequency F and a series of harmonic, F1, F2. F3…. multiples of F. is determined by the frequency of the first fundamental F. This is not entirely correct in that it is possible to suppress electronically this fundamental without thereby changing the subjective pitch of the actually perceived sound. If this theory were correct an electro‑acoustic chain not reproducing the lowest sound would change the pitch of the sounds. This is evidently not the case since the tonal quality changes but not the pitch. Certain researchers have proposed a theory which would appear to be more coherent: the pitch of sounds is determined by the separation of the harmonic lines or the difference in frequency between two harmonic lines. What is the pitch of the sounds, in this case for sonic spectra with “partials” (harmonics are not complete multiples of the fundamental)? In this case, the individual perceives an average of the separation of the lines in the zone which interests him. This in fact corresponds with the differences in perception which may be observed from one individual to the other (Fig. 1).


Fig. 1 Sonagram representation of three types of sound


a)         Harmonic spectrum: the harmonics are whole multiples of the fundamental.

b) Partials spectrum: the harmonics are no longer whole multiples of the fundamental.

c) Formant spectrum: two harmonics are intense and constitute a formant in the harmonic spectrum.

Formant spectrum: the accentuation in intensity of a group of harmonics constitutes a formant and is thus a zone of frequencies in which there is a large amount of energy.


Taking this formant into consideration a second concept of the perception of pitch comes to light. It has in effect been established that the position of the formant in the sonic spectrum results in the perception of a new pitch. In this case it is no longer a matter of the separation of the harmonic lines in the formant zone but of the position of the formant in the spectrum. This theory should be qualified however, since conditions also have to be considered.


Experiment: Tran Quang Hai sang two C’s an octave apart making his voice carry as if he were addressing a large audience. We observed, using a sonagram, that the maximum energy was situated in the zone perceptible by the human car (3, 4 KHz) and that the formant was situated between 2 and 4 KHz. We then recorded two C’s an octave apart in the same tonality, but this time he used his voice as it addressing a small audience, and we observed the disappearance of this formant (Fig. 2‑a. 2‑b).

In this case the disappearance of the formant does not change the pitch of the sounds. We then rapidly observed that the perception of pitch through the position of the formant was only possible it the formant was very acute for knowing that the sonic energy was only divided on two or three harmonics. Thus if the energy density of the formant is large and the formant is narrow the formant gives in­formation concerning the pitch as well as the overall tonality of the sonic item. Through this expedient we arrive at the biphonic vocal technique.


Fig. 4 Normal singing and biphonic singing

 a) Sonagraph representation of normal singing. An octave passage is equivalent to a doubling of the gap between the harmonic lines and to a drone of double frequency, (The first bar repre­sents the base line of the sonagram, and the drone is represented by the second bar.)

b) Sonagraph representation of biphonic singing. An octave passage is represented by a displace­ment of the formant. The harmonic lines of the formant are displaced in a zone in which the frequency is doubled.


Comparison between Biphonic Technique and Classical Technique


It may be said that biphonic singing consists as its name indicates, of the production of two sounds, one a drone which is low and constant, and the other at a higher pitch consisting of a formant which displaces itself in the spectrum in order to produce a certain melody. The concept of pitch given by the second voice is moreover somewhat ambiguous. The Western ear may need a certain amount of training before becoming accustomed to the sound quality.


Evidence concerning the drone is relatively easy to obtain thanks to the sonagram: it can be seen clearly and is also very clear on an auditory level. The device in Fig. 3 also makes it possible to see a pure amplitude frequency of a constant nature.


Fig. 3. Device for providing evidence of perfect constancy of the drone in intensity and frequency. 

After having examined the fundamental tone we compared two spectra, one of biphonic singing and the other of the so‑called classical singing style, the two being produced by the same singer. The sonagrams of these two types of singing are shown in Fig. 4. Classical singing is characterized by a doubling of the separation of the harmonic lines when an octave is exceeded (a). Biphonic singing is characterized on the other hand by the fact that the separation of the lines remains constant (this was foreseeable since the drone is constant), and that the formant is displaced by an octave (b). In fact it is easy to measure the distance between the lines for each sound. In this case, the perception of the melody in biphonic singing works through the expedient of the displacement of the formant in the sonic spectrum.

         It should be stressed that this is only really possible if the formant is high, and this is obviously so in the case of biphonic singing. The sonic energy is divided principally between the drone and the second voice consisting of two or at the most three harmonics.

         It has sometimes been stated that it is possible to produce a third voice. Using the sonagrarn we have in actual fact established that this third voice exists (see sonograms of Tuvin techniques), but it is impossible to state that it can be controlled. In our opinion this additional voice results more from the personality of the performer than from any particular technique.

As a result of our work we have been able to establish a parallel between biphonic singing and the technique of the Jew’s harp. As in the case of biphonic singing the Jew’s harp produces several different voices, the drone, the main melody and a counter melody. We may consider this third voice as a counter melody which may be produced on a conscious level but can presumably not be controlled. As far as possibility of variation is concerned, biphonic singing is the same as normal singing except in connection with pitch range.

The time of execution is evidently a function of the thoracic cage of the singer and thus of breathing, since the intensity is related to the output of air. Possibility of variation with regard to intensity is on the other hand relatively restricted and the level of the harmonics is connected to the level of the drone. The singer has to try and retain a suitable drone and produce the harmonics as strongly as possible. We have already observed that the clearer the harmonics the more the formant is narrow and intense. We are able furthermore to observe connections between intensity, time and clarity. Possibility of variation in relation to tone quality may pass without comment, since the resulting sound is in the majority of cases formed from a drone and one or two harmonics. The most interesting question is that of pitch range.


It is generally accepted that, for a sensible tonality (in consideration of the performer and of the piece to be performed a singer may modulate or choose between harmonics 5 and 13. This is true but should be stated more precisely. The range is a function of the tonality. If the tonality is on C2, the range represents nine harmonics from the fifth to the thirteenth, this involving a range of a major thirteenth. If the tonality is raised for example to C3 the choice is made between six harmonics, numbers 3 to 8 (see Table 2), representing an interval of an seventh. The following remarks should be made in this context. Firstly, the pitch range of biphonic singing is more restricted than that of normal singing. Secondly, the singer theoretically selects the tonality which he wishes between C2 and C3. In practice however, he instinctively produces a compromise between the clarity of the second voice and the pitch range of his singing, since the choice of the tonality is also a function of the musical piece to be performed. Thus if the tonality is raised, for example to C3, the choice of harmonics is restricted but the second voice is very clear. In the case of a tonality on C2 the second voice is more indistinct while the pitch range is at a maximum. The clarity of the sounds can be explained by the fact that in the first case, the singer is only able to select a single harmonic, whereas in the second case, he may select almost two (see Fig.5). As far as pitch range is concerned, it is known that the movement of the buccal resonators is independent of the tonality of the sounds produced by the vocal chords, or, put in another way. The singer always selects harmonics in the same zone of the spectrum whether the harmonics are broad or narrow.


It results from all this that the singer chooses the tonality instinctively in order to have the maximum range and clarity. For Tran Quang Hai, the best compromise exists between C2 and A2. He can thus obtain a range of between an octave and a thirteenth.


Mechanism for the Production of Biphonic Singing


It is always very difficult to know what is taking place inside a machine when we are placed outside it and can only watch it in operation. This is the case with the phonatory mechanism. The following remarks are only approximate and of a schematic nature and should not be assumed to be the final word on the subject. In dealing by analogy with the phonatory system we can get an idea of the mech­anisms but surely not a complete explanation. Fig. 6 is a representation of the phonatory system which can be compared with Fig. 7, showing an excitation system producing harmonic sounds and a series of resonating systems amplifying certain parts of this spectrum.


A resonator is a cavity equipped with a neck capable of resonating in a certain range of frequencies. The excitation system, i.e., the pharynx and the vocal chords emits a harmonic spectrum consisting of the frequencies F1, F2. F3. F4 … of resonators which select certain frequencies and amplify them. ImageThe choice of these frequencies evidently depends upon the ability of the singer. This is the case when a singer projects his voice within a large hail in that he instinctively adapts his resonators in order to produce the maximum energy within the area in which the ear is sensitive.

It should be noted that the amplified frequencies are a function of the volume of the cavity, the section of the opening and the length of the neck constituting the opening:

Through this principle it is possible to see already the action of the size of the buccal cavity, of the opening of the mouth, and of the position of the lips during singing.

However, this does not tell us anything about biphonic singing. In practice we need two voices. The first, the drone, is given to us simply by virtue of the fact that its production is intense, and that in any case, it does not undergo filtering by the resonators. Its intensity, higher than that of the harmonics, permits it to survive on account of buccal and nasal diffusion. We have observed that as the nasal cavity was closed, so the drone diminished in intensity. This occurs for two reasons, firstly that a source of diffusion is closed through the nose and secondly, by closing the nose the flow of air is reduced, as is the sonic intensity produced at the level of the vocal chords.

The possession of several cavities is of prime importance. In practice, we have established that only coupling between several cavities has enabled us to have a sharp formant such as is required by biphonic singing.

For the purposes of this research we initially carried out investigations into the principle of resonators in order to determine the influence of the fundamental parameters. It was observed that the tonality of the sound rises if the mouth is opened wider. In order to investigate the formation of a sharp formant, we carried out the following experiment. Tran Quang Hai produced two kinds of biphonic singing, one with the tongue at rest. i.e., not dividing the mouth into two cavities and the other with the mouth divided into two cavities. The observation which we made is as follows (an observation which could have been foreseen on the basis of the theory of coupled resonators).


In the first case the sounds were not clear: the drone could be heard distinctly but the second voice was difficult to bear. There was no clear distinction between the two voices, and, furthermore, the melody was indistinct. The cor­responding sonagrams bore this out: with a single buccal cavity the energy of the formant is dispersed over three or four harmonics and so the sense of a second voice is very much on the weak side. On the other hand, when the tongue divides the mouth into two cavities, the formant reappears in a sharp and intense manner. In other words, the harmonic sounds produced by the vocal chords are filtered and amplified in a rough manner with a single buccal cavity and the biphonic effect disappears. Biphonic singing thus necessitates a network of very selective resonators which filters only the harmonics required by the singer. Fig. 8 shows the responses in frequencies of the resonators, both simple and coupled. In the case of a tight coupling between the two cavities, these produce a single and very sharp resonance. If the coupling is loose, the formant has less intensity and the sonic energy in the spectrum is stemmed. If the cavities are transformed into a single cavity, the pointed curve Image

becomes even rounder, and one ends up with the first example with a very blurred type of biphonic singing (tongue at rest). The conclusion can be drawn that the mouth along with the position of the tongue plays the major role, and it can be compared roughly to a pointed filter which changes its place in the spectrum with the sole aim of selecting the interesting harmonics.


We should like to express our gratitude and sincere thanks to Research Team 165 of the Centre Na­tional de la Recherche Scientifique directed by Mr. Gilbert Rouget, who allowed us access to valuable documents concerning biphonic singing stored in the sound archives of his department. Our thanks go also to Professor Claudie Marcel‑Dubois, Head of the Department of Ethnomusicology at tile Musee National des Arts et Traditions Populaires, who gave us a great deal of help and encourage­ment. We should like also to thank Professor Emile Leipp, Dr. Michele Castellango and Professor Solange Borel‑Maisonny, who made it possible for us to examine the internal functioning of biphonic singing by means of the production of a radiographic film.


(Translated from French by Robin THOMPSON)



1. This tape is preserved in the Ethnomusicology Department of the Musee de L’Homnic. Paris. Archive  

    number BM 78 2, 1.

2. See the record “The Music of Tibet.” recorded by Peter Crossley‑Holland, Anthology Records (30133)

    AST 4005, New York, 1970.

3. See the record “The tail of the Tiger.” Ananda 2.

4. An example is the electronic music composition entitled “Ve nguon” (Return to the Source), composed  

    by Nquyen Van Tuong, with Tran Quang Hai as soloist. The first performance was given in France in

    1975.  The third movement (25 minutes) uses biphonic singing.



CHEN-GIA TSAI : Kargyraa and meditation, TAIWAN


Kargyraa and meditation : Chen-Gia Tsai

Pipe model of a Kargyraa singer’s vocal tract

The melody pitch f1 (the centre frequency of the first formant) in Kargyraa voices is determined by the mouth opening. A perturbation method predicts the resonance shift caused by a bore enlargement at a position x0 of a pipe with an irregular geometry (e.g., Fletcher & Rossing 1991). During a performance of Kargyraa, the bore diameter of the vocal tract changes at the lips, a pressure node for all modes. Hence, an enlargement of mouth opening leads to an increase in the centre frequencies of the first andsecond formants (Tsai 2001).



Figure: (a) Spectrogram of a Kargyraa song “the far side of a dry riverbed” (b) and (c) are two snapshot spectra of (a). They show f2=2f1.

This pipe model does not predict (1) the small bandwidth of the first and second formants, and (2) “mode-locking” f2=2f1. I hypothesize that periodic vorticity bursts at the diffuser-like supraglottal structures are responsible for producing the strong components at f1 and 2f1.

Subharmonic generation

In Kargyraa, there is a nonlinear coupling between the two pairs of the vocal folds, which can lead to either entrainments or chaos. While 1:2 entrainment can produce beautiful voices of Kargyraa, pathological voices with the involvement of chaotic vibration of the ventricular folds have a hoarse quality (ventricular dysphonia).

Based on recordings of high-speed images of the laryngeal movement, Lindestad and colleagues (2001) reported that during Kargyraa singing the ventricular folds vibrated with complete but short closures at half the frequency of the true vocal folds, thus contributing to subharmonic generation.

Autonomic functions

It seems that stiffness of the ventricular folds cannot be manipulated by will, because they contain very few muscle fibres. However, the constantly increased ventricular function and repetitive closure may lead to new functional and anatomical changes in the interior of the larynx (such as ventricular hypertrophy) and, possibly, to a new system of innervation.

On the other hand, evidence of psycho emotional, cerebella or midbrain (e.g., Parkinsonism) types of ventricular dysphonia suggests sub-cortical influences of the ventricular folds.

It is interesting to note that Tibetan monks do not practice their vocalization. They improve the control of the ventricular folds through meditation! Meditation is a conscious mental process that induces a set of integrated physiologic changes termed the relaxation response. The elastic property of the ventricular folds may be affected by meditation through autonomic functions. They become so relaxed that they vibrate with complete closures at half the frequency of the true vocal folds. In contrast, emotional stress can lead to adduction and vibration of the stiff ventricular folds with incomplete closures. Because lower subharmonics are weak in such melancholic voices, they sound rough.

Tibetan monks stated repeatedly that while singing overtones one should always make a special effort to attune heart and mind to the meaning of the holy moment (Smith and Stevens 1967).

An overtone singer and researcher related the psychological mechanism underlying overtone singing during meditation to “a higher sound awareness”: When we meditate by way of singing the need to make pleasant or even beautiful sounds moves to the background. It is not the singing that decides whether we enter a truly meditative state of mind. More important is that we listen to ourselves that we search for the voice inside. We are not concerned with personal judgments about our voice or with the personality in our voice. Singing harmonics automatically focuses the mind more than most other types of singing, because we essentially sing just one tone and listen to its internal dynamics. Overtones demand from us a higher than normal sounds awareness. They fulfil a service in certain spiritual traditions and have a built-in symbolic association with ‘thing high’. They have the exceptional ability to unite voices to the highest degree and a tendency to unify the body and the mind. (van Tongeren 2002:207)

It is my hypothesis that overtone singing focuses the mind automatically on the weak pitch of the prominent nth harmonic. This form of meditation is designed to lead one to a subjective experience of absorption with the object of focus. From a viewpoint of neuroscience it seems appropriate that a model for this kind of meditation begins with activation of the prefrontal cortex and the cingulated gyrus. Brain imaging studies have suggested that tasks requiring sustained attention are initiated via activity in the prefrontal cortex, particularly in the right hemisphere, and the cingulated gyrus appears to be involved in focusing attention. In an excellent review paper on the neural basis of meditation, Newberg and Iversen (2003) proposed a neurophysiological network possibly underlying meditative states. They discussed the prefrontal cortex effects on thalamic activation, posterior superior parietal lobule deafferentation, hippocampal and amygdalar activation, hypothalamic and autonomic nervous system changes, autonomic-cortical activity, and neurotransmitter activity. Although their model may provide a general framework for studying the neural basis of meditation, it should be noted that there are categories and subcategories of meditation that may be associated with different neural activity. For example, overtone singing by Tibetan monks belongs to the meditation category in which the subjects focus their attention on a particular object. When the object is the melody composed of overtones, the mental task and thus neural activity may differ from the meditation technique that focuses the mind on an image, phrase, or word, because of the involvement of supraglottal structures.

Nitric oxide mechanisms


Nonadrenergic, noncholinergic (NANC) nerves, which cause relaxation of airway smooth muscle, have been described in several species including man. Nitric oxide appears to account for all the NANC response in human central and peripheral airways in vitro. A recent review on meditation stressed the importance of the involvement of nitric oxide during meditation (Esch et al. 2004, see also Kim et al. 2005). Based on these findings I propose a model for Tibetan overtone chanting:

The loop underlying Tibetan overtone chanting can be described as: (1) a monk adducts and relaxes the ventricular folds; (2) he sings overtones; (3) he focuses his mind on the weak pitch of reinforced overtones; (4) this concentration triggers autonomic functions and nitric oxide mechanisms that in turn lead to a relaxation of the smooth muscles in the supraglottal structures.


Andersson K, et. al. (1998) Etiology and treatment of psychogenic voice disorders: results of a follow-up study of thirty patients. J Voice 12: 96-106.

Doersten PG, Izdebski K, Ross JC, Cruz RM. (1992). Ventricular dysphonia: a profile of 40 cases. Laryngoscope 102: 1296-1301.

D’Antonio L, et. al. (1987) Perceptual-physiologic approach to evaluation and treatment of dysphonia. Ann Otol Rhinol Laryngol 96: 187-190.

Esch T, Guarna M, Bianchi E, Zhu W, Stefano GB. (2004) Commonalities in the central nervous system’s involvement with complementary medical therapies: limbic morphinergic processes. Med Sci Monit. 10(6):MS6-17.

Hisa Y, Koike S, Tadaki N, Bamba H, Shogaki K, Uno T. (1999) Neurotransmitters and neuromodulators involved in laryngeal innervation. Ann Otol Rhinol Laryngol Suppl. 178:3-14.

Kim DH, Moon YS, Kim HS, Jung JS, Park HM, Suh HW, Kim YH, Song DK. (2005) Effect of Zen Meditation on serum nitric oxide activity and lipid peroxidation. Prog Neuropsychopharmacol Biol Psychiatry. 2005 Feb;29(2):327-31. Epub 2004 Dec 29. Lazar SW, Bush G, Gollub RL, Fricchione GL, Khalsa G, Benson H. (2000) Functional brain mapping of the relaxation response and meditation. Neuroreport 11(7):1581-5.

Newberg AB, Iversen J. (2003) The neural basis of the complex mental task of meditation: neurotransmitter and neurochemical considerations. Med Hypotheses 61(2):282-91.

van Tongeren, M. (2002) Overtone singing – physics and metaphysics of harmonics in East and West. The Netherlands: Fusica,Amsterdam.

Yuceturk AV, Yilmaz H, Egrilmez M, and Karaca S. (2003) Voice analysis and videolaryngostroboscopy in patients with Parkinson’s disease. Eur Arch Otorhinolaryngol. 2002 259(6):290-3.


Chen-Gia Tsai, Yio-Wha Shau, and Tzu-Yu Hsiao : False vocal fold surface waves during Sygyt singing: A hypothesis, TAIWAN


False vocal fold surface waves during Sygyt singing: A hypothesis

Chen-Gia Tsai, Yio-Wha Shau, and Tzu-Yu Hsiao


Overtone singing is a vocal technique found in Central Asian cultures, by which one singer produces a high pitch of nF0 along with a low drone pitch of F0. The pitch of nF0 arises from a very sharp formant. Current physical modelling of overtone singing asserts that the harmonic at nF0 is emphasized by a resonance of the vocal tract. However, this approach could not explain the extraordinarily small bandwidth of this formant.

This paper offers a hypothesis that surface waves (Rayleigh waves) of the false vocal folds might actively amplify the harmonic at nF0 in a specific technique of overtone singing: Sygyt. We propose a loop for harmonic amplification, which is composed of (1) the vocal tract with resonance nF0, (2) surface waves of the false vocal folds, and (3) a varicose jet separating from the false folds. This model receives indirect support from an experimental study on a novel human vocalization, which is characterized by a prominent component at 4 kHz. During this pure tonal vocalization, false fold surface vibrations were detected by ultrasound colour Doppler imaging. High-frequency false fold surface waves may also occur during Sygyt singing.

1. Introduction

Overtone singing (or throat singing, biphonic singing) is a vocal technique found in Central Asian cultures such as Tuva and Mongolia, by which one singer produces a high pitch of nF0 along with a low drone pitch of F0 (F0 is the fundamental frequency, n = 6, 7, …13 in typical performances). The voice of overtone singing is characterized by a sharp formant centered at nF0, as can be seen in Figs. 1 and 2. Traditional techniques of overtonesinging include Khoomei, Sygyt, Kargyraa and others.

There are two approaches of physical modelling of overtone singing: (1) the double-source theory [1], which asserts the existence of a second sound source that is responsible for the melody pitch; and (2) the resonance theory, which asserts that a harmonic is emphasized by a extreme resonance of the vocal tract. The fact that the melody pitches producible by the singer are limited to the harmonic series of the drone was regarded as robust support of the resonance theory [2].Image

Recent attempts of physical modleling of Sygyt were concerned with calculation of the transfer function of the vocal tract using one-dimensional models, successfully predicting the formant frequency [2,3]. From a theoretical standpoint, however, this approach may not be suitable for the tract with a rapidly flaring bell section. A Sygyt singer raises the tongue so that the tract shape changes abruptly at the narrowing of the tongue (marked with a red dot in Fig. 1b), where the assumption of planar wave fronts breaks down, and evanescent cross-modes can be excited in this flaring section even at low frequencies [4]. This may leads to errors in transfer function calculation using one-dimensional models. An alternative approach of Matched Asymptotic Expansions for modelling a Sygyt singer’s vocal tract was proposed in [5].

In a two-resonator theory, a Sygyt singer’s vocal tract was modelled as a coupled system of a longitudinal resonator that was from the glottis to the narrowing of the tongue, and a Helmholtz resonator that was from the articulation by the tongue to the mouth exit. Experiments showed that for some Sygyt voices with a sharp formant two resonances were matched, while a melody pitch can be perceived even in the case of not exactly matched resonances [6]. Although the formant magnitude was shown to be increased by resonance matching [3], it is unclear whether resonance-matching will reduce the formant bandwidth.

From a psychoacoustic point of view, a small bandwidth of the prominent formant is critical to a clear melody in Sygyt singing. A preliminary study using an autocorrelation model for pitch extraction suggested that the pitch strength of nF0 increased along with the Q value of this formant, with the formant magnitude playing a secondary role [5]. The spectrum of the Sygyt voice shown in Fig. 1a has the 12th harmonic approximately 15 dB stronger than its flanking components. If the amplification of this harmonic cannot be explained in terms of vocal tract impedance, it should be attributed to the source signal.Image

The insufficiency of the resonance theory is even more notable in another technique of overtone singing: Kargyraa. A

Kargyraa singer uses his false vocal folds to produce low pitched drone, manipulating his mouth opening to change the vocal tract resonance. Spectra in Fig. 2 show that the centre frequencies of the first and second formants of Kargyraa voices always stand in the ratio of 1:2. This strange phenomenon suggests an unknown glottal source that produces the outstanding component at F1, and its second harmonic.

The goal of this study is to offer a physical model based on a nonlinear loop that explains the harmonic amplification in

Sygyt. This model asserts that surface waves (Rayleigh waves) of the adducted false vocal folds can actively amplify a harmonic. We first discuss the interactions between the false vocal fold surface waves (FVFSWs), the glottal flow and acoustic waves. A preliminary experiment that provided indirect evidence of this model is then addressed.

2. Theory

2.1. Rayleigh surface waves

The Rayleigh surface wave is a specific superposition of a transverse wave and a longitudinal wave of an elastic solid (see, e.g. [7]). Its amplitude is significant only near the surface and attenuates exponentially with the depth. The trajectories of material particles are ellipses. At the surface the normal displacement is about 1.5 times the tangential displacement. The velocity of Rayleigh waves, independent on the wavelength, is about 0.9 times the transverse wave velocity. Rayleigh’s theory of surface waves has been generalized to viscoelastic solids (see, e.g. [8]).

The assumption of Rayleigh surface wave on the false vocal folds is supported, although indirectly, by recent measurements of the medial surface dynamics of the vocal folds [9]. The trajectories of fleshpoints were approximately ellipses, with the length ratio of the two axes varying in the range of 1.5-2.0. This value is in remarkable agreement with Rayleigh’s theory of surface waves.

2.2. Physical modelling of Sygyt

Here we propose a physical model that describes how FVFSWs absorb the energy of the glottal flow and acoustic waves.


The false folds are significantly adducted during Sygyt singing. Hence, the volume flow through them (UF) is sensitive to FVFSWs. FVFSWs are supposed to be triggered by the acoustic pressure, which is predominated by the resonance of the vocal tract nF0. So we assume a FVFSW with the frequency of nF0.

Based on the assumption of elliptic movements of fleshpoints on the false folds, snapshots of this wave can be obtained. The ellipses in Figs. 3b and 3c represent the trajectory of fleshpoints. We estimate the energy exchange between the flow and the tissue occurs at one point. In Fig. 3b the work done by the viscous flow at this point is positive. In Fig. 3c the flow separates upstream, performing no work (or positive work, if back-flow appears) at this point. It can easily be seen that over a period the FVFSW absorbs energy from the flow in the vicinity of the flow separation point, which moves back and forth at a crest of the FVFSW, modulating the flow through the false folds at frequency of nF0. This induces varicose oscillations of UF, which produce the harmonic at nF0 in the source signal. This harmonic is in turn reinforced by the strong vocal tract resonance at nF0.

The net work done by the sinusoidal acoustic wave with frequency nF0 at a point on the false fold over a period can be positive or negative, depending on the phase relationship between the FVFSW and the acoustic pressure. We suppose that within a half wavelength of the FVFSW in the vicinity of the flow separation point, the FVFSW absorbs the acoustic energy of the harmonic at nF0. Away from this flow separation point, the FVFSW is expected to decay rapidly because of large viscous losses in the tissue during high frequency vibrations. We thus conclude that the total work done by the acoustic wave on the FVFSW is positive.

To sum up, a loop for Sygyt is established in terms of (1) linear resonator: the vocal tract with resonance at nF0, (2) energy source: pressure difference across the false glottis, and (3) nonlinear amplifier: a flow separating from curved walls with mucosal layers receiving acoustic feedback. This self sustained oscillator differs from the true vocal folds in that the false fold mucosa does not vibrate at any intrinsic resonance, but rather respond to the acoustic pressure.

2.3. Discussion

The present model explains the crucial role of the adduction of the false folds in Sygyt technique. Because of this adduction the flow velocity over their mucosal layers is high enough to   supply the energy for sustaining FVFSWs. It is interesting to note that FVFSWs have been observed in patients suffering from ventricular dysphonia [10], although their frequencies appeared to be much lower than those during Sygyt singing.

From an empirical standpoint, learning Sygyt is much more difficult than it is implicated by the resonance theory. In workshops of overtone singing, it has been repeatedly observed that only very few people are able to produce voices with a clear melody pitch. The present model predicts that one cannot sing Sygyt well even when manipulating the tract shape perfectly, because his false folds are not correctly adducted, or their mucosal layers do not have a proper shape, thickness, and viscoelastic properties.

The loop described in our model tends to “unify” the double-source theory and the resonance theory of overtone singing. Whereas the true vocal folds and the vocal tract are, as usual, viewed as the independent source and filter, the false fold mucosa plays a key role in introducing acoustic feedback into the loop for harmonic amplification.

The present model for Sygyt might also shed new light on the production of high-frequency, whistle-like voice type of birds, dolphins, whales, and groaning dogs. In this regard, our model is an updated version of the double-source theory [1], which already drew parallels between the sounding mechanisms of overtone singing and the whistle-like voice type, which is produced with the false folds adducted.

It is interesting to compare the harmonic-amplification loop with the sounding mechanism of flute-type instruments, which is based on a loop composed of a vibrating jet and acoustic waves filtered by a resonator. In the case of flutes the jet separates from the musician’s lips, travelling along the mouth of the resonator towards a sharp edge. When the instrument produces a tone, the jet oscillates at one of the resonances of the pipe. The acoustic flow field near the flow separation point excites sinuous oscillations of the jet. At the sharp edge, the jet is directed alternately toward the inside and the outside of the resonator. This pulsing injection induces an equivalent pressure difference across the mouth that excites and maintains acoustic waves in the pipe [11]. The jet, like the false fold mucosa, does not vibrate at any intrinsic resonance. It should be noted that the acoustic flow induces sinuous oscillations of the jet at the mouth hole of a flute, whereas the acoustic pressure excites FVFSWs that induce varicose oscillations of the glottal flow.

While a varicose jet is essential for whistle-like sound production, the role of wall vibration is not fully understood. It has been suggested that the sounding mechanism of human whistling is a loop composed of the jet and the oral cavity with a prominent resonance. The pressure fluctuations due to the acoustic wave at the flow separation point could induce varicose oscillations of the jet without any wall vibration. This model is in an interesting contrast to our model of Sygyt, which assumes vibrations of the compliant walls. To examine the assumption of FVFSWs in our model of Sygyt, we measure surface vibrations during whistle-like singing in vivo.

3. Experimental Study

3.1. Whistle-like voice type


The present model of “varicose jet oscillations induced by surface waves of curved walls in the vicinity of the flow separation point” may provide insight into the production of the whistle-like voice type in birds and mammals. It has been suggested that the production mechanism of bird whistled song might be related to a retraction of the syringeal membranes while in oscillation so that they no longer completely close, leading to a great reduction in the harmonic content of the flow. An alternative explanation of whistled song is that it is produced by pure aerodynamic means without any vibrating surfaces [12]. However, recent experimental studies favour the sounding mechanism of vibrating surface [13,14].

After some practice, human can imitate dog’s groaning to produce high-frequency whistle-like voices, which have a prominent component approximately at 4 kHz, as shown in Fig. 4c. We hypothesize that the mechanism underlying this vocalization is a varicose jet induced by FVFSWs.

Medical ultrasound (US) provides an ideal non-invasive method for observing high-frequency surface vibrations with small amplitude, because the vibratory artefact of colour Doppler imaging (CDI) detects surface velocity rather than displacement. In previous studies, the CDI was used to measure the frequency and the length of the vocal folds during normal phonation [15,16]. In the present experiment we employ this technique to detect FVFSWs during whistle like singing.

3.2. Methods

A commercially available, high resolution US scanner (HDI-5000, ATL, Bothell, WA) with a 5- to 12-MHz linear-array transducer (L12 to 5 38 mm, ATL) was used in this study. The frame rate in B-mode was about 25 Hz. In the colour mode, the pulse-repetition rate was 10,000 Hz and th measuring velocity range was set at 0 to 128.3 cm/s with baseline offset, which resulted in a frame rate of about 7 Hz. TheUS scan head was placed horizontally at the midline of the thyroid cartilage lamina on one side (Fig. 4a). The subject is the first author of this paper, who is a healthy man aged 33 with normal vocal function. For this experiment he had practiced the whistle-like vocalization for a week.

3.3. Results

CDI colour artefacts detected surface vibrations of the right false vocal fold during pure tonal singing (Fig. 4d). During warming up of this vocalization, surface vibrations of the right vocal fold and the false fold were observed (Fig. 4b).

The frequency of pure tonal singing was found to range from 3.7 kHz to 4.6 kHz. Out of this range the voice lose the pure tonal characteristic, with breathy noises accumulating at the prominent resonance.

4. Concluding Remarks

The observation of false fold surface vibrations during pure tonal singing provides indirect support of our model for Sygyt. As FVFSWs may generate 4 kHz pure tonal voices with the second harmonic 30 dB (or more) weaker than the fundamental, it should be possible that a Sygyt singer amplifies a selected harmonic of the voice produced by the true vocal folds through FVFSWs.

The role of acoustic feedback in FVFSW generation is not fully understood. When the acoustic wave filtered by the resonator is strong enough to trigger FVFSWs, a loop for pure tonal vocalization may be established. If not, periodic FVFSWs may not occur. The laryngeal ventricle may be the Helmholtz resonator that is responsible for the prominent resonance at 3.7-4.6 kHz. However, this “resonance” model appears against experimental results about bird’s pure tonal vocalization [13,14]. If the frequency of surface waves is not determined by the tract resonance, it should be determined by the tissue curvature, elastic properties, and the flow speed. In the case of Sygyt singing, however, it has not been reported that a singer manipulates the false folds to change the melody pitch. Further research is needed to compare the sounding mechanisms of Sygyt singing and the pure tonal vocalization.

One implication of our surface wave model is that the vertical motion of fleshpoints on the true/false vocal folds may be critical to their self-sustained oscillation. The two-mass and three-mass models of the vocal folds [17,18] do not take into account the ellipse-like motion of vocal fold fleshpoints, which is consistent with Rayleigh’s theory of surface waves and has been demonstrated in excised canine larynx experiments [9]. We suggest that the vertical motion of fleshpoints near the flow separation point can absorb the kinetic energy of the glottal flow through viscous shear force.

The effect of surface viscous shear stress exerted by a flow also plays a central role in the system of a pair of fluttering flags in wind. This system shows some notable similarities of the glottis. When the inter-flag distance lies in a definite range the flags flutter in an out-of-phase state and generate a pulsating flow, with striking similarities of the vocal fold vibration in the chest register. Flow visualizations showed significant shear stress on the flags exerted by the flow [19]. This finding suggests that viscous shear stress on the vocal fold mucosa should not be ignored, especially in the vocalizations with a large open quotient.

Next to the viscosity effect, the surface shear stress may be attributed to the carrying-along of the varicose flow. It was observed in a pair of flags that the flag wave propagates along with the flow, while the wave of an isolated flag propagates in the direction opposite to the flow. Note that the surface shear stress dominates the system of a pair of flags but not an isolated flag [19]. It is likely that the surface shear stress is due to the effect that a varicose or sinuous flow carries along the flag wave. This approach may shed new light on the mechanism of the self-sustained oscillation of the vocal folds.

5. References

[1] Chernov, B.; and Maslov, V. 1987. Larynx double sound generator. Proc. XI Congress of Phonetic Sciences,

Tallinn 6, 40-43.

[2] Adachi, S.; and Yamada, M. 1999. An acoustical study of sound production in biphonic singing, Xöömij. J. Acoust. Soc. Am. 105(5), 2920-2932.

[3] Kob, M. 2002. Physical modeling of the singing voice. PhD thesis, Aachen University (RWTH).

[4] Pagneux, V.; Amir, N.; and Kergomard, J. 1996. A study of wave propagation in varying cross-section waveguides by modal decomposition. Part I. Theory and validation. J.Acoust. Soc. Am. 100, 2034-2048.

[5] Tsai, C.G. 2004. Physics and perception of overtone singing. URL: http://jia.yogimont.net/overtonesinging/

[6] Kob, M.; and Neuschaefer-Rube, C. 2004. Acoustic properties of the vocal tract resonances during Sygyt singing. Proc. of the International Symposium on Musical Acoustics, Nara, Japan.

[7] Achenbach, J.D. 1984. Wave propagation in elastic solids. Elsevier, New York.

[8] Romeo, M. 2001. Rayleigh waves on a viscoelastic solid half-space. J. Acoust. Soc. Am. 110 (1), 59-67.

[9] Berry, D.A.; Montequin, D.W.; and Tayama, N. 2001. High-speed digital imaging of the medial surface of the vocal folds. J. Acoust. Soc. Am. 110(5), 2539-2547.

[10] Nasri, S.; Jasleen, J.; Gerratt, B.R.; Sercarz, J.A.; Wenokur, R.; and Berke, G.S. 1996. Ventricular dysphonia: a case of false vocal fold mucosal travelling wave. Am. J. Otolaryngol. 17(6), 427-431.

[11] Verge, M.P.; Caussé, R.; Fabre, B.; Hirschberg, A.; Wijnands, A.P.J.; and van Steenbergen, A. 1994. Jet oscillations and jet drive in recorder-like instruments. Acustica 2, 403-419.

[12] Gaunt, A.S.; Gaunt, S.L.L.; and Casey, R.M. 1982. Syringeal mechanics reassessed: evidence from Streptopelia. Auk 99, 474-494.

[13] Brittan-Powell, E.F.; Dooling, R.F.; Larsen, O.N.; and Heaton, J.T. 1997. Mechanisms of vocal production in budgerigars (Melopsittacus undulatus). J. Acoust. Soc.Am. 101, 578-589.

[14] Ballintijn, M.R.; and Cate, C.T. 1998. Sound production in the collared dove: a test of the ‘whistle’ hypothesis. J

Experimental Biology 201, 1637-1649.

[15] Shau, Y.W.; Wang, C.L.; Hsieh, F.J.; and Hsiao, T.Y.

2001. Noninvasive assessment of vocal fold mucosal wave velocity using color Doppler imaging. Ultrasound

Med. Biol. 27, 1451-1460.

[16] Hsiao, T.Y.; Wang, C.L.; Chen, C.N.; Hsieh, F.J.; and Shau, Y.W. 2002. Elasticity of human vocal folds

measured in vivo using color Doppler imaging. Ultrasound Med. Biol. 28, 1145-1152.

[17] Ishizaka, K.; and Flanagan, J.L. 1972. Synthesis of voiced sounds from a two-mass model of the vocal cords.

Bell Syst. Tech. J. 51(6), 1233-1268.

[18] Story, B.H.; and Titze, I.R. 1995. Voice simulation with a body cover model of the vocal folds. J. Acoust. Soc. Am.97, 1249-1260.

[19] Zhang, J.; Childress, S.; Libchaber, A.; and Shelley, M. 2000. Flexible filaments in a flowing soap film as a model for one-dimensional flags in a two-dimensional wind. Nature 408, 835-839.



Physical Modelling of the vocal tract of a Sygyt singer

Chen-Gia Tsai

Source theory vs. Resonance theory

Two types of overtone-singing should be distinguished: Sygyt and Kargyraa. In Sygyt performances, the rising tongue divides the vocal tract into two cavities, which are connected by a narrow channel, whereas the tongue does not rise in Kargyraa performances.

Up until now, two major theories have been proposed on the production of the melody pitch: (1) The ‘double-source’ theory (Chernov & Maslov 1987), which asserts the existence of a second sound source such as a whistle-like mechanism formed by the narrowing of the false vocal folds (ventricular folds) in addition to the true vocal fold vibration; and (2) the ‘resonance’ theory, which asserts that only a glottal sound source exists, but that an upper harmonic is so emphasized by an extreme resonance of the vocal tract that it is segregated from the other components and heard as another pitch. The fact that the melody pitches producible by the singer are limited to the harmonic series of the drone supports the resonance theory (Adachi & Yamada 1999).

Physical modelling of the resonance of the vocal tract of Sygyt singers includes: (1) rear cavity theory, (2) front cavity theory, and (3) resonance-matching theory. The glottal sound source of Sygyt voices is rich in harmonics. This has been attributed to the short open duration of the glottis (Bloothooft et al. 1992, Adachi & Yamada 1999).

Rear cavity theory

Based on vocal tract shape measurements by MRI, Adachi and Yamada (1999) reported that the resonance of the rear cavity, that was, from the glottis to the narrowing of the tongue, produced the sharp formant Fk. The resonance of the front cavity, that was, from the articulation by the tongue to the mouth exit, was not critical to the production of the melody pitch. The length of the rear cavity decreases as fk increases.

Adachi and Yamada (1999) synthesized tones from transfer functions calculated with and without the front cavity, finding that the front cavity did not affect the formant frequencies, although the magnitude of Fk decreased due to the lack of the front cavity resonance. It is important to note that Adachi and Yamada calculated the transfer functions of a Sygyt singer’s vocal tract using a one-dimensional model, in which the tract shape was approximated as a succession of cones. While such models are widely used in speech research, I argue that the change in the tract shape at the articulation point is so abrupt that the assumption of planar-wave fronts clearly breaks down. Theoretically, one-dimensional models are unsuitable for a Sygyt singer’s vocal tract.

In practice, the rear cavity theory is not supported by a non-traditional technique of overtone-singing used by Tran Quang Hai, who calls it ‘one-cavity technique’ because the tongue does not rise to divide the vocal tract into two cavities. However, there is an articulation point at the soft palate, as to pronounce the velar /ng/. The melody of fk is produced by manipulating the opening of the front cavity, while the rear cavity, that is, from the glottis to the soft palate, remains unchanged. This technique suggests that the front cavity may be more important for the production of fk.

Front cavity theory

Based on preliminary impedance measurements of vocal tract by a Jew’s harp, Tsai (2001) reported that the resonance of the front cavity determined fk. The author modelled the front cavity as a Helmholtz resonator driven by a flow source U1 at the articulation point. The transfer function can be calculated according to Eq. (6.65) in [Fletcher & Rossing 1991].

Owing to the tract shape at the articulation point, the flow U1 is presumed to be incompressible. It is known that in regions of fast change in pipe geometry, such as a tone hole or the pipe termination, the Helmholtz number He<<1 implies that the wave equation can locally be approximated by the Laplace equation, which describes an incompressible potential flow (Hirschberg & Kergomard 1995). In overtone-singing, the acoustic flow at the articulation point is therefore incompressible (compact region). This is not true for normal phonations.

The front cavity theory failed to explain the small bandwidth of Fk. Fig. 2 compares the matched theoretical spectral envelops and recorded spectra of a Sygyt voice and a Jew’s harp tone, which were produced by me with the same front cavity. It can be seen that the Fk bandwidth of the voice is smaller than that of the Jew’s harp tone. The latter was produced without the rear cavity because the rising tongue completely closed the channel between the front and the rear cavities. This discrepancy suggests that the rear cavity may play a role in sharpening Fk.

Figure 2: Spectra of a Sygyt voice (left) and a Jew’s harp tone (right) produced with the same front cavity.

Resonance-matching theory

The resonance-matching theory takes into account the contributions of both the front and the rear cavities, whose resonances are more or less matched to produce a sharp Fk. Kob (2002), reported that an improvement of the second resonance by about 15 dB was achieved by matching two resonance frequencies, which was fulfilled by manipulating the mouth opening. Although this theory appears to ‘unified’ the theories of rear/front cavity, it should be noted that according to Table 6.1 in [Kob 2002], the resonance of the front cavity was just close to the second resonance of the rear cavity; Fk could be sharp enough for pitch production without an exact resonance-matching.


Kob (2002) calculated the transfer functions of a Sygyt singer’s vocal tract using an improved method of continuous-time interpolated multiconvolution (Barjau et al. 1999), which was originally developed to calculate the impulse response of wind instruments with tone-hole discontinuities. However, this approach does not predict the flow field at the articulation point. Fig. 3 displays the shape of a Sygyt singer’s vocal tract and the potential field at the articulation point. As can be seen from the isobar (equal-potential) lines, the acoustic flow has a higher velocity near the tongue. This contradicts the assumption of planar-wave fronts in Kob’s calculation.

Figure 3: Shape of a Sygyt singer’s vocal tract (left) and the isobar lines at the articulation point (right).

The limitations of one-dimensional models of the vocal tract or the bore of wind instruments should be borne in mind: even at low frequencies evanescent cross-modes will be excited in the rapidly flaring bell section because of strong mode coupling (e.g., Pagneux et al. 1996). In a Sygyt singer’s vocal tract, one-dimensional models are suitable only for the rear cavity.

The vocal tract sould be divided into four regions, in which the wave equations have different forms for approximation. In light of Matched Asymptotic Expansions, the global solution can be obtained by ‘gluing’ the local solutions together (Hirschberg & Kergomardh 1995). The four regions are (1) the rear cavity, (2) the compact region at the articulation point, (3) the front cavity as a Helmholtz resonator, and (4) the compact region at the mouth opening. The rear cavity is approximated as a succession of cones, where the acoustic field is governed by the Webster equation for He<<1. At the articulation point and at the mouth opening, the incompressible air is approximated as a piston. The front cavity is a Helmholtz resonator with a short neck.

If the transfer function of a Sygyt singer’s vocal tract does not predict the small bandwidth of the second formant, one should consider the possible effect of acoustic feedback to the glottal source (Levin and Edgerton 1999). This may be related to the nonlinear effect of the adducted ventricular folds.

CHEN-GIA TSAI : Physical Modelling of the vocal tract of a Sygyt singer

CHEN-GIA TSAI : Perception of Overtone Singing , TAIWAN


Perception of Overtone Singing : Chen-Gia Tsai

Pitch strength

Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.

A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.

Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation

Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.

The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.

Perception of rapid fluctuations

Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.

Ken-Ichi Sakakibara*1, Tomoko Konishi, Emi Zuiki Murano*2, Hiroshi Imagawa*2, Masanobu Kumada*3, Kazumasa Kondo*4, and Seiji Niimi*5 : Observation of Laryngeal Movements for Throat Singing Vibrations of two pairs of folds in the human larynx, JAPAN


First Pan-American/Iberian Meeting on Acoustics, Cancun


Lay Language Paper Index Press Room ]

Observation of Laryngeal Movements for Throat Singing 
Vibrations of two pairs of folds in the human larynx

Ken-Ichi Sakakibara*1, Tomoko Konishi, Emi Zuiki Murano*2, Hiroshi Imagawa*2, Masanobu Kumada*3, Kazumasa Kondo*4, and Seiji Niimi*5 

*1 NTT Communication Science Laboratories, 3-1, Morinosato Wakamiya, Atsugi-shi, 243-0198, Japan
http://www.brl.ntt.co.jp/people/kis/ ,kis@brl.ntt.co.jp or k_i_s@hotmail.com 
*2 The University of Tokyo, Japan
*3 National Rehabilitation Center for the Disabled, Japan
*4 Asian University, Japan
*5 International University of Health and Welfare, Japan

Popular version of paper 2pMUa1
Presented Tuesday Afternoon, December 3, 2002
144th ASA Meeting, Cancun, Mexico


1. Singing voices of the world

In the world, there are various styles of singing. These variations in voices are mainly associated with variations in timbre. Such diversity of singing voices might have arisen due to cultural diversity such as climate, geography, language, racial physical feature, religion, musical structure, and so on. As a matter, we can find considerable differences between European traditional or classical singing voice, such as bel canto and German lied, and the Asian traditional pressed singing voices, such as throat singing around the Altai mountains, Japanese Youkyoku, and Korean Pansori. For instance, European traditional singing styles were developed as a result of performing in stone-made acoustical environment. Therefore, it requires constant timbre. On the other hand, most Asian singing styles were developed as result of performing in acoustical environment of softer material such as wood and mud. Therefore, it requires a rich and varied timbre. It’s possible to infer that singing styles and music structures (polyphonic in Europe and homophonic in Asia) have evolved by interacting with each other. Here, we study throat singing, which is one of the most sophisticated styles of pressed-type singing voices, and how its laryngeal voice is generated.

2. Throat singing

Throat singing is the traditional singing style of people who live around the Altai mountains. Khöömei in Tyva and Khöömij in Mongolia are representative styles of throat singing. Throat singing is sometimes called biphonic singing, or overtone singing because two or more distinct pitches (musical lines) are produced simultaneously in one tone. One is a low sustained fundamental pitch, called a drone, and the second is a whistle-like harmonic that resonates high above the drone. Sometimes throat singing mean wider styles including all the biphonic singing styles not restricted to the styles around the Altai mountains: e.g. Inuit, Xhosa, and so on. But here we use the term “throat singing” for the common styles around the Altai mountains: Khöömei, Khöömij, Kai in Altai, and so on.

The production of the highly pitched overtone of throat singing is mainly due to the pipe resonance of the cavity from the larynx to the point of articulation in the vocal tract, which appear as the 2nd formant in its sound spectrum. On the other hand, the laryngeal voice of throat singing has a special pressed timbre and supports the generation of the overtone.

The laryngeal voices of throat singing can be classified into two voices: (i) squeezed voice (soundfile); and (ii) kargyraa voice ( soundfile). based on the listener’s impression, acoustical characteristics, and the singer’s personal observation on voice production. The pressed voice is the basic laryngeal voice in throat singing and used as drone. The equivalent voice is used in Japanese Naniwabsuhi. The kargyraa voice is a very low pitched voice that ranges out of the modal register. The kargyraa voice is very basic in Kai and perceptually identical to Tibetan chant.


Fig. 1: Coronal section of human larynx

3. Ventricular folds (or false vocal folds): Another pair of folds than vocal folds in human larynx

The ventricular folds or false vocal folds (VTFs) are a pair of soft and flaccid folds which exist above the vocal folds (Fig. 1). While the vocal folds (VFs) have a mechanism that change the stiffness, thickness, and longitude by the muscles (mainly by the action of thyroarytenoid muscle), the VTFs are incapable of becoming tense, since they contain very few muscle fibres. It seems that the VTFs are capable of moving with the arytenoid cartilages. They are also abducted and adducted by the action of certain laryngeal muscles. The VTFs as well as the VFs act as air traps from lungs and prevent foreign substances from entering the lower respiratory tract. In normal phonation, the VTFs do not vibrate. But among some patients with dysphonia, the vibration of the VTFs is sometimes observed. 

Fig. 2: High speed digital imaging system

4. Vocal fold and ventricular fold vibrations

We observed laryngeal movements in throat singing directly and indirectly by simultaneous recording of high-speed digital images, and EGG (Electroglottography) and sound waveforms (Fig. 2). The high-speed digital images were captured at 4500 frames/s through a flexible endoscope inserted into the nose cavity of a singer. 

We obtained the following results from our observation. The common features of the squeezed and kargyraa voices which are an overall constriction of the supra-structures of the glottis and vibration of the VTFs. The difference lies in the narrowness of the constriction and the manner of VTF vibration. In the squeezed voice, the VTFs vibrate at the same frequency as the VFs and both vibrate in the opposite phase (Fig. 3). In the kargyraa voice, the VTFs can be assumed to close once for every two periods of closure of the VFs, and contribute to the generation of the subharmonic tone of kargyraa (Fig. 4).

Fig. 3: High-speed images of the laryngeal movement for squeezed voice
Fig. 4: High-speed images of the laryngeal movement for kargyraa voice


5. What is a beautiful singing voice?

Throat singers are able to keep healthy, clear, and beautiful voices though they use pressed-type voices which are regarded to be a non-preferable phonation in European traditional musical pedagogy. They are able to use VTFs as well as VFs and produce their preferable voices without hurting their phonatory organs. Moreover, anyone can become skilled at producing these laryngeal voices. 

Thus, the phonation of throat singing is natural and not mysterious.



We would like to thank Kiyoshi Honda, Koichi Makigami, Caroline Menezes, Johan Sundberg, and Masahiko Todoriki for their helpful discussions.


  1. S.Adachi and M.Yamada, An acoustical study of sound production in biphonic singing, Xöömij. J. Acoust. Soc. Am., 105, pp.2920–2932, 1999.
  2. T.C.Levin and M.E.Edgerton, The throat singers in tuva. Scientific American , Sep-1999, pp.80–87, 1999.
  3. L.Fuks, B.Hammarberg, and J.Sundberg, A self-sustained vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic evidences. KTH TMH-QPSR, 3/1998, pp.49–59, 1998.
  4. H.Imagawa, K.-I.Sakakibara, T.Konishi, and S.Niimi, Throat singing synthesis by a laryngeal voice model based on vocal fold and false vocal fold vibrations. Proc. of Study Group on Musical Info., 01-MUS-39, pp. 71–78, Info. Processing Soc. Jpn., in Japanese, 2001.
  5. P.-Å. Lindestat, M.Sodersten, B.Merker, and S.Granqvist, Voice source characteristics in mongolican “throat singing” studied with high-speed imaging technique, acoutic spectra, and inverse filtering. J. Voice, 15, pp. 78–85, 2001.
  6. K.-I.Sakakibara, S.Adachi, T.Konishi, K.Kondo, E.Z.Murano, M.Kumada, M.Todoriki, H.Imagawa, and S. Niimi, Analysis of vocal fold vibrations in throat singing. Tech Rep. Musical Acoust. of Acoust. Soc. Jpn., 19-4, pp. 41–48, in Japanese, 2000.
  7. K.-I.Sakakibara, T.Konishi, K.Kondo, E.Z.Murano, M.Kumada, H.Imagawa, and S.Niimi, Vocal fold and false vocal fold vibrations and synthesis of khoomei. Proc. of ICMC, pp. 135– 138, 2001.
  8. K.-I.Sakakibara, H.Imagawa, S.Niimi, and N.Osaka, Synthesis of the laryngeal source of throat singing using a 2×2-mass model.Proc. of ICMC, pp. 5 — 8, 2002.




CHEN-GIA TSAI : articles on Overtone Singing , TAIWAN


TSAI Chen-Gia : Overtone Singing

Overtone Singing
Chen-Gia Tsai
* Perception of overtone singing
* Physical modeling of the vocal tract of a Sygyt singer
* False vocal fold surface waves during Sygyt singing: A hypothesis
* Kargyraa and meditation

The voice of overtone singing is characterized by a prominent formant. In this spectrum of a sound produced by a Taiwanese overtone singer, the 10th harmonic is stronger than its flanking components by more than 25 dB. It is not fully understood how the formant becomes so sharp.
Overtone singing, also known as throat singing, is a vocal technique found in Central Asian cultures, by which one singer produces two pitches simultaneously. When listening to the performance, a high pitch of n*f0 can be perceived along with a low drone pitch of f0.

Adachi, S., and Yamada, M. (1999). An acoustical study of sound production in biphonic singing, Xoomij. J. Acoust. Soc. Am. 105(5), 2920-2932.

Bloothooft, G., Bringmann, E., Capellen, M., Luipen, J., Thomassen, K. (1992). Acoustics and perception of overtone singing. J. Acoust. Soc. Am. 92(4), 1827-836.

Chernov, B. and Maslov, V. (1987). Larynx double sound generator. Proc. XI Congress of Phonetic Sciences, Tallinn 6, 40-43.

Fletcher, N.H., and Rossing, T.D. (1991). The Physics of Musical Instruments. Springer-Verlag.

Hirschberg, A., and Kergomard, J. (1995). Aerodynamics of wind instruments. In: Mechanics of Musical Instruments. Springer-Verlag, 291-369.

Kob, M. (2002). Physical Modeling of Singing Voice. Dissertation, University of Technology Aachen, Logos Berlin.

Levin, T.C., and Edgerton, M.E. (1999). The throat singers of tuva. Scientific American. Sep-1999, 80-87.

Lindestad, P.A., Sodersten, M., Merker, B., Granqvist, S. (2001). Voice source characteristics in Mongolian “throat singing” studied with high-speed imaging technique, acoustic spectra, and inverse filtering. J. Voice 15(1), 78-85.

MacDonald, A.W., Cohen, J.D., Stenger, V.A., and Carter, C.S. (2000). Dissociating the role of dorsolateral prefrontal cortex and anterior cingulate cortex in cognitive control. Science 288, 1835-1837.

Pagneux, V., Amir, N., and Kergomard, J. (1996). A study of wave propagation in varying cross-section waveguides by modal decomposition. Part I. Theory and validation. J. Acoust. Soc. Am. 100, 2034-2048.

Tsai, C.G. (2001). Physical foundations of overtone-singing. Science Monthly 375, 209-216. [in Chinese]

* http://www.avantart.com/postcards/etuva.html
* http://www.scs-intl.com/cgi-bin/webzonetuva/zone.cgi?list

Nathalie Henrich, John Smith and Joe Wolfe: Harmonic singing (or overtone singing) vs normal singing


Harmonic singing (or overtone singing) vs normal singing

Harmonic singing shares techniques with diphonic singing, overtone singing, xoomi singing, sygyt singing, throat singing, Tuva singing etc. We explain some of the acoustics of this style of singing in terms of the measured acoustical response of the vocal tract. In this technique, the singer emphasises one high harmonic of the voice to such an extent that it is heard separately from the low pitched note being sung. Different notes in the harmonic series may be chosen by changing the frequency of the resonance in the vocal tract that gives rise to it.

For background information on speech and ordinary singing, see our Introduction to the acoustics of the vocal tract. For background about our research and techniques, see this link. On this page, we begin by looking at how the vocal tract behaves for a whisper, where the resonances of the tract are most clear, then for normal singing, then for harmonic singing. But first, some sound examples:


wav     wav

In the first, Jer Ming Chen (a postdoctoral researcher in this lab) sings his own tune, called Desert Lullaby. In the second, he sings up a harmonic series, starting at the fourth harmonic. No treatment of the recording. How does he do that? We’ll need to start with some background first.

Whisper. In the first figure, a subject whispers the vowel in ‘hoard’. We show the frequency response of the vocal tract (For an explanation of the measurements, follow this link.) The sound of the whisper itself is masked by the injected signal used to measure the vocal tract resonances. The figure shows several peaks, indicated by the arrows. At these frequencies, the sound produced at the vocal folds is most effectively transmitted as sound produced in the external air. (Technically, these are peaks in the acoustic impedance of the vocal tract. At these resonant frequencies, the tract operates most effectively as an impedance transformer between the relatively high acoustic impedance of the tract and the low impedance of the radiation field at the mouth.)


graph showing the frequency response of the vocal tract for a whisper

Normal singing. In the figure below, the subject sings the same vowel at the pitch Bb3 (117 Hz). In this graph, you can see the harmonics of the voice, and you can see that the fourth and sixth harmonics appear stronger in the sound spectrum because they are near resonances of the tract.


graph showing the frequency response of the vocal tract for a sung vowel OR

Over the range shown and for this vowel, this subject’s vocal tract has six resonances, which are indicated by the arrows. Note that the subject changes the first two resonances a little between whispering and singing. The frequencies of these two resonances determine the vowel in a particular accent. It is not unusual for people to have different accents when whispering, speaking and singing. The higher resonances are also substantially changed, probably because rather different vocal mechanisms are used in whispering and singing.

Harmonic singing. The next graphs show two examples of harmonic singing. In this technique, one of the vocal tract resonances is made much stronger, while all the others are weakened. The strong resonance can be made so strong that it selects one of the harmonics and makes it so much stronger than its neighbours that we can hear it as a separate note. Hear it is the eighth harmonic that is amplified. Although the fundamental is only 8 dB lower than the selected harmonic, the fundamental lies in a range in which our ears are much less sensitive, so it sounds much less loud.


graph showing the frequency response of the vocal tract for harmonic singing

How do you do it? With some difficulty! One way to strengthen the second resonance, at the expense of the others, is to make a small mouth opening and also a relatively tight constriction between the tongue and the roof of the mouth. But mainly it takes a lot of practice, using feedback. Usually the feedback comes from finding a reasonably reverberant environment (bathroom, stairwell) and listening for the individual harmonics. (Another type of feedback is to use a of the spectrum, using your computer’s sound card. Yet another display uses the graphs shown here, but this last is not readily available.)

In traditional practice, some singers hold the sung pitch (fundamental) constant, and then tune the vocal tract resonances to choose one or another harmonic. They can therefore play the ‘instrument’ using the natural harmonics, just like players of the natural trumpet or horn. Skilled practitioners can vary the voice pitch and the resonant frequency independently. In the next graph, the fundamental has been lowered and the resonance has been raised, with the result that it is the twelfth harmonic that is amplified.


graph showing the frequency response of the vocal tract for armonic singing

For some harmonic singers, more complicated effects than those described here may be involved. It has been suggested that, for some sygyt singers, the strong resonance in the vocal tract may drive an oscillation in the false vocal folds. This could produce a stronger signal at the high pitch. Further, because the false vocal folds would be nonlinear oscillators, they would produce strong components at integral multiples of the high pitch frequency, ie at n*f0, 2n*f0, 3n*f0 etc. An example of such a spectrum and an explanation of the false vocal fold mechanism is given by Chen-Gia Tsai at this link.

This research is part of a project investigation the acoustics of singing in general. It is undertaken by Nathalie HenrichJohn Smith and Joe Wolfe.



Some related pages and explanatory notes

Some explanatory notes


WIKIPEDIA : Overtone Singing


Overtone singing

From Wikipedia, the free encyclopedia

Overtone singing, also known as overtone chanting, or harmonic singing, is a type of singing in which the singer manipulates the resonances (orformants) created as air travels from the lungs, past the vocal folds, and out the lips to produce a melody.

The partials (fundamental and overtones) of a sound wave made by the human voice can be selectively amplified by changing the shape of the resonant cavities of the mouth, larynx and pharynx.[1] This resonant tuning allows the singer to create apparently more than one pitch at the same time (the fundamental and a selected overtone), while in effect still generating a single fundamental frequency with his/her vocal folds.

Another name for overtone singing is throat singing, but that term is also used for Inuit throat singing, which is produced differently.




Mongolia and Buryatia[edit]

It is believed the art of overtone singing has originated from south western Mongolia in today’s Khovd and Govi-Altai region. Nowadays, overtone singing is found throughout the country and Mongolia is often considered as the most active place of overtone singing in the world.[2] The most commonly practiced style, Khöömii (written in Cyrillic as Хөөмий), can be divided up into the following categories:

  • uruulyn / labial khöömii
  • tagnain / palatal khöömii
  • khamryn / nasal khöömii
  • bagalzuuryn, khooloin / glottal, throat khöömii
  • tseejiin khondiin, khevliin / chest cavity, stomach khöömii
  • turlegt or khosmoljin khöömii / khöömii combined with long song

Mongolians also sing many other styles such as “karkhiraa” (literally “growling”) and “isgeree”.

Many of these styles are also practiced around neighboring regions such as Tuva and Altai.


Main article: Tuvan throat singing

Tuvan overtone singing is practiced by the Tuva people of southern Siberia. The history of Tuvan overtone singing reaches very far back[citation needed]. There is a wide range of vocalizations, including Sygyt, Kargyraa (which also uses a second sound source), Khoomei, Chylandyk, Dumchuktaar, and Ezengileer. Most of these styles are closely related to the styles and variations in neighboring Mongolia.

Altai and Khakassia[edit]

Tuva’s neighbouring states, the Altai Republic to the west, and Khakassia to the northwest, have developed forms of throat singing called ‘’kai’’, or ‘’khai’’. In Altai, this is used mostly for epic poetry performance, to the accompaniment of topshur. Altai narrators (“kai-chi“) perform in kargyraa, khöömei and sygyt styles, which are similar to Tuvan. They also have their own style, a very high harmonics, emerging from kargyraa. Variations of kai are called karkyra, sybysky, homei and sygyt. The first well-known kai-chi was Kalkin.

Chukchi Peninsula[edit]

The Chukchi people of Chukchi Peninsula in the extreme northeast of Russia also practice a form of throat singing.[3]


Tibetan Buddhist chanting is a sub-genre of throat singing. Most often the chants hold to the lower pitches possible in throat singing. Various ceremonies and prayers call for throat singing in Tibetan Buddhism, often with more than one monk chanting at a time. There are different Tibetan throat singing styles, such as Gyuke (Tibetan: རྒྱུད་སྐད་, Wylie: rgyud skad) – style with the lowest pitch of voice; Dzoke (Tibetan: མཛོད་སྐད་, Wylie: mdzod skad) and Gyer (Tibetan: གྱེར་, Wylie: gyer).

Uzbekistan and Kazakhstan[edit]

The oral poetry of Kazakhstan and the Uzbek region of Karakalpakstan sometimes enters the realm of throat singing.


Balochi Nur Sur is still a popular and one of the ancient form of Overtone singing in parts of Pakistan, Iran and Afghanistan espcially in the Sulaiman Mountains.


The Ainu of HokkaidōJapan, once practiced a type of throat singing called rekuhkara, which has since become extinct. The last singer of rekuhkara died in 1976, but some recordings exist.[3]



Main article: Tenores

In the Barbagia area on the island of Sardinia (Italy), one of the two different styles of polyphonic singing is marked by the use of a throaty voice. This kind of song is called a tenore. The other style, known as cuncordu, does not use throat singing. A tenore is practiced by groups of four male singers each of whom has a distinct role; the oche or boche (pronounced /oke/ or /boke/, “voice”) is the solo voice, while the mesu oche or mesu boche (“half voice”), contra (“against”) and bassu (“bass”) – listed in descending pitch order – form a chorus (another meaning of tenore). Oche and mesu oche sing in a regular voice, whereas contra and bassu sing with a technique affecting the larynx. In 2005, Unesco classed the canto a tenore among intangible world heritage.[4] Among the most well known groups who perform a tenore are Tenores di Bitti, Tenores de Orosei, Tenores di Oniferi and Tenores di Neoneli.

Northern Europe[edit]

The Sami people of the northern parts of SwedenNorwayFinland and the Kola Peninsula in Russia, have a singing genre called yoik. While overtone techniques are not a defining feature of yoik, individuals sometimes utilize overtones in the production of yoik.


The Bashkirs of Bashkortostan have a style of overtone singing called özläü (sometimes spelled uzlyauBashkort Өзләү), which nearly died out. In addition, Bashkorts also sing uzlyau while playing the kurai, a national instrument. This technique of vocalizing into a flute can also be found in folk music as far west as the Balkans and Hungary.

North America[edit]


The resurgence of a once-dying Inuit tradition called katajjaq is currently under way in Canada. Inuit throat singing was a form of entertainment among Inuit women while the men were away on hunting trips. It was an activity that was primarily done by Inuit women, though men also did it. In the Inuit language Inuktitut, throat singing is called katajjaq, pirkusirtuk or nipaquhiit depending on the Canadian Arctic region. It was regarded more as a type of vocal or breathing game in the Inuit culture rather than a form of music. Inuit throat singing is generally done by two individuals but can involve four or more people together as well. In Inuit throat singing, two Inuit women would face each other either standing or crouching down while holding each other’s arms. One would lead with short deep rhythmic sounds while the other would respond. The leader would repeat sounds with short gaps in between. The follower would fill in these gaps with her own rhythmic sounds. Sometimes both Inuit women would be doing a dance like movement like rocking from left to right while throat singing. The practice is compared more to a game or competition than to a musical style. In the game, Inuit women sit or stand face-to-face and create rhythmic patterns.


South Africa[edit]

Xhosa women of South Africa have a low, rhythmic style of throat-singing called eefing that is often accompanied by call-and-response vocals.[5]

Non-traditional styles[edit]

Canada, United States and Europe[edit]

The 1920s Texan singer of cowboy songs, Arthur Miles, independently created a style of overtone singing, similar to sygyt, as a supplement to the normal yodelling of country western musicBlind Willie Johnson, also of Texas, is not a true overtone singer, according to the National Geographic, but his ability to shift from guttural grunting noises to a soft lullaby is suggestive of the tonal timbres of overtone singing.[6]

Starting in the 1960s, some musicians in the West either have collaborated with traditional throat singers or ventured into the realm of throat singing and overtone singing, or both. Some made original musical contributions and helped this art rediscover its transcultural universality. As harmonics are universal to all physical sounds, the notion of authenticity is best understood in terms of musical quality. Musicians of note in this genre includeCollegium Vocale Köln (who first began using this technique in 1968), Michael VetterDavid Hykes,[7] Jill PurceJim ColeRy CooderPaul Pena(mixing the traditional Tuvan style with that of American Blues), Steve Sklar and Kiva (specializing in jazz/ world beat genres and composing for overtone choirs). Composer Baird Hersey and his group Prana with Krishna Das (overtone singing and Hindu mantra), Canadian songwriter Nathan Rogers has become an adept throat singer and teaches Tuvan Throat Singing in WinnipegManitoba.[citation needed]

Paul Pena was featured in the documentary Genghis Blues which tells the story of his pilgrimage to Tuva to compete in their annual throatsinging competition. The film won the documentary award at the 1999 Sundance Film Festival, and was nominated for an Oscar in 2000.

Tuvan singer Sainkho Namtchylak has collaborated with free jazz musicians such as Evan Parker and Ned RothenbergLester Bowie and Ornette Coleman worked with the Tenores di Bitti, and Eleanor Hovda has written a piece using the Xhosa style of singing. DJs and performers of electronic music like The KLF have also merged their music with throat singing, overtone singing, or with the theory of harmonics behind it.

A capella singer Avi Kaplan also exhibited overtone singing during his group’s (Pentatonix) performances. He merged throat singing together with dubstep a capella.

In Ireland Anúna have revived a technique of overtone chanting mentioned in the 8th-century manuscript Cath Almaine, the technique uses one held drone with a shifting three or four note overtone series. Expert pitched overtone performer Aengus Ó Maoláin joins Anúna on several numbers.

Several contemporary classical composers have incorporated overtone singing into their works. Karlheinz Stockhausen was one of the first, withStimmung in 1968. “Past Life Melodies” for SATB chorus by Australian composer Sarah Hopkins (b. 1958) also calls for this technique. In Water Passion after St. Matthew by Tan Dun, the soprano and bass soloists sing in a variety of techniques including overtone singing of the Mongolian style.

Overtone singing is implemented to a fairly wide extent in the guttural vocals of many heavy and death metal bands. A clear example of this is the vocal delivery in Necrophagist’s “Stabwound”.


Problems playing this file? See media help.


Ethnomusicologist John Levy recorded a Rajasthani singer utilizing overtones in imitation of either a jaw harp or a double-flute. There is no tradition of this style of singing there.[citation needed]

See also[edit]


  1. Jump up^ Titze 2008; Titze 1994; Pariser & Zimmerman 2004
  2. Jump up^ Sklar, 2005
  3. Jump up to:a b 4.3.02. “Inuit Throat-Singing”. Mustrad.org.uk. Retrieved 2008-11-27.
  4. Jump up^ Bandinu 2006.
  5. Jump up^ Smithsonian Global Sound – Throat Singing Retrieved on 2009-03-13.
  6. Jump up^ Miller, Bruce. “Overtone Singing Music”. National Geographic. Retrieved February 20, 2012.
  7. Jump up^ Bellamy and MacLean 2005, 515.


  • Bandinu, Omar (2006). “Il canto a tenore: dai nuraghi all’Unesco“, Siti 2, no.3 (July–September): 16–21.
  • Bellamy, Isabel, and Donald MacLean (2005). Radiant Healing: The Many Paths to Personal Harmony and Planetary Wholeness. Buddina, Queensland (Australia): Joshua Books. ISBN 0-9756878-5-9
  • Haouli, Janete El (2006). Demetrio Stratos: en busca de la voz-música. México, D. F.: Radio Educación—Consejo Nacional para la Cultura y las Artes.
  • Levin, Theodore C., and Michael E. Edgerton (1999). “The Throat Singers of Tuva“. Scientific American 281, no. 3 (September): 80–87.
  • Levin, Theodore, and Valentina Süzükei (2006). Where Rivers and Mountains Sing. Bloomington: Indiana University Press. ISBN 0-253-34715-7.
  • Pariser, David, and Enid Zimmerman (2004). “Learning in the Visual Arts: Characteristics of Gifted and Talented Individuals,” in Handbook of Research and Policy in Art Education, Elliot W. Eisner and Michael D. Day (editors). Lawrence Erlbaum Associates. p. 388. ISBN 978-0-8058-4972-1.
  • Saus, Wolfgang (2004). Oberton Singen. Schönau im Odenwald: Traumzeit-Verlag. ISBN 3-933825-36-9 (German).
  • Titze, Ingo R. (1994). Principles of Voice Production. Englewood Cliffs, NJ: Prentice Hall. ISBN 978-0-13-717893-3 Reprinted Iowa City: National Center for Voice and Speech, 2000. (NCVS.org) ISBN 978-0-87414-122-1 .
  • Titze, Ingo R. (2008). “The Human Instrument”. Scientific American 298, no. 1 (July):94–101. PM 18225701
  • Tongeren, Mark C. van (2002). Overtone Singing: Physics and Metaphysics of Harmonics in East and West. Amsterdam: Fusica. ISBN 90-807163-2-4 (pbk), ISBN 90-807163-1-6 (cloth).
  • Sklar, Steve (2005). “Types of throat singing” “[1]

External links[edit]