Tag Archives: acoustics



overtone singing


This is a demonstration of overtone singing. Overtone singing is an articulatory technique in which a certain overtone is amplified. You will hear a whistle over the drone of the fundamental and lower harmonics.
The first fragment demonstrates a technique which is especially used in the Western countries. The tongue is very slowly moved from the position of the vowel /O/ to the position of the vowel /i:/, resulting in a special scale of overtones. The fundamental remains the same in all cases! The careful somewhat retroflex articulation brings the second and third formant together, which amplifies the nearby overtone. The overtones 5-16 can be heard. To produce the overtones 3-5, one has to make a nasal sound and articulate from /o:/ to /a:/. The overtone is then amplified by the first formant, while the nasal anti-resonance creates the acoustic distinction between the amplified overtone and the lowest harmonics.
In the subsequent demonstrations, we hear song fragments containing overtone singing, from the Tuva Republic in Mongolia. The technique is not essentially different from the Western technique, but glottal adducation is much higher (almost pressed singing). The Tuva people know a number of techniques, which are related to the fundamental frequency. In these fragments you will hear Sygyt which is typically produced with a pitch of 150-200 Hz. An other technique is Kargyra, with a characteristic low pitch of less than 60 Hz.More information:

  • Bloothooft, G., Bringmann, E., van Cappellen, M., van Luipen, J.M., and Thomassen, K.P. (1992). Acoustics and perception of overtone singing, J. Acoust. Soc. Am. 92, 1827-1836.
  • Levin, Theodore C., and Edgerton, Michael E. (1999). The Throat Singers of Tuva, Scientific American, Sept.1999.
  • www.harmonx.com [19 Jul 2001] with sound clips!

listen to demonstration listen
technical details technical
text/transcript text/transcript
naar top

overtone singing
Keeping a constant pitch of about 130 Hz, a professional Western performer sings a scale of the overtones 4 to 16. The narrow-band spectrogram shows the amplification of the successive overtones. The lowest overtones are amplified by the first formant, the stepwise increasing resonance is the combination of second and third formant. Higher formants are also visible. 


File size 505246 bytes
Format aiff
Keywords overtone throat singing
naar top

tuva overtone singing 1
A Tuva singer demonstrates his “throat singing” or overtone singing, in which individual overtones are amplified. These overtones are clearly visible in the spectrogram.


File size 363058 bytes
Format aiff
Keywords overtone throat singing
naar top

tuva overtone singing 2
A Tuva singer demonstrates his “throat singing” or overtone singing, in which individual overtones are amplified. These overtones are clearly visible in the spectrogram.


File size 329092 bytes
Format aiff
Keywords overtone throat singing
naar top

tuva overtone singing 3
A Tuva singer demonstrates his “throat singing” or overtone singing, in which individual overtones are amplified. These overtones are clearly visible in the spectrogram.


File size 398652 bytes
Format aiff
Keywords overtone throat singing
naar top

tuva overtone singing 4
A Tuva singer demonstrates his “throat singing” or overtone singing, in which individual overtones are amplified. These overtones are clearly visible in the spectrogram.


File size 407606 bytes
Format aiff
Keywords overtone throat singing


Universiteit Utrecht

LEONARDO FUKS : Sundry Sounds produced by Leonardo Fuks and other examples

 VVM phonation mode, physical model (by L.Fuks)


This is a simplified physical model proposed by me for the larynx during VVM phonation mode, which produces similar sounds to those from the Tibetan Chant tradition. Below, the vocal folds (m1,m2) and above, the false or ventricular folds (m3). In this example, vocal folds oscillate at a frequency which is twice as that of the ventricular folds. The letters mstand for mass, k for stiffness, and r for damping coefficient. Indexes r-l stand for left and right sides, respectively.

Sundry Sounds produced by Leonardo Fuks and other examples

During my research work in music acoustics I created/recorded/processed some gigabytes of sound files, most of them of no musical interest for the listener.
However, a few of them might be listened by tolerant and attentive ears. They are presented below.
The first group of sound files refers to Paper VI of my thesis, which are identified with the Tibetan Chant voice, and other extended vocal effects investigated in the paper.


Vocal-ventricular sounds (used in Tibetan and Mongolian “undertone” singing):
0.Original sounds from the Gyuto Monastery, Tibet 

1.Fixed fundamental and sweeping overtones 
2.f0/2 and f0/3 VVM 
3. An imitation of a Tibetan Chant context (rather similar to 0, above)

4. Popeye the Saylor used VVM !! (an original recording from a William Costello’s version)
5. VVM and flute improvisation
6. Overtone singing in VVM mode, melody of “Oh, Susanah” (see the spectrogram)

Periodic pulse register , see Paper VI
7. Alternation between pulse register (“fry”) and modal voice

8. “Vocal fry” at fo/1,fo/2, fo/3, f0/4, fo/5 & back to 1 

Vocal Growl (co-oscillation of vocal folds and epiglottis)-similar to the mechanism used by Louis Armstrong

9. Periodic Growl, in f0/2 and fo/3, with overtone singing

Tarogato (wooden saxophone from Hungary)
10.Tarogato(from the theme of Ravel’s La Valse)

A piece for OBOE called “My Six Marigaux 10499’s”, recorded in 6 channels

All recordings, excepted by numbers 0 (Gyuto Monks, Tibet) and 4 (Popeye, W. Costello) are performed by Leonardo Fuks

To the THESIS INTRODUCTION – FROM AIR TO MUSIC: Acoustical, Physiological and Perceptual Aspects of Reed Wind Instrument Playing and Vocal-Ventricular Fold PhonationHTML byLeonardo Fuks

Last update 98.12.30


LEONARDO FUKS : Sundry Sounds produced by Leonardo Fuks and other examples



VVM phonation mode, physical model (by L.Fuks)


This is a simplified physical model proposed by me for the larynx during VVM phonation mode, which produces similar sounds to those from the Tibetan Chant tradition. Below, the vocal folds (m1,m2) and above, the false or ventricular folds (m3). In this example, vocal folds oscillate at a frequency which is twice as that of the ventricular folds. The letters mstand for mass, k for stiffness, and r for damping coefficient. Indexes r-l stand for left and right sides, respectively.

Sundry Sounds produced by Leonardo Fuks and other examples

During my research work in music acoustics I created/recorded/processed some gigabytes of sound files, most of them of no musical interest for the listener.
However, a few of them might be listened by tolerant and attentive ears. They are presented below.
The first group of sound files refers to Paper VI of my thesis, which are identified with the Tibetan Chant voice, and other extended vocal effects investigated in the paper.


Vocal-ventricular sounds (used in Tibetan and Mongolian “undertone” singing):
0.Original sounds from the Gyuto Monastery, Tibet 

1.Fixed fundamental and sweeping overtones 
2.f0/2 and f0/3 VVM 
3. An imitation of a Tibetan Chant context (rather similar to 0, above)

4. Popeye the Saylor used VVM !! (an original recording from a William Costello’s version)
5. VVM and flute improvisation
6. Overtone singing in VVM mode, melody of “Oh, Susanah” (see the spectrogram)

Periodic pulse register , see Paper VI
7. Alternation between pulse register (“fry”) and modal voice

8. “Vocal fry” at fo/1,fo/2, fo/3, f0/4, fo/5 & back to 1 

Vocal Growl (co-oscillation of vocal folds and epiglottis)-similar to the mechanism used by Louis Armstrong

9. Periodic Growl, in f0/2 and fo/3, with overtone singing

Tarogato (wooden saxophone from Hungary)
10.Tarogato(from the theme of Ravel’s La Valse)

A piece for OBOE called “My Six Marigaux 10499’s”, recorded in 6 channels

All recordings, excepted by numbers 0 (Gyuto Monks, Tibet) and 4 (Popeye, W. Costello) are performed by Leonardo Fuks

To the THESIS INTRODUCTION – FROM AIR TO MUSIC: Acoustical, Physiological and Perceptual Aspects of Reed Wind Instrument Playing and Vocal-Ventricular Fold Phonation

HTML by Leonardo Fuks

Last update 98.12.30






Piero Cosi, Graziano Tisato

Istituto di Scienze e Tecnologie della Cognizione – Sezione di Fonetica e Dialettologia
(ex Istituto di Fonetica e Dialettologia) – Consiglio Nazionale delle Ricerche
e-mail: cosi@csrf.pd.cnr.it tisato@tin.it
www: http://nts.csrf.pd.cnr.it/Ifd

I really like to remember that Franco was the first person I met when I
approached the “Centro di Studio per le Ricerche di Fonetica” and I still
have a greatly pleasant and happy sensation of that our first warm and
unexpectedly informal talk. It is quite obvious and it seems rhetorical to say
that I will never forget a man like Franco, but it is true, and that is, a part
from his quite relevant scientific work, mostly for his great heart and sincere

For “special people” scientific interests sometimes co-occur with personal “hobbies”. I
remember Franco talking to me about the “magic atmosphere” raised by the voice of
Demetrio Stratos, David Hykes or Tuvan Khomei1
singers and I still have clear in my mind
Franco’s attitude towards these “strange harmonic sounds”. It was more than a hobby but it
was also more than a scientific interest. I have to admit that Franco inspired my “almost
hidden”, a part from few very close “desperate” family members, training in Overtone
. This overview about this wonderful musical art, without the aim to be a complete
scientific work, would like to be a small descriptive contribute to honor and remember
Franco’s wonderful friendship.


Graziano G. Tisato, Andrea Ricci Maccarini, Tran Quang Hai : fisiologiche e acustiche del Canto Difonico


Caratteristiche fisiologiche e acustiche del
Canto Difonico
Graziano G. Tisato, Andrea Ricci Maccarini, Tran Quang Hai

Il Canto Difonico (Overtone Singing o Canto delle Armoniche) è una tecnica di canto
affascinante dal punto di vista musicale, ma particolarmente interessante anche dal punto di vista
scientifico. In effetti con questa tecnica si ottiene lo sdoppiamento del suono vocale in due suoni
distinti: il più basso corrisponde alla voce normale, nel consueto registro del cantante, mentre il
più alto è un suono flautato, corrispondente ad una delle parziali armoniche, in un registro acuto
(o molto acuto). A seconda dell’altezza della fondamentale, dello stile e della bravura,
l’armonica percepita può andare dalla seconda alla 18° (e anche oltre).
Per quanto riguarda la letteratura scientifica, il Canto Difonico compare per la prima
volta in una memoria presentata da Manuel Garcia di fronte all’Accademia delle Scienze a Parigi
il 16 novembre 1840, relativa alla difonia ascoltata da cantanti Bashiri negli Urali (Garcia, 1847).
In un trattato di acustica pubblicato qualche decennio più tardi (Radau, 1880), la realtà di questo
tipo di canto è messa in discussione: “…Si deve classificare fra i miracoli ciò che Garcia
racconta dei contadini russi da cui avrebbe sentito cantare contemporaneamente una melodia
con voce di petto e un’altra con voce di testa”.
Deve trascorrere quasi un secolo dal 1840 prima che si ottenga un riscontro obbiettivo
della verità del rapporto di Garcia, con le registrazioni fatte nel 1934, fra i Tuva, da etnologi
russi. Di fronte all’evidenza della analisi compiuta nel 1964 da Aksenov su quelle registrazioni, i
ricercatori cominciarono a prendere in considerazione il problema del Canto Difonico (Aksenov,
1964, 1967, 1973). Aksenov è il primo ad attribuire la spiegazione del fenomeno al filtraggio
selettivo dell’inviluppo formantico del tratto vocale sul suono glottico, e a paragonarlo allo
scacciapensieri (con la differenza che la lamina di questo strumento può ovviamente produrre
solo una fondamentale fissa). In quel periodo compare anche un articolo sul Journal of
Acoustical Society of America (JASA) sulla difonia nel canto di alcune sette buddiste tibetane,
in cui gli autori interpretano correttamente l’azione delle formanti sulla sorgente glottica, senza
tuttavia riuscire a spiegare come i monaci possano produrre fondamentali così basse (Smith et
al., 1967).
A partire dal 1969, Leipp con il Gruppo di Acustica Musicale (GAM) dell’Università
Paris VI s’interessa al fenomeno dal punto di vista acustico (Leipp, 1971). Tran Quang Hai, del
Musée de l’Homme di Parigi, intraprende in quel periodo una serie di ricerche sistematiche, che
portano alla scoperta della presenza del Canto Difonico in un numero insospettato di tradizioni
culturali diverse (Tran Quang,1975, 1980, 1989, 1991a, 1991b, 1995, 1998, 1999, 2000, e il sito
Web http://www.baotram.ovh.org). L’aspetto distintivo della ricerca di Tran Quang Hai è la
sperimentazione e verifica sulla propria voce delle diverse tecniche e stili di canto, che gli ha
permesso la messa a punto di metodi facili di apprendimento (Tran Quang, 1989). Nel 1989
Tisato analizza e sintetizza il Canto Difonico con un modello LPC, dimostrando per questa via
che la percezione degli armonici dipende esclusivamente dalle risonanze del tratto vocale
(Tisato, 1989a, 1991). Nello stesso anno anche il rilevamento endoscopico delle corde vocali di
Tran Quang Hai confermava la normalità della vibrazione laringea (Sauvage, 1989, Pailler,
1989). Nel 1992 compare uno studio più approfondito dal punto di vista fonetico e percettivo,
che mette in risalto la funzione della nasalizzazione nella percezione della difonia, la presenza di una adduzione molto forte delle corde vocali e una loro chiusura prolungata (Bloothooft et al.,
1992). Gli autori contestano l’ipotesi fatta da Dmitriev che il Canto Difonico sia una diplofonia,
con due sorgenti sonore prodotte dalle vere e dalle false corde vocali (Dmitriev et al., 1983). Nel
1999 Levin pubblica sul sito Web di Scientific American un articolo particolarmente interessante
per gli esempi musicali che si possono ascoltare, le radiografie filmate della posizione degli
articolatori e della lingua, e la spiegazione delle tecniche di produzione dei vari stili del Canto
Difonico (Levin et al., 1999, http://www.sciam.com/1999/0999issue/0999levin.html).

Il lavoro che presentiamo qui è il risultato di una recente sessione di lavoro con Tran
Quang Hai (ottobre 2001), in cui abbiamo esaminato i meccanismi di produzione del canto
difonico con fibroendoscopia. La strumentazione utilizzata era costituita da un fibroendoscopio
flessibile collegato ad una fonte di luce stroboscopica, per valutare quello che succedeva a livello
della faringe e della laringe, e un’ottica rigida 0°, collegata ad una fonte di luce alogena, per
esaminare il cavo orale.

Fig. 1 Azione dell’articolazione sulla posizione di F1 e F2: l’apertura della bocca sposta F1 e F2
nella stessa direzione, mentre il movimento antero-posteriore della lingua determina il
movimento contrario di F1 e F2. (ad. da Cosi et al., 1995) La tradizione del Canto Difonico
Il Canto Difonico, ignorato per centinaia di anni dall’Occidente, si è rivelato molto più
diffuso di quello che si potesse immaginare nei primi anni della scoperta: lo troviamo praticato in
tutta l’Asia centrale dalla Bashiria (parte europea della Russia vicino agli Urali meridionali), alla
Mongolia, passando dalle popolazioni dell’Altai, della Repubblica Tuva (confinante con la
Mongolia) fino ai Khakash (situati a nord di Tuva).
I Tuva hanno sviluppato una molteplicità di stili sostanzialmente riconducibili a 5:
Kargiraa (canto con fondamentali molto basse), Khomei (che significa gola o faringe e che è il
termine generalmente usato per indicare il Canto Difonico), Borbannadir (simile al Kargiraa,
con fondamentali un po’ più elevate), Ezengileer (caratterizzato da passaggi ritmici veloci fra le
armoniche difoniche), Sigit (simile ad un fischio, in cui la fondamentale e le armoniche basse
sono molto deboli).
In Mongolia la maggior degli stili prende il nome dalla zona di risonanza del canto:
Xamryn Xöömi (Xöömi nasale), Bagalzuuryn Xöömi (Xöömi di gola), Tseedznii Xöömi (Xöömi di
petto), Kevliin Xöömi (Xöömi di ventre) , Xarkiraa Xöömi (corrispondente al Kargiraa Tuva, è
uno Xöömi narrativo con fondamentali molto basse), Isgerex (voce di flauto dentale, stile usato
raramente). Si verifica fra gli stessi cantanti mongoli qualche confusione sulla esatta
denominazione del loro canto (Tran Quang, 1991a). Caratteristica, anche se non usata in
generale, è la presenza in alcuni cantanti mongoli di un vibrato piuttosto marcato (ad esempio, in
Sundui e Ganbold).
Le popolazioni Khakash praticano tre tipi di canto difonico (Kargirar, Kuveder o Kilenge
e Sigirtip), corrispondenti a quelli Tuva (Kargiraa, Ezengileer, Sigit). Anche presso gli abitanti
delle montagne dell’Altai si ritrovano questi tre stili, rispettivamente Karkira, Kiomioi e Sibiski.
Le popolazioni della Bashiria, infine, usano la difonia secondo lo stile Uzlau (simile
all’Ezengileer dei Tuva) per accompagnare i canti epici. Una tradizione di canto popolare epico
in cui si introduce la difonia esiste anche in Uzbekistan e Kazakistan (Levin et al., 1999).
John Levy ha scoperto nel 1967 un cantante del Rajastan che praticava il Canto Difonico
e che se ne serviva per imitare lo scacciapensieri (Tran Quang, 1991a). Si deve comunque dire
che questo è rimasto l’unico esempio di Canto Difonico in territorio indiano. Nel 1983
l’etnomusicologo Dave Dargie ha scoperto un tipo di difonia praticata tradizionalmente presso le
donne delle popolazioni Xhosa dell’Africa del Sud (Dargie, 1985).
Una tradizione completamente diversa è quella dei monaci tibetani delle scuole buddiste
Gyuto e Gyume. Lo scopo del canto in questo caso è di tipo religioso: secondo la loro visione, il
suono è una rappresentazione fedele della realtà vibratoria dell’universo, sintetizzata nel suono
Om (o meglio Aum). Il cosmo, secondo i buddisti tibetani, è un aggregato di energie interagenti,
nessuna delle quali esiste di per sé, che trovano una rappresentazione pittorica come divinità
(pacifiche o irate). Esiste una simbologia che lega l’aspetto visivo (Yantra e Mandala) e l’aspetto
sonoro (Mantra) di tutte le cose. La conoscenza dell’influsso mantrico del suono permette di
agire sul mondo e sugli uomini.
Sembra che sia stato Tzong Khapa (1357-1419), il fondatore del Lamaismo in Tibet, a
introdurre la pratica del Canto Difonico nei monasteri Gyuto. La tradizione dice che aveva
ricevuto questo insegnamento dalla sua divinità protettrice, Maha Bhairava, incarnazione di Avalokiteshvara, il Signore della Compassione. Maha Bhairava è una delle divinità terrifiche
(bhairava), simboleggiata come un bufalo infuriato (Tran Quang, 2000). Il canto dei monaci
Gyuto, da loro paragonato al muggito di un toro, è simile allo stile Tuva Borbannadir con
fondamentali basse. L’altezza della voce può scendere fino al La 55 Hz, una quinta sotto alla
nota più bassa prevista per un cantante basso nella nostra tradizione. L’articolazione della vocale
/o/ e l’arrotondamento delle labbra tende intenzionalmente a rinforzare l’armonico 5° e il 10°. Il
suono difonico che si percepisce è dunque una terza maggiore rispetto alla seconda ottava (4°
armonico) del bordone di base. Il canto è messo in relazione all’elemento fuoco. I monaci della
scuola Gyume esaltano invece la 12° armonica, corrispondente ad una quinta sopra alla terza
ottava (8° armonico) del bordone di base. In questa tradizione il canto simboleggia l’elemento

Il Canto Difonico in Occidente
Il Canto Difonico ha incontrato in Occidente un successo inaspettato. La diffusione è
cominciata in campo musicale con il tentativo delle avanguardie di sfruttamento di tutte le
possibilità espressive della voce e con l’influsso derivato dal contatto con tradizioni culturali
diverse dalla nostra. Il primo in assoluto ad utilizzare una modalità difonica della voce in campo
artistico è stato Karlheinz Stockhausen nell’opera Stimmung (Stockhausen, 1968). Seguito poi
da un folto gruppo di artisti fra cui il gruppo EVTE (Extended Vocal Techniques Ensemble)
dell’Università di California di San Diego nel 1972, Laneri e il gruppo Prima Materia nel 1973
(Laneri, 1981, 2002), Tran Quang Hai nel 1975, Demetrio Stratos nel 1977 (Stratos, 1978,
Ferrero et al., 1980), Meredith Monk nel 1980, David Hykes e l’Harmonic Choir nel 1983
(Hykes, 1983), Joan La Barbara nel 1985, Michael Vetter nel 1985, Christian Bollmann nel
1985, Noah Pikes nel 1985, Michael Reimann nel 1986, Tamia nel 1987, Bodjo Pinek nel 1987,
Josephine Truman nel 1987, Quatuor Nomad nel 1989, Iegor Reznikoff nel 1989, Valentin
Clastrier nel 1990, Rollin Rachele nel 1990 (Rachele, 1996), Thomas Clements nel 1990, Sarah
Hopkins nel 1990, Les Voix Diphoniques nel 1997.
Una menzione particolare deve andare al gruppo EVTE per il lavoro sistematico
compiuto nell’ampliare il vocabolario espressivo e le modalità compositive relative alla voce,
anche nel campo difonico. Il lessico codificato comprendeva un intero repertorio di effetti vocali:
rinforzamento di armoniche, vari tipi di ululato, canto tibetano (anche con effetti difonici),
schiocchi e sfrigolii di differente intensità e altezza, suoni multifonici, ecc. (Kavash, 1980).
La diffusione del Canto Difonico nel mondo occidentale si è caratterizzata per un alone di
misticismo che non era presente nelle culture originali (escludendo, come si è detto, il Buddismo
tibetano). Questo non è sorprendente, dal momento che questp tipo di canto sembra trascendere
la dimensione sonora consueta. Quando poi si riesce personalmente nella “magia” di scomporre
la propria voce in una melodia armonica, si sperimenta una sensazione di euforia. La stranezza
del fenomeno da solo non basterebbe a giustificare un interesse così grande, se non fosse che,
effettivamente, la realizzazione di questa tecnica di canto richiede uno sviluppo delle capacità di
attenzione e percezione tali da facilitare gli stati di concentrazione e meditazione. Non sorprende
neppure, sulla base delle considerazioni fatte, che si cominci ad utilizzare il Canto Difonico in
musicoterapia (da parte, ad esempio, dello stesso Tran Quang Hai, di Dominique Bertrand in
Francia e di Jill Purce in Inghilterra).
Le formazione delle immagini uditive
Ma perché il Canto Difonico si rivela essere una esperienza così strana? La risposta ovvia
è che normalmente noi percepiamo una voce con una unica altezza e un timbro caratteristico.
Come è noto, l’onda di pressione che arriva al timpano dell’orecchio è la risultante
dell’interazione di vari eventi sonori, ognuno dei quali è composto a sua volta da un aggregato di
parziali sinusoidali. Questo flusso sonoro è separato a livello della membrana basilare in
componenti frequenziali con un inviluppo di ampiezza e frequenza determinato. La
scomposizione spettrale è condizionata da tre fenomeni principali:
1 – La sensibilità dell’orecchio che varia notevolmente con la frequenza (curve
isofoniche di Fletcher e Munson, 1933).
2 – Il mascheramento operato dalle componenti in bassa frequenza rispetto a quelle di
frequenza più elevata all’interno di una stessa banda critica (Zwicker, 1957).
3 – I fattori temporali che intervengono in questo processo, per cui l’individuazione delle
componenti frequenziali basse è ritardata rispetto a quelle più acute (Whitfield, 1977).
A questo punto, i dati analitici del flusso sonoro sono organizzati (fusi, integrati) in
separate immagini uditive, secondo fattori psicologici gestaltici (Bregman, 1990). Il processo
avviene raggruppando assieme quelle parziali sonore che hanno un andamento omogeneo di
ampiezza, durata e frequenza. In maniera sostanzialmente simile alla percezione visiva, che
aggrega i quanti luminosi della retina in figure semplici (cerchio, rettangolo, poligono, ecc.),
questo processo di fusione percettiva dei quanti sonori porta a rappresentazioni mentali unitarie
che prendono il nome di voci, strumenti musicali di un certo tipo, rumori, ecc.
Nel caso sonoro, come in quello visivo, la percezione deve lavorare secondo una
dimensione simultanea, che prende in considerazione tutti gli elementi contemporaneamente
presenti sulla scena (uditiva o visiva), e una dimensione sequenziale, che tiene conto delle
variazioni degli elementi nel tempo.
Lo scopo di questa organizzazione percettiva in categorie mentali è vitale per la
sopravvivenza, dal momento che permette di individuare gli eventi (sonori o visivi) e di adottare
una strategia comportamentale adeguata.

La separazione dell’immagine uditiva nel Canto Difonico
Come abbiamo visto, il nostro sistema uditivo è condizionato a percepire una sola
fondamentale di un suono complesso, anche quando questo sia quasi-armonico o inarmonico (si
pensi ad esempio ad una campana) (Plomp, 1967). Normalmente in un suono i meccanismi
percettivi rendono difficoltoso l’ascolto delle componenti frequenziali separate. Nei bambini la
sensibilità uditiva alle singole componenti e le possibilità articolatorie sono più sviluppate che
negli adulti, per i processi di apprendimento che eliminano molte di queste potenzialità
(Jakobson, 1968).
Con la tecnica del Canto Difonico si acquisisce un controllo dell’articolazione del tratto
vocale tale da portare una delle risonanze (in genere la 1° o la 2°) in corrispondenza esatta di una
delle armoniche. A questo punto l’energia di quella componente aumenta in modo considerevole,
anche di una trentina di dB, e può essere udita come un suono puro distinto dalla voce. In effetti,
in questo caso, la parziale in questione non è più mascherata dalle componenti basse ed inoltre,
secondo i principi di fusione detti, non può più essere raggruppata con le altre armoniche, che sono accomunate da un “destino comune”, data l’anomalia del suo andamento. Si verifica
dunque una caso di separazione dell’immagine uditiva unitaria in due suoni distinti.
È necessario ovviamente un periodo di addestramento per riuscire in questo compito.
Nella nostra tradizione musicale esiste qualcosa di paragonabile nella tecnica della cosiddetta
“formante del cantante”, che consiste nell’allargare la faringe e abbassare la laringe, creando un
risuonatore che permette di esaltare un gruppo di parziali frequenziali fra i 2000 e 4000 Hz (Fig.
2-3). I cantanti hanno sviluppato questa capacità probabilmente per sfruttare al meglio la zona di
massima sensibilità dell’orecchio, per cui riescono a far sentire la loro voce al di sopra
dell’orchestra (caratterizzata da un profilo energetico complessivo quasi triangolare più spostato
sulle basse frequenze) (Sundberg, 1987).

Fig. 2 Inviluppo spettrale per il tratto vocale uniforme. Nel caso ideale (tubo senza perdite, lungo
17 cm) le formanti si trovano a multipli dispari dei 500 Hz.

Fig. 3 Inviluppo spettrale per effetto di un restringimento del tubo: le prime 3 formanti si
spostano verso l’acuto, mentre la 4° e la 5° si spostano verso le basse frequenze. Si crea così una
zona di esaltazione delle parziali fra i 2000 e 4000 Hz, tipica della formante del cantante.
Tecnica del Canto Difonico ad una cavità
Questa modalità di produzione del Canto Difonico è la più semplice e consiste nel
muovere semplicemente le labbra come se si pronunciasse la sequenza vocalica da /u/ a /i/
(oppure anche da /o/ ad /a/). La lingua rimane appiattita sul pavimento del cavo. La vibrazione
glottica è normale sia per quanto riguarda le corde vocali che per le false corde. Se il movimento
articolatorio è sufficientemente lento e preciso si avvertono chiaramente emergere gli armonici
più bassi uno dopo l’altro. In effetti si sta agendo solo sull’apertura della bocca, allungando con
la /u/ oppure riducendo con la /i/, la lunghezza complessiva del tratto vocale. L’effetto è quello di
spostare concordemente la posizione delle prime tre risonanze verso il basso (/u/) oppure verso
l’acuto (/i/) (Fig. 4). Come si può vedere dalla Fig. 1, la posizione della 1° formante per questo
tipo di articolazione è limitata fra 250 e 1000 Hz, per cui l’armonico più elevato che si può
percepire può arrivare al 12°, a seconda dell’altezza della fondamentale di partenza. In effetti
questa non è una tecnica che permetta di sentire chiaramente gli armonici molto acuti, per le
perdite di energia sonora dovute alla radiazione dalla bocca spalancata. La percezione
dell’armonico esaltato dalla risonanza migliora, se si crea una antirisonanza che attenui le
armoniche più basse rispetto a quella che si vuole far ascoltare. Questo effetto si ottiene
naturalmente nasalizzando il suono, con la comparsa di una antirisonanza che tuttavia non può
scendere molto sotto ai 400 Hz (Stevens, 1998). La comparsa di questa antirisonanza può portare
anche ad un’altra interpretazione della difonia, come azione della 2° formante (e non della 1°),
dal momento che la 1° potrebbe essere attenuata dall’antirisonanza stessa. In ogni caso i 350-400
Hz costituiscono un limite inferiore per la difonia e spiega perché non ci possa essere una chiara
percezione degli armonici più bassi (Bloothooft et al., 1992). La nasalizzazione ha anche
l’effetto di sopprimere la terza formante, il che può spiegare la debole energia nella zona delle
alte frequenze con questa tecnica (Fant, 1960).
Il rango di frequenza per gli armonici creati con questa tecnica varia dunque fra 350 e 1000 Hz e
la quantità di note difoniche possibili dipende dall’altezza della fondamentale. Ad esempio,
partendo da una altezza di un Fa+ 90 Hz, le armoniche percepibili che può creare Tran Quang
Hai vanno dalla 4° (Fa+ 360 Hz) alla 12° (Do#- 1080 Hz). La scala (trasposta in Do) a
disposizione del cantante è dunque Do, Mi-, Sol, La#-, Do, Re, Mi-, Fa#-, Sol. Se invece
l’altezza della fondamentale passa all’ottava Fa+ 180 Hz, le armoniche a disposizione si
riducono alla 3°, 4°, 5°, e 6°, dando una scala con sole 4 note utilizzabili nella melodia (Sol, Do,
Mi-, Sol). Ne segue che la voce femminile è penalizzata per quanto riguarda il Canto Difonico.

Tecnica del Canto Difonico a due cavità
La “ricetta” data da Tran Quang Hai per questa tecnica è la seguente:
1 – Cantare con la voce di gola (qualcosa come /ang/).
2 – Pronunciare la lettera /l/ o la sequenza /li/. Non appena la lingua tocca il centro della volta del
palato, mantenere la posizione.
3 – Pronunciare la vocale /u/, continuando a tenere la lingua incollata contro il punto detto fra il
palato duro e il palato molle.
4 – Contrarre i muscoli del collo e dell’addome, come se si cercasse di sollevare un oggetto
molto pesante.
5 – Conferire al suono un timbro molto nasalizzato, amplificando le fosse nasali. Fig. 4 Articolazione e posizione delle prime tre formanti per una variazione della sezione
trasversale del tratto vocale. Un restringimento alle labbra sposta contemporaneamente le tre
formanti verso il basso, scurendo il timbro sonoro come nelle vocali /u/ e /o/. Il restringimento
alla glottide produce l’effetto contrario, portando le formanti e cioè l’energia verso le alte
frequenze e rendendo il suono più brillante. (adattamento da Stevens, 1998)

6 – Pronunciare la sequenza delle vocali /u/ e /i/ (oppure anche /o/ e /a/) legate fra di loro, ma
alternate parecchie volte l’una con l’altra. Si ottengono così il bordone e le armoniche in
sequenza ascendente o discendente, secondo la volontà del cantante.
7 – Si varia la posizione delle labbra o quella della lingua per modulare la melodia delle
armoniche. Una concentrazione muscolare forte permette di far emergere la difonia con più

Fig.5 Tecnica a 2 cavità: la punta della lingua si muove lungo il palato, dividendo il tratto vocale
in 2 risuonatori.

Con questa tecnica, si divide il tratto vocale in due risuonatori distinti, ognuno dei quali
accordato sulla propria lunghezza d’onda: come si vede dalla Fig. 1 e 4, lo spostamento delle
formanti non è più concorde, ma dipende dal punto in cui la lingua si posiziona. Se si suppone ad
esempio che la strozzatura sia ad un terzo della lunghezza complessiva (più o meno 6 cm), si
ottiene uno spostamento della 1° formante verso il basso (sempre relativamente alla posizione
ideale di un tubo ideale uniforme, 500 Hz, vedi Fig. 2), mentre la 2° formante si sposta molto
verso l’acuto. Questo è una situazione che si verifica in una /i/, ad esempio. In questo caso,
rispetto alla tecnica con una cavità, l’armonico difonico è esaltato dalla 2° formante e dunque il
rango di variazione potrà essere molto più esteso che nel caso precedente (Fig. 1). Teoricamente
(ma lo si verifica anche sperimentalmente), l’armonico udibile può arrivare ai 2800 Hz, per cui
l’armonico più elevato che si può percepire può arrivare al 18°-20°, a seconda dell’altezza della
fondamentale di partenza. Con questa tecnica si evita il problema dell’irradiazione dell’energia
sonora dalla bocca, per cui non c’è la medesima necessità di nasalizzare il suono vocale, se non
per attenuare ulteriormente le parziali basse e migliorare la percezione di quelle elevate. Rimane
comunque l’esigenza di disporre di componenti sonore (soprattutto quelle più elevate in
frequenza) con energia sufficiente per essere udibili distintamente. Questo spiega la necessità
delle contrazioni muscolari appena descritte. La laringe produce un suono pressato, con la
ipercontrazione delle corde vocali e delle false corde (che arrivano a coprire le corde vere, anche
per un avvicinamento delle aritenoidi al piede dell’epiglottide) (Fig. 15).
La selezione dell’armonico può avvenire in tre modi distinti: 1 – Si può spostare la punta della lingua avanti o indietro lungo il palato (come avviene nello
stile Khomei), senza rigonfiarla. Lo spostamento verso i denti permette di selezionare le
armoniche più acute e lo spostamento verso il velo le armoniche più gravi (Fig. 5).
2 – Si può tener fissa la posizione della punta della lingua dietro i denti e muovere il corpo e la
base della lingua, gonfiandola verso il velo palatino o abbassandola fra i denti (stile Sigit).
3 – Una terza possibilità prevede di muovere la radice della lingua a livello della gola piuttosto
che lungo il palato. Si muove la base della lingua in avanti fino a far comparire le vallecule
glosso-epiglottiche (spazi fra la radice della lingua e l’epiglottide), facendo emergere gli
armonici medio-alti. Per gli armonici più alti, l’epiglottide oscilla in avanti chiudendo le
vallecule (Levin et al., 1999).
In ogni caso leggeri movimenti delle labbra permettono di aggiustare in maniera più
precisa la posizione della formante sull’armonico voluto.
Tran Quang Hai ha scoperto anche un altro metodo per produrre scale di armonici, che
consiste nel tener la lingua fissa pressata con i molari superiori e di articolare ciclicamente il
solito passaggio vocalico /u/ e /i/. Gli armonici prodotti sono molto acuti e coprono un rango che
può andare da 2000 a 3500 Hz. Questo metodo ha un interesse puramente dimostrativo delle
possibilità di difonia, visto che non permette la selezione della nota voluta (Tran Quang, 1991a).
Fig. 6 Nel canto Kargiraa (sinistra) le aritenoidi entrano in vibrazione, a differenza del canto
tibetano (destra).

Stile Kargiraa
In questo stile di canto le fondamentali sono in un registro estremamente basso (fino al
La 55 Hz, ma anche sotto). Il suono prodotto è molto intenso e ricco di componenti armoniche
(Fig. 7). Il canto utilizza la 6°, 7°, 8°, 9°, 10° e 12° parziale, corrispondenti a Sol4 392 Hz, La#4-
457 Hz, Do5 523 Hz, Re5 588 Hz, Mi5- 654 Hz, Sol5 784 Hz, quando la fondamentale sia un Do
65.4 Hz. La selezione dell’armonico è fatta mediante l’articolazione di una particolare vocale
(/u/, /o/, //, /a/, ecc.), che il cantante ha imparato ad associare con la nota voluta. In questo canto
possono entrare in vibrazione anche le strutture sopraglottiche (le cartilagini aritenoidi, le false
corde vocali, le pliche ariepiglottiche che connettono le aritenoidi all’epiglottide, e il piede
dell’epiglottide) (Levin et al., 1999), con una fondamentale che è una ottava sotto il registro della
voce normale, ma che può arrivare ad una ottava ed una quinta sotto al normale (Fuks et al.,
1998). Nel caso degli esempi di Kargiraa cantati daTran Quang Hai, abbiamo riscontrato con la fibroendoscopia che le aritenoidi entrano in vibrazione, pressate tra loro e contro il piede
dell’epiglottide, nascondendo completamente le corde vocali(Fig. 6). L’onda mucosa della
“nuova glottide” viene prodotta nella fessura tra le due aritenoidi (Fig. 6). Una situazione
analoga si viene a realizzare negli operati di laringectomia sub-totale, in cui vengono asportate le
corde vocali e parte dell’epiglottide, lasciando intatte solo le aritenoidi. In effetti il timbro della
voce nel Kargiraa ricorda quello dei laringectomizzati.

Fig. 7 Tuva: Vasili Chazir canta “Artii-sayir” nello stile Kargiraa (CD Smithsonian/Folways 18)
La fondamentale è un Si1 61.2 Hz. Gli armonici difonici sono 6° (Fa#4- 367 HZ), 8° (Si4 490
Hz), 9° (Do#5 550 Hz), 10° (Re#5- 612 Hz) e 12° (Fa#5- 734 Hz). Chiaramente visibili fra 950 e
1600 Hz gli armonici in ottava con quelli difonici. Attorno ai 2600-2700 Hz si nota una ulteriore
zona formantica che amplifica la 43° e 44° armonica

Ci sono varianti di Kargiraa nella tradizione Tuva: il Kargiraa della Montagna (Dag
Kargiraa), praticato sulle montagne producendo un eco e cantando con esso, e il Kargiraa della
Steppa (Xovu Kargiraa), usato quando si cavalca con il vento che entra nell’angolo della bocca e
amplifica gli armonici. Il Kargiraa della Montagna utilizza il registro più grave e aggiunge la
nasalizzazione del suono. Si caratterizza per una risonanza di petto e una tensione sulla gola più
moderata. il Kargiraa della Steppa si differenzia per le fondamentali più elevate, una contrazione
maggiore della gola, e una risonanza di petto minima. Un terzo tipo di Kargiraa è quello detto
del “ventaglio” (Chelbig Kargiraa), che prende in nome dall’uso di un ventaglio usato per
produrre un flusso d’aria davanti alla bocca e generare vari effetti di Kargiraa.
Si deve distinguere il Kargiraa dei Tuva o dei Mongoli dal canto dei monaci tibetani, in
cui la frequenza fondamentale bassa (circa 60 Hz) è invece ottenuta con il massimo rilassamento
o allentamento possibile delle corde vocali, e in cui non si verifica la vibrazione delle strutture
sopraglottiche, che risultano anzi contratte (Fig. 6). Il canto tibetano può invece rientrare nella
categoria stilistica del Borbannadir con fondamentali basse. Una ulteriore distinzione va fatta con l’effetto di friggio o crepitio (vocal fry o creaky
voice), caratterizzato da un timbro metallico, che si può ottenere con pulsazioni glottiche di varia
frequenza (anche molto bassa) e che però non presenta difonia (vedere un repertorio completo in
Kavash, 1980).

Stile Borbannadir
Questo stile è caratterizzato da fondamentali nel registro basso o baritonale. Si distingue
dal Kargiraa per fondamentali un po’ più elevate (Fig. 8), per la risonanza più nasale, e per una
pulsazione ritmica, con cui i cantanti imitano il mormorio dell’acqua nei ruscelli (Fig. 9), il
cinguettio degli uccelli, ecc. Il termine Borbannadir significa in effetti “rotolare” e indica tanto
l’effetto di trillo delle armoniche, come il suoni più grave nei testi antichi. Il cantante riesce a
creare un effetto di trifonia fra la fondamentale, un primo livello di armoniche a quinte parallele
(rinforzando il 3° armonico) e un secondo livello con il tremolo delle armoniche superiori (Fig.
9). Per quanto riguarda il suono glottico, non c’è l’intervento delle strutture sopra-laringee.
Proprio per la parentela con il Kargiraa, il cantante può passare da uno stile all’altro nello stesso
brano musicale.

Fig. 8 – Tuva, Stile Borbannadir: la fondamentale è un Fa#2 92 Hz molto attenuato. La
pulsazione di circa 6 Hz è evidente soprattutto sulla 8°,9° e 11° armonica

Stile Khomei
Khomei (che significa gola o faringe) è il termine usato per indicare il Canto Difonico in
generale, ma anche una modalità distinta dalle altre. È considerato lo stile più antico da molti
cantanti Tuva ed è quello che si è imposto per la sua facilità e dolcezza tecnica in tutto
l’Occidente. Il canto Khomei è caratterizzato da una vibrazione glottica normale e rilassata, senza
ipercontrazione delle aritenoidi (come ad esempio nello stile Sigit, vedi Fig. 11), e dal
rilassamento dei muscoli addominali. Alcuni cantanti utilizzano anche abbellimenti come il
Fig. 9 – Tuva: Anatoli Kuular – stile Borbannadir con fondamentale acuta (Mi3+ 169 Hz). Si
tratta di una trifonia, data dalla fondamentale, il 3° armonico molto forte (Si4+ 507 Hz) in
intervallo di 5° con il Mi3) e il tremolo evidente soprattutto sulla 6° armonica (Si5+ 1014 Hz)

Fig. 10 – Tuva: stile Khomei. La fondamentale è un Fa#3+ 189 Hz piuttosto debole. Gli armonici
usati qui sono 6°, 7°, 8°, 9°, 10° e 12°, corrispondenti a Do#6+ 1134, Mi6 1323 Hz, Fa#6+ 1512
Hz, Sol#6+ 1701 Hz, La#6+ 1890 Hz, Do#7+ 2268 Hz.
Fig. 11 Le aritenoidi nel canto difonico con tecnica Khomei sono in una posizione più arretrata
rispetto allo stile Sigit (fig. Xxx). Il piano glottico è visibile e mostra le corde vocali nella fase di
chiusura del ciclo vibratorio.

Stile Ezengileer
La parola Ezengileer significa “staffa” e vuole indicare che questo stile è caratterizzato da
variazioni ritmiche simili al suono che le staffe metalliche producono sotto l’appoggio periodico
dei piedi quando si sta galoppando (Fig. 12). L’Ezengileer è una variante dello stile Sigit,
caratterizzato da oscillazioni ritmiche veloci fra le armoniche difoniche. C’è una grande varietà
di timbro da un cantante all’altro, uniti da questo elemento comune che è il ritmo “del cavallo”.
Attualmente è raro sentirlo eseguire ed è giudicato uno stile piuttosto difficile
Fig. 12 – Tuva, Stile Ezengileer
La fondamentale è un La#2 117 Hz Fig. 13 Tuva: stile Sigit.La fondamentale è un Mi3+ 167 Hz di intensità molto debole. Gli
armonici usati qui sono 8°, 9°, 10° e 12°, corrispondenti a Mi6+ 1336 Hz, Fa#6+ 1503 Hz,
Sol#6+ 1670 Hz, Si6+ 2004 Hz. Si nota la scansione ritmica dovuta al passaggio veloce verticale
fra le armoniche, con una periodicità variabile di circa 900 ms. È presente una seconda zona di
risonanza in alta frequenza attorno ai 3000-3200 Hz

Fig. 14 Mongolia: Ganbold canta un Kevliin Xöömi (Xöömi di ventre, simile allo stile Sigit Tuva)
La fondamentale è un Sol# 208 Hz piuttosto debole. Gli armonici usati qui sono 6°, 7°, 8°, 9°,
10° e 12°, corrispondenti a Re#6 1248 Hz, Fa#6- 1456 Hz, Sol#6 1664 Hz, La#6+ 1872 Hz,
Do7- 2080 Hz, Re#7 2496 Hz. Presenza di un vibrato molto ampio con una modulazione di
frequenza di circa 6 Hz
Stile Sigit
Sigit significa “fischio” ed in effetti questo stile è caratterizzato da una difonia, in cui la
fondamentale e le armoniche basse sono molto indebolite e poco percepibili. L’armonico esaltato
dalla risonanza sovrasta il bordone con un suono flautato (Fig. 13-14). In genere il brano
comincia con un testo cantato, senza armonici percepibili. Alla fine della frase, il cantante intona
il bordone su una fondamentale media (da Mi3 165 Hz a La3 220 Hz), su cui costruisce la linea
melodica delle armoniche. In genere gli intervalli cantati corrispondono alla 9°, 10°, 12°
armonica, ma si ascoltano anche melodie sulla 8°, 9°, 10°, 12° e 13° parziale.
Questo stile richiede una pressione notevole sul diaframma e una ipercontrazione della
glottide. Il posizionamento della lingua è particolarmente critico dovendo selezionare armonici in
alta frequenza (fino a 2800 Hz circa) e dunque molto vicini fra di loro. La fibroendoscopia sulla
laringe di Tran ha mostrato una posizione delle aritenoidi molto avanzato a coprire quasi le corde
vocali (fig. 15). L’effetto della costrizione del tratto vocale alla glottide è stato illustrato in fig. 3:
l’energia spettrale è spostata sulle alte frequenze, attenuando la fondamentale e le armoniche

Fig. 15 Stile Sigit. Le aritenoidi si spostano marcatamente in avanti fino a nascondere il piano
glottico. L’energia spettrale si distribuisce sulle alte frequenze attenuando la fondamentale e le
componenti basse.

Aksenov, A.N. (1964). “Tuvinskaja narodnaja muzyka”, Mosca.
Aksenov, A.N. (1967). “Die stile der Tuvinischen zweistimmigen sologesanges”, Sowjetische
Volkslied- und Volksmusikforschung , pp. 293-308, Berlin.
Aksenov, A.N. (1973). “Tuvin folk music”, Journal of the Society for Asian Music, Vol. 4, n. 2,
pp. 7-18, New York.
Bregman, A. (1990). Auditory scene analysis: the perceptual organization of sound, MIT Press,
Dargie, D. (1985). “Some Recent Discoveries and Recordings in Xhosa Music”, 5th Symposium
on Ethnomusicology, University of Cape Town, International Library of African
Music (ed), pp. 29-35, Grahamtown. Desjacques, A. (1985). “Une considération phonétique sur quelques techniques vocales
diphoniques mongoles”, Bulletin du Centre d’Etudes de Musique Orientale, 31, pp. 46-
55, Paris.
Dmitriev, L. – Chernov, B. – Maslow, V. (1983). “Functioning of the voice mechanism in double
voice Touvinian singing”, Folia Phoniatrica, Vol. 35, pp. 193-197.
Fant, G. (1960). Acoustic theory of speech production, Mouton, The Hague.
Ferrero F. – Croatto L. – Accordi M. (1980). “Descrizione elettroacustica di alcuni tipi di
vocalizzo di Demetrio Stratos”, Rivista Italiana di Acustica, Vol. IV, n. 3, pp. 229-258.
Ferrero, F., Ricci Maccarini, A., Tisato, G. (1991). “I suoni multifonici nella voce umana”, Proc.
XIX Convegno AIA, Napoli, pp. 415-422.
Fletcher, H., Munson, W.A. (1933). “Loudness, Its Definition, Measurement and Calculation”,
Vol. 5, 2, pp. 82-108.
Fuks L., Hammarberg B.,Sundberg J. (1998): “A self-sustained vocal-ventricular phonation
mode: acoustical, aerodynamic and glottographic evidences”, KTH TMH-QPSR 3/1998,
pp. 49-59, Stockholm
Garcia, M. (1847). Traitè complet de l’art du chant, Paris.
Gunji, S. (1980): “An acoustical consideration of Xöömij”, Musical Voices of Asia, pp. 135-141,
The Japan Foundation (ed), Heibonsha Ltd, Tokyo.
Hamayon, R. 1980: “Mongol Music”, New Grove’s Dictionary of Music and Musicians 12, pp.
482-485, Stanley Sadie (ed), MacMillan Publishers, Londres.
Harvilahti, L. (1983). “A Two Voiced Song With No Word”, Suomalais-ugrilaisen seuran
aikakauskirja 78, pp. 43-56, Helsinki.
Kavasch D. (1980). “An introduction to extended vocal techniques”, Report of CME, Univ. of
California, San Diego, Vol. 1, n. 2, pp. 1-20, con cassetta di esempi sonori.
Jakobson, R. (1968). Child language, aphasia and phonological universe, La Hayes, Mouton.
Laneri, R. (1983). “Vocal techniques of overtone production”, NPCA Quarterly Journal, Vol XII,
n. 2-3, pp. 26-30.
Laneri, R. (2002). La voce dell’arcobaleno, Ed. Il Punto d’Incontro, Vicenza.
Leotar, F. (1998). “Etudes sur la musique Touva”, maîtrise de l’Université de Nanterre – Paris X,
128 pages, 2 cassettes.
Leothaud, G. (1989). “Considérations acoustiques et musicales sur le chant diphonique”, dossier
n° 1, Le chant diphonique, pp. 17-43, Institut de la Voix, Limoges.
Levin, Th. – Edgerton, M. (1999). “The Throat Singers of Tuva”,
Pailler, J.P. (1989). “Examen video du larynx et de la cavité buccale de Monsieur Trân Quang
Hai”, dossier n°1, Le Chant Diphonique, pp. 11-13, Institut de la Voix, Limoges.
Pegg, C. (1992). “Mongolian conceptualizations of Overtone Singing (Xöömii)”, The British
Journal of Ethnomusicology (1), pp. 31-53, Londres.
Plomp, R. (1967). “Pitch of complex tones”, JASA, Vol 41 (6), pp. 1526-1533.
Rachele, R. (1996). “Overtone Singing Study Guide”, Cryptic Voices Productions (ed), pp. 1-
127, Amsterdam .
Sauvage, J.P. (1989). “ Observation clinique de Monsieur Trân Quang Hai”, dossier n° 1, Le
Chant diphonique, pp. 3-10, Institut de la Voix, Limoges.
Smith, H., Stevens, K.N., Tomlinson, R.S. (1967). “On an unusual mode of singing of certain
Tibetan Lamas”, JASA. 41 (5), pp. 1262-4, USA.
Stevens K. (1998), Acoustic Phonetics, MIT Press, Cambridge. Sundberg, J. (1987). The science of the singing voice, Northern Illinois University Press, De
Kalb, Illinois.
Tisato, G. (1989a), “Analisi e sintesi del Canto Difonico”, Proc. VII Colloquio di Informatica
Musicale (CIM), Cagliari, pp. 33-51, 1989.
Tisato, G. (1989b), “Il canto degli armonici”, in Culture Musicali, Quaderni di Etnomusicologia,
Ed. La Casa Usher, Vol. 15-16, pp. 44-68.
Tisato, G. – Ricci Maccarini, A.R. (1991). “Analysis and synthesis of Diphonic Singing”,
Bulletin d’Audiophonologie, Vol. 7, n. 5&6, pp. 619-648, Besançon.
Tongeren, M. Van (1994). “Xöömij in Tuva: new developments, new dimensions”, Thèse de
maîtrise, Ethnomusicologisch Centrum “Jaap Kunst”, Universiteit van Amsterdam.
Tongeren, M. Van (1995). “A Tuvan perspective on Throat Singing”, Oideion, The Performing
Arts Worldwide, 2, pp. 293-312, Université de Leiden.
Tran Quang Hai (1975). “Technique de la voix chantée mongole: Xöömij”, Bulletin du CEMO,
n. 14 & 15, pp. 32-36, Paris.
Tran Quang Hai – Guilou, D. (1980). “Original research and acoustical analysis in connection
with the xöömij style of biphonic singing”, Musical Voices of Asia, pp. 162-173, The
Japan Foundation (ed), Heibonsha Ltd, Tokyo.
Tran Quang Hai (1989). “Réalisation du chant diphonique”, dossier n°1, Le Chant diphonique,
pp. 15-16, Institut de la Voix, Limoges.
Tran Quang Hai – Zemp, H. (1991a). “Recherches expérimentales sur le chant diphonique”,
Cahiers de Musiques traditionnelles, Vol. 4, pp. 27-68, Genève.
Tran Quang Hai (1991b). “New experimental about the Overtone Singing style”, Bulletin
d’Audiophonologie, Vol. 7, n. 5&6, pp. 607-618, Besançon.
Tran Quang Hai (1995). “ Le chant diphonique: description, historique, styles, aspect acoustique
et spectral”, EM, Annuario degli Archivi di Etnomusicologia dell’Accademia
Nazionale di Santa Cecilia, n. 2, pp. 123-150, Roma.
Tran Quang Hai (1997a). “Recherches introspectives sur le chant diphonique et leurs
applications”, Penser La Voix, (ed) La Licorne, pp. 195-210, Poitiers.
Tran Quang Hai (1997b). “ Overtones in Central Asia and in South Africa (Xhosa Vocal Styles),
Proceedings of the First South African Music and Dance Conference and 15th
Symposium on Ethnomusicology, pp. 422-432, Univ. de Cape Town, Afrique du Sud.
Tran Quang Hai (1998). “ Survey of overtone singing style”, Die Ausdruckswelt der Stimme, 1-
Stuttgarter Stimmtage/ Horst Gunderman, (ed) Hüthig, pp. 77-83, Allemagne.
Tran Quang Hai (1999). “Overtones used in Tibetan Buddhist Chanting and in Tuvin
Shamanism”, Ritual and Music, Lithuanian Academy of Music, Department of
Ethnomusicology, pp. 129-136, Vilnius.
Tran Quang Hai (2000). “Musique Touva”, http://www.baotram.ovh.org\tuva.html
Vlachou, E. (1985). “Recherches vocales contemporaines: chant diphonique”, Thèse de maîtrise,
Université de Paris VIII-Saint Denis, direction de D. Charles, 90 pages, Paris.
Walcott, R. (1974). “The Chöömij of Mongolia – A spectral analysis of overtone singing”,
Selected Reports in Ethnomusicology 2 (1): 55-59, UCLA, Los Angeles.
Whitfield, I. C. (1978). “The neural code”. In Handbook of perception, (ed) Carterette &
Friedman, Academic, Vol IV, 5, New York.
Zue, V. (1989). Acoustic theory of speech production, preliminary draft, Dep. Electrical Eng. &
Computer Science, MIT, Cambridge.
Zwicker, E., Flottorp, G., Stevens, S. S. (1957). “Critical bandwidth in loudness summation”.
JASA, Vol. 29 (5), pp. 548-557. Discografia
“Tuva: Voices from the center of Asia”, Smithsonian Folkways CD SF 40017, Washington,
USA, 1990.
“Tuva: Voices from the land of the eagles”, Pan Records, PAN 2005 CD, Leiden Hollande,
“Tuva- Echoes from the spirit world”, Pan Records, PAN 2013CD, Leiden, Hollande, 1992.
“Tuvinian singers and musicians – Ch’oomej: Throat singing from the center of Asia”, World
Network, vol.21, USA, 1993.
“Huun Huur Tu/ Old songs and tunes of Tuva”, Shanachie 64050, USA, 1993.
“Huun Huur Tu / The orphan’s lament”, Shanachie 64058, USA, 1994.
“Shu-De, Voices from the distant steppe”, Womad production for RealWorld, CD RW 41, Pays
Bas, 1994.
“Musiques traditionnelles d’Asie centrale/ Chants harmoniques Touvas”, Silex Y 225222, Paris,
France, 1995.
“Shu-de / Kongurei/ Voices from Tuva”, New Tone NT6745, Robi Droli, San Germano, Italia,
“Chirgilchin: The wolf and the kid”, Shanachie Records, USA, 1996.
“Deep in the Heart of Tuva”, Ellipsis Arts, USA, 1996.
“Huun Huur Tu – If I’d been born an eagle”, Shanachie Records, USA, 1997.
“Mongolie: Musique et Chants de tradition populaire”, GREM G 7511, Paris, France, 1986.
“Mongolie : Musique vocale et instrumentale”, Maison des Cultures du Monde, W 260009,
collection INEDIT, Paris, France, 1989.
“Mongolian Music”, Hungaroton, HCD 18013-14, coll. UNESCO, Budapest, Hongrie, 1990.
“White Moon, traditional and popular music from Mongolia”, Pan Records, PAN 2010CD,
Leiden, Hollande, 1992.
“Folk Music from Mongolia / Karakorum”, Hamburgisches Museum für Völkerkunde,
Hambourg, Allemagne, 1993.
“Vocal & instrumental of Mongolia”, Topic, World Series TSCD909, Londres, Grande Bretagne,
“Jargalant Altai/ Xöömii and other vocal and instrumental music from Mongolia”, Pan Records
PAN 2050CD, Ethnic Series, Leiden, Hollande, 1996
“Uzlyau : Guttural singing of the Peoples of the Sayan, Altai and Ural Mountains”, Pan Records
PAN 2019CD, Leiden, Hollande, 1993.
“Chant épiques et diphoniques: Asie centrale, Sibérie, vol 1”, Maison des Cultures du Monde, W
260067, Paris, France, 1996.
“The Gyuto Monks: Tibetan Tantric Choir”, Windham Hill Records WD-2001, Stanford,
Californie, USA, 1987.
“The Gyuto Monks: Freedom chants from the roof of the world”, Rykodisc RCD 20113, Salem,
Maryland, USA, 1989.
“Tibet: The heart of Dharma/ Buddha’s teachings and the music they inspired”, Ellipsis Arts
4050, New York, USA, 1996.
“Le chant des femmes Xhosa”, The Ngqoko Women’s Ensemble, VDE, CD 879, 1996.



Perception of Overtone Singing

Chen-Gia Tsai

Pitch strength

Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.

A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.

Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation

Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.

The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.

Perception of rapid fluctuations

Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.

Return to Mongolian Khoomii Singing main page


CHEN-GIA TSAI : Perception of Overtone Singing, TAIWAN

TRAN QUANG HAI & DENIS GUILLOU: Original Research and Acoustical Analysis in connection with the Xöömij Style of Biphonic Singing, FRANCE


Original Research and Acoustical Analysis in connection

with the Xöömij Style of Biphonic Singing

Tran Quang Hai , Centre National de la Recherche Scientitique, Paris 1980

Denis GUILLOU, Conservatoire. National des Arts et Métiers, Paris


The present article is limited in its scope to our own original research and to acoustical analysis of biphonic singing, this is preceded by a summary of the various terms proposed by different researchers. The first half the article concerning xöömij technique was written by Tran Quang Hai. Guillou has written the second half concerning acoustical analysis.


Until the present time it has not been possible to confirm that the centre, of biphonic singing within Turco‑Mongol culture is in fact Mongolia. Biphonic singing is also employed by neighbouring peoples such as the Tuvins (Touvins), Oirats, Khakass, Gorno‑Altais and Baschkirs; it is called kai by the Altais, uzliau by the Baschkirs, and the Tuvins possess four different styles called, sygyt, borbannadyr, ezengileer and kargyraa. A considerable amount of research is at present being carried out throughout the world into this vocal phenomenon, particularly as it is practised in Mongolia.


Research can be carried out in various ways: by means of observation of native performers after one or more visits to the country concerned, or by means of practical instrumental or vocal studies aimed at a better understanding of the musical structure employed by the population being studied. My own research does not belong to either of these two categories since I have never been to Mongolia and I have never learned the xöömij style of biphonic singing from a Mongolian teacher. What 1 shall describe in this article is the result of my own experience which will enable anybody to produce two simultaneous sounds similar to Mongolian biphonic singing.



Simultaneous two‑part singing by a single person is known in the Mongol language as xöömij (liter­ally “pharynx”). The manner in which the Mongol word is transcribed is by no means uniform; ho­mi, ho‑mi, (Vargyas 1968), khomi, khöömii, (Bosson 1964: 11), xomej, chöömej, (Aksenov 1964) chöömij, (Vietze 1969:15‑16. Walcott 1974) xöömij, (Hamayon 1973). French researchers have used other terms to describe this particular vocal technique such as chant biphonique or diphonique (Leipp 1971, Tran Quang Hai 1974). voix guimbarde. voix dédoublee (Heitfer 1973, Hamayon 1973), and chant diphonique solo (Marcel‑Dubois 1979). Several terms exist in English such as split‑tone sing­ing, throat singing and overtone singing, and in Germansweistimmigen Sologesang.


For convenience 1 have employed in this article the term biphonic singing to describe a style of singing realized by a single person producing simultaneously a continuous drone and another sound at a higher pitch issuing from a series of partials or harmonies resembling the sound of the flute.


Origin of My Research

In 1971, the date of my first contact with Mongolian music in the form of recordings made in Mongolia between 1967 and 1970 by Mrs. Roberte Hamayon, researcher at the Centre National de la Rech­erche Scientifique and especially after listening to a tape on which were recorded three pieces in the biphonic singing style, I was struck by the extraordinary and unique nature of this vocal technique.


For several months I carried out bibliographical research into articles concerned with this style of singing with the aim of obtaining information on the practice of biphonic singing, but received little satisfaction. Explanations of a merely theoretical and sometimes ambiguous nature did nothing so much as to create and increase the confusion with which my research was surrounded. In spite of my complete ignorance of the training methods for biphonic singing practised by the Mongols, the Tuvins and other peoples, I was not in the least discouraged by the negative results at the beginning of my studies after even several months of effort.


Working Conditions

According to Hamayon, the xöömij, which exists throughout Mongolia but is gradually dying out, is practised exclusively by men. It represents an imitation, by means of a single voice of two instruments, the flute and the Jew’s harp.


The xöömij refers to the simultaneous production of two sounds, one similar to the fundamental produced on the Jew’s harp (produced at the back of the throat), and the other resulting from a modifi­cation of the buccal cavity without moving the lips which remain only slightly open; positioning the lips as for a rear vowel results in a low sound, whereas front vowel positioning produces a high sound (Hamayon 1973), a technique similar to that used by the Tuvins (Aksenov 1964). The cheeks are tightened to such a degree that the singer breaks out into a sweat. It is the position of the tongue which determines the melody. Anybody who possesses this technique is able to copy any tune (Hamayon 1973).


I worked entirely alone groping my way through the dark for two years, listening frequently to the recordings made by Hamayon stored in the sound archives of the ethnomusicology department of the Musee de I’Homme. My efforts were however to no avail. Despite my efforts and knowledge of Jew’s harp technique, the initial work was both difficult and discouraging. 1 also tried to whistle while producing a low sound as a drone. However, checking on a sonograph showed that this was not similar to the xöömij technique. At the end of 1972 I got to the stage that I was able to produce a very weak harmonic tone which when recorded on tape, showed that 1 was still a long way from my goal.Then, one day in November 1973, in order to calm my nerves in the appalling traffic congestion of Paris, I happened to make my vocal chords vibrate in the pharynx with my mouth half open while ­reciting the alphabet. When I arrived at the letter L and the tip of my tongue was about to touch the top of my mouth, I suddenly heard a pure harmonic tone, clear and powerful. I repeated the operation several times and each time I obtained the same result. I then tried to modify the position of the tongue in relation to the foot of the mouth while maintaining the low fundamental. A series of partials resonated in disorder inside my ears.


At the beginning I obtained the harmonics of a perfect chord. Slowly but surely, after a week of inten­sive work, by changing the fundamental tone upwards or downwards, 1 had managed to discover all by myself a vocal jaw’s harp technique or biphonic singing style which appeared to be similar to that used by the Mongols and the Tuvins.


Basic Techniques

After two months of research and numerous experiments of all kinds I was able to establish some of the basic rules for the realization of what I call biphonic singing.


1) Half open the mouth.

    2) Emit a natural sound on the letter A without forcing the voice and remaining in the middle part of the vocal range (between F and A below    

         middle C for men, and between F and A above middle C for women).

3) Intensify the vocal production while vibrating the vocal chords.

4) Force out the breath and hold it for as long as possible.

      5) Produce the letter L. Maintain the position with the tip of the tongue touching the roof of the mouth.

      6) Intensify the tonal volume while trying to keep the tongue stuck firmly against the palate in order to divide the mouth into two cavities, one at the back  

          and one at the front, so that the air column increases in volume through the mouth and the nose.

      7) Slowly pronounce the sounds represented by the phonetic signs “i” anti “u” while varying the position of the lips.

8) Modify the buccal cavity by changing the position of the tongue inside the mouth without inter­rupting or  

    changing the height of the fundamental already amplified by the vibration of the vocal chords.

9)     In this way it is possible to obtain both the drone arid the partials or harmonics either in ascending or descending order according to the desire of the



For beginners the harmonics of the perfect chord (C. E. G. C) are easy to obtain. However, a considerable amount of hard work is necessary especially to obtain a pentatonic anhemitonic scale. Every person has his favourite note which permits him to produce a large range of partials. This favourite funda­mental tone varies according to the tonal quality of the singer’s voice and his windpipe. It often happens that two people using the same fundamental tone do riot necessarily obtain the same series of partials.


Regular practice and the application of the basic techniques which 1 have just described above per­mitted me to acquire a range of between an eleventh and a thirteenth according to the choice of the drone. Biphonic singing can also be practised by women and children, and several successful exper­iments have been carried out in this connection.


Other experiments which I have been carrying out recently indicate that it is possible to obtain two simultaneous sounds in two other ways. In the first method, the tongue may be either flat or slightly curved without actually at any stage touching the root of the mouth, and only the mouth and the lips move. Through such variation of the buccal cavity, this time divided into a single cavity it is possible to hear the partials faintly.


In the second method the basic technique described above is used. However instead of keeping the mouth half open it is kept almost completely shut with the lips pulled back and very tight. To make the partials audible, the position of the lips is varied at the same time as that of the tongue. The partials are very clear and distinctive, but the technique is rather exhausting and it is not possible to sing for a long time using it.


In the northeast of Mongolia in the borderland area between Mongolia and Siberia live the Tuvins, a people of Turkish origin numbering one hundred thousand. The Tuvins possess not only the biphonic singing style used by the Mongols, but four other different styles within this genre, called svgyt, ezengileer. kargyraa and borbannadyr. Table 1 will facilitate comparison between these four styles.


Biphonic singing is also practised by a number of ethnic groups in the republics of the Soviet Union bordering on Mongolia.


The late John Levy made a recording in Rajasthan in 1967 on which can be heard an example of biphonic singing similar to that practised by the Mongols and the Tuvins (1). The virtuoso performer in the recording imitates the double flute called the satara (an instrument producing simultaneously a drone and a melody) or the Jew’s harp with his voice. However, this may well be an exceptional ex­ample in that no mention is ever made of biphonic singing techniques in the musical traditions of Rajasthan or elsewhere in India.


Tibetan monks, particularly those in the monasteries of Gyume and Gyuto(2), make use of a technique using two simultaneous voices, although this technique is far less developed than that used by the Mongols and the Tuvins. The low register of the drone makes it impossible to produce harmonics as clear and resonant as those emitted by the Mongols and the Tuvins, and furthermore the production of harmonics is not the aim of Tibetan Buddhist chant.


In Western contemporary music groups of singers have also succeeded in emitting two voices at the same time and vocal pieces have been created in the context of avant‑garde music (3) and in recent years of electronic music (4).


An X-ray film was mode for the first time in 1974 at the Centre Medico‑chirurgical of the Porte do Choisy in Paris at the request of Professor S. Borel‑Maisonny, speech therapist and of Professor Emile Leipp, acoustician. This film which was made with the cooperation of the present author made it possible to examine closely the internal functioning and placement of the tongue during biphonic singing, and was thus of great interest. Thanks to this film the author has improved his biphonic singing technique as a result of which he has been able to decrease the volume of the drone and in­crease that of the harmonics.


Table 1 Characteristics of the biphonic singing styles of the Tuvins


                   sygyt                      ezengileer         kargyraa              borbannadyr

Pitch of      Changes in the      No change        No change,            No change

the drone or                              course of singing                            although sometimes

fundamental                                                                                     lowered by a minor



Tonality      More intense and  Same as sygyt   low                        Soft

                   higher than that of

                   the kargyraa style

Position     Half open              Half open          Half open              Almost closed

of tile



Harmonics  8, 9, 10 for uneven                         (6), 8, 9, 10,         (6), 8,.9, 10,11.2 6, 7, 8, 9,10, 12.

or partials   verses                    11, 12,13                                        13

                    8, 9, 10, 12 for

                    even verses

Special        ‑Harmonics used  ‑Alternation of ‑Each vowel         ‑Occasionally

features       as an ostinato        strong and weak                              corresponds to a three voices with

                    accompaniment.   accents like a    partial                    two used as a

                    thus resulting in a                          gallop rhythm      ‑Psalmodic drone: tonic and

                    narrow range                                   recitation with or fifth (in exceptional

                    ‑in the course of                             without special     cases) and third

                    a. song, at the end                           text on 2 pitches   voice producing

                    of each phrase a                              or drone in 2         harmonics

                    note is held                                     1 positions rising and ‑Called khomei in

                    (fundamental for                            descending by a    certain areas,

                    uneven verses, or a                                                       minor third

                    descending tone                              ‑Called borban­

                    for even verses)                              nadrt in cases

                                                                             when the borban­

                                                                             nadyr is named



Acoustical Analysis‑introduction


The present study is concerned with biphonic singing its understanding and interpretation, and does not constitute a complete and definitive piece of research. In fact the discovery of certain phenomena permits us only to imagine what might be the reality, this being particularly true in relation to the mechanism involved in the production of biphonic singing. Thus it will be necessary to carry out further research in the following areas: psycho ‑acoustics and particularly the perception of pitch and phonatory acoustics.

        Biphonic singing differs from so‑called natural singing on account of its sonority as well as of course the vocal technique involved. As its name indicates it consists of two sounds. On the basis of simple aural observation, it is possible to distinguish a first sound whose pitch is constant and which we shall call the drone and a second sound which takes the form of a melody which the singer can produce at will. It is basically possible for anybody to produce this biphonic sonority but to make the second voice dominate and to trace a melody with it depends upon the talent of the artist.

        Firstly, we shall examine the concept of pitch perception in terms of acoustics and psycho‑acoustics. Secondly we shall try to define biphonic singing, to differentiate it from other vocal techniques and to specify its scope. It will then be worthwhile to formulate several hypotheses concerning the mechanism whereby this style of singing is produced and finally to present a few examples of such a technique.


Pitch Perception

It is first of all necessary to comprehend exactly what is meant by the pitch of sounds or tonality. This concept presents a considerable amount of ambiguity and does not correspond to the simple principle of the measurement of the frequencies produced. The pitch of sounds is related more to psycho-­acoustics than to physics.

          Our own proposals are based partially on the recent discoveries of certain researchers, and partially on observations which we have made ourselves with the help of a sonagraph machine.

         The sonagraph makes it possible for us to obtain the image of the sound which we wish to study. On a single piece of paper is given information concerning time and frequency, and, in accordance with the thickness of the line traced information concerning intensity.

The classical manuals on acoustics tell us that the pitch of harmonic sounds, that is sounds with, for example a fundamental with the frequency F and a series of harmonic, F1, F2. F3…. multiples of F. is determined by the frequency of the first fundamental F. This is not entirely correct in that it is possible to suppress electronically this fundamental without thereby changing the subjective pitch of the actually perceived sound. If this theory were correct an electro‑acoustic chain not reproducing the lowest sound would change the pitch of the sounds. This is evidently not the case since the tonal quality changes but not the pitch. Certain researchers have proposed a theory which would appear to be more coherent: the pitch of sounds is determined by the separation of the harmonic lines or the difference in frequency between two harmonic lines. What is the pitch of the sounds, in this case for sonic spectra with “partials” (harmonics are not complete multiples of the fundamental)? In this case, the individual perceives an average of the separation of the lines in the zone which interests him. This in fact corresponds with the differences in perception which may be observed from one individual to the other (Fig. 1).


Fig. 1 Sonagram representation of three types of sound


a)         Harmonic spectrum: the harmonics are whole multiples of the fundamental.

b) Partials spectrum: the harmonics are no longer whole multiples of the fundamental.

c) Formant spectrum: two harmonics are intense and constitute a formant in the harmonic spectrum.

Formant spectrum: the accentuation in intensity of a group of harmonics constitutes a formant and is thus a zone of frequencies in which there is a large amount of energy.


Taking this formant into consideration a second concept of the perception of pitch comes to light. It has in effect been established that the position of the formant in the sonic spectrum results in the perception of a new pitch. In this case it is no longer a matter of the separation of the harmonic lines in the formant zone but of the position of the formant in the spectrum. This theory should be qualified however, since conditions also have to be considered.


Experiment: Tran Quang Hai sang two C’s an octave apart making his voice carry as if he were addressing a large audience. We observed, using a sonagram, that the maximum energy was situated in the zone perceptible by the human car (3, 4 KHz) and that the formant was situated between 2 and 4 KHz. We then recorded two C’s an octave apart in the same tonality, but this time he used his voice as it addressing a small audience, and we observed the disappearance of this formant (Fig. 2‑a. 2‑b).

In this case the disappearance of the formant does not change the pitch of the sounds. We then rapidly observed that the perception of pitch through the position of the formant was only possible it the formant was very acute for knowing that the sonic energy was only divided on two or three harmonics. Thus if the energy density of the formant is large and the formant is narrow the formant gives in­formation concerning the pitch as well as the overall tonality of the sonic item. Through this expedient we arrive at the biphonic vocal technique.


Fig. 4 Normal singing and biphonic singing

 a) Sonagraph representation of normal singing. An octave passage is equivalent to a doubling of the gap between the harmonic lines and to a drone of double frequency, (The first bar repre­sents the base line of the sonagram, and the drone is represented by the second bar.)

b) Sonagraph representation of biphonic singing. An octave passage is represented by a displace­ment of the formant. The harmonic lines of the formant are displaced in a zone in which the frequency is doubled.


Comparison between Biphonic Technique and Classical Technique


It may be said that biphonic singing consists as its name indicates, of the production of two sounds, one a drone which is low and constant, and the other at a higher pitch consisting of a formant which displaces itself in the spectrum in order to produce a certain melody. The concept of pitch given by the second voice is moreover somewhat ambiguous. The Western ear may need a certain amount of training before becoming accustomed to the sound quality.


Evidence concerning the drone is relatively easy to obtain thanks to the sonagram: it can be seen clearly and is also very clear on an auditory level. The device in Fig. 3 also makes it possible to see a pure amplitude frequency of a constant nature.


Fig. 3. Device for providing evidence of perfect constancy of the drone in intensity and frequency. 

After having examined the fundamental tone we compared two spectra, one of biphonic singing and the other of the so‑called classical singing style, the two being produced by the same singer. The sonagrams of these two types of singing are shown in Fig. 4. Classical singing is characterized by a doubling of the separation of the harmonic lines when an octave is exceeded (a). Biphonic singing is characterized on the other hand by the fact that the separation of the lines remains constant (this was foreseeable since the drone is constant), and that the formant is displaced by an octave (b). In fact it is easy to measure the distance between the lines for each sound. In this case, the perception of the melody in biphonic singing works through the expedient of the displacement of the formant in the sonic spectrum.

         It should be stressed that this is only really possible if the formant is high, and this is obviously so in the case of biphonic singing. The sonic energy is divided principally between the drone and the second voice consisting of two or at the most three harmonics.

         It has sometimes been stated that it is possible to produce a third voice. Using the sonagrarn we have in actual fact established that this third voice exists (see sonograms of Tuvin techniques), but it is impossible to state that it can be controlled. In our opinion this additional voice results more from the personality of the performer than from any particular technique.

As a result of our work we have been able to establish a parallel between biphonic singing and the technique of the Jew’s harp. As in the case of biphonic singing the Jew’s harp produces several different voices, the drone, the main melody and a counter melody. We may consider this third voice as a counter melody which may be produced on a conscious level but can presumably not be controlled. As far as possibility of variation is concerned, biphonic singing is the same as normal singing except in connection with pitch range.

The time of execution is evidently a function of the thoracic cage of the singer and thus of breathing, since the intensity is related to the output of air. Possibility of variation with regard to intensity is on the other hand relatively restricted and the level of the harmonics is connected to the level of the drone. The singer has to try and retain a suitable drone and produce the harmonics as strongly as possible. We have already observed that the clearer the harmonics the more the formant is narrow and intense. We are able furthermore to observe connections between intensity, time and clarity. Possibility of variation in relation to tone quality may pass without comment, since the resulting sound is in the majority of cases formed from a drone and one or two harmonics. The most interesting question is that of pitch range.


It is generally accepted that, for a sensible tonality (in consideration of the performer and of the piece to be performed a singer may modulate or choose between harmonics 5 and 13. This is true but should be stated more precisely. The range is a function of the tonality. If the tonality is on C2, the range represents nine harmonics from the fifth to the thirteenth, this involving a range of a major thirteenth. If the tonality is raised for example to C3 the choice is made between six harmonics, numbers 3 to 8 (see Table 2), representing an interval of an seventh. The following remarks should be made in this context. Firstly, the pitch range of biphonic singing is more restricted than that of normal singing. Secondly, the singer theoretically selects the tonality which he wishes between C2 and C3. In practice however, he instinctively produces a compromise between the clarity of the second voice and the pitch range of his singing, since the choice of the tonality is also a function of the musical piece to be performed. Thus if the tonality is raised, for example to C3, the choice of harmonics is restricted but the second voice is very clear. In the case of a tonality on C2 the second voice is more indistinct while the pitch range is at a maximum. The clarity of the sounds can be explained by the fact that in the first case, the singer is only able to select a single harmonic, whereas in the second case, he may select almost two (see Fig.5). As far as pitch range is concerned, it is known that the movement of the buccal resonators is independent of the tonality of the sounds produced by the vocal chords, or, put in another way. The singer always selects harmonics in the same zone of the spectrum whether the harmonics are broad or narrow.


It results from all this that the singer chooses the tonality instinctively in order to have the maximum range and clarity. For Tran Quang Hai, the best compromise exists between C2 and A2. He can thus obtain a range of between an octave and a thirteenth.


Mechanism for the Production of Biphonic Singing


It is always very difficult to know what is taking place inside a machine when we are placed outside it and can only watch it in operation. This is the case with the phonatory mechanism. The following remarks are only approximate and of a schematic nature and should not be assumed to be the final word on the subject. In dealing by analogy with the phonatory system we can get an idea of the mech­anisms but surely not a complete explanation. Fig. 6 is a representation of the phonatory system which can be compared with Fig. 7, showing an excitation system producing harmonic sounds and a series of resonating systems amplifying certain parts of this spectrum.


A resonator is a cavity equipped with a neck capable of resonating in a certain range of frequencies. The excitation system, i.e., the pharynx and the vocal chords emits a harmonic spectrum consisting of the frequencies F1, F2. F3. F4 … of resonators which select certain frequencies and amplify them. ImageThe choice of these frequencies evidently depends upon the ability of the singer. This is the case when a singer projects his voice within a large hail in that he instinctively adapts his resonators in order to produce the maximum energy within the area in which the ear is sensitive.

It should be noted that the amplified frequencies are a function of the volume of the cavity, the section of the opening and the length of the neck constituting the opening:

Through this principle it is possible to see already the action of the size of the buccal cavity, of the opening of the mouth, and of the position of the lips during singing.

However, this does not tell us anything about biphonic singing. In practice we need two voices. The first, the drone, is given to us simply by virtue of the fact that its production is intense, and that in any case, it does not undergo filtering by the resonators. Its intensity, higher than that of the harmonics, permits it to survive on account of buccal and nasal diffusion. We have observed that as the nasal cavity was closed, so the drone diminished in intensity. This occurs for two reasons, firstly that a source of diffusion is closed through the nose and secondly, by closing the nose the flow of air is reduced, as is the sonic intensity produced at the level of the vocal chords.

The possession of several cavities is of prime importance. In practice, we have established that only coupling between several cavities has enabled us to have a sharp formant such as is required by biphonic singing.

For the purposes of this research we initially carried out investigations into the principle of resonators in order to determine the influence of the fundamental parameters. It was observed that the tonality of the sound rises if the mouth is opened wider. In order to investigate the formation of a sharp formant, we carried out the following experiment. Tran Quang Hai produced two kinds of biphonic singing, one with the tongue at rest. i.e., not dividing the mouth into two cavities and the other with the mouth divided into two cavities. The observation which we made is as follows (an observation which could have been foreseen on the basis of the theory of coupled resonators).


In the first case the sounds were not clear: the drone could be heard distinctly but the second voice was difficult to bear. There was no clear distinction between the two voices, and, furthermore, the melody was indistinct. The cor­responding sonagrams bore this out: with a single buccal cavity the energy of the formant is dispersed over three or four harmonics and so the sense of a second voice is very much on the weak side. On the other hand, when the tongue divides the mouth into two cavities, the formant reappears in a sharp and intense manner. In other words, the harmonic sounds produced by the vocal chords are filtered and amplified in a rough manner with a single buccal cavity and the biphonic effect disappears. Biphonic singing thus necessitates a network of very selective resonators which filters only the harmonics required by the singer. Fig. 8 shows the responses in frequencies of the resonators, both simple and coupled. In the case of a tight coupling between the two cavities, these produce a single and very sharp resonance. If the coupling is loose, the formant has less intensity and the sonic energy in the spectrum is stemmed. If the cavities are transformed into a single cavity, the pointed curve Image

becomes even rounder, and one ends up with the first example with a very blurred type of biphonic singing (tongue at rest). The conclusion can be drawn that the mouth along with the position of the tongue plays the major role, and it can be compared roughly to a pointed filter which changes its place in the spectrum with the sole aim of selecting the interesting harmonics.


We should like to express our gratitude and sincere thanks to Research Team 165 of the Centre Na­tional de la Recherche Scientifique directed by Mr. Gilbert Rouget, who allowed us access to valuable documents concerning biphonic singing stored in the sound archives of his department. Our thanks go also to Professor Claudie Marcel‑Dubois, Head of the Department of Ethnomusicology at tile Musee National des Arts et Traditions Populaires, who gave us a great deal of help and encourage­ment. We should like also to thank Professor Emile Leipp, Dr. Michele Castellango and Professor Solange Borel‑Maisonny, who made it possible for us to examine the internal functioning of biphonic singing by means of the production of a radiographic film.


(Translated from French by Robin THOMPSON)



1. This tape is preserved in the Ethnomusicology Department of the Musee de L’Homnic. Paris. Archive  

    number BM 78 2, 1.

2. See the record “The Music of Tibet.” recorded by Peter Crossley‑Holland, Anthology Records (30133)

    AST 4005, New York, 1970.

3. See the record “The tail of the Tiger.” Ananda 2.

4. An example is the electronic music composition entitled “Ve nguon” (Return to the Source), composed  

    by Nquyen Van Tuong, with Tran Quang Hai as soloist. The first performance was given in France in

    1975.  The third movement (25 minutes) uses biphonic singing.



CHEN-GIA TSAI : Kargyraa and meditation, TAIWAN


Kargyraa and meditation : Chen-Gia Tsai

Pipe model of a Kargyraa singer’s vocal tract

The melody pitch f1 (the centre frequency of the first formant) in Kargyraa voices is determined by the mouth opening. A perturbation method predicts the resonance shift caused by a bore enlargement at a position x0 of a pipe with an irregular geometry (e.g., Fletcher & Rossing 1991). During a performance of Kargyraa, the bore diameter of the vocal tract changes at the lips, a pressure node for all modes. Hence, an enlargement of mouth opening leads to an increase in the centre frequencies of the first andsecond formants (Tsai 2001).



Figure: (a) Spectrogram of a Kargyraa song “the far side of a dry riverbed” (b) and (c) are two snapshot spectra of (a). They show f2=2f1.

This pipe model does not predict (1) the small bandwidth of the first and second formants, and (2) “mode-locking” f2=2f1. I hypothesize that periodic vorticity bursts at the diffuser-like supraglottal structures are responsible for producing the strong components at f1 and 2f1.

Subharmonic generation

In Kargyraa, there is a nonlinear coupling between the two pairs of the vocal folds, which can lead to either entrainments or chaos. While 1:2 entrainment can produce beautiful voices of Kargyraa, pathological voices with the involvement of chaotic vibration of the ventricular folds have a hoarse quality (ventricular dysphonia).

Based on recordings of high-speed images of the laryngeal movement, Lindestad and colleagues (2001) reported that during Kargyraa singing the ventricular folds vibrated with complete but short closures at half the frequency of the true vocal folds, thus contributing to subharmonic generation.

Autonomic functions

It seems that stiffness of the ventricular folds cannot be manipulated by will, because they contain very few muscle fibres. However, the constantly increased ventricular function and repetitive closure may lead to new functional and anatomical changes in the interior of the larynx (such as ventricular hypertrophy) and, possibly, to a new system of innervation.

On the other hand, evidence of psycho emotional, cerebella or midbrain (e.g., Parkinsonism) types of ventricular dysphonia suggests sub-cortical influences of the ventricular folds.

It is interesting to note that Tibetan monks do not practice their vocalization. They improve the control of the ventricular folds through meditation! Meditation is a conscious mental process that induces a set of integrated physiologic changes termed the relaxation response. The elastic property of the ventricular folds may be affected by meditation through autonomic functions. They become so relaxed that they vibrate with complete closures at half the frequency of the true vocal folds. In contrast, emotional stress can lead to adduction and vibration of the stiff ventricular folds with incomplete closures. Because lower subharmonics are weak in such melancholic voices, they sound rough.

Tibetan monks stated repeatedly that while singing overtones one should always make a special effort to attune heart and mind to the meaning of the holy moment (Smith and Stevens 1967).

An overtone singer and researcher related the psychological mechanism underlying overtone singing during meditation to “a higher sound awareness”: When we meditate by way of singing the need to make pleasant or even beautiful sounds moves to the background. It is not the singing that decides whether we enter a truly meditative state of mind. More important is that we listen to ourselves that we search for the voice inside. We are not concerned with personal judgments about our voice or with the personality in our voice. Singing harmonics automatically focuses the mind more than most other types of singing, because we essentially sing just one tone and listen to its internal dynamics. Overtones demand from us a higher than normal sounds awareness. They fulfil a service in certain spiritual traditions and have a built-in symbolic association with ‘thing high’. They have the exceptional ability to unite voices to the highest degree and a tendency to unify the body and the mind. (van Tongeren 2002:207)

It is my hypothesis that overtone singing focuses the mind automatically on the weak pitch of the prominent nth harmonic. This form of meditation is designed to lead one to a subjective experience of absorption with the object of focus. From a viewpoint of neuroscience it seems appropriate that a model for this kind of meditation begins with activation of the prefrontal cortex and the cingulated gyrus. Brain imaging studies have suggested that tasks requiring sustained attention are initiated via activity in the prefrontal cortex, particularly in the right hemisphere, and the cingulated gyrus appears to be involved in focusing attention. In an excellent review paper on the neural basis of meditation, Newberg and Iversen (2003) proposed a neurophysiological network possibly underlying meditative states. They discussed the prefrontal cortex effects on thalamic activation, posterior superior parietal lobule deafferentation, hippocampal and amygdalar activation, hypothalamic and autonomic nervous system changes, autonomic-cortical activity, and neurotransmitter activity. Although their model may provide a general framework for studying the neural basis of meditation, it should be noted that there are categories and subcategories of meditation that may be associated with different neural activity. For example, overtone singing by Tibetan monks belongs to the meditation category in which the subjects focus their attention on a particular object. When the object is the melody composed of overtones, the mental task and thus neural activity may differ from the meditation technique that focuses the mind on an image, phrase, or word, because of the involvement of supraglottal structures.

Nitric oxide mechanisms


Nonadrenergic, noncholinergic (NANC) nerves, which cause relaxation of airway smooth muscle, have been described in several species including man. Nitric oxide appears to account for all the NANC response in human central and peripheral airways in vitro. A recent review on meditation stressed the importance of the involvement of nitric oxide during meditation (Esch et al. 2004, see also Kim et al. 2005). Based on these findings I propose a model for Tibetan overtone chanting:

The loop underlying Tibetan overtone chanting can be described as: (1) a monk adducts and relaxes the ventricular folds; (2) he sings overtones; (3) he focuses his mind on the weak pitch of reinforced overtones; (4) this concentration triggers autonomic functions and nitric oxide mechanisms that in turn lead to a relaxation of the smooth muscles in the supraglottal structures.


Andersson K, et. al. (1998) Etiology and treatment of psychogenic voice disorders: results of a follow-up study of thirty patients. J Voice 12: 96-106.

Doersten PG, Izdebski K, Ross JC, Cruz RM. (1992). Ventricular dysphonia: a profile of 40 cases. Laryngoscope 102: 1296-1301.

D’Antonio L, et. al. (1987) Perceptual-physiologic approach to evaluation and treatment of dysphonia. Ann Otol Rhinol Laryngol 96: 187-190.

Esch T, Guarna M, Bianchi E, Zhu W, Stefano GB. (2004) Commonalities in the central nervous system’s involvement with complementary medical therapies: limbic morphinergic processes. Med Sci Monit. 10(6):MS6-17.

Hisa Y, Koike S, Tadaki N, Bamba H, Shogaki K, Uno T. (1999) Neurotransmitters and neuromodulators involved in laryngeal innervation. Ann Otol Rhinol Laryngol Suppl. 178:3-14.

Kim DH, Moon YS, Kim HS, Jung JS, Park HM, Suh HW, Kim YH, Song DK. (2005) Effect of Zen Meditation on serum nitric oxide activity and lipid peroxidation. Prog Neuropsychopharmacol Biol Psychiatry. 2005 Feb;29(2):327-31. Epub 2004 Dec 29. Lazar SW, Bush G, Gollub RL, Fricchione GL, Khalsa G, Benson H. (2000) Functional brain mapping of the relaxation response and meditation. Neuroreport 11(7):1581-5.

Newberg AB, Iversen J. (2003) The neural basis of the complex mental task of meditation: neurotransmitter and neurochemical considerations. Med Hypotheses 61(2):282-91.

van Tongeren, M. (2002) Overtone singing – physics and metaphysics of harmonics in East and West. The Netherlands: Fusica,Amsterdam.

Yuceturk AV, Yilmaz H, Egrilmez M, and Karaca S. (2003) Voice analysis and videolaryngostroboscopy in patients with Parkinson’s disease. Eur Arch Otorhinolaryngol. 2002 259(6):290-3.


Chen-Gia Tsai, Yio-Wha Shau, and Tzu-Yu Hsiao : False vocal fold surface waves during Sygyt singing: A hypothesis, TAIWAN


False vocal fold surface waves during Sygyt singing: A hypothesis

Chen-Gia Tsai, Yio-Wha Shau, and Tzu-Yu Hsiao


Overtone singing is a vocal technique found in Central Asian cultures, by which one singer produces a high pitch of nF0 along with a low drone pitch of F0. The pitch of nF0 arises from a very sharp formant. Current physical modelling of overtone singing asserts that the harmonic at nF0 is emphasized by a resonance of the vocal tract. However, this approach could not explain the extraordinarily small bandwidth of this formant.

This paper offers a hypothesis that surface waves (Rayleigh waves) of the false vocal folds might actively amplify the harmonic at nF0 in a specific technique of overtone singing: Sygyt. We propose a loop for harmonic amplification, which is composed of (1) the vocal tract with resonance nF0, (2) surface waves of the false vocal folds, and (3) a varicose jet separating from the false folds. This model receives indirect support from an experimental study on a novel human vocalization, which is characterized by a prominent component at 4 kHz. During this pure tonal vocalization, false fold surface vibrations were detected by ultrasound colour Doppler imaging. High-frequency false fold surface waves may also occur during Sygyt singing.

1. Introduction

Overtone singing (or throat singing, biphonic singing) is a vocal technique found in Central Asian cultures such as Tuva and Mongolia, by which one singer produces a high pitch of nF0 along with a low drone pitch of F0 (F0 is the fundamental frequency, n = 6, 7, …13 in typical performances). The voice of overtone singing is characterized by a sharp formant centered at nF0, as can be seen in Figs. 1 and 2. Traditional techniques of overtonesinging include Khoomei, Sygyt, Kargyraa and others.

There are two approaches of physical modelling of overtone singing: (1) the double-source theory [1], which asserts the existence of a second sound source that is responsible for the melody pitch; and (2) the resonance theory, which asserts that a harmonic is emphasized by a extreme resonance of the vocal tract. The fact that the melody pitches producible by the singer are limited to the harmonic series of the drone was regarded as robust support of the resonance theory [2].Image

Recent attempts of physical modleling of Sygyt were concerned with calculation of the transfer function of the vocal tract using one-dimensional models, successfully predicting the formant frequency [2,3]. From a theoretical standpoint, however, this approach may not be suitable for the tract with a rapidly flaring bell section. A Sygyt singer raises the tongue so that the tract shape changes abruptly at the narrowing of the tongue (marked with a red dot in Fig. 1b), where the assumption of planar wave fronts breaks down, and evanescent cross-modes can be excited in this flaring section even at low frequencies [4]. This may leads to errors in transfer function calculation using one-dimensional models. An alternative approach of Matched Asymptotic Expansions for modelling a Sygyt singer’s vocal tract was proposed in [5].

In a two-resonator theory, a Sygyt singer’s vocal tract was modelled as a coupled system of a longitudinal resonator that was from the glottis to the narrowing of the tongue, and a Helmholtz resonator that was from the articulation by the tongue to the mouth exit. Experiments showed that for some Sygyt voices with a sharp formant two resonances were matched, while a melody pitch can be perceived even in the case of not exactly matched resonances [6]. Although the formant magnitude was shown to be increased by resonance matching [3], it is unclear whether resonance-matching will reduce the formant bandwidth.

From a psychoacoustic point of view, a small bandwidth of the prominent formant is critical to a clear melody in Sygyt singing. A preliminary study using an autocorrelation model for pitch extraction suggested that the pitch strength of nF0 increased along with the Q value of this formant, with the formant magnitude playing a secondary role [5]. The spectrum of the Sygyt voice shown in Fig. 1a has the 12th harmonic approximately 15 dB stronger than its flanking components. If the amplification of this harmonic cannot be explained in terms of vocal tract impedance, it should be attributed to the source signal.Image

The insufficiency of the resonance theory is even more notable in another technique of overtone singing: Kargyraa. A

Kargyraa singer uses his false vocal folds to produce low pitched drone, manipulating his mouth opening to change the vocal tract resonance. Spectra in Fig. 2 show that the centre frequencies of the first and second formants of Kargyraa voices always stand in the ratio of 1:2. This strange phenomenon suggests an unknown glottal source that produces the outstanding component at F1, and its second harmonic.

The goal of this study is to offer a physical model based on a nonlinear loop that explains the harmonic amplification in

Sygyt. This model asserts that surface waves (Rayleigh waves) of the adducted false vocal folds can actively amplify a harmonic. We first discuss the interactions between the false vocal fold surface waves (FVFSWs), the glottal flow and acoustic waves. A preliminary experiment that provided indirect evidence of this model is then addressed.

2. Theory

2.1. Rayleigh surface waves

The Rayleigh surface wave is a specific superposition of a transverse wave and a longitudinal wave of an elastic solid (see, e.g. [7]). Its amplitude is significant only near the surface and attenuates exponentially with the depth. The trajectories of material particles are ellipses. At the surface the normal displacement is about 1.5 times the tangential displacement. The velocity of Rayleigh waves, independent on the wavelength, is about 0.9 times the transverse wave velocity. Rayleigh’s theory of surface waves has been generalized to viscoelastic solids (see, e.g. [8]).

The assumption of Rayleigh surface wave on the false vocal folds is supported, although indirectly, by recent measurements of the medial surface dynamics of the vocal folds [9]. The trajectories of fleshpoints were approximately ellipses, with the length ratio of the two axes varying in the range of 1.5-2.0. This value is in remarkable agreement with Rayleigh’s theory of surface waves.

2.2. Physical modelling of Sygyt

Here we propose a physical model that describes how FVFSWs absorb the energy of the glottal flow and acoustic waves.


The false folds are significantly adducted during Sygyt singing. Hence, the volume flow through them (UF) is sensitive to FVFSWs. FVFSWs are supposed to be triggered by the acoustic pressure, which is predominated by the resonance of the vocal tract nF0. So we assume a FVFSW with the frequency of nF0.

Based on the assumption of elliptic movements of fleshpoints on the false folds, snapshots of this wave can be obtained. The ellipses in Figs. 3b and 3c represent the trajectory of fleshpoints. We estimate the energy exchange between the flow and the tissue occurs at one point. In Fig. 3b the work done by the viscous flow at this point is positive. In Fig. 3c the flow separates upstream, performing no work (or positive work, if back-flow appears) at this point. It can easily be seen that over a period the FVFSW absorbs energy from the flow in the vicinity of the flow separation point, which moves back and forth at a crest of the FVFSW, modulating the flow through the false folds at frequency of nF0. This induces varicose oscillations of UF, which produce the harmonic at nF0 in the source signal. This harmonic is in turn reinforced by the strong vocal tract resonance at nF0.

The net work done by the sinusoidal acoustic wave with frequency nF0 at a point on the false fold over a period can be positive or negative, depending on the phase relationship between the FVFSW and the acoustic pressure. We suppose that within a half wavelength of the FVFSW in the vicinity of the flow separation point, the FVFSW absorbs the acoustic energy of the harmonic at nF0. Away from this flow separation point, the FVFSW is expected to decay rapidly because of large viscous losses in the tissue during high frequency vibrations. We thus conclude that the total work done by the acoustic wave on the FVFSW is positive.

To sum up, a loop for Sygyt is established in terms of (1) linear resonator: the vocal tract with resonance at nF0, (2) energy source: pressure difference across the false glottis, and (3) nonlinear amplifier: a flow separating from curved walls with mucosal layers receiving acoustic feedback. This self sustained oscillator differs from the true vocal folds in that the false fold mucosa does not vibrate at any intrinsic resonance, but rather respond to the acoustic pressure.

2.3. Discussion

The present model explains the crucial role of the adduction of the false folds in Sygyt technique. Because of this adduction the flow velocity over their mucosal layers is high enough to   supply the energy for sustaining FVFSWs. It is interesting to note that FVFSWs have been observed in patients suffering from ventricular dysphonia [10], although their frequencies appeared to be much lower than those during Sygyt singing.

From an empirical standpoint, learning Sygyt is much more difficult than it is implicated by the resonance theory. In workshops of overtone singing, it has been repeatedly observed that only very few people are able to produce voices with a clear melody pitch. The present model predicts that one cannot sing Sygyt well even when manipulating the tract shape perfectly, because his false folds are not correctly adducted, or their mucosal layers do not have a proper shape, thickness, and viscoelastic properties.

The loop described in our model tends to “unify” the double-source theory and the resonance theory of overtone singing. Whereas the true vocal folds and the vocal tract are, as usual, viewed as the independent source and filter, the false fold mucosa plays a key role in introducing acoustic feedback into the loop for harmonic amplification.

The present model for Sygyt might also shed new light on the production of high-frequency, whistle-like voice type of birds, dolphins, whales, and groaning dogs. In this regard, our model is an updated version of the double-source theory [1], which already drew parallels between the sounding mechanisms of overtone singing and the whistle-like voice type, which is produced with the false folds adducted.

It is interesting to compare the harmonic-amplification loop with the sounding mechanism of flute-type instruments, which is based on a loop composed of a vibrating jet and acoustic waves filtered by a resonator. In the case of flutes the jet separates from the musician’s lips, travelling along the mouth of the resonator towards a sharp edge. When the instrument produces a tone, the jet oscillates at one of the resonances of the pipe. The acoustic flow field near the flow separation point excites sinuous oscillations of the jet. At the sharp edge, the jet is directed alternately toward the inside and the outside of the resonator. This pulsing injection induces an equivalent pressure difference across the mouth that excites and maintains acoustic waves in the pipe [11]. The jet, like the false fold mucosa, does not vibrate at any intrinsic resonance. It should be noted that the acoustic flow induces sinuous oscillations of the jet at the mouth hole of a flute, whereas the acoustic pressure excites FVFSWs that induce varicose oscillations of the glottal flow.

While a varicose jet is essential for whistle-like sound production, the role of wall vibration is not fully understood. It has been suggested that the sounding mechanism of human whistling is a loop composed of the jet and the oral cavity with a prominent resonance. The pressure fluctuations due to the acoustic wave at the flow separation point could induce varicose oscillations of the jet without any wall vibration. This model is in an interesting contrast to our model of Sygyt, which assumes vibrations of the compliant walls. To examine the assumption of FVFSWs in our model of Sygyt, we measure surface vibrations during whistle-like singing in vivo.

3. Experimental Study

3.1. Whistle-like voice type


The present model of “varicose jet oscillations induced by surface waves of curved walls in the vicinity of the flow separation point” may provide insight into the production of the whistle-like voice type in birds and mammals. It has been suggested that the production mechanism of bird whistled song might be related to a retraction of the syringeal membranes while in oscillation so that they no longer completely close, leading to a great reduction in the harmonic content of the flow. An alternative explanation of whistled song is that it is produced by pure aerodynamic means without any vibrating surfaces [12]. However, recent experimental studies favour the sounding mechanism of vibrating surface [13,14].

After some practice, human can imitate dog’s groaning to produce high-frequency whistle-like voices, which have a prominent component approximately at 4 kHz, as shown in Fig. 4c. We hypothesize that the mechanism underlying this vocalization is a varicose jet induced by FVFSWs.

Medical ultrasound (US) provides an ideal non-invasive method for observing high-frequency surface vibrations with small amplitude, because the vibratory artefact of colour Doppler imaging (CDI) detects surface velocity rather than displacement. In previous studies, the CDI was used to measure the frequency and the length of the vocal folds during normal phonation [15,16]. In the present experiment we employ this technique to detect FVFSWs during whistle like singing.

3.2. Methods

A commercially available, high resolution US scanner (HDI-5000, ATL, Bothell, WA) with a 5- to 12-MHz linear-array transducer (L12 to 5 38 mm, ATL) was used in this study. The frame rate in B-mode was about 25 Hz. In the colour mode, the pulse-repetition rate was 10,000 Hz and th measuring velocity range was set at 0 to 128.3 cm/s with baseline offset, which resulted in a frame rate of about 7 Hz. TheUS scan head was placed horizontally at the midline of the thyroid cartilage lamina on one side (Fig. 4a). The subject is the first author of this paper, who is a healthy man aged 33 with normal vocal function. For this experiment he had practiced the whistle-like vocalization for a week.

3.3. Results

CDI colour artefacts detected surface vibrations of the right false vocal fold during pure tonal singing (Fig. 4d). During warming up of this vocalization, surface vibrations of the right vocal fold and the false fold were observed (Fig. 4b).

The frequency of pure tonal singing was found to range from 3.7 kHz to 4.6 kHz. Out of this range the voice lose the pure tonal characteristic, with breathy noises accumulating at the prominent resonance.

4. Concluding Remarks

The observation of false fold surface vibrations during pure tonal singing provides indirect support of our model for Sygyt. As FVFSWs may generate 4 kHz pure tonal voices with the second harmonic 30 dB (or more) weaker than the fundamental, it should be possible that a Sygyt singer amplifies a selected harmonic of the voice produced by the true vocal folds through FVFSWs.

The role of acoustic feedback in FVFSW generation is not fully understood. When the acoustic wave filtered by the resonator is strong enough to trigger FVFSWs, a loop for pure tonal vocalization may be established. If not, periodic FVFSWs may not occur. The laryngeal ventricle may be the Helmholtz resonator that is responsible for the prominent resonance at 3.7-4.6 kHz. However, this “resonance” model appears against experimental results about bird’s pure tonal vocalization [13,14]. If the frequency of surface waves is not determined by the tract resonance, it should be determined by the tissue curvature, elastic properties, and the flow speed. In the case of Sygyt singing, however, it has not been reported that a singer manipulates the false folds to change the melody pitch. Further research is needed to compare the sounding mechanisms of Sygyt singing and the pure tonal vocalization.

One implication of our surface wave model is that the vertical motion of fleshpoints on the true/false vocal folds may be critical to their self-sustained oscillation. The two-mass and three-mass models of the vocal folds [17,18] do not take into account the ellipse-like motion of vocal fold fleshpoints, which is consistent with Rayleigh’s theory of surface waves and has been demonstrated in excised canine larynx experiments [9]. We suggest that the vertical motion of fleshpoints near the flow separation point can absorb the kinetic energy of the glottal flow through viscous shear force.

The effect of surface viscous shear stress exerted by a flow also plays a central role in the system of a pair of fluttering flags in wind. This system shows some notable similarities of the glottis. When the inter-flag distance lies in a definite range the flags flutter in an out-of-phase state and generate a pulsating flow, with striking similarities of the vocal fold vibration in the chest register. Flow visualizations showed significant shear stress on the flags exerted by the flow [19]. This finding suggests that viscous shear stress on the vocal fold mucosa should not be ignored, especially in the vocalizations with a large open quotient.

Next to the viscosity effect, the surface shear stress may be attributed to the carrying-along of the varicose flow. It was observed in a pair of flags that the flag wave propagates along with the flow, while the wave of an isolated flag propagates in the direction opposite to the flow. Note that the surface shear stress dominates the system of a pair of flags but not an isolated flag [19]. It is likely that the surface shear stress is due to the effect that a varicose or sinuous flow carries along the flag wave. This approach may shed new light on the mechanism of the self-sustained oscillation of the vocal folds.

5. References

[1] Chernov, B.; and Maslov, V. 1987. Larynx double sound generator. Proc. XI Congress of Phonetic Sciences,

Tallinn 6, 40-43.

[2] Adachi, S.; and Yamada, M. 1999. An acoustical study of sound production in biphonic singing, Xöömij. J. Acoust. Soc. Am. 105(5), 2920-2932.

[3] Kob, M. 2002. Physical modeling of the singing voice. PhD thesis, Aachen University (RWTH).

[4] Pagneux, V.; Amir, N.; and Kergomard, J. 1996. A study of wave propagation in varying cross-section waveguides by modal decomposition. Part I. Theory and validation. J.Acoust. Soc. Am. 100, 2034-2048.

[5] Tsai, C.G. 2004. Physics and perception of overtone singing. URL: http://jia.yogimont.net/overtonesinging/

[6] Kob, M.; and Neuschaefer-Rube, C. 2004. Acoustic properties of the vocal tract resonances during Sygyt singing. Proc. of the International Symposium on Musical Acoustics, Nara, Japan.

[7] Achenbach, J.D. 1984. Wave propagation in elastic solids. Elsevier, New York.

[8] Romeo, M. 2001. Rayleigh waves on a viscoelastic solid half-space. J. Acoust. Soc. Am. 110 (1), 59-67.

[9] Berry, D.A.; Montequin, D.W.; and Tayama, N. 2001. High-speed digital imaging of the medial surface of the vocal folds. J. Acoust. Soc. Am. 110(5), 2539-2547.

[10] Nasri, S.; Jasleen, J.; Gerratt, B.R.; Sercarz, J.A.; Wenokur, R.; and Berke, G.S. 1996. Ventricular dysphonia: a case of false vocal fold mucosal travelling wave. Am. J. Otolaryngol. 17(6), 427-431.

[11] Verge, M.P.; Caussé, R.; Fabre, B.; Hirschberg, A.; Wijnands, A.P.J.; and van Steenbergen, A. 1994. Jet oscillations and jet drive in recorder-like instruments. Acustica 2, 403-419.

[12] Gaunt, A.S.; Gaunt, S.L.L.; and Casey, R.M. 1982. Syringeal mechanics reassessed: evidence from Streptopelia. Auk 99, 474-494.

[13] Brittan-Powell, E.F.; Dooling, R.F.; Larsen, O.N.; and Heaton, J.T. 1997. Mechanisms of vocal production in budgerigars (Melopsittacus undulatus). J. Acoust. Soc.Am. 101, 578-589.

[14] Ballintijn, M.R.; and Cate, C.T. 1998. Sound production in the collared dove: a test of the ‘whistle’ hypothesis. J

Experimental Biology 201, 1637-1649.

[15] Shau, Y.W.; Wang, C.L.; Hsieh, F.J.; and Hsiao, T.Y.

2001. Noninvasive assessment of vocal fold mucosal wave velocity using color Doppler imaging. Ultrasound

Med. Biol. 27, 1451-1460.

[16] Hsiao, T.Y.; Wang, C.L.; Chen, C.N.; Hsieh, F.J.; and Shau, Y.W. 2002. Elasticity of human vocal folds

measured in vivo using color Doppler imaging. Ultrasound Med. Biol. 28, 1145-1152.

[17] Ishizaka, K.; and Flanagan, J.L. 1972. Synthesis of voiced sounds from a two-mass model of the vocal cords.

Bell Syst. Tech. J. 51(6), 1233-1268.

[18] Story, B.H.; and Titze, I.R. 1995. Voice simulation with a body cover model of the vocal folds. J. Acoust. Soc. Am.97, 1249-1260.

[19] Zhang, J.; Childress, S.; Libchaber, A.; and Shelley, M. 2000. Flexible filaments in a flowing soap film as a model for one-dimensional flags in a two-dimensional wind. Nature 408, 835-839.



Physical Modelling of the vocal tract of a Sygyt singer

Chen-Gia Tsai

Source theory vs. Resonance theory

Two types of overtone-singing should be distinguished: Sygyt and Kargyraa. In Sygyt performances, the rising tongue divides the vocal tract into two cavities, which are connected by a narrow channel, whereas the tongue does not rise in Kargyraa performances.

Up until now, two major theories have been proposed on the production of the melody pitch: (1) The ‘double-source’ theory (Chernov & Maslov 1987), which asserts the existence of a second sound source such as a whistle-like mechanism formed by the narrowing of the false vocal folds (ventricular folds) in addition to the true vocal fold vibration; and (2) the ‘resonance’ theory, which asserts that only a glottal sound source exists, but that an upper harmonic is so emphasized by an extreme resonance of the vocal tract that it is segregated from the other components and heard as another pitch. The fact that the melody pitches producible by the singer are limited to the harmonic series of the drone supports the resonance theory (Adachi & Yamada 1999).

Physical modelling of the resonance of the vocal tract of Sygyt singers includes: (1) rear cavity theory, (2) front cavity theory, and (3) resonance-matching theory. The glottal sound source of Sygyt voices is rich in harmonics. This has been attributed to the short open duration of the glottis (Bloothooft et al. 1992, Adachi & Yamada 1999).

Rear cavity theory

Based on vocal tract shape measurements by MRI, Adachi and Yamada (1999) reported that the resonance of the rear cavity, that was, from the glottis to the narrowing of the tongue, produced the sharp formant Fk. The resonance of the front cavity, that was, from the articulation by the tongue to the mouth exit, was not critical to the production of the melody pitch. The length of the rear cavity decreases as fk increases.

Adachi and Yamada (1999) synthesized tones from transfer functions calculated with and without the front cavity, finding that the front cavity did not affect the formant frequencies, although the magnitude of Fk decreased due to the lack of the front cavity resonance. It is important to note that Adachi and Yamada calculated the transfer functions of a Sygyt singer’s vocal tract using a one-dimensional model, in which the tract shape was approximated as a succession of cones. While such models are widely used in speech research, I argue that the change in the tract shape at the articulation point is so abrupt that the assumption of planar-wave fronts clearly breaks down. Theoretically, one-dimensional models are unsuitable for a Sygyt singer’s vocal tract.

In practice, the rear cavity theory is not supported by a non-traditional technique of overtone-singing used by Tran Quang Hai, who calls it ‘one-cavity technique’ because the tongue does not rise to divide the vocal tract into two cavities. However, there is an articulation point at the soft palate, as to pronounce the velar /ng/. The melody of fk is produced by manipulating the opening of the front cavity, while the rear cavity, that is, from the glottis to the soft palate, remains unchanged. This technique suggests that the front cavity may be more important for the production of fk.

Front cavity theory

Based on preliminary impedance measurements of vocal tract by a Jew’s harp, Tsai (2001) reported that the resonance of the front cavity determined fk. The author modelled the front cavity as a Helmholtz resonator driven by a flow source U1 at the articulation point. The transfer function can be calculated according to Eq. (6.65) in [Fletcher & Rossing 1991].

Owing to the tract shape at the articulation point, the flow U1 is presumed to be incompressible. It is known that in regions of fast change in pipe geometry, such as a tone hole or the pipe termination, the Helmholtz number He<<1 implies that the wave equation can locally be approximated by the Laplace equation, which describes an incompressible potential flow (Hirschberg & Kergomard 1995). In overtone-singing, the acoustic flow at the articulation point is therefore incompressible (compact region). This is not true for normal phonations.

The front cavity theory failed to explain the small bandwidth of Fk. Fig. 2 compares the matched theoretical spectral envelops and recorded spectra of a Sygyt voice and a Jew’s harp tone, which were produced by me with the same front cavity. It can be seen that the Fk bandwidth of the voice is smaller than that of the Jew’s harp tone. The latter was produced without the rear cavity because the rising tongue completely closed the channel between the front and the rear cavities. This discrepancy suggests that the rear cavity may play a role in sharpening Fk.

Figure 2: Spectra of a Sygyt voice (left) and a Jew’s harp tone (right) produced with the same front cavity.

Resonance-matching theory

The resonance-matching theory takes into account the contributions of both the front and the rear cavities, whose resonances are more or less matched to produce a sharp Fk. Kob (2002), reported that an improvement of the second resonance by about 15 dB was achieved by matching two resonance frequencies, which was fulfilled by manipulating the mouth opening. Although this theory appears to ‘unified’ the theories of rear/front cavity, it should be noted that according to Table 6.1 in [Kob 2002], the resonance of the front cavity was just close to the second resonance of the rear cavity; Fk could be sharp enough for pitch production without an exact resonance-matching.


Kob (2002) calculated the transfer functions of a Sygyt singer’s vocal tract using an improved method of continuous-time interpolated multiconvolution (Barjau et al. 1999), which was originally developed to calculate the impulse response of wind instruments with tone-hole discontinuities. However, this approach does not predict the flow field at the articulation point. Fig. 3 displays the shape of a Sygyt singer’s vocal tract and the potential field at the articulation point. As can be seen from the isobar (equal-potential) lines, the acoustic flow has a higher velocity near the tongue. This contradicts the assumption of planar-wave fronts in Kob’s calculation.

Figure 3: Shape of a Sygyt singer’s vocal tract (left) and the isobar lines at the articulation point (right).

The limitations of one-dimensional models of the vocal tract or the bore of wind instruments should be borne in mind: even at low frequencies evanescent cross-modes will be excited in the rapidly flaring bell section because of strong mode coupling (e.g., Pagneux et al. 1996). In a Sygyt singer’s vocal tract, one-dimensional models are suitable only for the rear cavity.

The vocal tract sould be divided into four regions, in which the wave equations have different forms for approximation. In light of Matched Asymptotic Expansions, the global solution can be obtained by ‘gluing’ the local solutions together (Hirschberg & Kergomardh 1995). The four regions are (1) the rear cavity, (2) the compact region at the articulation point, (3) the front cavity as a Helmholtz resonator, and (4) the compact region at the mouth opening. The rear cavity is approximated as a succession of cones, where the acoustic field is governed by the Webster equation for He<<1. At the articulation point and at the mouth opening, the incompressible air is approximated as a piston. The front cavity is a Helmholtz resonator with a short neck.

If the transfer function of a Sygyt singer’s vocal tract does not predict the small bandwidth of the second formant, one should consider the possible effect of acoustic feedback to the glottal source (Levin and Edgerton 1999). This may be related to the nonlinear effect of the adducted ventricular folds.

CHEN-GIA TSAI : Physical Modelling of the vocal tract of a Sygyt singer