To test the hypothesis that chromatic scale intervals are specifically embedded in the frequency relationships in voiced speech sounds (i.e., phones whose acoustical structure is characterized by periodic repetition), we analyzed the spectra of different vowel nuclei in neutral speech uttered by adult native speakers of American English, as well as a smaller database of Mandarin.In other words, the 12-note scale isn't so arbitrary after all. Interestingly, there's preference for tuning systems in speech as well:
... [We calculated] the distribution of all F2/F1 ratios derived from the spectra of the 8 different vowels uttered by the 10 English-speaking participants (i.e., the relationships in 1,000 utterances of each of the vowels). Sixty-eight percent of these ratios fall on intervals of the chromatic scale (red bars), and all 12 chromatic intervals are represented over a span of 4 octaves.
In so far as the observations here inform this argument, the observed ratios in speech spectra accord most closely with a just intonation tuning system. Ten of the 12 intervals generated by the analysis of either English or Mandarin vowel spectra are those used in just intonation tuning, whereas 4 of the 12 match the Pythagorean tuning and only 1 of the 12 intervals matches those used in equal temperament. The two anomalies in our data with respect to just intonation concern the minor second and the tritone.That minor-second/tritone anomaly brings up a good chicken-egg question, given that composers who work with more chromatic than diatonic sounds tend not to explore alternate tunings so much: does a preference for crunchy dissonance mean that just intonation sounds "wrong"? Or is it that, in our predominantly equal-temperament world, it's those clashing seconds that sound the most "natural," so that's where the preference comes from? As someone who likes the sound of diatonic music in pure ratios, but opts for equal-tempered dissonance in my own, I'm inclined towards the latter, but I would imagine this is a highly personal impression.
Anyway, turns out Harold Hill was right: singing is just sustained talking.