Third round of Mark Liberman of Language Log’s language quiz. Copies of the three sound files (mp3 format) are here: quiz3_1, quiz3_2, quiz3_1. This one is going to be even harder than the second one, let alone the first one. For the record, I am also referring to this on my blog.
The language sounds quite a bit less exotic than the last one (which turned out to be Somali), “less exotic” in the sense of “closer to home”, ie Europe. Still, nothing really rings a bell.
Listening to the three recordings several times elicits the following observations:
One thing the previous exercises have showed me is that an IPA transcription, in particular a close IPA transcription, does not much help in identifying the language. A transcription that uses an ad-hoc phonetic convention and the Latin alphabet with diacritics helps seeing morphemes, and allows to spell allophones the same way, even if there is an audible difference in actual speech. As an example, consider the English sentence “Who are you going to send this to?” The first to would have to be transcribed [tə] (weak u), the second [tu] (full u). Using the same transcription for both helps to identify this bit as a word of English. Of course, this approach involves some guessing what variations to gloss over and can induce errors.
As my search on the language map will start somewhere east of Italy (see music remark above), I have adopted the following conventions:
With this, I obtain:
To come up with a good transcription, I used Audacity, reducing the speed by 30% and raising the pitch by 6 demi-tones. Very helpful. Audacity (the latest version in the Debian SID repositories) doesn’t like working with arts, the KDE sound daemon, very well. But when the daemon releases the sound device, Audacity is very much useable. A GTK+ version would be the top. I tried Transcriber too. Looks very handy to transcribe speech, but unfortunately kept crashing on my box. So I ended up committing suicide.
This is apparently from a music radio program, geared towards a younger crowd. The repeated passage might be the network’s or program’s slogan or jingle. Ljuba Gatóssova might be the name of the singer of the next song. Simjóna sounds like a name, too.
Based on Ljuba, -ossova (possibly a women’s last name suffix) and the general music style, I first thought of the South and West Slavic languages, starting with Slovenian and then checking Sorbian. (It’s obviously not Russian or Czech; Polish sounds quite different, too. Besides, any of those would have been too simple.) However, these just don’t fit. First of all, the stressed [ɨ] sound doesn’t seem to exist in these, and they have a higher frequency of [ʃ], [ko], [ski] and other sounds and combinations.
What words or morphemes can we identify?
The language we are looking for, be it in Easter Europe or further east (Caucasus), is not unlikely to use the Cyrillic alphabet. I attempted to transcribe these “little words” the best I could, and sent the result through Google. Excluding Russian, this leaves us with Bulgarian as a likely candidate.
I then tried to find the words in online dictionaries. The really very idiosyncratic Webster's The Rosetta Edition and this more user-friendly but very Bulgarian dictionary helped. I found an astonishing number of words and possible morphemes:
aз (as) – self; пъти (pyti) – time(s), as in many times; зъл (syl) – bad; див (in divydinén?) – dark, harsh, barbarous; кир (kīr) – dirt, filth, grime; вас (vas) – you; ние (in nië-xé?) – we; ние ще (nië če) – we will; меля (melja) – grind, mill, chatter, nag; житни (žitni) – wheat; чувам (čuvam) – hear, lend an ear; толеранс (tolerans) – tolerance, limit; чар (čar) – charm; ухо (uxo, in wónna?) – ear.
The presence of words from the semantic field of hearing, and personal pronouns should have clinched the case for Bulgarian. Still, there are problems. There ought to be a numeral or quantifier in front of as, for example. Elusive words are dīr, baljú-baljó-paljú-paljó. And then there’s another problem: It just doesn’t sound right. There’s quite a bit of internet radio in Bulgarian around, and while some recordings came close, most a had much more “Russian-sounding” quality. For example, the following sounds and combinations are common in Bulgarian, but absent from the recordings: [nja] [vnjo] [od] [ska] [ʃta] [dᴣnju] [dᴣja] [ᴣda] . The same is true for Macedonian, and even more for the East Slavic languages in the west of Russia.
Where does this leave us?
For all of them, there’s quite a bit of on-demand online radio available. I even installed RealPlayer 10 for Linux (with Opera, the only totally non-free piece of software I have). The US-sponsored Radio Free Europe and the Australian Special Broadcasting Service are good sources for audio material from Eastern Europe (I’m boldly annexing the Caucasus here). (The BBC has foreign language services, too, but they use an obscure “pnm” protocol, which my software can’t resolve; most local stations have chosen it, too, unfortunately.)
Albanian has some similarities. Not enough to be really convincing, though. But I am unable to find (here, for example) any of the words above identified, except as.
Armenian sounds different. Owing to the Armenian alphabet, I can’t do more detailled comparisons. (No time to learn another one, pretty as it is.)
This quiz is definitely more difficult than the last one. I am quite thoroughly in the woods. So here is what I did to come to some sort of conclusion: I listened to radio broadcasts in the following languages: Georgian, Afghan – definitely no. Turkmen, Kyrgyz, Kazakh, Tatar – a bit better, but far from convincing to my ear. Uzbek – yet better, has the y-sound. By far the best fit, acoustically: Kurdish. The broadcast I used is from this Swedish radio station. Kurdish has words that end in -in, -ižin, -em (stessed on the last syllable), all the sounds we need, including [x], three letter words with a ī-sound in the middle. I can’t find an online dictionary to further check this hypothesis. But there it is: I opt for Kurdish (non-Slavic, non-Turkic, but there we are). No idea which dialect, though. And the Slavic elements, ie the, presumptive, names of the singers, remain unexplained.
A way better idea of what Eastern European, Caucasian and Turkic languages sound like. And the ability to read near-fluently in the Cyrillic alphabet. For your amusement, here a list of Bulgarian words. Just read them out loud:
Велосипед автобус автостоп дифтонг дисквалификация диуретичен дякон байонет баобаб джудо
The veil was finally lifted: the recording is in Yakut. Yakut is a Turkic language that is spoken by about 365,000 speakers mainly in the Sakha Republic/Yakutia, ie in Siberia.
That we had to deal with a Turkic language was no surprise at all. Language Hat and caelestis at sauvage noble had come to the same conclusion, but were, like me, unable (this is what I conclude from what they wrote online) to decide which Turkic language this was. My desperate shot in the dark (Kurdish), after I excluded all available Turkic languages by ear, shouldn’t be taken as anything else.
Stefano Taschini’s “everyone could have found it” explanation (referenced above) hinges on not missing a number of subtle points and at the same time possessing the necessary resources:
Given the dearth of online resources in Yakut, though, deciding by ear wasn’t a viable strategy.