Mark Liberman at Language Log has launched a new challenge: transcribe (broadly) three sentences from an unknown language, and identify it. The .wav files (which I have taken the liberty to copy) are here: first sentence, second sentence, third sentence.
Listening to the files, my first reaction is that I haven’t the foggiest. I’m embarrassed to say that my auditive memory is bad and I don’t easily recognize sounds and languages other than romance and germanic ones, and maybe a few spoken around me. It’s not Arabic, at least not the variety spoken in Paris, France. :)
The only destinctive thing I connect this language with is a documentary I recently saw about women in Tajikistan. But the recollection is so vague that it could have simply been triggered by intonational similarities.
So let’s attack the problem scientifically.
The mystery language doesn’t have a huge number of different sounds. In particular the number consonants is quite low. There are several velar/uvular/pharyngeal fricatives and stops, but no [f], and only once something like a very soft [ʃ] or [ʂ]. On the other hand, we can hear a lot of diphthongs, and the vowels in general are relatively tense.
It is not easy to decide if the i sound in the diphthongs should be transcribed as [i] or [ɪ]: it is mostly quite tense, but obviously less so in the [ɔi] combination. For consistency and ease, I will use [i] in all these cases.
Other transcription problems:
Here is a list of the sounds I hear:
At this point, the result of a google search for languages with a lot of diphthongs and such characteristics is that we are dealing with Proto-Indo-European. Google isn’t infallible, especially when the search terms are vague.
So much for IPA. Identifying this specimen might be simpler, though, if we opted for a laxer convention, using the latin alphabet and not too many diacritics (Mark Liberman wanted a “broad transcription” anyway). Letters map to sounds the following way:
With these spelling conventions, the transcription reads:
We find that words end either in a vowel/diphthong or in n (can’t be sure it’s not m in some places), d, s and maybe p. Of course, not knowing the language, it is impossible to be sure where the words actually end, so this point must be taken with a grain of salt.
Which words or potential morphemes are present more than once?
Conclusion? Hm.
Mark Liberman provided a hint: the recording had something to do with a recent Eurpean event. Now, did anything recently happen in Europe? Apart from a politically motivated murder and some developments about power in the EU, I’m coming up blank. My brother, who was quite taken with the task :), was first convinced this is about Mr Arafat falling ill. So I listened through some samples from U Heidelberg's Semitic Sound Archive, but am still convinced that there is no way this is Arabic, and Aramaic sounds different, too.
Looking at languages in eastern Europe (those I least know how to place) isn’t conclusive either. Still looking into Indo-European languages, I had some hopes for Lithuanian and Latvian, but they lack the [x] and [h] sounds, or at least they are not as frequent; they also have a lot more sibilants (like [ʒ], which is absent from our samples) and too many words that end in -as. There is a page in the educational part of the BBC site that has small sound snippets from the major European lanugages. It’s none of those.
Now we could check phonetic inventaries for lanugages in the Middle East (back to Tajik, Farsi and other Indo-Iranian languages). There are vague similarities, but still too many sibilants. here is a text sample in Tajik, in Latin transcription. Hum. Kurdish? I think I’m quite on the wrong track.
Last attempt: If (which is not very likely) this language uses the Latin alphabet then Google could help. I searched for “guriga” and found that this word exists in Hungarian (which this language isn’t) and in [drumroll] Somali. For example, this patient information page of a hospital proposes a Somali version. Guriga means home by the way.
All right, so what’s Somali like? The best description (that I could find on the web) of its phonological inventory comes from none other than Prof. Liberman (the plot thickens).
Reading the Children’s hospital page, and more googling gives us:
Frankly, I’m still not quite convinced. Somali shouldn’t have voiced-s ([z]), but there are a lot of occurences in the samples. This could be explained away, of course: it might not be a phoneme but just an allophone of s, or a feature of this particular dialece. Still, Somali should have [ʒ] and [f], which our samples don’t. A recording presumably in Somali is here. Quite similar … I’m back to my bad auditive memory. My brother finds it similar enough to be convinced, maybe I ought to listen to him.
Then there is the “European Event” hint, which I take to mean that a speaker of the language has a particular occasion right now to make a statement on this topic (rather than talking about, say, nuclear physics or the weather forecast for Mogadishu). Well, the EU recently said they would provide help to the Somali people, caught in an awful situation for some time now, but that’s hardly an “European event”, is it?
If the mystery language is Somali, the word home coming up twice in three sentences could point to either refugee questions, or the question whether there should be an exile government (Britain was against it).
Time is running out, otherwise I’d have nipped down to the laundromat with my mp3 player and asked a few of the people there (my neighbourhood has several refugee homes); I have to do my laundry anyway this weekend.
Let’s just wait for Mark Liberman’s verdict.
So it was indeed Somali. As I documented on my blog (and not here, since I didn’t want to edit this page before the solution was published), I understood the oblique reference to an European event in the end. The assassination of Theo van Gogh, a Dutch film maker, was linked to threats against Ayaan Hirsi Ali, his colleague in the making of the film Submission, a Somali refugee and now a member of parliament.
Here, for the record, the three sentences in standard Somali spelling:
I believe that mine isn’t so bad, even though it is a bit overtranscribed (“índzumaxká” instead of in dumarka). The aural a/o distinction needs fine-tuning, but confounding d/ds/z/s/dz/ts, b/v/p and k/x/q/c is probably difficult to avoid. What I’m surprised about is the l/s confusion.
When working on and solving the first of Mark Liberman’s language quizzes, I was convinced that for a non-European and/or non-Indoeuropean language the problem would be too hard. It would indeed have been were Somali not written in the Latin alphabet. Google came into it at a crucial point, like it did for Jonathan Mayhew (see Mark Liberman’s solution details), whose method was rather more whimsical, though.
Thus, if the language employs a common enough alphabet, such puzzles are solvable. Next time, we have to expect one that has an ideographic writing system, or none at all.
Interesting, too, that the Google method allows us to actually understand a word, guriga from an obscure bit of language. And had I transcribed in dumarka correctly, I would even have found out the topic of the sample (domestic violence), as it becomes clear by clicking on the Google link above.
To learn about Somali, Prof. Liberman’s pages from his 1998 field linguistics course are excellent. So, yes -k(g)a is a morpheme (definite article), and so are some of the -o endings (plural) and, I believe, -yihiin and -tahay. There is also a page of proverbs, accessible by clicking on morphological exercise.
The proverb that sparked the entire quiz, is the following:
Naag ha kaga jirto guri ama god.
(A woman’s place is in the home or in the grave.)
(Remember, the base of all this is about violence against women and those men who are on their side.) Strangely, or interesginly, the word for woman here, naag, is different from the one in the samples, dumar. A distinction between the generic and the particular?
Oh, and Somali has at least 46 words for camel, even if “Eskimo” doesn’t have a mind-boggling number of words for snow. (But then, vistiting exactly two web pages gives me about 155 English terms for dog, and don’t get me started on car.)