Where do languages come from?
Almost half the world speaks languages descended from a lost language known as Proto-Indo-European,
A few weeks ago, in my last etymology-related letter, I made reference to a language known as Proto-Indo-European (PIE). Perhaps I should have also mentioned that there is no direct evidence (i.e., written records) for any such language. Proto-Indo-European is a “reconstructed” language. That is to say, it’s a theory; but it’s a well-constructed theory. If you’re interested in the history of language, you should find the story of Proto-Indo-European to be fascinating.
Disclaimer: My account below is high-level and not very technical. There are two reasons for this:
I don’t want to bore anybody with too much detail, and (more to the point),
I lack the expertise to get into many technicalities anyway.
If you want more detail, you might start with the Wikipedia article on PIE and follow the links from there.
Language Families—and Extended Language Families
You’re already familiar with the idea of language families. In Western Europe, the big families are the Romance Languages and the Germanic Languages. The Romance languages descend from the Vulgar Latin spoken by Roman soldiers posted throughout the Roman Empire. In different localities the language evolved differently, but it’s not hard to see (and hear) how Spanish, French, Italian, Portuguese, and Romanian derive from a common source. Latin canis (“dog”) gives us cane (Italian), câine (Romanian), chien (French). (Spanish is an outlier, with perro.)
To the north (mostly) of the Romance Languages developed the Germanic languages, which include English,1 German, Dutch, Swedish, Norwegian, Danish, Icelandic, and others. They grew from a now-extinct Proto-Germanic language that diverged as its speakers migrated around Europe. So Germanic words for “dog” or “hound” include hund (German, Swedish, Norwegian, Danish), hond (Dutch), and hundur (Icelandic).
So we aren’t surprised when neighboring people-groups speak similar languages. Nor are we surprised when languages that developed thousands of miles apart are completely different. Chinese is nothing like Italian; Arabic is nothing like Dutch; Xhosa is nothing like Sanskrit.
However, as the people of Europe came into contact with the people of India, they were surprised at some of the similarities between Sanskrit on the one hand and Greek and Latin (and Latin’s descendant languages) on the other. Nobody would mistake Sanskrit for a Romance language, but the lexical parallels between Latin (and Greek) and Sanskrit seem too numerous to be coincidental.
Here are some examples (In each row, the first word is transliterated Sanskrit, the second is transliterated Greek, and the third is Latin):
manas - menos - mens (mind)
naman - onoma - nomen (name)
nau - naus - navis (boat or ship)
pad - pous - ped (foot)
str - aster - astra (star)
At least as far back as the sixteenth century, travelers on the Silk Road noted these similarities between European languages and Indo-Iranian languages. In 1786, a British philologist named William Jones published an article making the case that Sanskrit, Greek, Latin, Gothic, the Celtic languages, and Old Persian all derived from the same source language. In the centuries since, scholars have added the Balto-Slavic and Germanic language families to the extended family of “Indo-European” languages. So then, the huge majority of the languages spoken in Europe, as well as much of West- and South-Asia, are Indo-European languages. Here’s a map of the reach of these languages in Europe and Asia (not counting the spread to other continents via colonialism):
To clarify, the Indo-European languages are extant languages that people all over the world are speaking and writing. This letter is written in the Indo-European language known as English. Proto-Indo-European, on the other hand, is the long-dead ancestor of all these extant languages (as well as some extinct languages). Everything we know about this language is speculative, working back from words from different languages that are believed to be related to one another.
In my letter from three weeks ago I mentioned the Proto-Indo-European root *dghem-, meaning “earth.” (Note the asterisk: when you see that, you know that a word has been reconstructed. I mistakenly left off the asterisk last time.) Nobody has ever found an ancient manuscript or carving of the word *dghem-. I don’t know who decides what the Proto-Indo-European root words are, or how they should be spelled (and I suspect there is plenty of disagreement on these points), but in the most simplistic terms, “they” look at related words like gumo, gomo, and gumi (from Germanic languages), homo, uomo, and hombre (from Romantic languages) zmuo (from Old Lithuanian), smoy (from Old Prussian) plus who knows how many other words from who knows how many other languages, and figure out some commonality, taking into account various sound shifts that are far beyond my understanding, much less my ability to explain.
Grimm’s Law and Other Sound Shifts
The most famous of the aforementioned sound shifts is described in Grimm’s Law, put forth by Jacob Grimm in 1822.2 (You may know him as one of the Brothers Grimm. Besides being a linguist, Grimm was also a collector of folk tales and fairy tales with his brother Wilhelm.) According to Grimm’s Law, in the Proto-Germanic language, “stop consonants” underwent a shift that didn’t occur in other Indo-European languages. When you account for that shift, you begin to see connections between Germanic and other Indo-European languages that weren’t obvious before.
I’m going to try not to get into the weeds here. There is a whole chain of consonant changes that makes sense if you really dig into it and pay attention to the difference between voiced stops and unvoiced stops and voiced fricatives and unvoiced fricatives. But I’m just going to give you some examples that help show how Germanic words, which seem so different from Latinate words, actually do have a place in the same (extended) family tree as Spanish, French, and Italian.
p —> f
Words that start with p in other Indo-European languages often start with f in Germanic languages. pisces—>fish, pater—>father, ped —> fot (foot), pyro- —>fire,
t —> th
In pater—>father, the p—> f shift explains the first consonant. The t —>th shift explains the second consonant. This shift also explains the connection between thou/thee in English and the tu second-person pronouns in Romance languages.
k —> h
This shift accounts for some of the more unexpected connections between Germanic and Latin and Greek words. It’s how we get from canis/cane/chien to hund/hond/hound. Greek kardia and Latin cor correspond to herz in German and heart in English. And did you ever wonder what the connection between cannabis and hemp might be? Change that initial /k/ sound to an /h/ sound, then switch the b to a p (another sound change in the Grimm chain), and you can hear the connection.
kw —> hw
This one explains why so many of our wh- words (who what when where why) have qu- equivalents in Latin, Spanish, and French (quid, quare, qui, que, etc.)
d —> t
Greek deka, Latin decem correspond to English ten. Latin duo, Spanish dos, French deux correspond to English two. This shift also helps explain the connection between tooth and the Latin dens/dentis. The initial d becomes a t, and, in another shift in Grimm’s chain, the t in dentis becomes th.
As I suggested above, this is a simplification of Grimm’s Law. Mostly I wanted to give you an idea of the kinds of phonological shifts that a) drive linguistic change and b) disguise some of the connections between words in different languages. Hemp and cannabis, for instance, seem like two unrelated words the same thing until you apply the principles of Grimm’s Law.
Grimm’s Law describes only one of many phonological shifts that gave rise to the huge variety within the Indo-European extended language family. I’m sure linguists have stated similar “laws” for other languages. (And there’s also Verner’s Law, which explains some of the exceptions to Grimm’s Law in the Germanic languages.)
Who were the Proto-Indo-Europeans?
If there was a Proto-Indo-European language, there must have been Proto-Indo-European speakers. But given the fact that the language itself is a theory, its speakers are a theory about a theory. But from looking at the map above you can see why most theories place the Proto-Indo-Europeans on the steppes of Central Asia.
For the most part, further speculation about PIE culture is based on what words are shared and not shared by Indo-European languages. For instance, scholars look at what trees and animals are in the PIE lexicon and make inferences therefrom. I’m having trouble putting my hand to the specifics (and I’m running out of time), but it will be this kind of thing (which I am making up, so don’t do any historical speculation on the basis of this evidence): Indo-European languages seem to share a root word for oak tree, but the word for birch tree is completely unrelated. That tells us that the Proto-Indo-European people lived somewhere that has oak trees but not birch trees. (Their descendants would have encountered birch trees only after they migrated away from the original homeland.) Then somebody notices that there’s a shared word for bear but not for wolf. So you go looking for some area of Central Asia that has oak trees and bears but not birch trees and wolves. But then you have to take into account that the range of oak trees and wolves changes through history, depending on climate. Oh, and as it turns out, in some languages wolf is a taboo word, kind of a “he-who-must-not-be-named” situation, so that makes you feel less confident in your assessment that the Proto-Indo-Europeans didn’t know about wolves. It’s possible that they said so little about wolves that their word got replaced by new words among their descendants.
Along the same lines, you’ll see people say things like this: “The Proto-Indo-Europeans had a word for gold and a word for silver, but not a word for lead. That tells us that they traded with gold and silver, but they didn’t smelt it for themselves, since lead is a byproduct of smelting gold- and silver-bearing ore.” All that kind of thing is interesting and fun, but you can see how speculative it all is.
Gregg Hecimovich solved a literary mystery.
My guest on this week’s episode of The Habit Podcast is my old graduate-school friend and housemate, Gregg Hecimovich. In 2001, Henry Louis Gates announced the discovery of an unpublished novel called The Bondswoman’s Narrative, written in the 1850s by an enslaved woman named Hannah Crafts. If Gates had the authorship right, it would be the oldest known novel by an African-American woman. But many people doubted the book’s authorship. In 2013, however, Gregg Hecimovich produced evidence that The Bondswoman’s Narrative was indeed written by a black woman in the1850s. Hannah Crafts, he demonstrated, was the pen name of Hannah Bonds, who escaped from slavery in North Carolina. Dr. Hecimovich’s new book is The Life and Times of Hannah Crafts: The True Story of The Bondswoman’s Narrative. It’s a biography of Hannah Bonds. It’s also a detective story, telling how Gregg Hecimovich and many others uncovered the fascinating true story behind Hannah Bonds’s fictional story.
Though English is a Germanic language, about 2/3 of its lexicon is Latinate. It can be confusing, I know. You can thank William the Conqueror and the other Normans who invaded Britain in 1066.
A linguist named Rasmus Rask actually made Grimm’s discovery a few years before Grimm did. But, still, it’s called Grimm’s Law, not Rask’s Law.
You reminded me that I have Carl Darling Buck's "A Dictionary of Selected Synonyms in the PIE Languages" on my shelf. I am looking forward to regular reading of it in retirement. There are so many connections to see!
My adopted daughter, Desi, born in Bulgaria of Roma (Gypsy) heritage, has been hunting for her biological parents for years. She has computer-generated color-dot maps of her genetic relatives, near and far, and they sweep like a rainbow out from India, across Iran, up into the Caucasus Mountains, and on into Western Europe. Her ancestry matches the march of the Proto Indo-European language.
Troy A. Thompson, M.D.