So You Want To / Create a Conlang
So you've decided to create a conlang, whether to add some depth to your own setting or because you were paid to do so for someone else (lucky!). You love to read The Lord of the Rings
and watch stuff like Star Trek
, Game of Thrones
, and Thor: The Dark World
, so you've decided to craft your own language.
First, be sure to check out Write a Story
for basic advice that holds across all
genres. Then, get look over a rundown of the genre-specific tropes that will help you, hurt you, and guide you on your way.
A few notes before we begin:
- The focus of this article is on naturalistic conlangs—that is, languages with an aim to look realistic, especially as compared to other languages of the world. You are of course free to create non-naturalistic languages—Klingon is one, as is Lojban, although these will for the most part not be focused on here.
- This article uses the International Phonetic Alphabet, or IPA, to represent sounds.
- The user who started this article, Iura Civium, will probably make reference to some of his languages over the course of the article. If you wish to expand this article, feel free to use examples from your own languages.
What you're probably going to want to do is start with the phonology
, the sound of the language, first. You can have ideas for other aspects of the language in question before you start, but the sounds are going to be necessary for actually making parts of the languagenote
Once you have this done, you can start with the grammar
). To a first approximation, this is how the language works internally—how you put the pieces together to encode relations between the words. This topic covers things like word order, syntactic alignment, typology, and grammatical number.
Next (or concurrently), you can work on morphology
. This is the shapes of the functional bits of the words in your language—prefixes, suffixes, infixes, function words, consonant mutation, things of that nature.
Should you want to, you can try working on a method of writing
- Don't try to do or have everything. There is a lot of different sounds and concepts in linguistics. Don't try to shove everything into a language. This is not to say you can't have languages that involve a lot of sounds or concepts—just don't bite off more than you can chew.
- Think about the pieces you have and how they fit together. You may find that, instead of adding a new piece to the language, you can express an idea using pieces you already have. If sounds change along with the grammar, think about what logical patterns they might fit.
- It is a good idea to avoid Translation: "Yes". This should be self-explanatory. (Of course, if you're making, say, a joke language, feel free to ignore this.)
- Don't be afraid to make mistakes or change your language. It is good to have ideas as to where you're going with a conlang, but don't let them crush your creativity. If something happens to be unwieldy, be open to changing it. If something happens to present an interesting option for your language, consider seeing where it goes.
Suggested Themes and Aesops
This is the sound set of your language and how the sounds work together. There's two major types of sounds, consonants
, as well as things like stress
, and phonation
. In addition, there are things like syllables
, and sandhi
International Phonetic Alphabet
The IPA is a valuable tool for displaying how words in a conlang are pronounced. It works by giving each phonetic sound a single letter. For instance /g/ represents a hard g sound, and never a soft g (or English j) sound, which is represented by /dʒ/. Some sounds (like the previously mentioned /dʒ/) may instead be represented by two letters if they are actually made up of more than one 'pure' phonetic sound. For instance, the 'ch' sound is represented as /tʃ/ because it's actually made up of a 't' and 'sh' sound. The 'ow' sound (as in now
) is represented as /aw/ because it's actually a combination of an 'ah' and an 'w' sound juxtaposed. A compound vowel sound like this is called a diphthong.note
IPA is fairly intuitive for consonants. The main oddity (for English speakers) note
is /j/, which represents the 'y' in y
es. A few other symbols are brought in, such as /ʃ/ for the 'sh' sound. The vowels are trickier. English uses 5 vowels (and sometimes 'y' as well) to represent 4 times as many vowel sounds, including about 12 that aren't diphthongs. Not only that, but IPA is based off how the vowels were pronounced in Latin. English has significantly deviated from this, thanks to the Great Vowel Shift. For instance, the vowel sound in see
is represented by /i/, because that's how "i" was pronounced in Latin.
Consonants, vowels, and the sonority hierarchy
There are many, many possible consonants. There are slightly less many vowels. A realistic conlang is unlikely to use exact same set of sounds as English.note
A conlang that is designed to sound exotic or alien, like Klingon, may include some unusual consonants while missing out common sounds like /k/.
One thing you're going to want to do to keep your phonology naturalistic is not to just dump a pile of random consonants and vowels into your language. There tends to be some form of symmetry in consonant inventories, though you don't have to have complete
symmetry; gaps in consonant systems are perfectly naturalistic as long as you don't go overboard.
The following things to consider about consonants:
- Voiced or Voiceless... On a consonants chart, consonants come in pairs. For instance, /s/ is the voiceless counterpart to the voiced consonant /z/. Two counterparts like this may sometimes get swapped into each other. Notice how the "s" in "cabs" gets turned into a "z" when it comes after the voiced "b" consonant, and compare that to how it sounds in "maps". Most languages don't bother having this contrast for nasal consonants, but Icelandic and Welsh, among others, do contrast the voicing of nasals. Voicing contrast of laterals (L-like sounds) and rhotics (R-like sounds) is also rare, but does occur in some languages.
- ...or aspirated? There is a third option for stop consonants. They can be aspirated too. English has both the aspirated /pʰ/ sound and the non-aspirated voiceless /p/. This is the different between the 'p' sounds in pie and spy. You don't notice, because English doesn't treat the sounds as contrasting. But some do. Chinese, in particular, draws a distinction between aspirated and non-aspirated stops and not between voiced/voiceless stops. This is why Beijing is written with a b, even though English-speakers might think the sound is more like a p. The older spelling of Peking reflects this.
- The "r" sound: The weak 'r' sound found in most varieties of English is actually very rare. This is represented as /ɹ/.note Languages more often have a tapped r /ɾ/, employ Trrrilling Rrrs with /r/ or a guttural /ʁ/. Spanish has two r sounds.
Vowels are often divided into front vowels ("ee", "eh"), back vowels ("oo", "aw") and central vowels (which tend to sound duller). "ah" sounds can count as either, though most often as back vowels. Central vowels are rarer, and you usually have an equal or greater number of front vowels than back vowelsnote
- How many vowel sounds a language have can vary considerably. As mentioned already, English has quite a lot of monophthongs (i.e. vowel sounds that aren't diphthongs). Most other Germanic languages have a fairly big inventory, Dutch having 19 monothongs and 4 diphthongs. Others have less; Spanish only really has 5, while the late Ubykh got by with 2note .
- Languages that are stressed-timed tend to have more vowels. This means that syllables vary in length depending on stress. Unstressed syllables will often use duller vowel sounds than the stressed syllables. In languages that are syllable-timed, every syllable is the same length. This is partly why Spanish only needs 5 vowel sounds. Compare the English pronunciation of "fajita" (fuh-HEE-tuh) with the Spanish pronunciation (fah-hee-tah).
- Despite having so many vowel sounds, English does miss out a few. These include the "ö" and "ü" sounds from German (actually two pairs of long and short sounds).
- Some languages such as Chinese make use of the tone that a vowel is said with. Thus "ma" can mean mother, "hemp", "horse" or "scold", depending on the intonation it is said with. An similar example in English would be the difference between "What." and "What (the hell)?!"
If you still can't decide on an inventory, it might be helpful to know that the most common consonants are /p, t, k, s, m, n, l/ and the most common vowels are /a, i, u/. All languages have most of these sounds, and most languages have all of these sounds.
A syllable can be divided into two parts: The onset
, or everything before the vowel (or syllabic consonant if you want to get fancy), and the rhyme
), which is everything from the vowel to the end. Rhymes can be further divided into two parts, the nucleus
, which is typically a vowel, and the coda
, which can either be consonant(s) or nothing. Linguistically, languages have a tendency to have larger onsets than codas. This is because generally speaking, it's easier to hear a consonant when it comes before a vowel rather than after itnote
This is sort of the rhythm and intonation of speech when a language is spoken. American English, for example, likes to have a tonal upshift at the end of questions, and stress matters in words (think próceeds
Given three grammatical categories (Subject, Verb, and Object), there are six main word orders:
One can also have no dominant word order. Latin, especially Latin poetry, exhibits this word order; most of the work is taken up by the inflectional endings of the wordsnote
There is an additional class called topic-prominent
, though the subject and topic need not be the same thing. Wikipedia has examples of possible word orders in such languages.
Some languages, such as English, don't really have these, but there are those that do, such as Navajo. An animacy hierarchy is a set of rules governing what subjects can act on what objects or what actors can have certain roles in a sentence. Wikipedia
gives the typical hierarchy as:
- nonhuman animates
where "1", "2", and "3" stand for those respective grammatical persons.
In essence, objects can appear in the agent role only if the patient role, if any, is either in the same tier or lower of the hierarchy—e.g.
, a general word for a person can appear as the agent if an inanimate object is the patient, but not the other way around. Getting around this can be done in one of several ways—you can have suppletive words that appear higher on the hierarchy than their referents "should"; you can invent morphological processes to keep the roles or order of the words the same, but change the meaning; or you can forbid it entirely and have such concepts handled with semantics or circumlocutions.
It is useful to list several relevant terms for this section:
There are some different possible ways for you to align your syntax.
- Nominative-accusative. A and E are marked the same (nominative), P is marked differently (accusative).
- Ergative-absolutive. P and E are marked the same (absolutive), A is marked differently (ergative).
- Transitive-intransitive. A and P are marked the same (transitive), E is marked differently (intransitive).
- Tripartite. A (agent), E (experiencer), and P (patient) are all marked differently.
- Split-ergative. Appears as ergative-absolutive sometimes, nominative-accusative in others, typically dependent upon the tense of the verb.
- Austronesian (a.k.a. Philippine or direct-inverse). Nouns take either a "direct" marking if they're the subject or an "indirect" marking if they aren't, and the verb tells you what role the noun plays.
- Active-stative. Nouns that are the subject of an intransitive verb can either be in the A or E role, depending on certain conditions. There are two subtypes:
- Split-S, where the role is a quirk of the particular verb in question.
- Fluid-S, where you can use either, but there is a difference in connotation depending on whether the noun is marked for the A or E role.
The morphological typology of a language determines how much words are inflected. It's best to imagine it as a triangle; in one corner, there are isolating languages (Chinese dialects, Hawaiian, most Southeast Asian languages, etc.), which have vary few inflections, instead opting for word order and and determiners; in another corner, there are agglutinating languages (Japanese, Nahuatl, Turkish, etc.), which have a one to one ratio of morphemes and their meaning (e.g. one affix for case and one for number); and in the third corner, there are fusional languages (Romance languages, Germanic languages, Semitic languages, etc.) which use one morpheme to refer to multiple meanings (e.g. one affix for both case and number). No natural language is 100% isolating, agglutinating or fusional; Mandarin Chinese has an agglutinating plural (wǒ
"I" but wǒmen
"we") and Turkish pronouns show some fusion (ben
"I" and biz
"we" but o
"he/she/it" and onlar
This is a fancy way of saying "does the main part of the concept come at the beginning or the end of the phrase?". In head-initial
languages, the important word tends to come at the beginning of the phrase (for instance, nouns precede adjectives, verbs precede adverbs). In head-final
languages, the opposite is true (nouns tend to follow adjectives, verbs tend to follow adverbs). According to Wikipedia
, no language is strictly head-initial or head-final in every category—it is a pattern
, not an exceptionless rule.
Language change (or, putting languages through a wood chipper for fun and profit)
As most people are aware, the Romance languages (the big five being French, Spanish, Italian, Portuguese, and Romanian) are those languages that descended from Latin. They are distinct from Latin. They are also distinct from each other. How did they get like that?
- Sound change: As Latin developed into Romance, the way people pronounced words began to change in (usually-)systematic ways. For example, Latin had long /l/ and /n/ sounds. On the way to Spanish, these sounds palatalized and ended up as /ʎ/ and /ɲ/, respectively—compare Latin annum 'year' > Spanish año, but anus 'ring' > Spanish ano 'anus'. (Don't look at me like that. It was the first example that came to mind and it illustrates the principle nicely.) Latin annum also yielded French an, which didn't palatalize, but turned into a nasalized vowel through a different process.
- Grammatical change: The grammar changed too, as features were developed and lost. Latin had a distinction between vel 'and/or' and aut 'either/or' which didn't survive Vulgar Latin (e.g., French ou 'or' < Latin aut). Romance languages also mostly did away with the infamous case system of Latin (though it lingers on in a reduced form in Romanian). In terms of gains, some Romance languages developed new verbal inflections—the French future tense developed out of a construction of the form [infinitive] + habere, as can be seen in, for example, mangerai '(I) will eat'. Spanish usted is a shortening of a respectful form of addressing someone else that didn't exist in Latin. Romance languages also gained stricter word order. In Latin, the case endings carried the information about a word's role in the phrase, so you could shuffle them around a lot, as readers who have the misfortune of being familiar with Cicero know all too well. As sound change destroyed the case system, speakers of Romance had to figure out some other way of helping figure out how the words fit together in the sentence, and they did this by making the order of the words more important. (Something similar actually happened on the way from Old English to Modern English because Middle English said "Word-final case markings? lolno".)
- Lexical change: Words changed meaning, fell out of use entirely, or were borrowed. One prominent example is the word for "horse"—originally, this was equus in Latin. This fell out of use in Vulgar Latin, from which Romance developed; it borrowed a Celtic word for horse, which ended up as caballus (whence, say, French cheval). Famously, Portuguese saudade (look up the definition, it's nuanced) came from Latin solitatem, which just meant 'solitude'. The word "admiral" doesn't come from Latin, although it looks like it does; it's from Arabic amiir al- 'emir of'note .
- J.R.R. Tolkien. Tolkien's so-called "secret vice" led to The Lord of the Rings with its many languages. Why does Middle-Earth feel so alive? In large part because of its languages.
- David J. Peterson, full stop. The former president of the Language Creation Society, he created Dothraki for Game of Thrones as well as the languages used in Thor: The Dark World, Defiance, Star Crossed, and The 100.
- Marc Okrand's Klingon, created for the Star Trek cinematic universe. While not really a naturalistic language, its cultural influence is considerable and looking at how utterly different it is can be useful and/or at least instructive and inspirational.
The Epic Fails
- David J. Peterson's The Art of Language Invention deals with conlanging and features many relevant examples from both natural and constructed languages.
- The Concepticon is a good way to build a starting vocabulary for a conlang.
- The New Conlang Bulletin Board, a community for language creators.
- Marc Rosenfelder's A Conlanger's Lexipedia provides ideas on coining words with etymologies attested in natural languages.
- Etymonline, a database about the origins of words in English.
- "Ergativity", by David J. Peterson. Notes on how to derive (split-)ergative languages from languages that were originally nominative-accusative.
- FrathWiki, a conlanging wiki.
- The aforementioned Index Diachronica is a resource for those interested in naturalistic sound changes.
- Marc Rosenfelder's Language Construction Kit is a good starting place for those who want to build their own languages. Rosenfelder literally wrote the book on conlanging—the Kit is available in two volumes.
- Omniglot, a source for looking at the various writing systems of the world (and invented ones).
- redditor /u/yaesen's "On Generating Ideograms" can help with creating ideograms.
- A Survey of some Vowel Systems is a good resource for making your vowels naturalistic.
- Wikipedia has a large amount of information on linguistics topics.
- The World Atlas of Language Structures (WALS) is a database of the features of natural languages. One can find data sets and examples there and can even compare multiple features across languages, which can be useful in getting ideas on how to proceed if a language already has certain features.
- Marc Rosenfelder's page on yingzi is a good place to start for those of you who want to create logograms.
- The Zompist Bulletin Board, run by Marc Rosenfelder, is a community of conlangers (as well as a forum for his own projects, such as the world of Almea).