Useful Notes: Grammar In Foreign Languages
Unlike what you would see in many works of fiction, languages of the real world can work in wildly different ways, enough to make them sound like Starfish Language to a non-native. In fact, for every property that has ever been proposed as a "universal" characteristic of human language, there is at least one known non-artificial human language that doesn't have it, or has its exact opposite.

Western audiences and authors generally find the Indo-European language family the most familiar in terms of grammar and vocabulary. This family includes most (but not all) of the languages spoken in modern Europe (already quite diverse; compare Russian to English to Italian) but also roughly half of the many languages spoken in India and what used to be called the "Near East" (Turkey, Persia, etc). And Indo-European is only one of dozens of such families. Wikipedia has more details.

Real human languages very often differ from what Benjamin Whorf has called "Standard Average European" in that they can:

  • Lack articles such as a, an, or the, such as Russian and Latin (IE), and Japanese and Chinese (non-IE).
    • Have definite articles but no indefinite articles, such as Irish and Icelandic (both IE), Esperanto (a Con Lang based mostly on IE languages), and (all forms of) Arabic (non-IE)
    • Have finicky rules about when things can be definite or indefinite (Literary Arabic: not "a leader of the community," but rather "one leader among the leaders of the community")note ; even closely related languages such as English and German sometimes use inverse rules when it comes to abstracts, for example. note 
    • Place articles after the word modified instead of before (Romanian, Bulgarian, Macedonian and the Scandinavian languages have an enclitic definite article, while the Romanian indefinite article follows rules closer to English).
    • Have many more articles than English. German articles change according to gender, number, and case of the noun, resulting in 16 possible combinations for the definitive article (although those are only expressed through 6 forms).
  • Have no direct or single equivalent of verbs like 'to be', 'to have', or 'to do' which are kind of a defining feature of IE languages. It's often not just non-Indo-European languages. Irish, the Ibero-Romance languages (Spanish, Portuguese, Galician, etc.) as well as Catalan (Gallo-Romance) have two copulas ('be') (one of the Romance ones usually deriving from the Latin word for "to stand"). Irish and Russian have no auxiliary verb "have". note 
    • Arabic, meanwhile, has both "to be" and "to have" (in the possessive sense), but uses them far less frequently than English does. "To be" is almost always omitted in the present tense; you would say "I Egyptian" rather than "I am Egyptian". The equivalent of "to have" is almost never used for normal possession, because it implies not just possession, but sovereignty. You would say "to/at me there is an umbrella," not "I have an umbrella."
    • Polish uses "to have" much like English, but adds an extra sense that literally translates as "it doesn't have X", meaning "X isn't here".
  • Do not mark nouns for number (Japanese), or, alternatively, have more number markers than simply singular and plural. Many languages have separate dual or even trial ('three') numbers. There is even at least one language that has marks for zero (I have no cookies), fractional (I have half of a cookie), singular (I have one cookie), dual (I have two cookies), paucal (I have a few cookies), and large-scale plural (I have lots of cookies)! Most Indo-European languages have lost their duals; Sanskrit, Ancient Greek, and Old Church Slavonic had them, and there are still traces of them in some of the Balto-Slavic languages (usually in a unique declension for the number two, and different noun forms used with certain numbers). English's use of the word both (rather than *all two) may be a remnant of this as well. Latin also had one, which survived in the irregular declension of the word "duo", while Slovene still makes full use of it. Old English possessed the vestiges of a dual, but only in the pronouns. Come Middle English, this dual number was gone.
  • Have a more limited set of cardinal numbers — the so-called "one-two-many" phenomenon, although some languages may hit "many" at a point other than three. Note that this does not necessarily prevent accurate counting above "many"; it may just change the nomenclature. The Trolls of the Discworld, for instance, have a cardinality based on powers of 4: "one" (1), "two" (2), "three" (3), "many" (4) and "lots" (16), which can then be combined to express other quantities (like English does for concepts like "twenty-one" and "one hundred fifty-two"). Then again, a culture that is truly innumerate may not be able to distinguish between different quantities of "many".
    • Conversely, linguistic evidence suggests that many languages started out with "one-two-many" cardinals before gaining more terms for numbers above two; one of the telling pieces of such evidence is that the first two ordinal numbers in most languages ("first" and "second", in English) are not related to their corresponding cardinals ("one" and "two"), whereas ordinals for three and above ("third", "fourth", etc.) are clearly constructed from their cardinals. An alien language might well go further into the ordinals before one encounters the first ordinal derived from a cardinal, suggesting a larger range of early numeracy than humanity generally demonstrated.
    • You may think a race with an inherent grasp of mathematical concepts might never derive ordinals from cardinals, but you can't just bust out a new word whenever you need a high enough number; at some point you're gonna have to start building your numbers on earlier numbers (say, twenty-one; that way you can also round 3104393 to three million). That said, an alien language might follow a completely different repeating pattern in ordinals and cardinals.
  • Have nouns with grammatical gender. French has two (masculine and feminine), German has three (masculine, feminine, neuter), and some languages assign "gender" according to whether the topic of the subject is visible, known to be near, or far away. Some languages have a simple animate vs. inanimate. Some confusingly combine these (e.g. Arabic, which arbitrarily divides non-human objects into masculine and feminine, and proceeds to ignore that division by making all inanimate plurals "singular feminine"; Unfortunate Implications aside, it's really confusing). Other languages differentiates gender by properties of the noun, Swahili has a different gender for people, animals, tools, liquids and so on. Or alternatively, are more gender-neutral than English, like the Uralic Languages. Imagine having "he" and "she" be the same word, as well as "him" and "her."
    • It's also possible for languages not to distinguish gender or animacy in their pronouns. Basically, everything is "it", whether it's a man, a woman, a dog or a bit of navel lint.
  • There can also be grammatical gender for numbers. In Hebrew, there is a male and female form (the latter is the one commonly used for plain numbers - probably because the male form is often a syllable longer). Sometimes, it's worse, when there are further divisions due to the object type. There is a story about a Nivkh child who had trouble subtracting five buttons from thirty and adding six trees to seven - because the shape of the buttons and the size of the trees weren't specified.
  • Mark verbs for categories that English either doesn't have or marks periphrastically, such as voice, aspect, mood, and so on. Or don't mark verbs for categories that English does; Mandarin Chinese has no tense, and conveys temporal information through aspect, instead.
  • Differentiate between the inclusive and exclusive 'we'. Compare the English, "We are at a disagreement" to "We do not like you." The inclusive includes the person being addressed, while the exclusive does not.
  • Have a different concept of "word" than what you expect. There is no agreement among linguists on what constitutes a "word", or even on whether there is a universal concept of "word" that can be applied to all languages. Again, Japanese provides an example — are the particles (wa, ga, o, etc) part of the word or separate words themselves? Most linguists say they're separate, but there's no shortage of transliterations that don't have a space there. (Japanese itself avoids the issue by not having spaces between words at all.)
  • Are ergative-absolutive instead of nominative-accusative. Take two similar sentences that differ in verb transitivity (such as 'He slept.' and 'She ate them.'). A nominative-accusative language (like English) case-marks the subjects 'he' and 'she' the same in both sentences (that is, as 'he'/'she', the nominative case, instead of as 'him'/'her', the accusative case) and case-marks the object 'them' (perhaps some apples?) in the accusative (as opposed to in the nominative 'they'). In an ergative-absolutive language, the subject of the intransitive sentence 'he' would be case-marked the same as the object of the transitive sentence 'them' — in the absolutive case. The ergative case only shows up marking the subject of the transitive sentence 'she'. Total ergativity is extremely uncommon, with Basque, a language isolate spoken in Spain and France, being one of the few languages to be almost completely ergative. Most languages considered ergative have split-ergativity instead, which means they only behave like an ergative-absolutive language in some contexts, and use another alignment (usually nominative-accusative, as in English) in others. Several Indo-Iranian languages such as Kurdish and Hindi are split-ergative. They appear to have borrowed this feature from neighbouring languages like the Dravidian languages, the Caucasian languages, etc.
    • There are a lot of different kinds of morphosyntactic alignment, besides nominative-accusative and ergative-absolutive. Some languages are transitive, marking both the subject and object of a transitive sentence the same, but the subject of an intransitive sentence differently. Some are tripartite (marking the subject of a transitive sentence, the subject of an intransitive sentence, and the object of a transitive sentence all differently). Some are various kinds of active-stative (marking subject case based on whether or not the subject actively does something, so case marking is dependent on the meaning of the verb rather than grammar), and then there's "Austronesian alignment", which is, well, very confusing.
    • Then there is the fun case of finished versus unfinished action. For example the distinction between passé composé and imparfait in French. Another such case is the object cases in Finnic languages. An example from Finnish: "Söin kalaa" (I ate some fish) vs. "söin kalan" (I ate a whole fish). The idea is similar as in French but it's specifically about transitive sentences and it's marked with the object rather than the verb.
  • Have wildly different syntax (word order). English generally places the subject of a sentence first, the verb second, and the object last, a very common word order. However, in just as many languages, the subject is placed first, the object second, and the verb last. A minority of languages even do things like place the verb or the object first, the subject last, or any other possible combination. Some languages, usually those that are highly inflected, don't even have a hard and fast word order at all. Latin, for instance, generally prefers SOV outside of poetry, but is so inflected that the word order can be changed without changing the meaning of the sentence. The old forms of Semitic languages (like Classical Arabic and Biblical Hebrew) historically preferred VSO, but left SVO as an option because of their inflection—the latter of which became dominant in the contemporary colloquial forms.
  • Then there's the question of whether to put adjectives before or after the words that they modify, where to put determiners, what types of clauses or sentences change word order, how to construct relative clauses, etc.
  • Are not nearly-isolating languages like English, where word use is determined by position, and there are lots of particles — small words with purely grammatical functions (like English prepositions). Some languages, like Japanese and Turkish, are agglutinative, where word use and other such markers are affixes that combine in a string. Some languages, like Latin and its descendants, are fusional, where word use and other morphemes are marked by affixes that are all mutually exclusive (so there's one affix in Latin where Turkish might have a string of three or four, but you need a completely different affix in Latin for a small change in meaning, while Turkish can just switch out one of its affixes). Agglutinative languages are rather famous for their ability to cram very large amounts of information onto single words. For example, in Hungarian, the common toast "Egészségünkre!" is literally "To our health!"; a phrase which takes three words to say in English, but in Hungarian, one word does the job. Some languages really take the ball and run with it — in Inuit, "he said he wouldn't be able to arrive first" is "tikitqaagminaitnigaa," while in Yaghan, "the look shared by two people too shy to do anything about it" is "mamihlapinatapai." It gets even worse when you get to polysynthetic languages, where several distinct words get mashed together: archaic Ainu "usaopuspe aejajkotujmasiramsujpa" means "I keep swaying my heart afar and toward myself over various rumors."
  • Or perhaps they're more isolating than English is. Plurals and past tense forms may be expressed using distinct words that in some cases can be used alone: "did walk" instead of "walked", with "did" alone as a possible answer to a question. Chinese, for instance, has one morpheme per syllable and close to one morpheme per word.
  • Have adjectives that act like verbs instead of or along with acting like nouns (kind of). For example, some Japanese adjectives can be conjugated just like verbs — shirokunakatta ie = the house that was not white (white-NEG.PAST house). Sometimes this situation is described as "the language has no adjectives," which confuses the uninitiated — what is meant is not that the language doesn't have words like "red" or "large," but rather that words like that follow the same rules as verbs.
    • The Wolof language of Senegal conjugates pronouns. Maa ngi dem means "I am going" or "I go." Dinaa dem means "I will go [soon]." In this case, dem is the verb (go), and cannot be changed. Maa ngi and dinaa are both pronouns.
  • Have prepositions that can be used independently as verbs, or rather, have verbal grammar such that subordinate verb phrases are used when English would use prepositional phrases. In such a language, one word may serve as the verb "go" and the preposition "toward".
  • Use noun cases to convey the same meaning as English prepositions. In Finnish, for instance, there are fifteen distinct noun cases (kind of makes the three in English look simple, doesn't it?) to express various different meanings, but the use of prepositions is severely limited. For example, "talo" means "house," but "talossa" means "in the house," "talolla" means "at the house," "taloksi" means "(transform) into a house," etc.
    • Hungarian has at least eighteen cases, and that's without counting the rarely used ones. A fellow Uralic language, Komi, has over twenty as well.
  • Differentiate between alienable and inalienable possession: "my wrist" is "wrist of me", but "my watch" is "watch on me".
  • Have something other than two degrees of demonstratives — English has just this and that (but it used to have yon[der] as a third, and the other is commonly used as a third but decidedly less standard), Japanese has three (kore, sore, are), some languages have one, some have as many as five. Alaskan Yup'ik has thirty. They are sorted by five layers of location, three layers of visibility and two layers of accessibility. So for example one demonstrative means "partially visible 'that,' near and accessible to the listener but not necessarily to the speaker." Another demonstrative means "completely visible 'that' which is above the speaker and inaccessible to him/her."
    • German, by contrast, has only one used in common speech, dies-. Technically there is a second, jen-, cognate with English yon—and used just about as frequently.
  • Mark the relationship between speaker and audience (register), and occasionally also between speaker and subject, whether through pronouns or verb forms or sentence markers. Most Indo-European languages have this, actually; for example, in French there's 'tu' (informal) and 'vous' (formal). English is one of the few IE languages that doesn't do this, although it used to and a few dialects still do. Some languages get very elaborate; Japanese marks for formal/informal, plain/polite, and humble/honorific, in any combination of the three (though formal/informal are pretty similar). Korean has about seven degrees of politeness and formality, each of which also has a humble and an honorific form—though a few of them aren't used much anymore.
    • Or just have a different world view on pronouns altogether. Vietnamese is often described as "having no universal pronoun". (Which is untrue, as it actually does have some.) In practice, this means that in most conversations, the language requires its speaker to choose a kinship word (let’s call it Kinship Term A) to refer to themselves where English would say “I”, and pick another one (Kinship Term B) for the listener, where English says “you”. Here’s where it gets interesting: When comes the other person’s turn to speak, the kinship words stick to the respective parties they represent, so now Term A becomes "you" and Term B becomes "I". What the address terms actually do is to convey the expected social relation between you and the other person. You don't stop being your mom's child just because it's your turn to speak. Confused yet? That is how an Anguished Declaration of Love by a man to a woman in Vietnamese could translate to “older brother love younger sister a lot”, and then the woman would reply “younger sister love older brother a lot too." Working out the I’s and you’s in Vietnamese can ask for (and reveal) a ridiculous amount of contextual info – the other person’s sex, age, your own sex and age, relationship between you and them if any, their attitude towards you, your attitude towards them... And that’s just for one-on-one convos. People’s first names can take on the role of pronouns; in fact, any noun can, under the right circumstances. Needless to say, this makes for a sociological minefield even for native speakers.
  • Have words that don't directly and perfectly translate into English. Sure, there can be some of the whole "showing culture through vocabulary" thing, but also more mundane instances — for example, English divides temperature into cold, cool, warm and hot, but other languages may have only two or three of those, or maybe more.
    • Similarly, many non-English languages divide up colors differently from the Western standard "ROY G. BIV", with some having as few as just two basic colors (black and white)note . Quite a few make no distinction at all between blue and green. On the other hand, some Asian languages have dozens if not hundreds of distinct color names. An author writing a race with a different visual range from humans (such as demihumans from D&D, who frequently possess vision in the infrared range) may forget to create terms for colors humans can't see at all, not even "squant" or "octarine".
    • Other languages may also have fundamentally different conceptual metaphors. For example, while in most languages the past is "behind" us and the future lies "in front" of us, in Quechua and Aymara it is the other way round.note  Rather than likening the passage of time to the ego's journey from the past toward the future these languages liken it to a movement of events in a queue — the events of the future are lined up behind the events that have already occurred (this metaphor is also present in English and other languages with words like "before" and "after", but it is only used to relate events to other events, when the ego is not involved).
  • A language might not have a general term for a group of objects or actions that English takes for granted. For example,an Australian aboriginal cannot say "twenty birds" referring to a group of ten sparrows and ten ostriches. For him it would be like adding rocks and dogs together. In Russian, there are no words meaning "bring" and "put" - you can only say that you carried or rolled something in, or that you laid or stood something in front of a person.
  • Lack relative constructions ("the one that does X" etc.), and have to substitute adjective phrases ("the X-doing one"), or have correlatives: "This is the man who my wife has been sleeping with him!"
    • In Romance languages, the opposite is true; there is no adjective phrase with verbs. To say "The talking dog" in French, one must say "The dog that talks." (Le chien qui parle.)
      • Actually, even in Romance languages adjective phrases exist: in French for example, one could as well say "Le chien parlant" ("The talking dog"). It's true that English has way more occurrences of those, though, as many of them can only be translated with relative constructions.
  • Treat relative clauses like adjectives. For example, in Mandarin Chinese, using the attributive particle de, one can just as easily say "red de car" as "drives down the street de car," using actual Chinese words of course. The former would simply be "red car," but the latter would have to be translated as "the car driving down the street."
  • Are topic promotional instead of subject promotional (Japanese). In English, the subject is understood to be the topic of the sentence (which the passive voice facilitates). In Japanese, topic and subject do not have to be the same.
  • Have no element in a sentence that corresponds straightforwardly to what Europeans would call the "subject." The topic-promotional Japanese -wa is a good example, as are dozens of academic papers in Linguistics debating whether sentences in Tagalog (the most common language of the Philippines) can be properly said to have subjects or not. (Short version: the properties that a subject has in English can often be split up between two noun phrases, the "topic" and the "agent", in other languages.)
  • Is written using logograms (Chinese)note , abjads (Arabic, Hebrew)note , syllabaries (Inuktitut)note , abugida (the languages of India and Ethiopia)note , or a hodgepodge of everything (ancient Egyptian and modern Japanese), instead of an alphabetic writing system. And not all writing systems include the concepts of upper and lower casenote , cursive writingnote  and/or punctuation, and if they have them, they may not use them the same way.
    • Korean Hangul is a very fascinating one: it's a syllabary where each syllable character is a combination of the characters for the sounds it contains and each sound character is actually a "code" describing that character phonetically, making it both a syllabary and an alphabet. Sounds complicated but it's very logical in use.
  • Use different methods for dividing words other than spaces. Many, such as Japanese and Chinese, have no divisions at all. Other options include interpuncts (Classical Latin), special characters at the beginnings of words (Hebrew), or even elevating the first character in each new word (Persian). German is also famous for not having spaces in its noun compounds — though in reality, these compounds are grammatically more or less the same as English phrases like magical girl anime fan; the main difference is orthography (where you put spaces in writing), not grammar proper.
  • Possess writing directionalities different from the most common left-to-right and top-to-bottom, such as right-to-left and top-to-bottom (Arabic, Hebrew), left-to-right in vertical lines that run from top to bottom first (Mongolian, Uyghur), or even right-to-left in vertical lines (Chinese, Japanese). Beyond that would be boustrophedon (changing direction with each line), which while common in antiquity is used by no (natural) modern language. Then there are languages that can be written in multiple ways, or are leaning more towards left-to-right and top-to-bottom as a result of western influence.
  • Follow a different syllabic stress pattern than English. A case in point: when faced with an unfamiliar word of more than two syllables, English speakers tend to stress the next-to-last syllable, with a secondary stress on the second syllable prior to that, if the word is long enough. Other languages may prefer other stress patterns. Word stress patterns are particularly in-ground habits, and it is sometimes quite difficult to adapt to a different language's "defaults"; writers creating a language will rarely choose stress patterns they find difficult or "unnatural".
  • Use pitch and changes thereof as elements of meaning in words. While Mandarin Chinese is the most famous example, numerous African languages also possess this property, where changing the pitch at which you pronounce a set of phonemes can completely change the meaning of those phonemes.
  • Form compound nouns differently. Most languages put the base noun at the back, but there are languages which put it at the front. As an example, control CENTER would be translated as PUSAT kawalan in Malay language.
  • Have idioms and allusions that make no sense to a non-native speaker. Even languages that are closely related to English have turns of phrase that are completely incomprehensible without a native to explain their use, such as the French avoir les dents longues ("to have long teeth", meaning "to be ambitious") or the German Ich werde dir die Daumen drücken ("I'll squeeze my thumbs for you", meaning "I wish you luck"). Languages of vastly different derivation, evolving in a wildly foreign cultural matrix, can (and do!) have idioms that make even less sense to the outsider — and nonhuman/alien idioms may be utterly impenetrable even with native help.
  • Similarly, has a different concept of what constitutes "blasphemous", "obscene" or "offensive" language. Different body parts, functions or gestures — or none at all — may be offensive to native speakers; other obscenities will be culturally-based, derived from the religious, social and/or political matrix in which the language evolved. This can be seen even between English-speaking cultures — it was noted once that Catholics tended toward religious-based oaths, while Protestants swore by bodily functions. And Americans generally have no idea why some Brits consider "bloody" such an offensive adjective that in the Victorian era it was frequently replaced with "ruddy", and its use still gets reprimands in some quarters today. Further, a dialect may encode a language's obscenities into unrecognizability — see the "Cockney Rhyming Slang" section of the British English page. And some obscenities may well be fossils — words or usages which carry offense only because "everybody knows they're dirty", despite the reason for this common knowledge being long forgotten. In more extreme cases, entire tenses, moods or categories may be offensive, perhaps under complex rules governing time, place and speaker.
    • While most languages have words that are considered obscene in any and every situation (for example, it is impossible to use the f-word "politely" in English), swearing in other languages is a much more context dependent matter. In Japanese, for example, registers of politeness is encoded directly into the grammar and failure to employ the polite verb conjugation when speaking to a social superior is ocassion for great offence; however, using the exactly same sentence when speaking to a social inferior could be construed as tactless, but not technically rude.
  • And above all, do not have only and all of the sounds that are found in English. The pronunciation of even closely related languages like French and German can only be approximated by English sounds, let alone more distant languages, and vice versa: this is of course where foreign accents come from. Even a lot of conlangs still use English's horribly complicated tense/lax vowel system (yet many claim to have five vowels, while English generally has 12 or more), and some of the worse-done relexes and such employ English orthographic conventions as well — writing reed or rede when the speaker says /r\i:d/. And few if any conlangs employ more consonants than English possesses (which do exist — Xhosa and related African languages, for instance, have three entire groups of click-based consonants which have no counterparts in Indo-European tongues, and the glottal stop — which while present in English is generally not even noticed as a separate "sound" — is a common element in many others).
