This article will primarily focus on Standard Chinese (officially called "Mandarin Chinese"), the sole official language and the most widely-spoken language in China. This specification is used to differentiate it with the numerous "dialects"note of Chinese (e.g. Cantonese) and older Chinese languages (such as Old Chinese, the ancestor of all these "dialects"), which feature widely different pronunciations, vocabulary, phonology, lexicons, and even grammar. Information on other dialects can be found on Chinese Dialects and Accents. This specification also differentiates the information from Classical Chinese used in Imperial China writings, which uses a different kind of grammar.
The history of Chinese is a long one; 4000-year-old pictograms carved into tortoise shells have been been discovered throughout China. Chinese is generally divided into the Classical and the Modern period, with Classical being everything before the fall of the Qing Dynasty in 1912 and Modern being everything after. As has been noted before, the needs of administering such a large empire largely prevented the official Classical or Literary Chinese from mutating. Bureaucrats and poets self-consciously modeled their writing on the grammar and style of the Spring and Autumn period (771-476 BCE) and on the writings of Confucius in particular. New words and grammar are continuously introduced to the Literary language from the spoken dialects, but at a glacial pace (kind of like how modern Church Latin is different from the Latin of Caesar, but still looks Latin). But in the waning years of the Qing, reformists criticized the increasingly impenetrable Literary Chinese for creating a gulf between the largely illiterate masses and their literate overlords and held it up as one of the reasons of why China had failed miserably in modernization in face of colonialist encroachment. Reformist authors like Lu Xun advocated the use of spoken Chinese as the basis of written language and popularized the use of Baihua (literally, the plain tongue based on the Beijing dialect) through their novels and essays—many of these authors would go on to advocate for the simplification of the Chinese script and the use of Pinyin. In the modern era, use of Baihua has been encouraged by the Nationalist and Communist administrations. They have brought written Chinese even closer to the spoken variant.
Mandarin is generally considered a 'difficult' language for Westerners to learn. There are a number of factors behind this, but one of the primary reasons is probably the standard writing system, which has a large number of characters in comparison with most other languages. These are known as 汉字/漢字 hànzì, literally meaning "Han characters", or "characters of the Han people", the Han being the largest ethnic group within China (and, indeed, the world); these same characters remain at the core of the Japanese Writing System (where they are known as kanji), and once formed the principal writing systems for the Korean and Vietnamese languages (which pronounced the same word as hanja and hán tự). There are two main variants on the character set, the "simplified" set used in Mainland China, Malaysia and Singapore, and the "traditional" ones, used in Taiwan, Hong Kong, and Macau. (These differences are covered in more details later in the page; suffice it to say, choosing the appropriate set to use can become Serious Business.) On this page, characters which differ between the two are presented with the simplified version first, for example: 发/發 fā. (If there's only one character given, it can be assumed that it's the same in both cases.)
Most Chinese speakers use a logographic writing system (though writing the language using Latin alphabets has been proposed), using thousands of characters, each of which represents a syllable, although often two characters are pronounced the same but have different meanings. Though it is sometimes said the characters resemble the objects they are describing, only the most basic characters, such as the ones for 'sheep' (羊) or 'door' (Simplified: 门, Traditional: 門), bear even a vague resemblance to physical things.
The logographic writing system is ancient, dating back at least four thousand years in one form or another, making it the oldest writing system still known to be in current public use. It also masks the difference in phonology and pronunciation. This means that a highschooler should be able to understand a text written in about 150 B.C. with some extra training on recognizing ancient grammars and vocabularies (still mostly the same set of characters, with at least similar grammars and vocabularies, with pronunciations becoming unimportant). This goes back to about 600 B.C. This gave Chinese some special features that are still retained to the present day. The more ancient form of Chinese scripts originally used no word separators or punctuation at all. Modern Chinese still has no word separation, and punctuation can usually be ignored without hampering understanding despite resulting in a bad form. As one might expect, many, many styles of writing the characters have been developed over the years; the split between Traditional and Simplified Chinese is the most recent.
When written in vertical columns, Chinese is generally read from top to bottom, right to left. When written horizontally, as on a shop sign or a modern book print, it is generally read from left to right. Traditional-styled signs (on temples, for example) are read from right to left, which is actually a special case of right-to-left vertical column writing where each of the columns consist of a single character.
Despite the daunting number of characters, there is a certain twisted logic to their construction. Many of the more complex characters are composed of simpler characters and these often give a clue as to the meaning or pronunciation of the whole. Indeed, when describing a character that has several homophones, such as a surname, one often lists the component pieces if precision is necessary or the character is uncommon. Just to add to the confusion, some characters have multiple readings and the pronunciation 'hints' may not be valid in modern speech. Unlike with Japanese, Alternate Character Readings are rarely drawn on for puns.
Generally, upon meeting an unknown character, one looks at the uppermost or leftmost element for some clue to its meaning. There are a number of standard 'heads' and sides ("radicals") that indicate general categories like the 'Grass' radical for plants (other than trees) or the 'Walking' radical for movement. The words for 'to flee' and 'peach' are homophones. Both characters are composed of two components: 'Peach' (桃 'táo') has a 'wood' radical on the left, while 'flee' (逃 'táo') has the walking radical; both share a common element on the right (兆 zhào, giving some indication of the pronunciation).
In writing hanzi, the strokes that make up each character have a set order. In general, a character is built up from left to right, top to bottom, though certain radicals are always written after the the rest of the character is complete. This is most important in calligraphy and somewhat superseded with the introduction of pinyin-based typing.
Each character generally corresponds to a single sound or 'syllable' in spoken Chinese, which means that even a relatively short line of dialogue can span the entire screen when close captioned. While each character can have an intrinsic meaning, many 'words' are short phrases consisting of multiple characters, and similar phrases can have widely different meanings. For example, 火车/火車huǒchē (lit., "fire vehicle") means train while 救火车/救火車jiùhuǒchē (lit., "help fire vehicle") means fire trucknote . One particularly cute compound is the word for panda: 熊猫/熊貓 ''xióngmāo', which has the literal translation of "bear cat".note
Further difficulties with Chinese stem in part from the fact that Mandarin includes a number of phonemes (sounds, basically) not found in, say, English. For example, Standard Mandarin has two distinct sh sounds where English has only one. This can work the other way as well, creating that 'flied rice' accent. While some people may think that r and l might be allophones, such as in Japanese, it isn't quite as simple as that. Standard Mandarin does have twonote distinct sounds corresponding to the English r and l, although generally, the Mandarin r can be quite different to the English one, and has a 'buzzy' quality, sounding like something between a French r sound, and the s in measure, depending on the dialect; in Standard Mandarin, it is usually the voiced retroflex sibilant, which speakers of Polish should recognize as the consonant "rz". Most Mandarin speakers should be able to perceive l and r as distinct sounds, but they may have difficulty pronouncing them in consonant clusters, which are common in English but don't occur in Mandarin at all. So it's quite possible that a Mandarin speaker would struggle with saying words such as flight and fright distinctly, but not with lice vs ricenote
The consequences of this for romanizing Chinese are discussed in another article. The several different romanization systems created for Chinese over the years messes with the way people are supposed to learn to read the language quite a bit, an example of confusion would be that "kung-fu" and "gōngfu" are different romanizations of the same two-character word (功夫).
Further, Chinese is a tonal language. Mandarin Chinese uses four tones and a neutral tone. A student not accustomed to tonal speech can easily mishear what is intended or form strange malapropisms just by not paying attention to the tone. As an example, the words for 'mother' (妈/媽 mā), 'to scold' (骂/罵 mà), 'hemp' (麻 má), and 'horse' (马/馬 mǎ) are distinguished only by a change in tonenote (now there's an international incident just waiting to happen). And on top of it, normal sentence intonations still exist, twisting the original tones in subtle ways.
In addition to confusion between tones, Mandarin has true homophones, which actually do sound alike, including tone. In one slightly odd case, the words for "needle" (针/針), "gizzard" (胗), "rare" (珍), and "true" (真) are pronounced alike (zhēn). There are even multi-syllable true homophones; for example, qīngdàn can mean "light in flavor/faint" (清淡), or "h-bomb" (氢弹/氫彈) depending on the characters used to write it. With insufficient context or attention, even native speakers can sometimes mistake one another's meaning, but no more often than in any other language. Having an idea of which words and phrases are most commonly used by Mandarin speakers, and in which contexts, goes a long way toward getting a better understanding of what's being said (although again, this is probably true for other languages as well).
As the writing system lacks an alphabet, Chinese dictionaries tend to be organized by stroke number and the radicals mentioned earlier (as well as other commonly used elements), though the index may also use the sound, often in pinyinnote or bopomofo.note Ironically, this can make simple characters harder to look up. However, one could argue that with the advent of advanced, smartphone-backed Chinese dictionaries, such as Pleco, that enable the user to manually input (by means of hand-drawing) the character they want to look up, the traditional paper dictionaries are bound to be rendered useless in a short period of time. It should also be noted that the common view of Chinese as a language whose mastering is next to impossible has always been based not necessarily on the difficulty that accompanies understanding the meaning of so many characters per se but the time it takes to look them up; we can thus say that mastering Chinese in the 2010s is a lot easier a task than any time before.
Chinese is almost completely uninflected. There are no verb or noun endings to reflect tense, number or grammatical case. One exception is the 'word' 们/們 men which is attached to pronouns note to indicate a plural: (我 wǒ ('I') becomes 我们/我們 wǒmen ('we')). A verb's tense is indicated by context, usually by stating when it was done or will be done; this can be construed as Chinese having only three tenses; past, present, and future, with various offshoots. Aside from this idiosyncracy, word order is usually similar to English's subject-verb-object order. In fact, a sentence written in English and translated word-for-word into Mandarin might look a bit odd to a native speaker, but would probably be perfectly understandable. To give an idea of this, the sentence 我跟朋友走去公园/我跟朋友走去公園 wǒ gēn péngyǒu zǒu qù gōngyuán would translate word-for-word into the odd-sounding "I with friend walk go park" ("I'm walking to the park with my friend").
The third-person pronoun has separate written forms for male, female, and neuter which sound exactly alike when spoken. Before modern times there was actually only one written form of third-person pronoun 他 'tā'', different written forms were introduced from western languages. This can lead to Pronoun Trouble for native speakers of Chinese who learn new languages with gendered pronouns.
The 'classifier' or 'measure word' is yet another feature likely to give trouble to students of Chinese. These are a class of nouns which can have very general meanings, and in fact can in many cases simply be omitted when translating Chinese. Nevertheless, they are still an essential part of Mandarin grammar. Simply put, they indicate the class of objects to which a number refers. English does it on occasion; you say you have "four loaves of bread" instead of just "four breads." Well, in Chinese, you have to do that with everything, which is simultaneously more nitpicky and more precise. Four trees would be 四棵树/四棵樹 sì kē shù while four cars would be 四辆车/四輛車 sì liàng chē. 四 Sì means four, 树/樹 shù means tree, and 车/車 chē means car; 棵 kē and 辆/輛 liàng are the measure words. Using the wrong measure word for something can be a bit embarrassing ("I have four terabytes of bread"), especially if one uses one of the words for animals on people instead ("I have four flocks of priests"). Thankfully, the measure word 个/個 gè can stand in for nearly any other measure word in a pinch, functioning in many ways as a generic measure word (eg. 我有四个朋友/我有四個朋友 - "I have four [units of] friends").note
Interestingly enough, exclamation points and question marks can be included as words in the sentence. In ancient times even full stops had word representations. These are known as particles, and are typically added to the end of a sentence; they're a feature kept almost intact from ancient Chinese. The a (啊) (pronounced 'ah!') sound that Chinese people supposedly make expresses surprise, doubt, agreement, or affirmation depending on the tone used. ma (吗/嗎) (yes, another word to mix up with 'mother' and 'horse'note ) is used to express a question. There are many other useful particles, including 吧 ba, which is used to imply politeness when making suggestions. There is also 呢 ne, which roughly means "How about...", which is commonly used when responding to "你好吗/你好嗎? (nǐ hǎo ma)" ("How are you/are you well?") - "好,你呢? (hǎo, nǐ ne)" ("I'm fine, how about you?")
One of the most notable features of Chinese that sets it apart from Japanese (which it is usually grouped with, despite sharing no common linguistic origins) is its loanword-hostile character. While many of the words concerning modern technology and society do in fact have meanings similar to their western counterparts, they hardly ever sound anyway close. That is probably because unlike Japanese, Chinese has no secondary script used to writing this sort of words and relying solely on characters to convey the pronunciation might give an average user of the language a hard time figuring out whether what they read is actually supposed to mean something or it's just an English loanword. There are exceptions though, such as "麦克风" (màikèfēng, "microphone") or "模特" (mótè, "model"), and names of people or locations associated with areas Chinese people were rather unfamiliar with until the last few centuries are usually rendered more or less phonetically (again, with exceptions, for instance the Chinese name for Iceland is "冰岛"/Bīngdǎo, literally "ice island").
When transcribing English into Chinese, because each character has their own meanings and semantics, a poorly-thought transcription can result in a word that is phonetically similar to the original word but less-than-appropriate in meaning. (That's where Bite the Wax Tadpole came from) Conversely, a well-thought transcription can add a layer of meanings on top of being phonetically similar. This process is known as phono-semantic matching and is very commonly used in Chinese borrowings. Most transcriptions of proper nouns such as names use a set of informally-agreed-on characters that have limited semantic associations, and tends to follow the habits of Chinese pronunciations over their original pronunciations.
Of note is that pinyin is never used as a transcription tool, or even used in Chinese writings at all. It is a romanization tool (i.e. transitioning from Chinese to Latin script), and is never used unless trying to illustrate the sound of a Chinese character.
Since Chinese is so ineffective at transcribing, it is incredibly common for words to be calques, i.e. translated literally word-for-word. More interestingly, proper nouns are also known to be translated instead of transcribed, depending on the translator's stylistic choice. Locations in A Song of Ice and Fire are known to be translated, preserving some of the meaningful location names. Microsoft is known in China as 微软/微軟 wēi ruǎn, unlike almost every other country on the planet, the two characters literally meaning "Micro" and "Soft".
In spite of the tough relationship with western languages, Chinese has a much more interesting and close relationship with other East Asian languages which its linguistic ancestors had influenced, particularly Japanese. As Japanese is also a language that uses Chinese characters, it is possible to borrow a word made from hanzi/kanji more or less seamlessly from Japanese to Chinese and (with some difficulties) vice-versa. The transition was seamless to the point that many Chinese are unaware that a portion of modern Chinese vocabulary was coined or redefined in Japan, known as wasei-kango. Many more were loaned as internet slangs as Japanese pop culture, particularly those of Anime and Manga and the likes, entered China in The New '10s.
The way Chinese people read Japanese names is to use the Chinese readings for the kanji of their names, and thus a name like Tarou Tanaka (田中太郎) would become Tianzhong Tailang in Chinese (Asian naming order swap accounted). If a name isn't written in kanji (and instead with kana), then it would most likely be assigned one when transitioned, usually using the most common kanji word that the name corresponds to. This also applies to multi-ethnic names with Japanese in them. This naturally has a tendency to cause problems when people have no idea what Chinese character to use and thus use them inconsistently. Phonetical transcription is extremely rare, and usually requires a very good reason. Nico Yazawa from Love Live! notably has a transcribed last name in Chinese (the most commonly used version at least) due to how her catch-phrase "Nico-nico-nii" requires the knowledge of her name's phonetics, in addition to being semantically awkward in all of its possible kanji forms.
Seeing how few people in the West are familiar with Chinese characters to any degree, it would only figure that Western artists should not feel too much pressure over rendering them faithfully. Surprisingly, it's hardly ever the case and while at times (not as often as you would think though) you will get to run into sets of characters randomly pieced together without any care for grammatical (or even symiotic) accuracy, the characters themselves are almost always genuine. This may be because people of Chinese background form a significant portion of the Western entertainment industry (such as Hollywood or video game development companies) and so receiving linguistic consultancy in that particular field doesn't usually require much effort.
For some aid with language, the MediaGlyphs Project is a cross-cultural language that uses Chinese grammar and easily recognized pictures to aid in translation from one natural language to another.
This language provides examples of:
- Alternate Character Reading:
- Some characters have multiple meanings or sounds, used differently in different words. Sort of like the English stock or lead or maybe minute. Some of this can be attributed to differences with literary and colloquial readings of Chinese characters, where historically, the literary dialectical readings from the more prestigious regions are loaned to other areas. An important note is that most of them are nowhere near as extreme as the reading differences in Japan were. The dialects do share a common origin after all (unlike Japanese, which applied the local readings that are completely unrelated to China to the Chinese characters), and it is still possible to see some resemblances between the different readings.
- While dialects are mutually unintelligible, their writing systems mostly involve the same Chinese characters. Thus, it is possible for a character, or better yet, a name, to be pronounced differently in different regional dialects. Note that it is also possible for a character to be read in multiple ways within that dialect (again, in different contexts), usually due to the same reason of literary and colloquial differences as above.
- Hanyu Pinyinnote has shades of this too - it uses the Latin alphabet to represent the pronunciation of Chinese words. In most cases, this is fairly straight forward: for example ping and ban are pronounced pretty much as you'd expect. However, there are a few letters which are used to represent non-English sounds - sounds like ci or quan aren't pronounced at all like their spelling might suggestnote .
- Broken Base: The supporters of Traditional versus Simplified characters. The simplified supporters like the fact that they can write a paragraph in half the time and not have the characters turn into illegible inkblots when the font gets too small, while the traditionalists like the hints to meaning and pronunciation that the oldstyle characters contain and the link to history that it provides. Which one is easier to learn and remember is the subject of much debate—which will not be done here. Also, as the simplification scheme is promulgated by the mainland communist government, the people of Taiwan/overseas Chinese take slight offense due to political/ideological reasons instead of anything linguistic.
- Insistent Terminology: The official Chinese term for Traditional characters in Taiwan means "Standard/Orthodox characters", while elsewhere in the Sinophone world they are generally referred to as "Complex characters" (as the character used, 繁, carries the connotation of "frustratingly" complex). Some Simplified supporters insist on using the term "Complex" in English as well, considering the term "Traditional" a misnomer, as some characters had been made more elaborate over time and that many Simplified characters are based on traditionally used abbreviated forms.
- Common Tongue: As indicated by two of the names cited below, Mandarin serves this purpose in a linguistically diverse China.
- Four Is Death: Either the Old Chinese (the ancestor of all Chinese/Sinitic langauges, including Mandarin) or the Middle Chinese (which also descended from Old Chinese) language is the Trope Maker.
- Foreigners Write Backwards: As mentioned above, Chinese can be written vertically, with columns read top-to-bottom from right to left. Seals may even be read in a circle, as shown by the first image at the Other Wiki. The flexibility behind the writing direction is largely associated with the block-styled construction of individual characters.
- Fun with Homophones:
- Mandarin Chinese (as well as other Chinese/Sinitic languages) has a lot of homophones, though communication usually isn't a problem thanks to extensive use of contexts and the writing system. Case in point, the Cihai dictionary lists that the syllable of yì has 149 different characters associated with the sound, each with their own character and meaning. More than just that, most common Chinese puns would extend the range to the other tones, leading to even more possibilities, making it an absolute gold mine of puns.
- Lion-Eating Poet in the Stone Den (施氏食狮史/施氏食獅史, pinyin: Shī Shì shí shī shǐ) is a well-known 92 characters-long Classical Chinese poem by Yuen Ren Chao. The poem is best known for that fact that every single syllable in the poem uses the sound shi (with different tones) when read in Mandarin. The poem is an example of a one-syllable article poem that fully exploits the number of homophones in Chinese, creating a work that would be completely and utterly incomprehensible when transliterated, but understandable when written in its original form.
- I Have Many Names: The most common are Putonghua ("Common Speech") and Guoyu ("National Language"), used on the Mainland and Taiwan, respectively. In parts of the diaspora, Huayu ("Chinese Language", Hua being a name for Chinese culture) is common. Finally, the word "Mandarin" is a rendering of Guanhua, "the speech of officials" from a time when it was the language of government functionaries based around Beijing.
- Loads and Loads of Characters: The writing system has over 40,000 characters with a college grad knowing about 5,000. You only need 200 to 500 or so for a basic conversation or skimming a newspaper. As described above, there are rules for deducing the pronunciation of an unknown character, though they are not completely foolproof.
- In fact, there are characters that have no pronunciation, like 込 (used in Japanese).
- New Media Are Evil: Many old fogeys have claimed the advent of pinyin-based character inputting has led to a loss of literacy among the younger generation. Whether or not the use of this technology actually has any impact on literacy skills has not been proven.
- Pronoun Trouble: Primarily in translation since the (spoken) third-person is gender-neutral.
- In written Chinese "he" and "she" are fairly intuitive, with the left-side radical being "woman" for "she" and "person" for "he" while sharing the same root/base. However, the word for "it" does not resemble the others in any way— as it was originally the Chinese word for "others" (aside- in Traditional Chinese, even "you" is gender-specific).
- When the overzealous language reformers made up genitive third-person pronouns, they also made up a ta for animals; a ta for all inanimate it, and a ta for gods.
- On the other hand, when translating into Mandarin, you could say Pronoun Trouble is averted for many of the same reasons. If you know 4 syllables: 我 wǒ - I/me, 你 nǐ - you, 他 tā - he/him, 们 men (plural suffix for making we/us, (all of) you, and they/them), then you know every pronoun you commonly neednote . And if we throw in the possessive suffix 的 de, we can also express 'mine', 'ours', 'yours', 'his/hers/its' and 'theirs' as wellnote .
- Pun: Those four tones and the sheer number of true homophones make for loads and loads of these. There's an entire class of jokes called xiehouyu whose punchlines often rely on wordplay.
- The character for "spring" written upside-down is sometimes seen around the Chinese New Year because this was traditionally considered the start of spring. As it happens, the words for upside-down (倒) and "to arrive" (到) are homophones (dào). So "spring" upside-down = "Spring has arrived."
- A ridiculous number of Chinese superstitions are based on homophones. Some examples include: Four Is Death, as mentioned above; pears should never be served at a wedding because the Mandarin for pear (梨, lí) sounds the same as the word for separation (离, also part of the compound word meaning "divorce"); and fish is usually eaten for New Year's Eve dinner as the word for fish (鱼, yú) and the word for surplus (余), i.e. you ended the year with more than you started, are homophones.
- These are also exploited to avoid censorship: homophones and near-homophones are used to get around government filtering on certain character combinations. The government sometimes cottons on to particularly widespread workarounds, but all in all it's a game of cat and mouse in which the Chinese Internet is always two to three steps ahead of the censors.
- They Changed It, Now It Sucks!:
- Some supporters of the traditional characters consider Simplified Chinese to be an example, despite the fact that traditional characters are perfectly legible to those who have learned simplified (with a bit of practice) and vice versa.
- When it comes to the spoken language, linguists have noted that Mandarin Chinese has strayed quite a bit away from older forms of Chinese languages (such as Middle Chinese), resulting in Mandarin speakers being unable to fully appreciate older works of poetry and folk songs... while Han Chinese groups that speak other "dialects" often can, because these "dialects" are, in many cases, actually closer to older Chinese languages. There is no lack of Chinese people who would rant about this issue if given the chance to.
- And on the mainland, the second round of simplified characters was eventually withdrawn, after causing 9 years of widespread confusion and disagreement. (More info here.)
- What Could Have Been: Throughout the 20th century (especially the former half of it) many political and intellectual figures of China would bring up the idea that characters be abolished in favor of a syllabary, akin to the Japanese kana scripts, or the Western alphabet. Whereas most of them acknowledged that characters formed an important part of Chinese culture, they argued that the fact that it takes forever to learn them would mean that China was never to become a fully literate country. Some also claimed that characters were difficult and time-consuming to type on a typewriter (which, by the way, is actually possible) and thus present an obstacle in the way of the country's development. Mao Zedong himself was personally of the opinion that pinyin would eventually replace characters as the sole means of writing Chinese but he did not do much to actually make that happen. These days, however, nobody seems to talk about it anymore: China is almost as literate as the countries of the West (about 90% of population is able to read and write and illiteracy is basically unheard of among teenagers; it should be noted that almost all of the most illiterate countries in the world actually use the Western alphabet) and characters are easy to input into modern computers.