Tones mapping between Sino languages: Mandarin, Vietnamese and Cantonese
For the last few weeks I’ve been learning Chinese — that’s another interesting topic that I will definitely write about.
Chinese (or more accurately Mandarin), like my native language Vietnamese, is a tonal language. The difference is that Vietnamese has 6 tones (sắc, huyền, hỏi, ngã, nặng, không) while Chinese only has 5 tones (mā, má, mǎ, mà, ma — I’ll be using these names for the tones since the official names like rising, falling-rising, etc. do not do them justice, in my opinion). In Vietnamese there is a large amount of loan words with Chinese root; the relationship between them is akin to the relationship between English and Latin, only that historically it stretches much further back.
Naturally I have been curious about how words in a 5-tone language map to words in a 6-tone language. Over the weekend, inspired by this Reddit post, I wrote a simple Python program to find out the answer.
Taking cue from the same Reddit post, I made use of the Chinese dictionary from the Unihan website. At the time of writing, this dictionary includes:
- 4357 characters with both Mandarin and Vietnamese pronunciation
- 3941 characters with both Cantonese and Vietnamese pronunciation
- 21278 characters with both Mandarin and Cantonese pronunciation
This is the mapping from Vietnamese to Mandarin:
We can observe that:
- Vietnamese không mostly becomes Mandarin mā.
- Vietnamese sắc and nặng mostly become Mandarin mà.
- Vietnamese huyền mostly becomes Mandarin má.
- Vietnamese hỏi and ngã mostly become Mandarin mǎ.
This is the reverse mapping, from Mandarin to Vietnamese:
Similarly, we can observe that:
- Mandarin mā mostly become Vietnamese không.
- Mandarin má mostly become Vietnamese huyền.
- Mandarin mǎ mostly become Vietnamese hỏi, followed by sắc, then ngã.
- Mandarin mà mostly become Vietnamese sắc, followed by nặng.
Conclusion
With the exception of Mandarin mǎ to Vietnamese sắc, I think for language learners, we can remember this rule of thumb:
Vietnamese không is Mandarin mā.Vietnamese sắc and nặng are Mandarin mà.Vietnamese huyền are Mandarin má.Vietnamese hỏi and ngã are Mandarin mǎ.
The rest are considered exceptions
I also ran against two other pairs of language: Vietnamese-Cantonese, and Cantonese-Mandarin. Below are the full results:
You might have noticed that I’ve arranged the Vietnamese tones in an unusual order (the usual order is không, sắc, huyền, hỏi, ngã, nặng). This is to show the nice correlation with Cantonese. This correlation does suggest some closeness between the 2 languages (more accurately between words in Cantonese and Sino-Vietnamese). Historically, this seems to be the case as well, since both are more closely related to Middle Chinese than Mandarin is.
As one might have expected, the tones mapping between Cantonese and Mandarin looks quite similar to that between Vietnamese and Mandarin tones.
Other areas that I may explore more in the future:
- Mapping of initial consonants and vowels between the languages.
- Mapping of tones between Vietnamese and other Southern Chinese dialects like Hokkien, Teochew, etc.
The Jupyterlab Notebook for this project can be found at: https://github.com/ryanphung/sino-tones-pairing