Photo credit: Salty Justice

Tones mapping between Sino languages: Mandarin, Vietnamese and Cantonese

Ryan Phung
4 min readMay 23, 2021

For the last few weeks I’ve been learning Chinese — that’s another interesting topic that I will definitely write about.

Chinese (or more accurately Mandarin), like my native language Vietnamese, is a tonal language. The difference is that Vietnamese has 6 tones (sắc, huyền, hỏi, ngã, nặng, không) while Chinese only has 5 tones (, , , , ma — I’ll be using these names for the tones since the official names like rising, falling-rising, etc. do not do them justice, in my opinion). In Vietnamese there is a large amount of loan words with Chinese root; the relationship between them is akin to the relationship between English and Latin, only that historically it stretches much further back.

Naturally I have been curious about how words in a 5-tone language map to words in a 6-tone language. Over the weekend, inspired by this Reddit post, I wrote a simple Python program to find out the answer.

Taking cue from the same Reddit post, I made use of the Chinese dictionary from the Unihan website. At the time of writing, this dictionary includes:

  • 4357 characters with both Mandarin and Vietnamese pronunciation
  • 3941 characters with both Cantonese and Vietnamese pronunciation
  • 21278 characters with both Mandarin and Cantonese pronunciation

This is the mapping from Vietnamese to Mandarin:

Tones mapping from Vietnamese to Mandarin

We can observe that:

  • Vietnamese không mostly becomes Mandarin .
  • Vietnamese sắc and nặng mostly become Mandarin .
  • Vietnamese huyền mostly becomes Mandarin .
  • Vietnamese hỏi and ngã mostly become Mandarin .

This is the reverse mapping, from Mandarin to Vietnamese:

Tones mapping from Mandarin to Vietnamese

Similarly, we can observe that:

  • Mandarin mostly become Vietnamese không.
  • Mandarin mostly become Vietnamese huyền.
  • Mandarin mostly become Vietnamese hỏi, followed by sắc, then ngã.
  • Mandarin mostly become Vietnamese sắc, followed by nặng.

Conclusion

With the exception of Mandarin to Vietnamese sắc, I think for language learners, we can remember this rule of thumb:

Vietnamese không is Mandarin .Vietnamese sắc and nặng are Mandarin .Vietnamese huyền are Mandarin .Vietnamese hỏi and ngã are Mandarin .

The rest are considered exceptions

I also ran against two other pairs of language: Vietnamese-Cantonese, and Cantonese-Mandarin. Below are the full results:

Tones mapping between Vietnamese and Cantonese
Tones mapping between Cantonese and Mandarin
Tones mapping between Vietnamese and Mandarin

You might have noticed that I’ve arranged the Vietnamese tones in an unusual order (the usual order is không, sắc, huyền, hỏi, ngã, nặng). This is to show the nice correlation with Cantonese. This correlation does suggest some closeness between the 2 languages (more accurately between words in Cantonese and Sino-Vietnamese). Historically, this seems to be the case as well, since both are more closely related to Middle Chinese than Mandarin is.

As one might have expected, the tones mapping between Cantonese and Mandarin looks quite similar to that between Vietnamese and Mandarin tones.

Other areas that I may explore more in the future:

  • Mapping of initial consonants and vowels between the languages.
  • Mapping of tones between Vietnamese and other Southern Chinese dialects like Hokkien, Teochew, etc.

The Jupyterlab Notebook for this project can be found at: https://github.com/ryanphung/sino-tones-pairing

--

--