Cantonese Linguistics and NLP
This is another development release towards v3.1.0. Compared to v3.1.0.dev2, this dev release has more word segmentation issues fixed in order to improve part-of-speech tagging being developed.
Installing this version from the GitHub source requires Git LFS on your system, if it's not already installed.
Corresponding PyPI release: https://pypi.org/project/pycantonese/3.1.0.dev3/
This is a development release to tag some unreleased features, particularly a part-of-speech tagger under development. (Installing this version from the GitHub source likely requires Git LFS on your system.)
Corresponding PyPI release: https://pypi.org/project/pycantonese/3.1.0.dev2/
x2y
counterparts:
characters_to_jyutping
jyutping_to_tipa
jyutping_to_yale
jyutping_to_yale
: The default value of the keyword argument as_list
has
been changed from False
to True
, so that this function is now more in
line with the other "jyutping_to_X" functions for returning a list.characters_to_jyutping
: The returned value is now a list of segmented words,
where each is a 2-tuple of (Cantonese characters, Jyutping).
Previously, it was a list of Jyutping strings for the individual
Cantonese characters.x2y
functions have been deprecated in favor of their
counterparts named as x_to_y
.
characters2jyutping
jyutping2tipa
jyutping2yale
Major update: Shift to the CHAT transcription format for HKCanCor and custom corpus datasets.