Under Community Review

Change to default Japanese segmentation rules regarding double quotation marks after full stops

Currently, with the default segmentation rule for Japanese, any quotation mark (") after a full stop (such as 。) is included at the end of the previous segment, rather than at the beginng of the next segment.

However, this behavior is often inappropriate as in the case below:

Source

これは1つ目の例文です"2つ目の例文"

Current default segmentation results

これは1つ目の例文です。"
2つ目の例文"

Unlike European languages, it is impossible in Japanese to determine where to divide segments based on spaces, as no spaces are placed between words. So, it is inappropriate to always place single and double quotes (' and ") which are indistinguishable as to whether they are opening or closing quotation marks are placed after a full stop (such as 。) at the end of segment.

So, it would be better to change the default Japanese segmentation rule so that double quotation marks following a full stop are not included in the previous segment.

Parents
  • Incidentally, as far as we tested, adjusting the "Terminating punctuation (full stop, ...)" rule for Japanese in the Language Resources settings of the translation memory did not change this behavior.
    Please note that this issue is based on Japanese-specific linguistic characteristics, rather than a merely technical issue.

Comment
  • Incidentally, as far as we tested, adjusting the "Terminating punctuation (full stop, ...)" rule for Japanese in the Language Resources settings of the translation memory did not change this behavior.
    Please note that this issue is based on Japanese-specific linguistic characteristics, rather than a merely technical issue.

Children
No Data