Paragraph Segmentation for TM via API

Does anyone know how to implement paragraph segmentation (as opposed to sentence based segmentation)?

I have tried accessing the SegmentationRules through the LanguageResourceBundle from the LanguageResourceBundleCollection of the translation memory. I've also looked through the SegmentationRules class in the API. However, it all seems to be focused on sentence based segmentation? I've tried using System.Reflection to see the segmentation rules for a TM that's already set up for paragraph based segmentation, however, it just returned the standard/old sentence based segmentation rules.

Thanks in advance.

emoji
Parents
  • Hi  , paragraph segmentation is performed by the File Type that is associated with the native file. Sentence segmentation is performed, given the rules defined in the tm as you noted above.

    Typically if you don't run a pre-translation automated task during project creation, then the bilingual files will not be (sentence) segmented. You can confirm this by opening the bilingual SDLXLIFF files in an editor (e.g. not the studio editor)  However, if you attempt to open a non-segmented bilingual SDLXLIFF file in the editor, (sentence) segmentation will always occur.  The studio editor will use the rules defined in the TM of the project, or use the default rules for that language if no project TM loaded.

    Can you give me an example of what you are trying to achieve. Are you simply trying to load content in the studio editor without sentence segmentation applied on the paragraphs or are you interested in creating a new File Type to parse a native file in a particular way?

    emoji
Reply
  • Hi  , paragraph segmentation is performed by the File Type that is associated with the native file. Sentence segmentation is performed, given the rules defined in the tm as you noted above.

    Typically if you don't run a pre-translation automated task during project creation, then the bilingual files will not be (sentence) segmented. You can confirm this by opening the bilingual SDLXLIFF files in an editor (e.g. not the studio editor)  However, if you attempt to open a non-segmented bilingual SDLXLIFF file in the editor, (sentence) segmentation will always occur.  The studio editor will use the rules defined in the TM of the project, or use the default rules for that language if no project TM loaded.

    Can you give me an example of what you are trying to achieve. Are you simply trying to load content in the studio editor without sentence segmentation applied on the paragraphs or are you interested in creating a new File Type to parse a native file in a particular way?

    emoji
Children