Segmentation Rule

Dears,

 

I would like to ask the below questions.

  1. I have created TM with a segmentation rule full stop and translate the file against it.Can i export the TM as TMX and re-create new TM with the exported TMX but with a Paragraph mark segmentation rule ? if yes, How can I see the changes? as i can't see any change and the segments are still segmented with a full stop.
  2. Can we change the segmentation of a translated bilingual file from full stop to paragraph mark?

 

 

Best Regards,

Samar

Parents
  • Hi  

    A TMX file doesn't hold segmentation rules.

    If you want to use paragraph segmentation for all future files then creating a new TM with this option will work for all future files. But if you import your TMX into this new paragraph based TM the segments will still only be sentence based as these are already defined in the TMX. I'm not aware of any tools that can go from sentence to paragraph... only a few that go the other way around.  Part of the problem I guess is that a TM is not a true reflection of the original documents so making sure the paragraphs were really correct would be tricky if not impossible and technically the TMX puts all segments, whether sentence based or paragraph based into a single TU.  So there is nothing in the TMX to tell any tool whether the TUs were part of a larger entity or not.

    What may be useful if is that if you do this the fragment matching feature can pick out the TUs. So whilst you won't get proper pretranslation leverage at least you would still be able to leverage the work interactively:

    Once you have converted your bilingual file that's it.  You can't change the segmentation at this point, you need the source file for that.  Perhaps a potential solution would be to align the source and target files with a TM set up for paragraph based segmentation instead of trying to change the bilingual files... although I wouldn't hold my breath!

    If there is a solution for this out there I'd also be interested to learn.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thank you for your replay.

    I would like to ask you a question. While trying to create SDL Project with a new Paragraph based TM, The new created file is not segmented with Paragraph as per the below is a screenshots.

    I have expected that each highlighted paragraph will presented  in only one segment in studio but this didn't happen.and the text is also segmented by full stop.

     

     

    Best Regards,

    Samar

  • Unknown said:
    This is a bit of a tricky problem because only the source language will be segmented correctly.  There is no way to segment the target based on a target TM as you can only set a TM to act on the source.

    You can set segmentation rules for BOTH the source and target language in TM.

    And if I remember correctly, the last time I experimented with paragraph-based segmentation in alignment (autumn 2017), it worked pretty well.

  • Hi ,

    I'm waiting for your instructions... I really hope you're right as this would be very useful indeed. I have no idea how to do that for two reasons based on what I "thought I knew".

    - segmentation rules only apply to source on a TM
    - if you align an EN -> DE for example you select your own EN -> DE which segments the source file, but the DE file is segmented using a TM created by Studio in the background, DE to something.

    So very happy to be educated by you Evzen.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • It could be that the source format played a big role in my case... it was MadCap Flare XML/HTML, so the segmentation is pretty much defined by the file type, rather than the TM-defined rules.

    Of course I cannot be expected to know how Studio works internally... all I know is that I:

    - created new empty TM where I changed the segmentation to Paragraph based for both source- and target language

    - used this TM for running the alignment

    That's all. I don't (and can't) know exactly which "magic" (or coincidence) made it to align just as one would expect ;-). Perhaps is the internal "reversed" TM created by reversing the actual TM (similarly to what AnyTM does)? It would quite make sense...

    I didn't explore it any deeper as we ended up not going further with paragraph-based segmentation and went the harder way of sentence-based  segmentation.

  • Hi  

    I would never of thought of doing that as I usually select an existing TM and it's too late at this point.  But you are absolutely right... and I'm really happy to see this:

    Thank you for sharing this information.... something we should definitely document somewhere as I'm sure it will be useful to many users.  Or maybe I was the only one who didn't know this!!

    Thanks

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • I believe that not many users actually know this... because the TM creation/settings GUI hides the fact that there is more languages than just the source one from the user, and even discovering the dropdown content does not make immediately clear to the user what are the consequences of it.
    I'm not quite sure this is intentional... though one may presume that 'not making it too complex for user' might have been the driver, but in that case I would assume "synchronizing" some elementary settings (at least the segmentation type for sure) between the source and target language automatically in the background.
  • Dears,

    I have applied the below steps but the generated file is in sentence based and the paragraph is not reflected.

    1. Go to the Align Documents in the welcome menu
    2. Align Single File Pair.
    3. Create New file based Translation Memory.
    4. Write the TM Name and Select the Source and Target Language in the General Page.
    5. Select paragraph based segmentation in the Segmentation Rule and Click Finish.
    6. Select the Source and Target File and click finish.

    Please let me know the correct Steps to create Alignment Project with a paragraph segmentation in ( Source, Target  and TM).

     

    Best Regards,

    Samar

     

  • Hi ,

    I spent some time testing this tonight and I have to say I think we were lucky with the paragraph segmentation rule. Try it with any other kind of segmentation rules and the effect is mindblowing... at least I'm really struggling to see any logic in how this works. I certainly think my original assumption in every other case I tested this evening was correct. I don't think you can effect the target segmentation rules in the way described at all. I also looked at retrofit and this seems to do something else again.

    I can only conclude that whilst it seemed to make perfect sense, and I really wanted that to work, it does not. At least not in every case. I'm going to try and get to the bottom of this so I can understand what's going on.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Hi  

    Unknown said:
    • Write the TM Name and Select the Source and Target Language in the General Page.
    • Select paragraph based segmentation in the Segmentation Rule and Click Finish.

    Can I just clarify that this means the following:

    - select the source language in the general page

    - select paragraph segmentation rules

    - select the target language in the general page

    - select paragraph segmentation rules

    It's not clear foir me what you have done, but when this worked for me I selected each language one at a time and set paragraph segmentation for them both.  Then it worked.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Unknown said:
    Try it with any other kind of segmentation rules and the effect is mindblowing... at least I'm really struggling to see any logic in how this works.

    Hmmmm... just out of curiosity, you are testing it with the latest version, I suppose... the one with god-knows-how broken segmentation rules...
    Would you mind doing same tests with "last (kind-of)sensibly-behaving version", i.e. 2017 CU5 (last pre-SR1) and 2015 SR3 (vanilla, w/o CUs)?
    I feel that it may behave differently...

  • Hi Evzen,

    I have actually found a few more interesting things... albeit embarrassing!

    1. I was aligning a completely different target file (same name, different location) and didn't notice
    2. Studio 2017, current version, actually handles the custom rules exactly as you said and works as expected
    3. Studio 2015 completely ignores the use of custom rules so actually 2017 SR1 CU9 works correctly for me. It's an improvement.
    4. Retrofit alignment works differently and won't apply any custom rules

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

Reply Children
No Data