Extract hyperlink content from .docx in Studio 2021

Hi there,

I am processing docx files, exported from a confluence environment, and they contain different hyperlinks. Some are extracted correctly, making the address and the display text translatable, but others are not extracted correctly, and I could use some help in customizing the file type settings to make it work.

This is what I see for hyperlinks that are extracted correctly:

Screenshot of Trados Studio showing correctly extracted hyperlinks with full addresses and display text highlighted in purple.

And this is what I see for the hyperlinks that are not extracted correctly:

Screenshot of Trados Studio showing a hyperlink not extracted correctly, only displaying 'V-Vendor' without a full address.

The hyperlink is different in that it only references the "V-Vendor" and doesn't give a full address. When we import this back into the Confluence Wiki, that works, we just need to be able to change that "V-Vendor" to match the first letter of the translation of 'vendor', which will obviously not start with a 'V' in all languages.

I have looked at the file type Embedded Content settings for docx, and this is what I have come up with so far, but the manual really doesn't help in getting the right stuff in here, so I was wondering if someone has any idea, based on the above? It would be nice it only those 'peculiar' links would be processed and externtalized, the normal ones work well and it would be a shame if we would get all hyperlink tags as normal text instead of tags.

Screenshot of Trados Studio file type settings for docx Embedded Content with fields for Tag Type, Regular Expression, Start Tag, and End Tag.

Thanks so much!

Janneke



Generated Image Alt-Text
[edited by: Trados AI at 5:37 AM (GMT 0) on 29 Feb 2024]
emoji
Parents
  • I've never seen a Word file containing links like these before.  Can you share a small sample so we can investigate it?  I see your problem and imagine it's because the "Vendor" link are not handled as links at all so we may not be able to do anything with this in the software right now.  But we can at least discuss with the filetype development team.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji
  • Thanks for your reply, . I copied a small sample below, because I don't think I can attach a .docx here. If there is a better way to share, let me know.

    It is a from a word export from a Confluence Wiki glossary, and I can imagine the markup is not very common. We had problems processing the glossary as XML, also because of link issues, so we figured we'd try to use the Word export option and see if it would work better. I just figured out the issue only occurs in the words starting with the letter of the page they are on (in this case the 'V' page of the glossary), so I think the impact is limited. I must have to do with the way the Wiki writers use a different link functionality when they link to the page you are already on.

    variable nucleotide tandem repeat (VNTR) 

    Markers similar to a short tandem repeat (STR).

    variance 

    A nonsignificant difference in result values from the perspective of quality assurance (QA) peer review.

    VAT exempt 

    Goods and services that are exempt from VAT, meaning that you cannot reclaim any VAT on your business purchases or expenses.

    emoji
Reply
  • Thanks for your reply, . I copied a small sample below, because I don't think I can attach a .docx here. If there is a better way to share, let me know.

    It is a from a word export from a Confluence Wiki glossary, and I can imagine the markup is not very common. We had problems processing the glossary as XML, also because of link issues, so we figured we'd try to use the Word export option and see if it would work better. I just figured out the issue only occurs in the words starting with the letter of the page they are on (in this case the 'V' page of the glossary), so I think the impact is limited. I must have to do with the way the Wiki writers use a different link functionality when they link to the page you are already on.

    variable nucleotide tandem repeat (VNTR) 

    Markers similar to a short tandem repeat (STR).

    variance 

    A nonsignificant difference in result values from the perspective of quality assurance (QA) peer review.

    VAT exempt 

    Goods and services that are exempt from VAT, meaning that you cannot reclaim any VAT on your business purchases or expenses.

    emoji
Children