How do I set up the Document structure for an .md file?

I actually posted on another thread, but I figured maybe I'd get more help if I started my own post. I'm trying to set up a new file type for an .md file. I'm a veteran Trados user, but new to file types and regex, so forgive my extremely basic question, but how do I configure the Document structure so that Trados knows what is translatable text? I've read Paul's post about the inline tags and I think I might be able to figure those out (I'm sure I'll be back if I can't), but I can't even get Trados to display any text at all if I attempt to process the file. 

Parents Reply Children
  • Thank you so much! If it's possible, it would be the easiest solution here. I think that's really the biggest thing that needs to be hidden from these files. Everything else is just a matter of tags and placeholders that follow pretty straightforward regex rules.

    Btw, do you think SDL is eventually going to incorporate this file type into its default selections? I spoke to a programmer involved in this project yesterday and he indicated that *.md files are becoming more popular in web development and that we are likely to see more of these in the future.
  • Hi Paul,

    Just wanted to follow up and see if you managed to get any info regarding whether it's possible to rearrange the document structure so that I can omit the ``` code tags and everything in between from my *.md documents? I'd like to know either way, whether it can be done or if it's impossible, because if there's no way around it, I may have to go back and talk to the programmers who designed the original files to see if I can work with them in any way to make this easier to feed into Trados. Thanks again for all your help so far!

    Beatriz
  • Hi Beatriz,

    Sorry for the late reply. I have not had a response from development on this yet which tells me it's not so easy! It may be that the best approach is to use a developer to create a custom filetype specifically for your needs. If you expect to have a lot of these files I think that's worth investigating and maybe not too hard using the API.

    Alternatively, your idea to see if the files themselves can be simplified as you are generating them is probably a very good one!

    Either way, if I get a response I will share it in here.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul,

    Thanks so much for following up. I've spent the last few days playing around with different regex for the document structure, with help from some colleagues who are better versed in regex than I am, but this seems to go beyond our scope. Because the text between the code breaks does not always follow the same pattern, and in fact sometimes follows a pattern that looks just like translatable text, it's extremely challenging to write regex rules that will gather them all. I've been able to process the document in a way that will filter out the first instance of the ``` code breaks, but I can't figure out how to get Trados to detect them as a repeating pattern. And I also noticed that when I use the "Multiline" feature in order to get it to pick up line breaks, it messes with the entire document's segmentation, something we don't want. I do appreciate your follow up and indeed, if you do get a response (whether positive or negative), please let me know.

    Regarding your information about using a developer to create a custom file type... do you mean a developer from SDL? If so, how do we make such a request? I do suspect we will see more of these file types in the future, as Markdown documents are becoming more popular in web development, according to some conversations I've had with a few different programmers.

    Thanks,
    Beatriz
  • Unknown said:
    Regarding your information about using a developer to create a custom file type... do you mean a developer from SDL? If so, how do we make such a request? I do suspect we will see more of these file types in the future, as Markdown documents are becoming more popular in web development, according to some conversations I've had with a few different programmers.

    Hi Beatriz,

    I meant your developer, not one of ours.  I could also help you find a developer to work on this if you like, one who has experience with creating filetypes?  Alternatively you could use SDL via our Professional Services as they could also help you with this.  So three options I guess.

    I think Markdown documents in themselves are ambiguous... in fact I believe there are at least nine different "flavours" already and probably a host of user defined variants.  So creating a "standard" would be tricky.  I think your guys seem to be following the original syntax rules  but even here handling the code blocks with hard returns on every line is tricky and this is where a developer could do a better job.

    Regards

    Paul

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Paul,

    Got it. Makes sense, thanks! I will go back to our developers with this information and see if they could help, then.

    We had come up with one single regex that actually captures the text we want and does not capture what we don't want: (?s)((.+?)(?:```.*?```)+?)+

    Unfortunately, it doesn't work because we need opening and closing patterns, and then Trados marks as translatable everything in between those patterns. I managed to write a pair that takes the beginning and takes out everything in between codeblocks and marks as translatable everything in between, which is perfect... except that it also leaves out the rest of the document that is translatable. All in all, I keep thinking there is a way to write several different rules that might work, but it's beyond my grasp. If we end up stumbling into it, I'll post it here for future reference.

    But I suspect I may have to follow your advice and talk to the developers instead to find a way around it.

    I do appreciate all your help! This has been quite the learning experience. Have a great day!

    Beatriz
  • Hi again, Paul!

    So, I actually found a partial solution to this issue. In the Document structure, I wrote the following regex:

    Opening pattern: ^
    Closing pattern: (?s)(```.*?```)

    This works to filter out the codebreaks, but only as long as the document ENDS in a code break. If this is not the case, then it stops after the last codebreak and does not start again where it left off. Does this make sense? I think I might be able to make this rule work if I can alter the closing pattern regex to take this into account OR if it's possible to write another rule that will account for this. So far, I have not had any luck, but I will continue to investigate.

    If this is successful, there is another issue that comes up. It's relatively minor, but I was wondering if you knew a way around it. Because I have to set these rules as Multiline in order to account for line breaks, it seems to mess up my usual segmentation rules of cutting the segments by line breaks. I can have the line breaks show up as tags, which is one solution, but I'd much rather segment them instead. Is there any way to do this that you know of?

    I attempted to add an Inline tag rule that catches line breaks (\n) and then set the Advanced rules to mark them as "Is Word Stop", but the pre-translation failed, which means the rule didn't work. I also tried \r\n (for CR LF, which is how the breaks appear in Notepad++) with the same result. Do you know if this is possible to do, or if there is simply no hope of segmenting by line break if I use Multiline?

    Thanks again.
    Beatriz