How to exclude placeholders enclosed in single or double curly brackets in HTML files?

I'm trying to define text enclosed in single or double curly brackets in an HTML file as placeholders. For example.

<p>Do not translate this {{ variable }} and don't translate that { variable } either.</p>


The following expression works fine Notepad++, but it doesn't work in SDL Studio.

\{+[^}]+\}+


1. Why doesn't the expression work?

1. What regex flavor(s) does SDL Studio support?

Parents Reply
  • The file is apparently a Shopify Liquid template file. Here's an example taken from the official website:

    https://shopify.github.io/liquid-code-examples/example/call-to-action

    {%- if cart.item_count > 0 -%}
    
    <form action="/cart" method="post">
    
      {%- for item in cart.items -%}
        <a href="{{ item.url | within: collections.all }}">
          <img src="{{ item | img_url: '200x200' }}" alt="{{ item.image.alt | escape }}">
          {{ item.product.title }}
        </a>
    
        {%- unless item.variant.title contains 'Default' -%}
          <p>{{ item.variant.title }}</p>
        {%- endunless -%}
    
        {%- assign property_size = item.properties | size -%}
        {%- if property_size > 0 -%}
          <ul>
    
            {%- for p in item.properties -%}
              {%- assign first_character_in_key = p.first | truncate: 1, '' -%}
              {%- unless p.last == blank or first_character_in_key == '_' -%}
                <li>
                  {{ p.first }}:
    
                  {%- if p.last contains '/uploads/' -%}
                    <a href="{{ p.last }}">{{ p.last | split: '/' | last }}</a>
                  {%- else -%}
                    {{ p.last }}
                  {%- endif -%}
    
                </li>
              {%- endunless -%}
            {%- endfor -%}
    
          </ul>
        {%- endif -%}
    
        <p>
          <a aria-label="Remove {{ item.variant.title }}" href="/cart/change?line={{ forloop.index }}&amp;quantity=0">Remove</a>
        </p>
      {%- endfor -%}
    
      <input type="submit" name="checkout" value="Checkout">
    </form>
    
    {%- else -%}
      <p>The cart is empty. <a href="/collections/all">Continue shopping</a></p>
    {%- endif -%}


Children
  • The same approach is required, but you need to know what the parent element would be for the script.  You don't show this in your example.  But this could be problematic... for example.  Assume I have this:

    <!DOCTYPE html>
    <html>
    <body>
    
    <h1>Testing {variables} in html</h1>
    
    <div>
      <h2>{{ section.settings.text-box }}</h2>
    
      <a href="{{ section.settings.link }}">
        {{ section.settings.linktext }}
      </a>
    </div>
    
    {% schema %}
    {
      "name": "Call to action",
      "settings": [
        {
          "id": "text-box",
          "type": "text",
          "label": "Heading",
          "default": "Title"
        },
        {
          "id": "link",
          "type": "url",
          "label": "Link URL"
        },
        {
          "id": "linktext",
          "type": "text",
          "label": "Link text",
          "default": "Click here"
        }
      ]
      ,
      "presets": [
        {
          "name": "Call to Action",
          "category": "Promotional"
        }
      ]
    }
    {% endschema %}
    
    <p>Do not translate this {{ variable }} and don't translate that { variable } either.</p>
    
    <time>Testing more  {{variables}} inside {variable} html.</time>
    
    </body>
    </html>
    
    
    
    

    The shopify json (I know you don't have this...) is under the body element in my made up example.  So I can create an embedded parser to handle it and do this:

    Trados Studio parser rules settings showing 'body' as Parser Rule Name and Condition Path with 'Shopify_JSON' as Embedded Processor ID.

    This will get me this:

    Trados Studio preview window displaying extracted text for translation from Shopify HTML with fields like 'Call to action', 'text-box', 'link', and 'linktext'.

    However, since this is at the body level nothing else is parsed... and this is clearly not helpful.  I can't see how to get at only this script in this location without the ability to be more specific with the parser rule condition path.  Maybe it's just my lack of knowledge so I will investigate this more... but so far it's a problem.

    If your actual file has the variables at the body level of the html then you'll have the same problem.  But what you could do is this:

    1. translate the parts you need in the script.

    2. Save the target file

    3. remove the rule and translate the target file in the normal way with just the html file

    So two passes, but it would work.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 4:26 AM (GMT 0) on 5 Mar 2024]
  • Thanks for your reply, but the method that you suggested is rather cumbersome.
    Is there really no way to tell SDL Studio to treat all text enclosed in {} or {{}} in HTML files as placeholders no matter where the text occurs in the HTML file?

  • As I just explained there is.  But as you have still not provided me with a file that shows clearly where that script sits I actually don't know whether it's going to  work or not.  It cannot possibly be outside of all elements... so which element is it in?

    I created an example as I showed you above where I placed it in the body element (because you have not told me where it should go) and in there it seems to have the effect of overruling all other rules.  But if you comeback and tell me that actually it's not in there then maybe there is some way of tackling this.

    But to answer this:

    Is there really no way to tell SDL Studio to treat all text enclosed in {} or {{}} in HTML files as placeholders no matter where the text occurs in the HTML file?

    Of course there is.  You can create your own regex based filetype that just treats the entire file as text.  Now you can do whatever you like.

    But if you want to use html as the basis of this then you need to follow some rules and we can only validate those rules if you actually give us a real file that is complete instead of these partial snippets that force us to keep guessing.

    but the method that you suggested is rather cumbersome.

    Perhaps... but maybe still preferable to the alternatives.

    You could also hire a developer and create your own filetype specifically for handling Shopify files.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • Thanks for looking into this!

    It looks like I'll have to create a regex based file type, which is a bummer, because the file is more than 70% HTML.

    FYI, the code in curly braces occurred before and after the starting <body> tag as well as after the closing </body> tag. I.e., it'd be impossible to define tag based rules.

    That's why I was asking whether there's a possibility to globally mark this tags as placeholders.

  • ok

    It is difficult because the html parser is based on looking at files that are ordered with text always coming inside the elements.

    The embedded content parser looks at what's inside the elements and treats them as text which is what allows you to handle your variables.

    If you now inject script outside of these elements then you can only define an embedded processor to handle the content between the highest parent element... in your case the <html> element by the sounds f it.  So in effect you are now saying treat the entire file as text and you'll define the rules for everything.

    This will be quite tricky but if you're happy to handle everything as placeables it's easy enough:

    Screenshot of HTML code with variables and conditional statements in Trados Studio.

    Trados Studio inline tags configuration window showing rules for converting email addresses into inline tags.

    Gets you this:

    Preview of translated text in Trados Studio with variables and conditional text highlighted.

    It's not too bad because structural elements will be moved out and you won't make a mistake.  Inline tags, which I don't have in this example will be trickier since they won't be handled as opening and closing pairs and the translator will have to be very diligent to make sure they are placed in the right order.

    Frankly, I think that if this type of file is something we're going to be seeing more and more of then it would be worth developing a filetype to handle this sort of embedded content in a different way.  Perhaps you should raise this here and see if anyone else thinks it's a good idea:

    http://ideas.sdl.com

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji


    Generated Image Alt-Text
    [edited by: Trados AI at 4:26 AM (GMT 0) on 5 Mar 2024]
  • Thanks for all your help! I really appreciate it.
    If we get these files more often, I'll definitely raise this at http://ideas.sdl.com.

  • Although only indirectly related to this question, it would be a good "idea" to extend file tagging to file types other than TXT. I will present the request in the specified link.  

  • When you do that can you elaborate a little on why you want to do this and for which filetypes?

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

  • I often need to translate software manuals that contain many "tag" names (usually written in CamelCase or some variant). It would desirable to convert these to tags and so prevent them from being translated, and more importantly, exclude them from the Spell Checker (I have had cases of hundreds of false positives). Principally MS Word and Excel.

  • Principally MS Word and Excel.

    Excel and Word already support this within the filetype.  Why can't you use it in the way it is currently implemented?  Just asking because your idea needs to be clear on this point.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub