Help with a multiple text replacement script

Even though this forum so far has been for sharing scripts and not for writing them, I was wondering if I could pick the brains of AHK experts ( and  come to mind right now), to figure something out.

I'm trying to put together a script to do multiple text replacements at a segment level, i.e., for the active segment only, so I can easily see what has been changed, without calling up the Find & Replace window.

I've managed to put together this script (silly examples included):

#r::
ClipSaved := ClipboardAll
Clipboard =
SendInput, ^a^c
ClipWait, 30
FixString := Clipboard
vList := " ;continuation section
(
dog perro

house casa
¿ ¿
/ /
, ,
? ?
. .
pie 2 pie2
m 2 m2
)"
Loop, Parse, vList, `n
{
oTemp := StrSplit(A_LoopField, "`t")
FixString := StrReplace(FixString, oTemp.1, oTemp.2)
}
oTemp := ""
Clipboard := FixString ; load the new string to clipboard
Sleep 200
Send ^v
Return

This works fine in segments with no tags, but when there are tags, they get stripped at some point during the replacement operation and the text that is pasted back into the segment has all the necessary replacements but no tags. Is there any way of preserving the tags in the clipboard?

I came up with a very clumsy workaround for this, which involves using Studio's Delete to Next Tag shortcut, so instead of Select All-Copy, the script would do Delete to Next Tag-Undo-Copy:

#r::
ClipSaved := ClipboardAll
Clipboard =
;SendInput, ^a^c
;ClipWait, 30
Send ^+D ;delete to next tag
Sleep 100
Send ^z ;undo
Sleep 50
Send ^c
ClipWait, 30
FixString := Clipboard
vList := " ;continuation section
(
organisation organization
¿ ¿
/ /
, ,
? ?
. .
pie 2 pie2
m 2 m2
)"
Loop, Parse, vList, `n
{
oTemp := StrSplit(A_LoopField, "`t")
FixString := StrReplace(FixString, oTemp.1, oTemp.2)
}
oTemp := ""
Clipboard := FixString ; load the new string to clipboard
Sleep 200
Send ^v
Return

While this also works in segments with no tags, I would like to optimize it.

My second question is: how would I go about creating a list of all these replacements (CSV? Excel?) and getting the script to take them from there instead of having to add them manually to the script? I've been reading up on arrays but I'm still far away from being able to implement what I need.

I have another simpler script attempt with just multiple StringReplace lines (see below), but again, that would require creating possibly hundreds of replacement lines and I imagine it's not the best solution.

#p:: 
Send, ^a
Send, ^c
StringReplace, clipboard, clipboard, dog, perro, All
StringReplace, clipboard, clipboard, cat, gato, All
StringReplace, clipboard, clipboard, raining, lloviendo, All
Send ^v
Return

So, any help with this would be greatly appreciated.

Thank you!

Parents
  • Hi Nora,

    I'll start with some suggestions to your second question about how to better implement the search and replacement items:

    Separating search and replacement parts by a tab is smart, but you don't need to put these into your script. It is way easier to maintain a simple text file where you add them and then read that text file into a variable like this:

    FileRead, vList, vListFile.txt

    In order to run your search and replace, just use a simple loop:

    Send, ^a
    Send, ^c

    Loop, Parse, vList, `r, `n
    {
        oTemp := StrSplit(A_LoopField, "`t")
        FixString := StrReplace(FixString, oTemp.1, oTemp.2)
    }

    SendInput, %FixString%

    As for the tag question, I need to look a bit deeper into this, but I fear it won't be easy without (again) the help of Studio APIs.

    Kind regards,
    Raphaël

  • Hi Raphaël,

    So, a couple of questions:

    - Where should I store the text file?, i.e, how will AHK know where to look for the file? And what happens if I move it to a different location later on?

    - Should the format of the contents in the file be:
    Old string TAB New string
    or should I use a different separator and not a tab?

    Thanks!
  • 1) If you don't provide the complete path of the file in the script, but only its name, then it needs to be in the same folder as the script, otherwise wherever you want:

    • FileRead, vList, vListFile.txt → the file "vListFile.txt" must be in the same folder as the script
    • FileRead, vList, C:\Users\ndia\Documents\vListFile.txt → the file is stored in the folder "My Documents"

    If you move the file later on, then you have to update the file location accordingly in your script.

    2) That is up to you too, since you define the separator yourself in the script:

    • StrSplit(A_LoopField, A_Tab) → a tab is used, but you could replace that with a pipe character or whatever you like, both here and in the file

     

    Edit: typos

  • Thank you Raphaël, a couple more questions:

    - Should SendInput, %FixString% paste the modified contents of the clipboard back to the segment or do I need to add a Ctrl+V there at the end?

    It seems like if it's by itself, SendInput, %FixString% won't paste the contents back, but if I add a Ctrl+V, both the modified contents and the original segment are pasted in, not sure what I need to add or where to make it work reliably.

    - Should the text file have a special encoding to allow Spanish characters such as ñ and á? They're not being passed through correctly.

    - If I want to include characters such as question marks, commas, periods, etc. in the text file, do they need to be escaped somehow or can they just be added literally?
  • Hi Nora,

    About the special characters ñ and á, I had a similar problem when I processed strings in a Dragon command.

    The problem occurred because the string was being converted from Unicode to ASCII.

    I was using the Windows registry to pass strings between Dragon commands. Unfortunately strings in the Windows registry are ASCII only, so when another command retrieved the string the special characters had been converted to regular characters without the accents.

    I had to write a more complicated interface that converted each Unicode character in the string into three permissible "ASCII characters" (excluding ASCII zero, which is used as the terminating byte for the registry string) and then reversed the conversion when the string was retrieved.

    Maybe a similar Unicode to ASCII conversion is happening somewhere in your situation.

    Best regards,
    Bruce Campbell
    ASAP Language Services

  • I guess that's possible. I also had something similar in KnowBrainer, but my workaround was to write all my commands involving accented and special characters directly in Dragon (no string manipulation, though, just simple commands like copy-paste).
  • Hi Nora,

    I just noticed that you are using a text file. (Sorry, I am not following all the details of what you are doing...)

    Are you saving strings to a text file and then retrieving them?

    If so, maybe check the encoding of the text file.

    Check by opening the file, and then opening the "Save As" dialog (File->Save As).

    At the bottom of the dialog box you will see a drop-down list for "Encoding:"

    I think "ANSI" is the default. Try changing it to "Unicode" and see whether this helps.

    Best regards,
    Bruce Campbell
    ASAP Language Services

  • Hi Bruce,

    That was it! I had created the file in Notepad++ without making any changes to the encoding. I've now saved it as Unicode (was UTF-8 originally) and it's working fine. Thank you!
  • Good morning, Nora!

    Here we go:

    1. When you use SendInput, %FixString%, AHK sends the content of the variable directly to the currently active control in the active window, so there is no need to first copy the content of the variable to the clipboard and then the clipboard content to Studio by sending [Ctrl]+[ V ].
      It is even possible to send the content of a variable directly to a control that is in a non-active window via the command ControlSend, but in the case of Studio, this wouldn't work too well because of the changing control names.
      In principle, SendInput, %FixString% should paste the corrected string back into the target segment, overwriting the existing target since it would still be selected by the initial Send, ^a command. If you add a Send, ^v after that, it is normal that the original unmodified target gets also pasted since the clipboard still contains only the unmodified target from the command Send, ^c.
    2. As already pointed out by Jesus, if you need to support special characters like accents, diacritic marks or other alphabets like Cyrillic or Greek, both the AHK script and any external file need to be in Unicode (UTF-8), otherwise you risk encountering corrupted characters. Advanced text editors like Notepad++ usually use UTF-8 as the default file encoding, but the standard Windows Notepad uses ANSI.
    3. Since we are not talking about a CSV file, the only character that would need to be escaped or that might cause trouble would be the tab character itself since it is used as a delimiter. All other characters should be handled correctly when added literally.

    Don't hesitate to get back to me if anything is still unclear ;-)

     

    Have a great day!

    Raphaël

Reply
  • Good morning, Nora!

    Here we go:

    1. When you use SendInput, %FixString%, AHK sends the content of the variable directly to the currently active control in the active window, so there is no need to first copy the content of the variable to the clipboard and then the clipboard content to Studio by sending [Ctrl]+[ V ].
      It is even possible to send the content of a variable directly to a control that is in a non-active window via the command ControlSend, but in the case of Studio, this wouldn't work too well because of the changing control names.
      In principle, SendInput, %FixString% should paste the corrected string back into the target segment, overwriting the existing target since it would still be selected by the initial Send, ^a command. If you add a Send, ^v after that, it is normal that the original unmodified target gets also pasted since the clipboard still contains only the unmodified target from the command Send, ^c.
    2. As already pointed out by Jesus, if you need to support special characters like accents, diacritic marks or other alphabets like Cyrillic or Greek, both the AHK script and any external file need to be in Unicode (UTF-8), otherwise you risk encountering corrupted characters. Advanced text editors like Notepad++ usually use UTF-8 as the default file encoding, but the standard Windows Notepad uses ANSI.
    3. Since we are not talking about a CSV file, the only character that would need to be escaped or that might cause trouble would be the tab character itself since it is used as a delimiter. All other characters should be handled correctly when added literally.

    Don't hesitate to get back to me if anything is still unclear ;-)

     

    Have a great day!

    Raphaël

Children