Segmentation rule | exceptions | full stop rule

Question

Hi all, 
 Does anyone know if it is possible to make sure segments are not separated by a full stop in any case that is not the end of a sentence? So for example in names for people, like 'F.G. de Groot', but also after a common abbreviation that is followed by an uppercase in the following word. I've noticed that it doesn't separate the segments when it's followed by a word starting with a lowercase letter. I've also noticed Trados Studio 2022 sometimes breaks up hyperlinks after the :\, which will have to manually be merged again. 
 I hope you know what I'm trying to get at and have some solutions for me I haven't tried yet. I've tried \p{Lu} before the break as an exception to the full stop rule (as found under another post in this forum), which seems to work for the names for people (thank god), but that's only part of the problem it appears. And I'm not exactly an expert on what every bit of a regular expression means exactly, so I'm not sure what I need to add or delete in order to get exactly what I'm trying to achieve from it. 
 Thanks in advance, 
 Charley

Paul Filkin · Answer

Charley van der Salm 
 I created a small test file to have a play with:

Fullscreen 
 segment.md 
 Download 
 
 ### &#128313; **Names and Initials (should \*not\* segment):**

1. The manuscript was signed by F.G. de Groot.
2. Please refer to the comments from A.J. Smith and C.P. Haanstra.
3. The case was ruled on by M.L. King Jr. in 1964.

------

### &#128313; **Abbreviations followed by Uppercase (should \*not\* segment):**

1. This was confirmed in the meeting with Prof. Andrew Marks.
2. The project will begin in Jan. 2025, as planned.
3. The goods were delivered by DHL Exp. Services.

------

### &#128313; **Abbreviations followed by lowercase (correctly handled):**

1. The error occurred at approx. 4pm yesterday.
2. This was agreed upon by e.g. several key stakeholders.

------

### &#128313; **Standard sentence endings (should segment):**

1. The client approved the text. We may proceed with publication.
2. I contacted the team. They responded within the hour.

------

### &#128313; **Problematic hyperlink/URL splitting (should \*not\* segment):**

1. Please visit www.example.com/.../start.html for more details.
2. This is hosted at http:\server.domain.local\shared\folder\file.txt
3. The tool can be found at downloads.example.org/.../index.zip 

------

### &#128313; **Other edge cases (optional, for thoroughness):**

1. “Etc.” is not a reason to stop being precise. This should be clear.
2. The company is based in the U.S. It operates globally.
3. Refer to para. 3 in the contract. This outlines your obligations.

------

Then opened against a default TM with Trados Studio 2022: 
 
 Observations: 
 
 The first examples segment on A.J., C.P. and M.L. when I don't want them to. So I add these as abbreviations: 
 The next segment on Exp. so I repeat the exercise for that: 
 Abbreviations followed by lowercase are all good. 
 Standard sentence endings are all good. 
 Problematic hyperlink/URL splitting are all good 
 Other edge cases, shouldn't have segmented on "para." So I add that as well: 
 
 So no segmentation rules needed for any of these. All handled correctly or by adding to the abbreviations list. 
 If you did this already then I think your problem is either because your source document contains more than just plain text, or you have competing rules that are conflicting.

Trados Studio > 5. Regex and XPath

Segmentation rule | exceptions | full stop rule