I'm trying to create a regex-based HTML parser for a template language that contains HTML fragments.
(I can't use the built-in HTML file types for this.)
The settings are:
When I tested this with the following HTML file:
<!DOCTYPE html>
<html>
<head>
<title>test</title>
</head>
<body>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis interdum est ut bibendum
rutrum. Vestibulum luctus, nibh ac viverra molestie, tortor neque fringilla justo, ut venenatis
arcu sapien quis lectus.
</p>
</body>
</html>
I noticed that the line-breaks in the <p> tag were maintained even though Remove Line Breaks was selected.
What setting do I need to change to have Studio split the <p> contents in only three segments?
(I'm trying to duplicate the segmentation algorithm of the built-in HTML file types.)
Generated Image Alt-Text
[edited by: Trados AI at 4:39 AM (GMT 0) on 5 Mar 2024]