Seeking guideline to preserve formatting

Hello everyone! Hope everything is fine

I am currently working on developing a web app which translates an input file and generates the output target file in sdlxliff format. 

The problem that I am facing right now is that when text is extracted from the input file, it is being extracted as plain text (Formatting is not preserved)  but I want the extracted text to be in the format as the input file and the same text formatting should be applied to the final .sdlxliff output file, I am sharing my text extraction and .sdlxliff generation logic here, it is a python code. 

I want someone to help me out, to guide me the right text extraction and .sdlxliff creation logic.

My current logic for text extraction is




Everything apart from text formatting preservation is perfect.


Moved to code block.
[edited by: Paul at 5:10 PM (GMT 0) on 14 Mar 2025]
emoji