Markdown Segmentation Not Segmenting Sentences using the Full Stop Rule

Hi, 
I'm doing some markdown files in Trados Studio. In my language resources, I used the default Full Stop segmentation rule. But when I created a project from one of the MD files, I noticed the following:

Screenshot of Trados Studio showing text from a markdown file with incorrect segmentation, where multiple sentences are included in a single segment.

First, as you can see in the image above, both segments contained multiple sentences, even though I've used the default full stop segmentation rule. 

Screenshot of Trados Studio displaying a markdown file where ordinals 1, 2, and 3 are segmented into individual segments, separate from the following actions.

Second, as you can see from the image above, the ordinals 1, 2, and 3 are segmented into individual segments. I wanted them to be kept in the same segment as the actions that follows, how can I do that? Please advice. Thank you!
By the way, I presume you may want the content of the file, so here it is:

# Contributing

So you're looking to contribute to Dify - that's awesome, we can't wait to see what you do. As a startup with limited headcount and funding, we have grand ambitions to design the most intuitive workflow for building and managing LLM applications. Any help from the community counts, truly.

We need to be nimble and ship fast given where we are, but we also want to make sure that contributors like you get as smooth an experience at contributing as possible. We've assembled this contribution guide for that purpose, aiming at getting you familiarized with the codebase & how we work with contributors, so you could quickly jump to the fun part.

This guide, like Dify itself, is a constant work in progress. We highly appreciate your understanding if at times it lags behind the actual project, and welcome any feedback for us to improve.

In terms of licensing, please take a minute to read our short [License and Contributor Agreement](github.com/.../LICENSE). The community also adheres to the [code of conduct](github.com/.../CODE\_OF\_CONDUCT.md).

### Before you jump in

[Find](github.com/.../issues an existing issue, or [open](github.com/.../choose) a new one. We categorize issues into 2 types:

#### Feature requests:

* If you're opening a new feature request, we'd like you to explain what the proposed feature achieves, and include as much context as possible. [@perzeusss](https://github.com/perzeuss) has made a solid [Feature Request Copilot](udify.app/.../MK2kVSnw1gakVwMX) that helps you draft out your needs. Feel free to give it a try.
*   If you want to pick one up from the existing issues, simply drop a comment below it saying so.

    A team member working in the related direction will be looped in. If all looks good, they will give the go-ahead for you to start coding. We ask that you hold off working on the feature until then, so none of your work goes to waste should we propose changes.

    Depending on whichever area the proposed feature falls under, you might talk to different team members. Here's rundown of the areas each our team members are working on at the moment:

    | Member                                                                                  | Scope                                                |
    | --------------------------------------------------------------------------------------- | ---------------------------------------------------- |
    | [@yeuoly](https://github.com/Yeuoly)                                                    | Architecting Agents                                  |
    | [@jyong](https://github.com/JohnJyong)                                                  | RAG pipeline design                                  |
    | [@GarfieldDai](github.com/GarfieldDai)                                          | Building workflow orchestrations                     |
    | [@iamjoel](https://github.com/iamjoel) & [@zxhlyh](https://github.com/zxhlyh)           | Making our frontend a breeze to use                  |
    | [@guchenhe](https://github.com/guchenhe) & [@crazywoola](https://github.com/crazywoola) | Developer experience, points of contact for anything |
    | [@takatost](https://github.com/takatost)                                                | Overall product direction and architecture           |

    How we prioritize:

    | Feature Type                                                 | Priority        |
    | ------------------------------------------------------------ | --------------- |
    | High-Priority Features as being labeled by a team member     | High Priority   |
    | Popular feature requests from our [community feedback board](github.com/.../ideas) | Medium Priority |
    | Non-core features and minor enhancements                     | Low Priority    |
    | Valuable but not immediate                                   | Future-Feature  |

#### Anything else (e.g. bug report, performance optimization, typo correction):

*   Start coding right away.

    How we prioritize:

    | Issue Type                                                                          | Priority        |
    | ----------------------------------------------------------------------------------- | --------------- |
    | Bugs in core functions (cannot login, applications not working, security loopholes) | Critical        |
    | Non-critical bugs, performance boosts                                               | Medium Priority |
    | Minor fixes (typos, confusing but working UI)                                       | Low Priority    |

### Installing

Here are the steps to set up Dify for development:

#### 1. Fork this repository

#### 2. Clone the repo

Clone the forked repository from your terminal:

```
git clone git@github.com:<github_username>/dify.git
```

#### 3. Verify dependencies

Dify requires the following dependencies to build, make sure they're installed on your system:

* [Docker](https://www.docker.com/)
* [Docker Compose](docs.docker.com/.../)
* [Node.js v18.x (LTS)](http://nodejs.org)
* [npm](https://www.npmjs.com/) version 8.x.x or [Yarn](https://yarnpkg.com/)
* [Python](https://www.python.org/) version 3.10.x

#### 4. Installations

Dify is composed of a backend and a frontend. Navigate to the backend directory by `cd api/`, then follow the  [Backend README](github.com/.../README.md) to install it. In a separate terminal, navigate to the frontend directory by `cd web/`, then follow the [Frontend README](github.com/.../README.md) to install.

Check the [installation FAQ](docs.dify.ai/.../install-faq) for a list of common issues and steps to troubleshoot.

#### 5. Visit dify in your browser

To validate your set up, head over to [http://localhost:3000](http://localhost:3000) (the default, or your self-configured URL and port) in your browser. You should now see Dify up and running.

### Developing

If you are adding a model provider,[this guide](github.com/.../README.md) is for you.

If you are adding tools used in Agent Assistants and Workflows, [this guide](github.com/.../README.md) is for you.

> **Note** : If you want to contribute to a new tool, please make sure you've left your contact information on the tool's 'YAML' file, and submitted a corresponding docs PR in the [Dify-docs](github.com/.../tool-configuration) repository.

To help you quickly navigate where your contribution fits, a brief, annotated outline of Dify's backend & frontend is as follows:

#### Backend

Dify’s backend is written in Python using [Flask](flask.palletsprojects.com/.../). It uses [SQLAlchemy](https://www.sqlalchemy.org/) for ORM and [Celery](docs.celeryq.dev/.../introduction.html) for task queueing. Authorization logic goes via Flask-login.

```
[api/]
├── constants             // Constant settings used throughout code base.
├── controllers           // API route definitions and request handling logic.           
├── core                  // Core application orchestration, model integrations, and tools.
├── docker                // Docker & containerization related configurations.
├── events                // Event handling and processing
├── extensions            // Extensions with 3rd party frameworks/platforms.
├── fields                // field definitions for serialization/marshalling.
├── libs                  // Reusable libraries and helpers.
├── migrations            // Scripts for database migration.
├── models                // Database models & schema definitions.
├── services              // Specifies business logic.
├── storage               // Private key storage.      
├── tasks                 // Handling of async tasks and background jobs.
└── tests
```

#### Frontend

The website is bootstrapped on [Next.js](https://nextjs.org/) boilerplate in Typescript and uses [Tailwind CSS](https://tailwindcss.com/) for styling. [React-i18next](https://react.i18next.com/) is used for internationalization.

```
[web/]
├── app                   // layouts, pages, and components
│   ├── (commonLayout)    // common layout used throughout the app
│   ├── (shareLayout)     // layouts specifically shared across token-specific sessions 
│   ├── activate          // activate page
│   ├── components        // shared by pages and layouts
│   ├── install           // install page
│   ├── signin            // signin page
│   └── styles            // globally shared styles
├── assets                // Static assets
├── bin                   // scripts ran at build step
├── config                // adjustable settings and options 
├── context               // shared contexts used by different portions of the app
├── dictionaries          // Language-specific translate files 
├── docker                // container configurations
├── hooks                 // Reusable hooks
├── i18n                  // Internationalization configuration
├── models                // describes data models & shapes of API responses
├── public                // meta assets like favicon
├── service               // specifies shapes of API actions
├── test                  
├── types                 // descriptions of function params and return values
└── utils                 // Shared utility functions
```

### Submitting your PR

At last, time to open a pull request (PR) to our repo. For major features, we first merge them into the `deploy/dev` branch for testing, before they go into the `main` branch. If you run into issues like merge conflicts or don't know how to open a pull request, check out [GitHub's pull request tutorial](docs.github.com/.../collaborating-with-pull-requests).

And that's it! Once your PR is merged, you will be featured as a contributor in our [README](github.com/.../README.md).

### Getting Help

If you ever get stuck or got a burning question while contributing, simply shoot your queries our way via the related GitHub issue, or hop onto our [Discord](discord.com/.../8Tpq4AcN9c) for a quick chat.



Generated Image Alt-Text
[edited by: RWS Community AI at 2:10 AM (GMT 0) on 3 Mar 2025]
  •  

    I think the numbers are handled correctly, and in most cases I think users would prefer this behaviour as the numbers ate handled automatically and you may get better leverage from the text that is separated out.  But if you want to change that then you'd need to add an exception to the full stop rule in your segmentation rules... should be trivial.

    On the first part, relating to this in your sample file:

    So you're looking to contribute to Dify - that's awesome, we can't wait to see what you do. As a startup with limited headcount and funding, we have grand ambitions to design the most intuitive workflow for building and managing LLM applications. Any help from the community counts, truly.

    The segmentation here is rather odd and I cannot see why it's behaving this way.  I actually get this:

    #1: So you're looking to contribute to Dify - that's awesome, we can't wait to see what you do. As a startup with limited headcount and funding, we have grand ambitions to design the most intuitive workflow for building and managing LLM applications.

    #2: Any help from the community counts, truly.

    I played around with a few things but cannot get this to behave correctly.  I don't think this is a segmentation that can be controlled by rules because I think it looks like a bug.  However, it would not be the first time I was wrong so I have logged a case with the technical support team to validate - CS0002555.  I'll update the post when I get the feedback.

    Paul Filkin | RWS Group

    ________________________
    Design your own training!

    You've done the courses and still need to go a little further, or still not clear? 
    Tell us what you need in our Community Solutions Hub

    emoji