What is TrainAI?
- Offers data collection, annotation (labelling) and validation services for all types of AI training data, in any language, at any scale, based on the principles of responsible AI.
- Provides text, audio, image, video, multilingual and synthetic data to train a broad range of AI applications from search and virtual assistants, to facial recognition, voice control, content moderation systems and more.
- Delivers responsible AI training data that clients can depend on. TrainAI data is ethically sourced, accurate, fair and inclusive, based on a privacy- and security-first approach, and built on human-in-the-loop methodologies.
- Data meets required quality standards, eliminating the need for data to be redone and the AI retrained at additional cost.
Who is TrainAI for?
TrainAI is a service built for organizations of any size which are aiming to improve their user experience, efficiency, or general capabilities by deploying artificial intelligence, namely machine learning.
Teams of AI researchers, ML engineers, Product Owners or Data Scientists will sooner or later run into a lack of high-quality AI training data for specific domains or languages, and partnering with RWS by using TrainAI services is their best way of bridging this gap, whether they need data collected, annotated or validated.
What does a typical request for TrainAI look like?
In the last several years, the AI data services space has exploded in size and application potential, which has led to new and exciting areas for data annotation and collection.
Our clients are often looking to acquire datasets from underrepresented languages to build AI models that better support the next billion users of technology, but the requests can also involve the most commonly used languages of today – in which case the difficulty often lies in collecting a very specific type of data (e.g. commands for a voice assistant, speech recorded with various emotions, bilingual requests, etc.).
For data annotation requests, our clients are often looking for transcriptions from audio or images, and annotations of entities in natural text or various types of data classification projects which could include all content types.
However, many of the partnerships we have established are with R&D groups who are working on cutting-edge research, which may involve experimental approaches to machine learning – and especially in these cases, every single request tends to be unique.
What is the TrainAI Community?
The TrainAI Community includes the following resources:
Description |
Total |
Raters, Annotators and Linguists (In-House and Freelance) |
40,958 |
AOP Connect Researchers |
43,000 |
Partner Network Resources |
20,000+ |
Total Size of the TrainAI Community |
103,958+ |
How are TrainAI Community members sourced?
TrainAI Community members are recruited using the following tools/strategies on a project-by-project basis:
- Job postings on:
- Local job websites
- LinkedIn Recruiter
- Social media posts (with a visual and hashtags) on:
- Official RWS social media channels
- RWS employee personal profiles
- Relevant LinkedIn and Facebook groups
- Articles published in our vendor newsletter, ‘Stay Tuned’
- RWS Campus
- Referrals
The TrainAI VRM team is currently working on setting up a recruiting campaign that will run on a continuous basis to grow our TrainAI Community.
How are TrainAI Community members vetted?
TrainAI Community members are evaluated by completing:
- Background information about themselves, their skills, interests and hobbies, language proficiency, the types of services they can provide based on their educational background and industry experience, desired pay rate etc.
- Language knowledge tests to establish their language level(s)
- General machine learning tests which evaluate attention to detail, critical thinking and performance on typical data annotation tasks
- Additional tests based on project-specific requirements
A member’s test results then determine the AI data projects/tasks they will be invited to work on.
Additional verification may be completed based on client requirements.
How are TrainAI Community members trained?
TrainAI engagements often vary from client to client, and therefore require project-specific training and testing.
TrainAI Community members receive project-specific training facilitated by the TrainAI production team, in which they are required to pass several training rounds to be able to start work on a project.
On moderated data collection or data annotation projects, a project manager also works with each project participant to ensure compliance with scenario, instructions, training, etc.
How much are TrainAI Community members paid?
TrainAI Community members are compensated fairly based on:
- Applicable in-country minimum wage, inflation rate, member skills, as well as desired pay
- Annual reviews which consider economic factors
- Transparent payment terms that are clearly outlined during registration
- Regular and timely payments
How do you set pay rates for the TrainAI Community?
Pay rates for members of our TrainAI Community are set based on in-country:
- Minimum wage
- Average salary
- Cost of living
- Inflation rate
- Member skills
- Member’s desired pay
- Annual reviews which consider economic factors
How is the TrainAI Community different from other providers?
Unlike crowdsourcing i.e. using anyone for the job and hoping for the best, TrainAI only selects the vetted, qualified and skilled resources best suited to meet the specific needs of each AI data project.
RWS’s TrainAI Community consists of 100,000+ participants who are:
- Vetted, skilled and qualified raters, annotators and linguists
- Proficient in 400+ language variants covering 175+ countries
- Supported by RWS’ global footprint
What type of contract do members of the TrainAI Community have in place?
TrainAI Community members can work with us as freelancers or as private individuals, and as such are not bound by a contract but can work whenever they like. However, the relationship with RWS is regulated by our T&Cs and an NDA.
What data protection is in place?
TrainAI is built on RWS’s privacy- and security-first approach:
Privacy:
- Benchmarks for data protection: EU General Data Protection Regulation (GDPR) and UK Data Protection Act 2018
- Focused on protection of Confidential Business Information (CBI) and Personally Identifiable Information (PII)
- Transparency in data collection / management
- Compliance with country-specific policies on the use of biometric data
Security:
- NIST Cybersecurity Framework
- Information Security Management System (ISMS)
- Multi-factor authentication (MFA)
- ‘Defence in depth’ security controls
- Risk management
- Physical security
- Incident management
How do you define / what is TrainAI’s approach to responsible AI?
There are slightly varying definitions of what is responsible AI, but to summarize, responsible AI is:
- Ethical, accountable and transparent
- Fair and unbiased
- Accurate and reliable
- Private and secure
Responsible AI models need responsible AI data – this is where RWS TrainAI comes in.
TrainAI data is prepared in accordance with the principles of responsible AI as shown in the following table.
Responsible AI is: |
TrainAI Data is: |
Ethical, accountable, and transparent |
d |
Fair and unbiased |
Accurate, inclusive, and targeted |
Accurate and reliable |
Built on human-in-the-loop methodologies |
Private and secure |
Based on a privacy- and security-first approach |
How do you ensure responsible AI / remove bias / provide trusted data?
TrainAI deploys the processes below to ensure the AI data we provide to clients is responsible:
- SmartSourcing:
- Ethical recruitment and fair compensation
- Pairing the right people with the right skills to the right projects
- Custom program setup tailored to specific project needs to ensure we prepare the targeted data required to train the client’s AI
- Bias detection and removal at every stage of the project including review of the following:
o Team composition o Exploitability o Demographics o Language variants o Task design |
o Incentive system o Representation o Annotation ontology o Class imbalance o Usage of protected classes |
- A combination of human-in-the-loop and automated quality assessments to ensure data accuracy and quality:
- Human review – annotation manually checked by an expert
- Consensus review – the same data point is annotated multiple times and annotations are compared
- Golden dataset – annotation is compared to a data point with a known answer
TrainAI data meets or exceeds quality targets – done right the first time or we’ll fix it.
- Privacy and security measures in accordance with RWS company policies