- Basic competence with machine learning technology
- Understanding of the broad range of data needed to build and develop an MT system
- Understanding of proper data preparation and data optimization processes
- Ability to understand, measure and respond to success and failure in model building as an integral part of the development process
- Understanding of the additional support tools and connected data flow infrastructure needed to make MT deployable at enterprise scale
Why is relying on open source difficult for the enterprise?
![](/resized-image/__size/620x240/__key/communityserver-blogs-components-weblogfiles/00-00-00-01-98/pastedimage1564596909243v1.png)
- Building engineering teams that understand what research is most valid and relevant, and then upgrading and refreshing existing systems is a significant, ongoing and long-term investment.
- Keeping up with the evolution in the research community requires constant experimentation and testing that most practitioners will find hard to justify.
- Practitioners must know why and when to change as the technology evolves or risk being stuck with sub-optimal systems.
![](/resized-image/__size/1789x240/__key/communityserver-blogs-components-weblogfiles/00-00-00-01-98/pastedimage1564596909245v2.png)
What are the key requirements for enterprise MT?
- Data preparation for training and building MT engines, acquired through the experience of building thousands of engines across many language combinations for various use cases.
- Deep machine learning techniques to assess and understand the most useful and relevant research in the NLP community for the enterprise context.
- Development of tools and architectural infrastructure that allows rapid adoption of research breakthroughs, but still maintains existing capabilities in widely deployed systems.
- Productization of breakthrough research for mission-critical deployability, which is a very different process from typical experimentation.
- Pre- and post-processing infrastructure, tools and specialized capabilities that add value around core MT algorithms and enable systems to perform optimally in enterprise deployment settings.
- Ongoing research to adapt MT research for optimal enterprise use, e.g., using CPUs rather than GPUs to reduce deployment costs, as well as the system cost and footprint.
- Long-term efforts on data collection, cleaning, and optimization for rapid integration and testing with new algorithmic ideas that may emerge from the research community.
- Close collaboration with translators and linguists to identify and solve language-specific issues, which enables unique processes to be developed to solve unique problems around closely-related languages.
- Ongoing interaction with translators and informed linguistic feedback on error patterns provide valuable information to drive ongoing improvements in the core technology.
- Development of unique language combinations with very limited data availability (e.g., ZH to DE) by maximizing the impact of available data. Utilization of zero-shot translation (between language pairs the MT system has never seen) produces very low-quality systems through its very basic interlingua, but can be augmented and improved by intelligent and informed data supplementation strategies.
- Integration with translation management software and processes to allow richer processing by linguistic support staff.
- Integration with other content management and communication infrastructure to allow the pervasive and secure implementation of MT capabilities in all text-rich software infrastructure and analysis tools.