Language Weaver Edge 8.6.5.0 has been released

Language Weaver Edge 8.6.5.0 has been released

Hi all, 

I’m happy to announce that Language Weaver Edge 8.6.5.0 has been released and is now available from our download servers.

Language Weaver Edge 8.6.5.0 is a minor feature release, introducing the support of Automatic Speech Recognition, New Adaptation modes, Edge-Cloud scalability enhancements and multiple fixes.

What's New?

  • Integration of an Automatic Speech Recognition (ASR) Module
    • By integrating an open source ASR module, it is now possible to process audio content with Language Weaver Edge and automate the transcription and the translation of offline audio files
    • The ASR module is an optional module, that requires a separate installation. The ASR module should be installed on all worker hosts where a Transcription Engine will be enabled.
      • The ASR installers are available on our download servers in their dedicated folder, ‘ASR Models’, along with the regular ‘Application’ Edge installer folder.
      • Installation of ASR is supported for both Windows and Linux, and installers are available for each operating system.
      • Installation of ASR should be done after the regular Language Weaver Edge has been deployed on the host.
    • Once the ASR model has been deployed on a host, it is possible to create an Audio Transcription Engine.
      • The Audio Transcription Engine provides access to all languages supported by the ASR model.
    • Language Support
      • Supported Languages are the languages supported by both Language Weaver Edge and the open source ASR model (OpenAI Whisper).
      • Current list of language supported includes: Arabic, Armenian, Azerbaijani, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese.
      • Not all languages have the same transcription quality and may present different Word Error Rates. Please consult the OpenAI Github Whisper page to learn about the evaluation of models and the estimated WER per language.
    • Each Audio Transcription Engine will require a supported GPU for maximum performance.
      • 5 GB of GPU RAM is required for each ASR Engine.
      • 2 CPU cores minimum are required on the host.
      • Average transcription Real-Time Factor is 0.25 (it takes on average 1s to process 4s of audio content). 
        • Note that this is an average and may vary depending on the GPU, the languages and the quality of the audio source file.
        • Performance testing based on source type is advised.
    • Audio Transcription Engines and Training Engine can't share the same GPU:
      • Training Engine requires exclusive access to the GPU, therefore, it will not be possible to run both a Training Engine and an Audio Transcription Engine on the same host using the same GPU
      • Multiple Audio Transcription Engines can however share the same GPU.
    • Audio Transcription Engine also supports CPU, but the transcription will be slower compared to using a GPU.
      • Minimum Hardware requirements per Audio Transcription Engine when running on CPU only are:
        • 16 CPU cores
        • 16GB of RAM
      • On the above CPU configuration, average transcription Real-Time Factor is 3.2 (it takes 3.2s to process 1s of audio content) 
        • Note that this is an average and may vary depending on the CPU, the languages and the quality of the audio source file.
        • Performance testing based on source type is advised.
    • Audio Transcription doesn't require a separate license and can be enabled with any existing 8.6 license, assuming that PUs are available from the License.
      • Each Transcription Engine requires 1 Processing Unit to be allocated from the Language Weaver Edge license
      • It is possible to have multiple Translation Engines running in parallel on hosts for concurrent processing. Each engine requires a PU to run.
  • Adaptation Improvements
    • It is now possible to adjust the generic data mix that is performed during adaptation to allow linguists to choose between limiting generic domain degradation, maximizing in-domain improvements from the training data or having a good balance between in-domain improvements and generic domain degradation.
    • Adjustment is performed in the UI using a slider which is now available during the Adaptive (manual and auto) LP creation process.
    • The 3 different adaptation modes are:
      • Generic: With this mode, the adapted model will retain more of the generic nature of the baseline LP. This is the recommended mode for use with multiple domains, especially the one covered by the training data. Note that this mode was the default mode introduced in Language Weaver Edge 8.6.4, which requires longer adaptation time (GPU is then highly recommended).
      • Balanced: This mode offers a good balance between generic and domain-specific content (new default in Language Weaver Edge 8.6.5).
      • Domain Specific: Using this mode, the adapted model retains more of the domain specific training set. It is recommended for use with a single domain, that matches the provided training set.
    • New adaptation modes are available for both Manual and Auto Adaptation.
  • Edge-Cloud Improvements
    • The Edge-Cloud service has been enhanced and optimized for scalability
    • Configuration is simplified: it is now possible to access all Language Pairs combinations available in Cloud without the need to create Edge-Cloud Engines for each LP in Edge.
    • Only a single Edge-Cloud Engine is needed, optimizing hardware resource consumption on the Edge hosts, therefore reducing TCO.
    • Multiple Edge-Cloud Engines can be enabled on different hosts to offer load-balancing and high-availability.
    • Edge-Cloud engine hardware requirements are:
      • 1 CPU core
      • 1 GB of RAM
    • Edge-Cloud Engines don't require any PUs, but a Language Weaver subscription is required.

Enhancements

  • Dictionary enhancements
    • For Language Pairs that support Fluent Terminology, it is now possible to disable Fluent Terminology at the term level in dictionaries. This can be useful if you want to force the model not to inflect terms and produce an inflection in the output. When disabling Fluent Terminology, the term is handled in the same way it was handled with a non fluent terminology LP.
    • It is also possible to enforce case matching at the term level (this only works for LPs that don't support Fluent Terminology or for terms where Fluent Terminology is disabled). When enabled, the dictionary term will only be matched when the exact casing is found in the source.
  • Monitoring enhancements
    • Added a new Language Pair WPM chart that aggregates the WPM for all TEngines of the same LP or Source & Target language.
    • Added an "WPM - Total" row on the Activity table to show the real-time aggregated WPM of the entire cluster.
    • Added all available engine types to the Deployments table.
    • Changed the default charts shown to "Translation Engine WPM" and "Host Memory Usage".
    • When hovering on a specific chart label, automatically hide all other plots on the chart to easily and quickly focus on a plot of interest.
    • Added new Prometheus metrics for document count, word count, and character count, for successful translations. These are available in the REST API /api/v2/metrics endpoint.
  • Added support for BMP format. BMP format is now supported for image translation.
  • Added ability to show/hide masked passwords in UI edit fields.
  • Added tooltip explanations for PDF smart selection options.
  • Added a new "List Dictionaries" role that allows users to list and use Dictionaries even if they don't have the "View Dictionaries" permission.
  • Added an Edge installer option to make installation of the ABBYY PDF Converter optional.
  • Added digital signatures for new Windows LP installers.
  • Improved handling of formatted bold/italics words for Asian language MS Word files.
  • Accessibility improvements for keyboard navigation and other miscellaneous updates.

Fixes

  • Fixed Arabic RTL text alignment on Feedback editing page and in Microsoft Excel spreadsheets.
  • Fixed language detection of Chinese Traditional text.
  • Fixed XLIFF translation failure when the file contains a starting UTF8 Byte Order Mark.
  • Fixed SMTP sender reset on upgrades.
  • Fixed UI Translation History sort order to be stable even when changes are made to the Translation Settings panel.
  • Fixed UI partial highlighting of feedback entry when the target word contains a dash.
  • Fixed unexpected new browser tab that would get opened whenever a user clicked to download a translation.
  • Fixed rare crash when API Gateway process is started due to a race condition corruption when updating the configuration.
  • Fixed occasional PDF translation failure on Windows when processing multiple PDFs in parallel.
  • Fixed improper escaping of special password characters entered during the Edge installation.
  • Disallow users for managing higher-privileged roles.
  • Correctly detect newer OS' in the generated myhost.json license profile.
  • Security fix to prevent Open Redirection.
  • Security fix to prevent user enumeration with failed credentials.
  • Security fix to update 3rd party libraries to newer versions: RabbitMQ, OpenSSL, Tesseract, LibreOffice, etc.

Documentation Updates

The online Language Weaver Edge documentation has been enhanced with the following new or updated sections:

Deprecated features

Good to know 

If you are already using Language Weaver Edge 8.6.0, 8.6.1, 8.6.2, 8.6.3, or 8.6.4, no new license is needed, and you can upgrade your existing instance directly. 

If you are upgrading from Language Weaver Edge 8.5.x or earlier, a new license will be required, and you can request it through your gateway account. 

If you want to deploy Automatic Speech Recognition, download the separate installer (Linux or Windows) from the dedicated download folder, perform the additional installation on each host where you will run an Audio Transcription Engine, and make sure you have enough free PUs in your license to run them before starting the engines.

The Kubernetes deployment package will be updated shortly, and we will post a specific announcement when new images are available on our registry. 

If you have any questions, please reach out to your Account Manager or the Language Weaver support team. 

We hope you’ll enjoy this new release!  

Thank you!