When Translators Play God: Creating a Neural Network in My Own Image

In the labyrinthine world of machine translations, my journey has been nothing short of a Herculean saga, punctuated with moments of enlightenment, frustration, and sheer awe at the power of modern technology. This narrative delves deeper into the technical intricacies and challenges I faced, painting a vivid picture of a relentless quest for linguistic perfection through the lens of software development and deep learning.

Awakening

It all began with a realization that sent ripples through my previously skeptical stance towards machine translations. Once a source of amusement, these tools had quietly evolved, becoming sophisticated enough to significantly streamline the translation process. This epiphany prompted me to explore ModernMT, a neural network capable of adapting its style and terminology over time—a crucial feature for maintaining consistency in my translations, particularly in the nuanced realm of video game localizations.

The Linux Chronicles

My initial foray into this world was marked by a challenging installation on a Linux system—a task that tested my patience and technical acumen. The endeavor required a dedicated machine, extensive maintenance, and the training of language models for each language pair, a testament to the complexity and resource-intensive nature of cutting-edge MT software. Despite these hurdles, the results were promising, leading me to believe in the potential of these tools. However, the subsequent acquisition of the software by a commercial entity marked the end of support for the open-source version I had come to rely on, propelling me towards alternative solutions.

ModernMT running on Ubuntu

A Quest for Customization and Control

Switching to ModernMT Online, a commercial machine translation platform, brought forth a fresh batch of hurdles. Despite its user-friendly nature, this online service fell short in offering the tailor-made experience provided by its local counterpart. It leaned heavily on a broad, one-size-fits-all database, significantly diluting the impact of my meticulously trained custom models. This experience ignited a fervent wish to reclaim mastery over my translations, prompting me to delve into the realms of local implementation and bespoke adjustments. While the model did indeed adjust to my unique style, it did so with a notably lesser degree of precision and personalization than the local version, which was trained exclusively on my own translations.

Far less usage of ModernMT in March, not only because only one week in March has passed, but also because we threw English-Dutch out of the equation.

The Genesis of My Own Neural Network

Armed with an improved understanding of Python and the public domain’s wealth of knowledge on neural network programming, I embarked on creating my own neural network. The initial prototype was modest, limited by the capabilities of my existing hardware. Yet, a serendipitous failure of my CPU cooling system forced me to invest in a more powerful setup, a 5,000 euro Acer equipped with 64GB of RAM and an RTX 4090 GPU. This upgrade facilitated the development of a much larger network, boasting 785 million parameters—a tenfold increase from the original model.

Meet Takanokaze, the newest addition to our server park

A Technological Renaissance

This newfound computational power unlocked unprecedented potential in my translations. The neural network, trained exclusively on my previous translations, began to exhibit remarkable adaptability, seamlessly integrating my stylistic preferences and terminology. A particularly striking example of its capabilities was observed in the translation of a sports franchise, where the network learned to correctly position player scores according to Dutch conventions after being fed just a few sentences, in real-time: the network literally learns as you go.

Between 10:03 and 15:15, this network learned a lot. Note how it very quickly learned to translate “97 OVR Tre Jones” as “Tre Jones (97 ALG)”.

Mastery Through Machine Learning

Today, my system stands as a testament to the power of machine learning, capable of delivering personalized, high-quality translations that reflect my unique style. It’s a heavy-duty setup that pushes the limits of my hardware but offers unparalleled speed and accuracy. The continuous learning and adaptation of the network are visualized through dynamic graphs, providing real-time feedback on its performance.

This journey has underscored the importance of customization in machine translations. By developing and refining my own system, I’ve created a tool that not only enhances my productivity but also preserves the integrity of my work. The experience has reaffirmed my belief in the potential of neural networks and the value of personalized, high-quality translations. As I look to the future, I’m excited about the possibilities that lie ahead in further refining this technology, ensuring that it remains a powerful ally in my ongoing quest for translation excellence.

Confronting the Cloud Conundrum

The exhilaration of crafting a neural network with the finesse to navigate the nuances of language translation was tempered by a sobering realization—the computational behemoth I had unleashed was not just powerful but voraciously power-hungry. The cost of harnessing such might in an online environment was astronomical. To run this advanced machine learning model for each language pair required a hardware setup that was both rare and expensive. Even a conservative estimate placed the cost price at a steep 20 euros per hour per language pair per user, a figure that starkly highlighted the impracticality of deploying this solution in a cloud-based format.

This economic barrier has confined the operation of my neural network masterpiece to a local incarnation of Cattitude, a version whose source code is a closely guarded secret. The decision to withhold this technology from public release was not taken lightly. It is a reflection of the intricate balance between innovation and practicality, the understanding that while we stand on the cusp of a new era in machine translation, the journey there is fraught with challenges both technical and financial.

A Paradigm Shift in Translation Economics

This journey, with its highs and lows, has crystallized a crucial insight into the economics of translation work, particularly in dealings with translation agencies. The industry norm, where agencies offer “pretranslated” texts using generic and often subpar machine translation engines, followed by a request for discounts from professional translators, is a paradigm I now challenge with newfound conviction.

Why should we, the translators, accede to discounts on prework that not only compromises the quality of the final product but also undermines our expertise? My foray into the realm of custom neural networks demonstrates a pivotal truth: equipped with the right tools and technical prowess, we can undertake this preliminary work ourselves, achieving results that far surpass the generic output of commercial MT engines. All you need is a good machine, a Python IDE and too much time on your hands. This realization calls for a reevaluation of our value proposition to translation agencies. It’s not just about the translation anymore; it’s about the added value of personalized, high-quality machine translation that reflects the translator’s unique style and expertise.

A Glimpse into the Future

As I stand at this crossroads, armed with a tool of my own making that marries the art of translation with the science of machine learning, I am both a guardian of linguistic integrity and a pioneer in a digital frontier.

This odyssey through the realm of machine translation, fraught with technical challenges and marked by groundbreaking achievements, serves as a clarion call to my fellow translators. It is a call to recognize and assert the value of our work, to challenge the status quo, and to envision a future where technology enhances our craft, not commoditizes it. As we navigate this ever-evolving landscape, let us hold fast to the principle that our work, enriched by personal touch and technical innovation, is not just a service but an art form worthy of fair compensation.

Doraemon II: specs

The software outlined, called Doraemon II, is a sophisticated Python application designed for language translation tasks, leveraging deep learning models. It’s built on Flask for web server functionality, PyTorch for tensor computations and deep learning, the Transformers library for accessing pre-trained models and pipelines for tasks like sequence-to-sequence language translation, and several other libraries for data manipulation and plotting.

Here’s a breakdown of its core functionalities:

Web Server Setup: Utilizes Flask to create a web server that can receive HTTP requests. It has endpoints to add new sentence pairs for translation and to perform translations.

Model Loading and Translation: Loads a pre-trained neural network model for translating text from one language to another, specifically designed for English to Dutch translations. It can dynamically adjust the token length for input sequences to optimize performance based on the input text length.

Data Preparation and Training: Contains a custom dataset class to handle the input and output sequences for training the model. It supports dynamic updates to the model by adding new sentence pairs and re-training the model in the background using these pairs.

Dynamic Plotting: Features functionality to dynamically plot and update a graph showing the total points (a metric of performance or progress) over time, saving both the plot and the data points to disk.

Threading and Concurrency: Employs threading to handle background tasks such as model updates and sentence pair processing without blocking the main thread, ensuring the web server remains responsive.

CUDA Integration: Includes setup for CUDA, allowing the application to utilize NVIDIA GPU resources for faster computation if available, improving the model’s training and inference times.

Symbol Substitution: Implements a mechanism to substitute symbols with placeholders during processing to handle special characters or symbols in texts, which might be problematic for the model.

Logging and Debugging: Uses logging for monitoring the application’s status and debugging purposes, along with basic configurations for the logging system.

Environment Configuration: Manages environmental variables and paths, particularly for integrating CUDA toolkit for GPU acceleration.

Utility Functions: Provides a suite of utility functions for tasks such as dynamic tokenization, dataset creation from queues, sentence pair persistence, translation with dynamic token adjustment, and more.

Model Updates: Facilitates updating the model with new data in real-time, locking mechanisms to prevent concurrent updates, and an event system to signal completion of updates.

This application combines web development, machine learning model deployment, real-time data updating and visualization, and GPU computing, showcasing an advanced use case of integrating various Python libraries and frameworks for a specialized task such as language translation.

2 thoughts on “When Translators Play God: Creating a Neural Network in My Own Image

  1. Guilherme Gama Reply

    Interesting writeup. Could you go into more detail regarding the NN itself? What kind of NN did you use? Did you build it manually? Did you use pretrained weights? How did you preprocess the data? Thanks!

    • admin Post authorReply

      I need to be cautious since our competitors might also be following this, but the neural network I used is a standard one from Hugging Face, with default settings. Pre-processing the data was straightforward because it solely comprised my own high-quality translations from the past 30 years. Currently, the network is being trained exclusively on my new translations, with the learning rate set to twice the usual speed. This approach is making it increasingly resemble my own translation style.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.