Tafsiri- Outerbase's Multilingual Data Translation Plugin

Tafsiri- Outerbase's Multilingual Data Translation Plugin

Enhancing Data Translation with Google Translate API Integration

In our increasingly interconnected world, communication knows no boundaries. Whether you're running a global business, managing a multicultural team, or simply trying to connect with people from diverse linguistic backgrounds, language should never be a barrier to understanding and collaboration. That's where the power of technology comes into play, and at the forefront of this linguistic evolution is Tafsiri - the multilingual translation plugin.

Tafsiri is a Swahili word which refers to the act of translating. I thought it would be a suitable name for the multilingual translation plugin I had decided to build for the Outerbase hackathon in conjunction with Hashnode. Why name a plugin? Well, naming the plugin was like a fun adventure for me because, you see, I'd never ventured into the plugin-building territory before. Consequently, there were moments when I felt like throwing in the towel, but then I thought, why not inject some personality into it? That's how it got its quirky name!


Plugin?

All bickering and banter aside, I think that it is essential to define what a plugin is. Plugins (or add-ons) are software components that enhance the functionality of a larger software application, allowing users to customize and extend their software experience according to their needs and preferences.

Characteristics of Plugins

    Modularity:

    Plugins are typically standalone units of code that can be added or removed from the host software without affecting its basic functionality.

  • Enhancement:

  • Plugins add new features or improve existing ones.

  • Customization:

  • Users can choose which plugins to install based on their requirements, tailoring the software to their specific needs.

  • Isolation:

  • Plugins are often isolated from the core software to ensure that they do not cause conflicts or crashes. If a plugin encounters an issue, it should not disrupt the host application.


What's the Plugin for Again? Outerbase?

Outerbase, think of it as the cosmic control center for your data galaxy! Companies hop aboard the Outerbase spaceship to effortlessly gaze upon, tweak, and transform their data. Plus, they can whip up stunning visual dashboards without ever needing to summon the SQL sorcery. It's like data management with a sprinkle of stardust! Say goodbye to the days when your database resembled a dead, dull spreadsheet! Consider it a dynamic playground where you not only create data but also dance with it under the disco lights of innovation. Your data has a new beat, and it's time to dance with the features you didn't realize you needed!

PS: I think that we should at least acknowledge that there's a literal spaceship on Outerbase's homepage- so that companies can hop on- literally!

Let us now list Outerbase's main features.

  • EZQL, your AI companion even as you create and customize your database tables, is the one feature that drew my attention when I first logged on to Outerbase. EZQL can also help in querying your database which will help you draw valid conclusions backed up with code.

  • Dashboards - Remember when dashboards were left to the professionals and you could not visualize your data unless you were above a beginner level in utilizing data visualization software? Well, guess what? Outerbase comes with a built-in dashboard!

  • Connecting a database - Maybe you have your database, hosted somewhere else. It doesn't matter whether it is a Postgres, MySQL or MSSQL database, all are welcome! Outerbase allows you to connect, query and visualize your hosted databases. Currently, in the beta version, Postgres and MySQL are fully supported.

  • Commands - Outerbase commands are a flexible way to make different tasks easier in Outerbase. These commands are based on WebAssembly, which means you can create them using the programming language you like the most.

  • Plugins - Finally, plugins! Good thing this article is all about them.


How Tafsiri Works

The simplified flowchart below gives some insight into how Tafsiri works.

From the above flowchart, we can deduce that:

  • There is some form of choice in selecting the target language. How do those target languages get there? What criterion was used in selecting them?

  • Translation occurs pretty much after the target language is selected. What technology is used in translation? Why?

  • Tafsiri, even as it translates column values, can do it cell-by-cell or in batch mode. Why not just pick one mode and stick to it?


Let's See the Code

To answer the above questions, let's look at the code. Being a visual learner myself, I thought I would resort to the use of a more detailed flow chart- again- in explaining my code. Worth noting is that Outerbase provided us third-party developers with the layout of the plugin code in the Outerbase docs.

In subsequent parts, we shall be delving even deeper into the flow chart above.

Privileges

  • This is the information necessary for your plugin to function. It ensures that you have access to only the data that you require without overdelivering or underdelivering. There are several privileges available for use: The tableValue, configuration, cellValue, rowValue etc. For Tafsiri, we shall be using the cellValue and configuration privileges.

Google Translate API

Initially, I had decided to use a fully-fledged LLM model. My understanding at the time was that an LLM model would have to be created from scratch. Having researched NLP in the past, I knew that this would not be an easy feat. If anything, accuracy would have given me a run for my money. I later realized that there are kind souls out there who have trained massive LLM models and allowed the public to use them. Of course, some are paid for certain rates but most aren't. This made me think of GPT-3, an example of an existing LLM. I got my hands on its documentation and several articles and got down to work. Unfortunately, I wasn't successful. Till now, I am still trying to figure out why. Close to losing hope, I looked up Google Translate.

LLM Model Examples

    GPT-3 with Fine-tuning:

    Developers and researchers have fine-tuned models like GPT-3 for specific translation tasks. By training the model on a bilingual dataset, it can be used for translation. For example, you can fine-tune GPT-3 for English-to-French translation and vice versa.

    T5 (Text-to-Text Transfer Transformer):

    T5 is a versatile LLM developed by Google that is designed to handle a wide range of text generation tasks, including translation. By conditioning the model with a "translate English to French: " prefix, you can use it for translation tasks.

    mBART (Multilingual BART):

    mBART is a multilingual variant of Facebook's BART model. It's designed to handle translation and other language generation tasks across multiple languages. You can use it to translate between various language pairs.

    MarianMT:

    MarianMT is a dedicated machine translation model developed by the team behind the Marian NMT toolkit. It's specifically designed for translation tasks and supports a wide range of languages.

    OpenNMT:

    OpenNMT is an open-source neural machine translation framework. While it's not an LLM itself, you can use it to train your custom translation models using large-scale LLMs like BERT or GPT as the encoder.

    Fairseq:

    Fairseq is another open-source sequence-to-sequence learning toolkit that can be used for translation tasks. It provides pre-trained models and tools for custom model training.

    Multilingual Transformers:

    Some LLMs, like mBERT (Multilingual BERT) or XLM-R (Cross-lingual Language Model for English and Multiple Languages), are trained on multiple languages and can be used for translation between those languages.

    Commercial Translation APIs:

    Companies like Google, Microsoft, and Amazon offer commercial translation APIs that utilize LLMs in the background. You can integrate these APIs into your applications for translation tasks.

Turns out that Google Translate is built on Neural Machine Translation (NMT), a deep learning algorithm. While its sheer capacity is mindblowing, I was only interested in how to link it to my application- at the time. Alas! There was an API- free if one had a Google Cloud Account.

Take a short detour via this article link and look into how exactly to get your Google Translate API key if you don't already have one.

Significance of using Google Translate API

    The Perks of using Google Translate API

    High-Quality Translation:

    Google Translate is known for its accuracy and high-quality translations. It supports a wide range of languages and is constantly improving its translation algorithms.

    Ease of Integration:

    Google provides comprehensive documentation and libraries for integrating the API into various programming languages and platforms. Integration is relatively straightforward.

    Fast and Scalable:

    The API is designed to handle a high volume of translation requests quickly, making it suitable for applications with a large user base or heavy translation needs.

    Multilingual Support:

    Google Translate supports a vast number of languages, which makes it suitable for applications targeting a global audience or dealing with multilingual content.

    Customization Options:

    You can customize the translation model by specifying glossaries, custom translation models, and more to tailor translations to your specific domain or terminology.

    Cloud-Based:

    Being a cloud-based service, you don't need to worry about infrastructure management. Google handles the underlying infrastructure and maintenance.

    Cons:

    Costs:

    While Google offers a free tier with limited usage, heavy or commercial usage can become costly. You pay per character or word translated, and costs can add up, particularly for large-scale projects.

    Rate Limits:

    Google imposes rate limits on the API, which can affect the responsiveness of your application if you have a high volume of translation requests. You may need to upgrade to a premium plan for higher rate limits.

    Privacy and Data Usage:

    When you use the Google Translate API, your text data is sent to Google's servers for translation. Depending on your use case, this may raise privacy concerns, especially for sensitive or confidential content.

    Language Variability:

    While Google Translate is generally accurate, it may struggle with languages that have complex grammar or context-dependent meanings. Human review may be necessary for critical translations.

    Dependency on Google:

    Using the Google Translate API ties your application to Google's services. If there are changes to the API or its availability, it could impact your application's functionality.

    Limited Control:

    You have limited control over the translation process. If you require fine-grained control over translations, you might find the API's automation limiting.


The Main Code

  • A huge disclaimer, before we start, is that my code has a monolithic structure- It's code is in one file. Also, we are going back to basic JavaScript. Remember document.createElement() and document.addEventListener() in DOM manipulation? Well, we are going back to that. No fancy libraries or any form of installation. Just us and our classes, methods and loads of functions.

  • I use VS Code (it's not a must). You can download and install it, create a new folder somewhere suitable on your PC, create a new file and name it anything you would like but the file extension should be .js

    The following section will address the intricacies of my code.

Privileges Definition

    var privileges = ["cellValue", "configuration"];
    const googleTranslateAPI = "YOUR_GOOGLE_TRANSLATE_API_KEY";

privileges is an array containing two strings: "cellValue" and "configuration." These may be used as observed attributes for the custom element.

googleTranslateAPI is a constant that holds your Google Translate API key. You should replace "YOUR_GOOGLE_TRANSLATE_API_KEY" with your actual API key.


Creating our HTML element

    var templateCell_$PLUGIN_ID = document.createElement("template");

templateCell_$PLUGIN_ID is a variable that creates an HTML template element. This template defines the structure of your plugin's user interface.


Batch Mode Checkbox

    var batchTranslationCheckbox = document.createElement("input");
    batchTranslationCheckbox.type = "checkbox";
    batchTranslationCheckbox.id = "batch-translation";
    batchTranslationCheckbox.name = "batch-translation";
    batchTranslationCheckbox.value = "batch";
    batchTranslationCheckbox.textContent = "Batch Translation";
    batchTranslationCheckbox.style.marginTop = "10px";
    batchTranslationCheckbox.style.position = "absolute";
  • batchTranslationCheckbox is a checkbox input element created dynamically in JavaScript. It's configured with various attributes and styles.

It is used to enable or disable batch translation functionality.


Our Plugin's UI

    templateCell_$PLUGIN_ID.innerHTML = `
    <style>
        /* CSS styles for our plugin's UI */
    </style>
    <div id="container">
        <form id="form-group">
            <input id="input">
            <select class="language-select"></select>
        </form>
        <div id="translated-column">
            <div id="translated-list"></div>
        </div>
    </div>
    `;
  • templateCell_$PLUGIN_ID is being populated with an HTML structure using a template literal.

  • This structure includes CSS styles and the layout for our plugin's UI. It contains a form with an input field, a select element, and a div to display translated content.

Looking at our plugin's UI definition, you might be wondering why I did not include the batch mode checkbox here. Well, I did and quickly realized that anything in the container div is rendered on every cell of every column of my table. Thus, I only needed one checkbox hence defining it outside the container div.


Configuration

    class OuterbasePluginConfig_$PLUGIN_ID {
      constructor(object) {
        this.targetLanguage = "en";
        this.columnName = "";
        this.cellId = "";
        if (object) {
          this.targetLanguage = object.targetLanguage || this.targetLanguage;
          this.columnName = object.columnName || this.columnName;
          this.cellId = object.cellId || this.cellId;
        }
      }
toJSON() {
return {
targetLanguage: this.targetLanguage,
columnName: this.columnName,
cellId: this.cellId,
};
}
}
  • This class defines a configuration object for our plugin. It has properties such as targetLanguage, columnName, and cellId, which can be set based on user configuration.

Our Main Class

    class OuterbasePluginCell_$PLUGIN_ID extends HTMLElement {
      //Houses our plugin's functions
    }
  • This is the main class for our web component. It extends HTMLElement, indicating that it represents a custom HTML element.

  • The class defines properties like config, supportedLanguages, batchTranslationLanguage, batchData, and lastTranslatedText to manage the Tafsiri's state and data.

  • The plugin functions essential for the functioning of this plugin are defined here. Here are some of them:

    Initializing the element's state and initial rendering

  async connectedCallback() {
    const configAttribute = this.getAttribute("configuration");
    if (configAttribute) {
      this.config = new OuterbasePluginConfig_$PLUGIN_ID(
        JSON.parse(configAttribute)
      );
    }

    this.shadow.getElementById("input").value = this.getAttribute("cellValue");

    const translatedList = this.shadow.getElementById("translated-list");
    const languageSelect = this.shadow.querySelector(".language-select");

    this.populateLanguageDropdown(languageSelect);
//THE REST OF THE CODE//
}

async connectedCallback(): This function is part of the OuterbasePluginCell_$PLUGIN_ID class is called when the custom element is connected to the DOM. It initializes the element's state, sets event listeners, and performs the initial rendering.


Populating our dropdown element with supported languages

async populateLanguageDropdown(selectElement): This asynchronous function populates a dropdown element with supported languages for translation. It fetches language data from the Google Translate API. At first, it receives language codes so it is essential to translate the codes to readable language names. That is why we have the following piece of code:

option.textContent = new Intl.DisplayNames(["en"], {
          type: "language",
        }).of(option.value);

Batch translation

async translateBatch(targetLanguage, listElement): This asynchronous function handles batch translation. It takes the targetLanguage and a listElement where the translated text will be displayed. It translates a batch of text entries and updates the UI with the translated results.

Significance of Batch Translation

    Batch translation, which involves translating multiple pieces of content simultaneously, has its own set of advantages and disadvantages. Here are the pros and cons of batch translation:

    Pros:

    Efficiency:

    Batch translation is highly efficient for translating large volumes of content in a single operation. It can save a significant amount of time compared to translating each piece of content individually.

    Scalability:

    It's well-suited for applications with extensive translation requirements, such as e-commerce websites with a vast product catalog or multilingual documentation.

    Consistency:

    Batch translation ensures a consistent translation approach across all content items, reducing the risk of translation errors or inconsistencies.

    Cost-Effective:

    In cases where automated translation services are used, batch translation can be cost-effective, especially for large-scale projects, as it can be less expensive per unit of content compared to manual translation.

    Faster Content Updates:

    When content updates are frequent, batch translation allows for quick updates across multiple languages without the need for manual intervention.

    Cons:

    Loss of Context:

    Batch translation may not capture the specific context or nuances of individual pieces of content. It can result in less precise translations, especially for content that requires context-awareness.

    Limited Control:

    Translating content in bulk may not provide the level of control that cell-by-cell or manual translation offers. It might not be suitable for highly sensitive or specialized content.

    Quality Variability:

    Automated translation services, commonly used for batch translation, may produce translations of varying quality, depending on the source language, target language, and content complexity.

    Initial Setup:

    Setting up batch translation processes, especially for custom or specialized applications, can require time and technical expertise.

    Post-Translation Review:

    Depending on the quality of automated translations, post-translation review and editing may be necessary to ensure accuracy and context preservation, adding to the overall time and effort.

    Language Support:

    Automated translation services may not support all languages equally, and some languages may have lower translation quality.


Cell-by-cell translation

async translateCell(targetLanguage, listElement): This asynchronous function handles cell-by-cell translation. It takes the targetLanguage and a listElement where the translated text will be displayed. It translates the content of an input field and updates the UI with the translated result.

Significance of Cell-by-Cell translation

    Cell-by-cell translation involves translating column cell values one after the other. When I started working on Tafsiri, I was hell-bent on doing cell-by-cell translation only. Why? Because it was the first element of my plugin that worked! After all, "if the code works, don't touch it". However, it's applicability in the real world is also essential.

    Pros:

    Precision and Control:

    Cell-by-cell translation allows for precise control over which text elements are translated. You can choose specific cells or content to translate, ensuring that only relevant text is processed.

    Contextual Understanding:

    It provides the opportunity to review and refine translations for each text element, ensuring that the context and meaning are accurately preserved. This is especially important for content that requires a high degree of accuracy.

    User-Friendly:

    Users can choose when and what to translate, making it a user-friendly approach. This allows users to read content in their preferred language selectively.

    Cost-Efficiency:

    For applications with relatively low translation needs, cell-by-cell translation can be more cost-effective compared to batch translation, as you only translate what's necessary.

    Cons:

    Manual Effort:

    Cell-by-cell translation can be time-consuming and labor-intensive, especially when dealing with a large amount of content. Each cell or text element must be translated individually.

    Limited Scalability:

    It may not be suitable for applications with extensive translation requirements. When dealing with a massive amount of content, manual cell-by-cell translation becomes impractical.

    Inconsistent User Experience:

    If not all content is translated, users may experience an inconsistent language experience within the same application, which can be confusing.

    Maintenance Challenges:

    Ongoing maintenance can be challenging as updates or changes to content may require manual re-translation, which can be error-prone.

    Resource Intensive:

    For multilingual applications with frequent updates or changes, dedicating resources to continuously manage cell-by-cell translations can be resource-intensive.


Hiding the selected language

hideSelectedLanguage(targetLanguage): This function hides the selected language from the language dropdown to prevent users from selecting the same language for translation.

  hideSelectedLanguage(targetLanguage) {
    const languageSelect = this.shadow.querySelector(".language-select");
    const options = languageSelect.options;

    for (let i = 0; i < options.length; i++) {
      if (options[i].value === targetLanguage) {
        options[i].style.display = "none";
      } else {
        options[i].style.display = "block";
      }
    }
  }

Translating our text

async translateText(text, targetLanguage): This asynchronous function translates a single text using the Google Translate API. It sends a request to the API with the text and target language and returns the translated text.

async translateText(text, targetLanguage) {
    try {
      console.log("Text to translate:", text);
      console.log("Target language:", targetLanguage);
      const endpoint = `https://translation.googleapis.com/language/translate/v2?key=${googleTranslateAPI}`;

      const response = await fetch(endpoint, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
        },
        body: JSON.stringify({
          q: text,
          target: targetLanguage,
        }),
      });

      if (!response.ok) {
        const errorResponse = await response.json();
        throw new Error(
          `Translation request failed with status: ${response.status}, Error Message: ${errorResponse.error.message}`
        );
      }

      const data = await response.json();

      if (
        data.data &&
        data.data.translations &&
        data.data.translations.length > 0 &&
        data.data.translations[0].translatedText
      ) {
        return data.data.translations[0].translatedText;
      } else {
        throw Error("Invalid response from translation API");
      }
    } catch (error) {
      console.error(`Error translating text: ${error}`);
      console.error("API Error Response:", await response.text());
      throw error;
    }
  }

Displaying a single translated text

displayTranslatedText(translatedText, listElement): This function displays a single translated text entry in the specified listElement.

  displayTranslatedText(translatedText, listElement) {
    listElement.innerHTML = "";
    const listItem = document.createElement("li");
    listItem.textContent = translatedText;
    listElement.appendChild(listItem);
  }

Displaying a batch of translated text

displayTranslatedBatch(translatedBatch, listElement): This function displays a batch of translated text entries in the specified listElement. It iterates through the translatedBatch array and adds each entry to the list.

  displayTranslatedBatch(translatedBatch, listElement) {
    listElement.innerHTML = "";

    for (const value of translatedBatch) {
      const listItem = document.createElement("li");
      listItem.textContent = value;
      listElement.appendChild(listItem);
    }

    listElement.scrollTop = listElement.scrollHeight;
  }

Handling changes in translation mode

handleTranslationModeChange(isBatchTranslation): This function handles changes in the translation mode (batch or cell-by-cell). It enables or disables the input field based on the `isBatchTranslation` parameter.

  handleTranslationModeChange(isBatchTranslation) {
    const inputElement = this.shadow.getElementById("input");

    if (isBatchTranslation) {
      inputElement.setAttribute("disabled", true);
    } else {
      inputElement.removeAttribute("disabled");
    }
  }

Registering our Custom Element

window.customElements.define(
  "outerbase-plugin-cell-$PLUGIN_ID",
  OuterbasePluginCell_$PLUGIN_ID
);
  1. window.customElements.define: This is a method provided by the Web Components API that allows you to define a new custom HTML element. It takes two arguments:

    • The first argument is the name of the custom element. In our code, it's "outerbase-plugin-cell-$PLUGIN_ID". This name should follow the custom element naming conventions and include a hyphen (e.g., <my-element>).

    • The second argument is the JavaScript class that defines the behavior and structure of the custom element. In your code, it's OuterbasePluginCell_$PLUGIN_ID, which is a JavaScript class.

  2. "outerbase-plugin-cell-$PLUGIN_ID": This is the name you're giving to your custom HTML element. Custom element names typically include a hyphen to distinguish them from standard HTML elements. You can later use this custom element in your HTML markup just like any other HTML element, such as <div> or <p>.

  3. OuterbasePluginCell_$PLUGIN_ID: This is the JavaScript class that defines the behavior of your custom element. It is expected to extend the HTMLElement class, which is a built-in JavaScript class that represents an HTML element. This class should contain the logic and behavior for your custom element.

By using window.customElements.define, you are essentially registering your custom element with the browser so that it knows how to handle it when it encounters the custom element in your HTML markup. This allows you to create reusable and encapsulated components in your web application.

If you do not add the line of code above, our plugin will be ignored.

That's it for our code explanation. There's a lot more to talk about but I think that is a great place to stop.


Demo

The full code can be accessed in my GitHub Repository. Test it out, create your version with even more features, and share it with the world!


Update...plus demo

Update! I finally got the select language mode to work in batch mode. Here's a demo video. The code for this feature can also be found on my GitHub Repository. Specifically, this file.


Conclusion

Accessibility and embracing other people's culture- more so the language they speak is essential. Our goal, ultimately, should not be to make everyone conform to one way of thinking or speaking but to embrace the fact that our differences make the world a very interesting place. Accommodating others where we can, especially with the advent of technology is the way to go. Different is not bad, different is special. Tafsiri is my contribution to embracing what is different. I hope it inspires you to do so too in your own way.