AI Manga Translator: Combining Programming, AI, and Manga Translation

April 7, 2024

About

At our MVP building agency, we thrive on turning ambitious ideas into functional, scalable products. Here’s a glimpse into how we approach challenges and deliver innovative solutions, demonstrated through a recent project that combined programming, AI, and manga translation.

The Challenge: AI Manga Translator

On discovering the Spheron Network’s bounty challenge, we explored their tasks and found one that resonated deeply with us as anime enthusiasts: building a manga translator. Although we had no prior experience in this domain, we were determined to figure it out as we went along.

The Approach

The plan was simple yet systematic—mimic the process a human would follow to translate a manga:

Identify speech bubbles in the manga panels.
Extract the text and mark their corresponding bubble locations.
Translate the extracted text.
Reinsert the translated text into the speech bubbles after removing the original Japanese text.

The Execution

Extracting Speech Bubbles

For those unfamiliar, manga panels typically feature text within "speech bubbles." The first step was to extract these bubbles from the artwork. This task resembled a computer vision challenge.

Influenced by a prior surveillance project where our team used machine learning for object detection, we decided to apply a similar approach to detect speech bubbles.

Dataset: We used a specialized manga speech bubble dataset (source).
Model: YOLO (You Only Look Once) is commonly used for detection tasks but needed fine-tuning to perform well on this specific dataset.

Training the Model

Fine-tuning the YOLO model involved training it with the manga dataset. The training environment was unconventional, leveraging a decentralized GPU network for cost-effective compute power. This enhanced the model's ability to detect speech bubbles accurately.

Translating the Text

For translation, we used the deep-translator package by Google. This straightforward approach ensured accurate conversion of Japanese text into English.

In-Painting

In-painting involves filling in the gaps left by removed Japanese text. While advanced methods like LaMa (Resolution-robust Large Mask Inpainting with Fourier Convolutions) are available, we opted for a simpler approach using Python's OpenCV package for this project.

Text Placement

The final step was reinserting the translated text into the speech bubbles. This required:

Measuring each speech bubble’s dimensions to determine space constraints.
Adjusting text size, spacing, and wrapping to fit within the bubble boundaries.
Centering the translated text for a natural appearance.

Building the Web App

To make the solution accessible, we built a web app that communicates with the inference server. Users upload a manga image, and the app returns the translated version, complete with English text in the original speech bubbles.

The Outcome

This project, nicknamed "Otakuverse," demonstrated the power of combining AI, vision technology, and agile MVP development. It wouldn’t have been possible without the wealth of research and resources available in the open-source community.

Why MVP Matters

At our agency, this project underscores the essence of MVP building:

Iterative Development: Starting with a basic plan and refining it along the way.
Resourcefulness: Leveraging existing tools and datasets for faster results.
Focus on Usability: Ensuring the solution is functional and accessible to users.

Conclusion

If you’re looking to transform your idea into reality, our agency can help you build innovative MVPs like the AI Manga Translator. Whether it’s leveraging cutting-edge AI or crafting user-friendly apps, we bring your vision to life.