Cross-Language Mappings: A TIC Collaboration with voxANN

Funded by Innovate UK and the Greater Manchester Combined Authority (GMCA), Research IT has played a key role in the Turing Innovation Catalyst (TIC) programme. As part of the TIC collaboration, Research Software Engineers (RSEs) from RIT, together with academics from the Department of Computer Science at the University of Manchester, partnered with local AI company voxANN. The company is developing an AI-powered localisation platform for the media and entertainment industry, using machine translation and expert input to dub content for international audiences.

The Challenge

The project's core challenge was to use AI to automatically identify semantically equivalent text chunks between languages using different grammar, structure, or word order, initially focusing on English and Spanish. This is essential for accurately transferring annotations — such as tone, pace, and emphasis — from the original language to the correct corresponding segments in the target language, enabling high-quality automated dubbing and voice synthesis.

This task is difficult for several reasons:

Manual effort is impractical: Manually translating and re-annotating text is too slow, expensive, and error-prone to be a scalable solution.
AI model limitations: Large language models (LLMs) often struggle to align small, annotated text fragments, especially between languages with different grammatical structures.
Data scarcity: Specialised, human-reviewed datasets needed to build and evaluate such a pipeline are not readily available.
Semantic sensitivity: Minor changes in translation—such as reordering words—can alter the intended meaning and tone of annotations.

The software pipeline was designed to automatically map an annotated English phrase (e.g., "this realm and burn") to its corresponding Spanish sub-chunk (e.g., "este reino e incendiarlo"), as illustrated in Figure 1.

A sentence translated from Spanish to English with a section highlighted in red

Figure 1
Sub-chunk Identification in a translation

What Did We Do?

Benito Matischen and Chandima Samarakoon led the technical execution, delivering a flexible and reproducible pipeline which integrates multiple large language models (LLMs) from OpenAI, Hugging Face, and Ollama using the LangChain framework to manage prompts and model seamlessly. Key contributions included:

Versatile Model Pipeline: A system using LangChain to seamlessly switch between different AI models (including OpenAI's GPT-4o, Hugging Face, and local Ollama models), optimised with few-shot learning and GPU acceleration on the UoM’s Computational Shared Facility (CSF).
Automated Quality Checks: An automated process uses a pre-trained SBERT model to assess the semantic similarity between the original English chunk and its Spanish translation.
"Round-Trip" Retranslation: For a deeper quality check, the pipeline translates the Spanish chunk back into English and compares it with the original to measure any alteration in the meaning, as depicted in Figure 2.
A "Traffic Light" System: Based on the similarity scores, each translation is categorised as Green (a good match), Yellow (a reasonable match), or Red (needs manual review), ensuring only high-quality alignments are accepted automatically. See Figure 3 for an example.

A short string of Spanish words are translated into English and then back into Spanish and compared

Figure 2
Chunk Similarity Evaluation

A table demonstrating the semantic and lexical similarity of 2 pieces of text

Figure 3
Lexical vs Semantic Similarity in Chunk Pairs

The Impact and Outcomes

The project delivered a functional prototype that successfully aligns annotated English and Spanish text. The primary outcomes include:

Reduced manual effort: The system streamlines multilingual workflows, significantly reducing the need for manual annotation.
A scalable approach: We demonstrated a repeatable alignment method that can be expanded to other language pairs.
Reusable assets: The project generated a valuable codebase, datasets, and evaluation tools to support future enhancements.
Deeper AI insights: We gained a better understanding of model limitations, guiding the development of more trustworthy AI.
Stronger collaboration: The project enhanced the shared capability of our researchers and RSEs by blending linguistic reasoning with technical execution.

Reflections and Lessons Learned

Fundamentally, the project's success was rooted in the close, continuous collaboration between our RSE team, voxANN, and the Department of Computer Science.

Key learnings from the project highlight the factors that most influenced its success. Regular check-ins and a co-design approach proved fundamental, showing that collaboration drives better outcomes. Re-translation was an effective way to validate chunk alignment, but it also revealed the limitations of current large language models and evaluation methods, making self-validation a double-edged sword. Carefully crafted, few-shot prompts significantly boosted model performance, reinforcing the importance of prompt engineering. The use of LangChain enabled rapid experimentation and development, demonstrating how the right tools can maximise flexibility. Finally, sourcing high-quality Spanish data emerged as a major challenge, emphasising the need to secure robust multilingual capacity for future projects.

Working on a real-world task was an excellent opportunity to test the boundaries of different LLMs, compare different models including open-source and commercial, and develop efficient evaluation strategies and data that can provide confidence in the pipeline’s output. He also praised the collaborative and agile delivery of the pipeline by the RSE team. The project demonstrated a model for efficient and successful partnership between research and RSE teams, that we should utilize in future AI-related projects with industry partners.

Prof. Goran Nenadic, Dept. Computer Science, University of Manchester and Project Academic Lead

Our collaboration with TIC came at a formative time for both organisations: TIC, a new government-backed initiative positioning the UK as a world leader in AI, was establishing itself as a hub for academic–industry partnerships, while voxANN was evolving rapidly, focused on building AI-powered tools that enhance human workflows and expand localisation capacity.
The project explored cross-language mapping of annotations – a future-facing R&D area for voxANN – and, while distinct from Align, our first product to market, it shared enough technological overlap to yield immediate platform-wide benefits.
From the outset, the project was clearly scoped, well-communicated, and underpinned by a strong sense of shared purpose. Communication was open, tools transparent, and regular check-ins ensured issues were surfaced early and collaboratively resolved. The final deliverables stood out for their clarity and usability, and crucially, voxANN retained ownership of the IP and proof-of-concept – a strategically valuable outcome we’re already building on. We’re grateful to TIC and The University of Manchester team for their collaborative spirit and technical rigour; it was a rewarding experience that helped bring the future of localisation a step closer.

Robin Tong, Technical Co-Founder, voxANN

Looking Ahead

Future work will focus on extending the pipeline to support multiple languages and continuously benchmarking its performance against new LLMs. Further gains could be achieved through deeper optimisation, such as fine-tuning models or developing more advanced custom prompts to evolve the prototype into an even more robust and versatile localisation solution.

If you’re interested in working with the RSE department on an AI-related project (large or small), get in touch with us via Connect and we’ll organise a meeting to discuss how we can work together. If you’re interested in getting involved in TIC’s CR&D projects, drop them an email for more information.