Adventures in Localization

19 Dec 2023

Garden Witch Life is the first title I worked on that got localized into multiple languages. I was solely responsible for providing the to-be-translated text to the publisher and implementing the translation that came back into the game. Over 40.000 words were translated into 12 languages. I learned a lot during this process and in this article tried to write down things I wish I knew before going into it.

Thanks

Before getting started I want to give thanks and credit to the following people:

I also want to make it clear that the problems documented in this article stem soely from my inexperience at the time. I don’t want to put blame on anyone else. Everyone involved in the process acted to the best of their knowledge and had nothing but understanding for any technical difficulties.

Other Resources

This article is not meant as a full tutorial on how to localize, there are already plenty of resources out there. I’m trying to fill the blanks I had after reading the other resources.

Expectations vs. Reality

For the longest time I was aware that Unreal had support for localization and assumed it was mostly automated. As long as I used FText variables (or simply Text in BPs) I could go into the Localization Editor, gather up all the words and translate them or send them to translators.

However, when the time came to prepare translations to send to translators everything fell apart and I had to work long hours to fix it in time to meet deadlines.

Unreal automatically works with .po files, which from my initial research seemed to be a common interchange format for translations. The file is essentially a long list of entries like this one:

#. Key:    C9AD374C498E67CC9B9F9F98E41D87A5
#. SourceLocation:
#. /Game/MagicItemShop/Blueprints/Items/BP_Potion.Default__BP_Potion_C.FriendlyName
#: /Game/MagicItemShop/Blueprints/Items/BP_Potion.Default__BP_Potion_C.FriendlyName
msgctxt ",C9AD374C498E67CC9B9F9F98E41D87A5"
msgid "Potion"
msgstr "Zaubertrank"

The really important parts are:

  • msgctxt: This is an identifier in the form “Namespace,Key” for a given piece of text
  • msgid: The original text
  • msgstr: The translated text

The other lines are comments that you can see in the Localization Editor in Unreal (and supposedly in other software that uses this format). From my understanding translators would be able to work with this info to sort through the text. This was right at the start of development in 2019.

When it came time to initially discuss translations with our publisher in 2021 I learned that the way data would be passed around was going to be via an Excel Sheet. So I went back to Google and found Denis Maltsev’s Translation Editor for UE4. I did a few tests with it and it worked very well. It provides a way to handle Excel files and the import / export process worked like a charm.

Edit: After reading this section our producer remarked that we actually could have used po files. Neither of us could remember who mandated the Excel Sheet back then.


Development continued until 2023 when the milestone for translations was next. At this point I was given a pre-formatted Excel sheet by our publisher. And it was not compatible with the Translation Editor. So I took the Excel export from the editor and tried to manually copy over the lines into their respecitve columns. It was at this point I realized that only the text keys remain to identify and sort the text.

All of them look something like this: “C9AD374C498E67CC9B9F9F98E41D87A5”.

My mistake was assuming that the auto generated keys would have some sort of grouping or order to them. In hindsight this is something I could have caught early, but there were too many other things requiring my attention before translations became the top priority.

Now I had to go through all of the text variables in our game and assign proper keys to over 6000 entries strewn all over the project. Overall this process took weeks.

Tip: Do a translation test run early. It might seem wasteful to translate portions of your game that are still up to change, but it can save you a lot of time to make sure the pipeline works as intended.


Generating Text Keys

A lot of assets in the game are id based. Items, Mails, Quests utilize Primary Asset Ids. That means these objects already have a unique (Type, Name) pair that they are addressed by. I wrote a small piece of C++ code that updates a Text namespace and key to match given values and I made these values dependent on the asset id.

This is the snippet inside a Blueprint Function Library that changes the keys. Technically only FText::ChangeKey is required. However I update these values whenever an asset is saved and this can happen while cooking content. To avoid warnings I made sure to only update the data if needed.

MisUtilityFunctionLibrary.h
/**
 * @brief Changes the namespace and key of a given Text property, but only if they are different from the given ones.
 * @param InOutText Text property to update namespace and key for
 * @param NewKey Key to set for text property
 * @param Namespace Namespace to set for text property
 */
UFUNCTION(BlueprintCallable, Category="FreetimeStudio|Editor")
static void UpdateTextKeyIfDifferent(UPARAM(Ref) FText& InOutText, const FString& NewKey, const FString& Namespace);


MisUtilityFunctionLibrary.cpp

void UMisUtilityFunctionLibrary::UpdateTextKeyIfDifferent(FText& InOutText, const FString& NewKey,
								const FString& Namespace)
{
#if WITH_EDITORONLY_DATA
	const TOptional<FString> ExistingKey = FTextInspector::GetKey(InOutText);
	const TOptional<FString> ExistingNamespace = FTextInspector::GetNamespace(InOutText);
	if(!ExistingKey.IsSet() || ExistingKey.GetValue() != NewKey || ExistingNamespace != Namespace)
	{
		InOutText = FText::ChangeKey(Namespace, NewKey, InOutText);
	}
#endif
}

Updating the text happens when an asset is saved. I’m not guaranteeing that this is the best place to do this, but from my tests this is where it works:

  • Actors: PreSave
  • Data Assets: PreSaveRoot

Alternatively the function can also be called during Construction Scripts or other initializers.

void UMisMailMessageDataAsset::PreSaveRoot(FObjectPreSaveRootContext ObjectSaveContext)
{
	Super::PreSaveRoot(ObjectSaveContext);

#if WITH_EDITORONLY_DATA
	//Get Asset Id
	const FPrimaryAssetId AssetId = GetPrimaryAssetId();
	
	//Convert Asset Id Type,Name to "Type_Name" prefix for key
	const FString TextKeyPrefix = AssetId.PrimaryAssetType.ToString() + FString("_") 
							+ AssetId.PrimaryAssetName.ToString();

	//Change Text data to (Mail, Type_Name_Title)
	const FString TitleKey = TextKeyPrefix + FString("_Title");
	UMisUtilityFunctionLibrary::UpdateTextKeyIfDifferent(Title, TitleKey, "Mail");

	//Change Text data to (Mail, Type_Name_Message)
	const FString MessageKey = TextKeyPrefix + FString("_Message");
	UMisUtilityFunctionLibrary::UpdateTextKeyIfDifferent(Message, MessageKey, "Mail");
#endif
	
}

With this in place key data gets updated automatically and requires a simple “Resave All” for existing assets.

Changed Text Key and Namespace

There is one drawback to this approach! If translations are already done and an asset is renamed this will break the assigned translations.


String Tables

Purely from a data organization viewpoint String Tables are the way to go for every text. However in practice there were multiple people involved in the creation of items, mails, quests and dialogue and I decided against trying to consolidate everything into one or more tables. Instead I went with the automatically generated key approach for those assets.

Another area that needed a lot of text was UI though. Here I was the only one adding text and a lot of elements share the same texts. The setup was a bit more involved because each text value needs to be assigned manually, but the benefit is that translators don’t have to translate the exact same text multiple times.

Screenshot of a String Table editor in Unreal

It is also possible to load string tables directly in C++ using csv files. However no matter what I did I was not able to get the text gather to pick up on those files. I ended up importing all csv entries into a string table asset


Automating Import

After realizing that I would not be able to use the Translation Editor to import translations I wrote a python script to take care of this step. It was quite a bit of trial and error, but thanks to OpenPyXL for reading xlsx and Babel being able to handle po files, I managed to automate the process.

The exact Excel sheet is under wraps, but the format was roughly like this: A text id per row with languages in columns. The file also had multiple work sheets for different categories.

Key ... en de ...
UI,CommunityBox_Report ... Earnings Ertrag ...
UI,CommunityBox_Sell ... Sell Verkaufen ...
UI,Container_Chest ... Chest Kiste ...

The goal was to make the corresponding entries in the po files look like this:

#. Key:    CommunityBox_Report
#. SourceLocation:
#. /Game/MagicItemShop/Localization/StringTables/ST_UserInterface.ST_UserInterface
#: /Game/MagicItemShop/Localization/StringTables/ST_UserInterface.ST_UserInterface
msgctxt "UI,CommunityBox_Report"
msgid "Earnings"
msgstr "Ertrag"

#. Key:    CommunityBox_Sell
#. SourceLocation:
#. /Game/MagicItemShop/Localization/StringTables/ST_UserInterface.ST_UserInterface
#: /Game/MagicItemShop/Localization/StringTables/ST_UserInterface.ST_UserInterface
msgctxt "UI,CommunityBox_Sell"
msgid "Sell"
msgstr "Verkaufen"

#. Key:    Container_Chest
#. SourceLocation:
#. /Game/MagicItemShop/Localization/StringTables/ST_UserInterface.ST_UserInterface
#: /Game/MagicItemShop/Localization/StringTables/ST_UserInterface.ST_UserInterface
msgctxt "UI,Container_Chest"
msgid "Chest"
msgstr "Kiste"

The idea of the import script is straight forward. Read a row in the sheet, find the corresponding key in the different langauge files and assign the value in found in that language’s cell.

#!/usr/bin/env python3
# Extra modules required for this script
# pip install openpyxl
# pip install babel

import json
from babel.messages import pofile
from openpyxl import Workbook, load_workbook
from io import BytesIO

# Set localization sheet path:
localization_sheet_path = "some/where/in/the/cloud/Localization/"
localization_sheets = [
    "Garden Witch Life - Main Localization.xlsx", 
    "Garden Witch Life - Achievements Localization.xlsx",
    "Garden Witch Life - Additional Localization 1.xlsx", 
    "Garden Witch Life - Additional Localization 2.xlsx"
]

#Path to folder where the .uproject file is in
unreal_loc_path = "some/where/local/UnrealProject/Content/Localization/Game"

# Map all Unreal cultures to columns in the translation sheet
culture_map = [
    ["en", "F"], # E before proofreading, F after
    ["de", "G"], 
    ["..", ".."] # Additional language mappings
]

# Sheet includes header info that is not part of the translation data
first_value_row = 4

# Worksheet Ids, name of each sheet
worksheet_id_UI = "Menu"
worksheet_id_Characters = "Characters"
worksheet_id_Quests = "Quests"
worksheet_id_Dialogue = "Dialogue"
worksheet_id_Items = "Objects"

# Put them all in a list for iteration
worksheet_ids = [
    worksheet_id_UI, 
    worksheet_id_Characters, 
    worksheet_id_Quests, 
    worksheet_id_Dialogue, 
    worksheet_id_Items
]

# All texts in unreal are identified by a combination of Namepsace:Key
# Map all Unreal namespaces to sheet ids, e.g. all text with the namespace UI 
# will be found in the work sheet worksheet_id_UI
namespace_map = {
    "UI" : worksheet_id_UI,
    "Character" : worksheet_id_Characters,
    "Creature" : worksheet_id_Characters,
    "Quest" : worksheet_id_Quests,
    "Mail" : worksheet_id_Quests,
    "Dialogue" : worksheet_id_Dialogue,
    "Item" : worksheet_id_Items,
}


# This is where all the localization data from the sheets will be stored
# in the form database[ "en" : { "id1" => "text", "id2" => "text2" } ]
loc_database = {}

# Populate the table with all the cultures
for culture in culture_map:
    loc_database[culture[0]] = {}

# First step is to iterate over all localization excel files 
# and gather all the localization data
for localization_sheet in localization_sheets:
    workbook_path = localization_sheet_path+localization_sheet
    
    print("Loading "+workbook_path)
    
    workbook = load_workbook(filename=workbook_path)

    # Iterate through all work books
    for worksheet_id in worksheet_ids:
	
        # Some of the sheets only contain partial data, 
        # so skip reading if one is missing
        if worksheet_id not in workbook:
            print("Skipping missing sheet Id "+worksheet_id)
            continue
        
        # Get the sheet and read all rows
        sheet = workbook[worksheet_id]
        
        # Start reading data after header row
        row = first_value_row
        more_data = True
        
        while more_data:
            # Check if the id cell has a value, after a certain point 
            # there are no more values in the table
            row_str = str(row)
            key = sheet["A"+row_str].value
            if key == None:
                more_data = False
                break

            # Iterate through all cultures in map and read the cell values into the database
            for culture in culture_map:
                translation = sheet[culture[1]+row_str].value
                if translation != None:
                    loc_database[culture[0]][key] = translation
                else:
                    # Empty cell
                    loc_database[culture[0]][key] = ""
            
            row += 1

# Next step is to write the read data into the po files
for culture in culture_map:
    culture_id = str(culture[0])
    po_path = unreal_loc_path+"/"+culture_id+"/Game.po"

    # Open the file, make sure to use UTF-8 encoding, 
    # otherwise some language will error out
    with open(po_path, 'r', encoding='utf-8') as fh:
        catalog = pofile.read_po(fh)
        
        # Each entry in the po file is a message, and the key is called 'context'
        for message in catalog:
            if message.context in loc_database[culture_id]:
                translation = loc_database[culture_id][message.context]
                if translation:
                    message.string = str(translation)
                else:
                    message.string = ""
            
        
        # Write po file into bytes first to avoid any encoding problems
        buf = BytesIO()
        pofile.write_po(buf, catalog)

        # Write buffer into file, overwriting po with new data
        print("Writing file "+ culture_id)
        with open(unreal_loc_path+"/"+culture_id+"/Game.po", 'wb') as out:
            out.write(buf.getbuffer())
            

Fallback Fonts

When defining a font for text Unreal has the option to define fallbacks for characters that are not included in the primary font. Honestly I don’t have much to say about this, it mostly worked as intended.

I stumbled over one curious case though: From what I understand Japanese and Chinese share some charsets. However when configuring the character ranges I either had the case where Chinese was missing symbols because the font would switch to the Japanese fallback or Japanese was missing symbols because it was using the Chinese fallback.

Constraining the Japanese font to the ‘ja’ culture fixed it. This is the feature working as intended, I was just not aware of its existence until I hovered over the text box.

Screenshot showing a Fallback Font in Unreal. The culture override ar is highlighted

Tip: When in doubt, Google Noto has all the glyphs.


Pitfalls

In this section I’ll list any other smaller issues that came up during production.

Other platforms, other terminology

If you plan to release your game on consoles you might need to adhere to much stricter guidelines than if you release just on PC. Simple things like “Press Any Key” can get flagged as improper terminology on a console, because it should be “Button” instead. Sadly specific examples are under NDA, so make sure you read up on the available documentation.

In such cases I added a TMap<FString,FText> PlatformOverrides property. It allowed me to add a different text for a specific platform that can be queried with GetPlatformName.


The Localization System picks up on Niagara assets

Make sure to not include folders with Niagara Particle System assets. Otherwise there will be some system text in the gathered localization.


Languages other than English care about gender a lot more

English is very relaxed when it comes to gendering, almost everything is neutral or can be expressed in a neutral way. However it is very important to let translators know which gender characters have in text referencing them. A lot of things can change based on that and translators need to be able to know who is talking, who they are talking to and what about. If your player can pick their character’s gender make sure to pass along a variable translators can use.

Let them know how to use variables by showing them the documentation or by providing your own notes tailored to the specific use cases. There is a great article on how to pass player gender to text.

For Garden Witch Life’s dialogue I provided the following list of variables. Leaving the player gender as simply g was done to make the translators life easier, since a lot of dialogue required its use.

{GetPlayerName} The name the player chose during character creation. They are free to type their own.
{GetSpeakerName} Name of the person speaking the current dialogue line. Might be an NPC, might be the player.
{g} Player’s chosen gender / pronoun.
{GetSpeakerGender} Gender of the person speaking the current dialogue line. Might be an NPC, might be the player.

Be prepared to answer questions

The translators will most likely not know what text is referencing and they will ask about it. Probably 90% of questions we got was about what something looked like. I ended up writing an editor utility script to export all the items into a long table that contained their icon, id and display name.

A table showing some cinnamon themed tables and cinnamon powder with their respective ids and names


Be wary of missing text

At the start I wanted to do everything very cleanly. Added all the folders with Blueprints and Data Assets in them so the Text Gather would pick them up. Later I realized that this makes it much easier to miss something. An asset could be moved, a new folder created without the path being added.

Having a clean folder structure is a good goal, however in reality it’s better to be on the save side. I added the entire Content folder to the Include Path Wildcards and only selectively removed folders using Exclude Path Wildcards when something turned up in the text export that was not supposed to be there.

If you add plugin folders through the localization folder it will do so without complaint, but the text gather will silently fail and not include assets in plugins.

Tip: You can use -leet as a launch parameter to leetify any text that is part of the gathered text.


In my case however there was so much text that only shows up in very specific states of the game I would never get to test it all in a reasonable time frame.


Same keys = Same text

Not so much a problem, but it’s good to be aware that Unreal will consider two pieces of text the same if they have the same namespace and key. This can be useful when copy pasting text to different places in the game. However in such a case it is probably better to use a String table entry instead.


You may need to update localization over and over

The ideal path for translation was

  1. Export English texts
  2. Get proof reading done
  3. Get translations done
  4. Import to Unreal

The actual path was more like:

  1. Export English Texts
  2. Send to be proof read
    • 2.1 Answer Questions for proof readers
    • 2.2 Get proof read results
  3. Send to translators
    • 3.1 Answer questions from translators
  4. Notice there was a miscommunication whether or not it should be American or British English
    • 4.1 Send back to proof reading
    • 4.2 Receive updated results
  5. Get first part of translations
    • 5.1 Send update to translators
    • 5.2 Answer questions from translators
  6. Import first batch of translations
  7. Realize some pieces of text are missing
  8. Export new pieces in separate sheet
  9. Get new sheet proofread
  10. Send new sheet to translators
    • 10.1 Answer questions from translators
  11. Get feedback from LQA testers that some things are worded strange
  12. - ???. End up sending lots of documents back and forth, have multiple corrections and now there are four sheets.

Conclusion

So there you have it. Writing it down after the fact makes it all feel very straight forward. In reality I thought I was prepared and quickly realized I wasn’t and it got very chaotic. But that’s game dev so what else is new? :D

I hope that my experiences and solutions can help someone else to avoid these issues.