A New Way of Playing Japanese and English Games Side-By-Side

31 Comments
NOTE: This article is quite old, and I’ve developed new and better software in the years since. My newer software handles everything listed here, but much better and with more functionality – see here!

For the past two months, I’ve been streaming a variety of games every weekday to gather screenshots, info, and more for all the things I write about. It’s hard to analyze a game’s localization without playing the original version and its localized version, though, and playing the same game twice takes twice as much time and work. So, to save time, I developed some “stream magic” that displays a game’s Japanese text while I play the English version on stream.

Stream Magic Examples

Here’s an example from our Breath of Fire II streams:

I didn't have high hopes when I started this game, but by the end my opinion had changed 100%

And an example from our A Link to the Past streams:

I was too lazy to have the script displayer implement the player's name too, although that'd be easy enough

In all, it’s super-useful for comparing lines of text side-by-side, in context, and live for others to discuss!

How It Works

I regularly get questions on how this all works, but it’s pretty complicated to explain quickly. I’ll cover the details in a bit, but the simple answer is this: it’s a bunch of custom, experimental programming built on a shaky foundation and is barely held together by a bunch of virtual string and tape. It also has to be reprogrammed to work with each game, and that reprogramming is tough.

There’s been some interest from language students – a tool like this would be a really nice and handy learning tool. As it is right now, though, the program isn’t ready for public use. There are potential security concerns, it crashes a lot, and it could possibly ruin your OS. I’ll explain more below, but this is why I haven’t released a version for others yet.

The rest of this article is about the more technical side of things, but I’ve tried to present it in a simpler way without all the computer science jargon. If it still sounds like crazy talk, though, it’s because it is crazy! I fully admit that I’m a weak programmer and that my design decisions will make most programmers laugh and/or cry, but I also hope that it might get the creative juices flowing for other programmers out there.

The Technical Concept

Let’s take a look at how the Breath of Fire II text displayer came about. The core idea here is actually pretty simple:

  1. When an emulator is running a game, detect when a line of text is being displayed on the screen
  2. Identify the current line of text being displayed
  3. Take that identifier, load up the original, untranslated game, and then locate the equivalent line in the game’s data
  4. Extract the untranslated line’s data, convert whatever needs to be converted, then display it on screen
This could be applied to non-emulator games too, of course, but for now I’m only going to focus on console emulators.

In all, it’s not a complicated process – Steps #1, #3, and #4 are pretty much straight out of ROM Hacking 101. Step #2 is the trouble spot, though – we need a way to identify the line and then communicate that info to our text displayer program. Ideally, whenever an English line is being displayed, its ID number would be sitting in RAM somewhere. Then we’d just need to find it and check it when needed. To accomplish this, we need a way to read the game’s RAM and CPU status from outside the game.

Reading Memory

There are a couple options for reading an emulator’s info live. First, I looked into Super NES emulators that include LUA scripting capabilities, as that would make things infinitely easier and more accurate. I checked all kinds of emulators, including different versions of the same emulator, but was surprised by how poor the documentation was and/or how poor the support was. I even looked at non-English emulators I’d never heard of before. Sadly, this turned out to be a dead-end for me, at least for Super NES stuff. NES support looks much more promising, though.

Another option I investigated was taking an open source emulator and modifying it as I needed. This would’ve been great, but I’ve never, ever had a good experience with this sort of thing. This would’ve been a nice solution too, but in the end I dropped the idea. I did use some of what I learned for later work, however.

Eventually I decided to go with a far less accurate solution, based on things I made a while back for my old personal streams. Some examples:

Ninja Gaiden Data Overlay

I once made some stream overlay stuff that read an NES emulator’s memory and used it to create an automatic death counter, game play timer, and death taunter. I also added some bells and whistles, like showing what items are in each item container:

I know that LUA emulators and other programs can do this stuff and much much more, which is why I'm gonna look into those more when I do NES stuff. I do like the freedom and flexibility my custom system allows for, though

Final Fantasy VI T-Edition Data Overlay

I once created a stream overlay that showed enemy stats. Since I was playing a Japanese ROM hack, I also had the overlay display simple terms in English, like spell and enemy names.

Seriously, this was one of the best games and hacks I've ever played, it's gonna be hard for me to go back to vanilla FF6

These are very simple things, though; I never intended for my old program to handle more complex data. But I decided to see how far I could take it.

It sounds crazy, but it’s possible to have a program that can read other programs’ RAM. My old stream programs used this to hook into an emulator’s RAM, and then did some magic to locate the simulated SNES’ RAM within the emulator’s RAM. For the curious programmers out there, look into the “ReadProcessMemory” function – I’d never heard of it until just a year or two ago and thought it wasn’t even possible!

Messing with other programs’ memory is obviously a security and stability concern, which is one reason why I’m hesitant to release these tools just yet.

Identifying a Line

The key to everything was finding the current line’s unique ID. Ideally, the game might have a line ID number stored in RAM somewhere. Unfortunately, after a lot of reverse engineering, I learned that Breath of Fire II’s text system isn’t so generous. As a result, I spent a day or so poking at the game’s code to get as close to a line ID as I could. I wasn’t familiar with SNES programming code, but my time with MOTHER 3’s translation helped a lot. I also needed to learn the English ROM’s layout, particularly the script’s location. And then I needed to do all of this over again, but with the Japanese game.

There’s a major flaw with my approach, though: the game’s RAM changes thousands of times a second, but my program can only check it a few times a second. That means things in RAM can change between checks, which makes things even tougher and more unreliable. If you’ve ever heard of ROM hacking before, this is like that, but imagine that the ROM is changing thousands of times a second. I call it “RAM hacking”.

I didn't mention it in the main text, but the SNES ASM hacking/tracing tools are such a weird mess right now - I look forward to whatever that fork of bsnes can conjure up!

Eventually, I managed to locate a “quasi ID” number that I could depend on some of the time, but not all of the time. Then I had to figure out what to do when I couldn’t trust the number. And then I had to repeat that a few more times.

Many days were spent staring at stuff like this to learn the game's programming and RAM usage

After enough of this, I could tell which line of text was being printed most of the time. And sometimes, even when I would get an ID number I could rely on, the data would change before I could use it. My quick and dirty fix to this was to have the program “refresh” everything and try it again when a certain button was pressed. It’s basically the equivalent of kicking the computer to make it work right when it messes up.

This is the key moment where the text begins to get loaded, but the 20 pages before it is full of tough nonsense I had to figure out. It felt like trying to decipher old hieroglyphs from scratch

In all, this was a very messy process and solution for Breath of Fire II. Luckily, it’s been a bit cleaner with A Link to the Past.

Text Conversion

With the line ID in hand, the next step is to load the original, untranslated game and locate the equivalent line in it. This is standard ROM hacking, so it wasn’t too tough with Breath of Fire II. Once the line is located in the game, it’s time to read that data.

Older games rarely used a standard text format, so I needed to convert all of the Japanese characters into a modern, standard format that computers can use. This is also a standard part of ROM hacking, so I looked online to see if Breath of Fire II’s Japanese text format had already been documented somewhere. Luckily, I found the needed info on a Japanese site, so I plopped it into my program. But I soon learned that you can’t 100% rely on outside sources – the info on the Japanese site had many mistakes. So I then had to go back and decipher the game’s Japanese text format myself from scratch. I’ve posted my documentation here.

I feel like one of these days I should talk more in depth about retro games, their fonts, and their text encoding

There’s another hurdle in the process, though: game scripts are filled with text data, but they’re also filled with non-text information known as “control codes”. I found rough documentation online about Breath of Fire II’s Japanese control codes too, but quickly learned that it was incomplete and wrong in places, so I had to document them all from scratch as well. Basically, by the end of all this, I had re-created Breath of Fire II’s text engine in my own program.

Text Display

Now that I could locate an equivalent Japanese line, read it, and convert it, all that remained was to display it on the screen. I used to program many games in XNA/MonoGame years ago, so I was most comfortable with using that to draw things to a window. But here was another problem: I had no experience with displaying Japanese text using XNA/MonoGame. My stream overlay design only has room for a small window pane, so formatting/wrapping the text as needed was also important. I wasn’t sure what to do at this point, but then I realized I could let the stream software handle that for me.

OBS (the streaming software I use) can display text files, and it will even refresh what’s shown whenever a text file gets updated. So I decided I’d have my program write the Japanese text to a text file, and then I could have it display the text file “on top” of my program’s window pane. This saved me a lot of time and worked pretty well. Of course, this meant that the program itself doesn’t do any of the text display, which made for a weird, user-unfriendly setup.

Laziness is the mother of ingenuity

But now there was a new, unexpected problem: I personally couldn’t see the Japanese text very well when streaming – it appeared very tiny in the streaming preview window. Eventually, I realized I could use a plugin for Notepad++ that would refresh a text file whenever it got updated elsewhere, and then just have Notepad++ off to the side while playing a game. This worked okay, but the plugin only checked for text file changes every 3 seconds, which was too long for my purposes. I had to figure out how to hack the plugin to drop it from 3 seconds to 1 second or something less.

With all of these pieces in place, I could display the Japanese line of text on a stream whenever an English line appeared. It didn’t work all of the time, but I could kick it to make it work properly whenever it messed up. My Frankenstein experiment worked!

Enemy Data

A few days later, I realized it would also be nice to see the Japanese enemy names in battle somehow. Enemy name localizations can be fascinating, so I got to work. I essentially had to do most of the above steps all over again, but things were even worse this time: the enemies’ IDs never stayed in the SNES’ RAM for more than a millisecond – too short for my program to reliably read them. And once the enemies were loaded into RAM, there was no way to identify who was what. They just became a blob of stats.

It felt like I had hit a dead-end – if I couldn’t even identify the enemies in the English game, there’d be no way to find their names in the Japanese game. Then a weird solution hit me: I could “create” my own IDs for every enemy! Although the enemies’ IDs don’t stay in RAM, the enemies’ stats do. So I could glue a lot of these numbers together to create a long string of numbers, sort of like “MaxHP-MaxMP-Exp-Defense-Offense”, but with even more stats. Under ideal conditions, this would give every enemy a unique identifier… assuming no enemies shared the exact same stats.

I whipped up a quick tool to read an enemy’s stats from the Japanese ROM, write its Japanese name to a text file, and then name the text file whatever that enemy’s “stat ID” was. This resulted in several hundred individual text files that I put in a dedicated folder.

In fancier terms this ID creation thing is a form of 'hashing'

I then updated my stream overlay program to detect which enemy is currently selected during battle. It would read that enemy’s stats from RAM, combine them all into the crazy “stat ID”, and then try to load a text file by that name. If it succeeded, it would then load the Japanese name stored inside the text file. If it didn’t succeed, then there was no other way to figure out the Japanese name, so the program would simply state that the enemy “doesn’t exist” in the files.

Ego Orb is that you

As always, a new problem appeared: I had already hacked the English game to increase the money and experience that enemies dropped. I had done this to get through the game more quickly, since my main goal was to get screenshots and info as efficiently as possible. I had to take this into account and redump all the enemy files.

After this, everything worked pretty well! This solution for giving enemies a unique ID had several risks though:

  • It assumed that absolutely no enemy stats were changed during localization, which is a big assumption to make
  • It also assumed that the stats in RAM were static and reliable – but that’s a big assumption to make when it’s common for enemies to have multiple forms and have stat buffs/debuffs at unexpected times
  • If any enemies happened to share the exact same stats, then they’d end up sharing the same text file… and thus at least one of them would be misidentified

Luckily, things worked out ~95% of the time:

I didn't expect this game's enemy name translations to be as bad as they were - I wish I had implemented this sooner to see them all in action!

The remaining issues involved a bunch of enemies that I later realized went unused in the game. I also learned how to display Japanese text with XNA/MonoGame by this point, and since the short, single-line names didn’t require much text formatting, I used that knowledge here.

Bring It All Together

All of these programs and files are strewn about, so to make it all look seamless, I loaded up the streaming software and aligned everything just right. It’s a bit difficult to explain in words, so here it is in action:

Not seen: a few other programs and files needed to make it all look like it's a simple, single program

Everything that’s magenta gets turned transparent, kind of like a green screen. This means I could also display stuff right on top of the game if I wanted, although I haven’t used that functionality for Legends of Localization streams yet.

As mentioned, I also had to fiddle with text file stuff to display Breath of Fire II’s Japanese script. By the time I moved on to A Link to the Past, I knew how to directly display Japanese text with XNA/MonoGame, which has simplified things and made it look nicer.

SPOILER: GANON IS THE BAD GUY ALL ALONG

Conclusion

This was a general look at how my on-stream Japanese text displayer works, and the basic process for developing it. The setup is still very messy, complicated, dangerous, and unreliable, and my implementation is scream-worthy, but I’m hoping that as I continue to polish it I’ll eventually be able to simplify the process and maybe even release some of it for others to enjoy. It’s already proven to be an extremely handy tool for me, and I bet it’d super-helpful for studying languages, playing old games in new ways, and other things I haven’t even thought of yet.

Also, I didn’t realize it until now, but this is a really good chronicle of how designing a program is often a bunch of problem-solving, which gives rise to new problems, which require more solutions, which brings even more unexpected obstacles. Programming is a grueling back-and-forth battle of problems and solutions.

Anyway, this turned out to be pretty long, but hopefully all this technical mumbo jumbo was enlightening and interesting. If you want to check out this stuff live, you can catch our streams here. See you there!

Update: By popular demand I’ve uploaded the program for others to try. It’s very unstable and not guaranteed to work on your system, but if you’re still curious you can check it out here.
If you enjoyed this, check out Legends of Localization Book 1: The Legend of Zelda, my book dedicated to the very first Zelda translation and how it has affected every Zelda game since! (free preview PDF )
31 Comments
  1. This was fascinating to read, so thank you for sharing.

    Your hacked together enemy ID reminded me a lot of trying to make unique “keys” for stuff in Excel to use with their lookup functions. Since they can only check a single cell, you often end up combining a bunch of values together in the hopes that it will be unique enough to produce only a single match.

    1. Haha, it’s cool to hear how other programmers need to invent solutions to weird stuff like this.

  2. As a bit of a font nerd, a post about game fonts and text encoding sounds super-interesting.

    1. It’s only a little bit, but the upcoming EarthBound book has a chapter on technical stuff, including some talk about game fonts and text display. I’d like to expand on it some more someday for sure.

  3. ReadProcessMemory is a privilege checked function. There are no security issues with it. You can’t tamper with processes of other users.

    Btw, I hope that you used a build of the emulator with debugging symbols. Symbols contain all the juicy details such as where variables are in memory. Sure beats mindlessly scanning memory for values.

    1. Yeah, now that I think about it, ReadProcessMemory should probably be okay as you say, although I do use another function to write to the emulator’s memory sometimes too. Usually just for fun effect.

      Another thing that’s kept me from releasing it is that I haven’t tested it on other systems/Windows versions, and I don’t know much about how OpenProcess and GetProcessByName actually work, so I don’t know what to expect on other setups. I’m starting to feel hopeful that it’s not as bad as I imagined. There’s also the issue of whether or not I can count on the emulator’s memory to behave the same way on other setups.

      1. I could be mistaken, because I don’t know all the intricacies of it, but all those functions sound exactly like things that Cheat Engine does.

        1. Yep! That and OBS are what made me realize that it was possible to see into other process’ memory and inspired all this 😛

      2. ReadProcessMemory and WriteProcessMemory are carry-over from before Windows was an operating system, which is why it was to be used with caution way back when. There was essentially no memory protection back then; just anyone could write into kernel memory space and mess with process and memory tables or other kernel objects if they knew how. On a proper operating system there are protections in place on writing to kernel memory and taking over another process’s memory space which will be prevented either by the CPU itself or the OS kernel when setting up virtual memory, which is why those functions are considered safer now. Nowadays, by using these functions, you only need to worry about corrupting your own user data rather than the entire system.

        The biggest problem with your approach is that all of your memory addresses are specific to one particular build of the emulator. Even cleaning the project and recompiling the same code could lead to a version different enough that your scripts will crash and burn.

        What requirements do you have for a scripting engine and on which (open source) emulator would you want it built? Post some requirements and I’m sure someone will set it up for you, especially if it’s just a Lua hook whenever a certain RAM or ROM address is accessed. If it turns out to be a small project I’ll even volunteer to do it for you.

        1. Thanks for the background info, that alleviates many of my worries. Like I mentioned, I’m a pretty inexperienced programmer so this was all really just a “can it be done” experiment I threw together with pieces I barely understood.

          I’ve been thinking about your requirements question a lot. I don’t have a solid list yet but just wanted to let you know I haven’t forgotten. One thing that’d be nice would be the ability to open a separate output window where all the data and such can be drawn. I’m not 100% sure, but I think most emulators with LUA support currently limit your drawing ability to the emulator’s main window.

          Naturally, being able to call custom code whenever the emulated CPU hits a certain address is ideal for something like this – it’d be far more accurate than my current setup. Several SNES emulators already have this capability in some form, but they were either limited, broken, or too resource-intensive for what I needed. I’m considering trying the open source emulator approach again sometime… although time is hard to come by these days.

          Anyway, if I can think of anything more I’ll be sure to add them to this comment thread later on. Thanks for the insightful comment and offer!

          1. Everyone starts out a novice. It’s through this kind of experimentation that one truly comes to understand how things work.

            I figured there would be several things absolutely essential to make your script work: execution breakpoints, memory poking and peeking, layering and drawing capabilities, and true-type font rendering. It then comes down to figuring out the implementation details like how many concurrent breakpoints do you need, is it necessary to have memory access breakpoints as well (slow, but would probably make scripting simpler), at what point during the execution cycle do breakpoints get triggered, how many layers do you need for rendering, if you need to use textures, and so on.

            Setting up the rendering engine is actually the most complex part of the process. Since it’s an emulator, setting up the CPU debugger is easy because there’s no need to worry about throwing off CPU and PPU synchronization loops. Getting the rendering engine in place will likely require some modification to the emulator, either by changing the window layout to add space on the left or by having the window’s main view be larger than necessary and only rendering to a sub-rectangle, leaving a bar of empty space on the left, or by having the emulator render to a texture and drawing the window in Lua script. Determining how complex this will be basically depends on knowing what type of drawing functionality you intend to use.

  4. If you ARE interested in getting this to work with LUA scripting for stability, BizHawk is a multi-emulator with support for it. It also has a rewind function for when you might miss a piece of text due to program error, accidental mashing, or anything like that. It’s also got a hex editor and ram watch function.
    I’ve never used the LUA scripting myself, but I’ve found the other tools in it extremely valuable.

    1. Yep, Bizhawk was the first one I checked, but I sadly found that: 1. its debugging features didn’t work well (I think some things just flat out didn’t work yet), and 2. the only way to do what I wanted required me to use the accuracy core, which brought my system to a crawl. It probably would’ve been even slower if I had tried to stream stuff 😯

      This wasn’t an issue when I briefly tested its NES core, so I look forward to tinkering around with LUA when playing NES games.

      1. The only LUA bashing I’ve done was when I used to make mods for World of Warcraft way back when. Even if I remembered any of it — and I really, really don’t — it’d be no use for ROM hacking. 🙂

  5. This is really interesting, Mato. Thanks!

    I know the TAS community frequently uses tools that allow them to monitor the values of memory locations. I don’t know if that’s using patched versions of emulators or external debuggers. It’d be nice if more emulators offered debugging APIs. Being able to connect to a socket and get notified of the value of a list of (in-emulator) memory locations each frame would probably substantially reduce the amount of reprogramming you need to do for each game.

    1. Yeah, that’d be nice to have for sure, and I bet it’d allow for a nice, handy foundation for per-game plugins/scripts.

      Although, now that I think about it, some of this BoFII stuff would still encounter the same problems with a per-frame check – the only way to get it super-accurate would be with a per-cycle check. I can kind of do that with Bizhawk currently, but between the resource-intensive accuracy core and the constant checks every CPU cycle, it REALLY slows everything down. Maybe I should just bite the bullet and get a much better PC 😛

  6. You’re pretty good…

  7. If you ask me, THIS would definitely be a handy thing to have in the game localization process. Be able to compare translations in real time and all that.

  8. This is super interesting, Mato. If you ever released some version of it that managed to be more user-friendly and reliable, I’d be all over it. Hell, I’d probably pay money for a program like that.

  9. Yeah, BoF2’s translation is just overall beyond bad in so many regards. Why they never bothered retranslating it for the GBA port is beyond me.

  10. Would it be possible to release some of the source code and see if anyone else wants to take a shot at helping make it better? This looks really cool and helpful.

  11. I work as a programmer, and whenever I see stuff like this I’m always amazed and a little sad about my own skills. There’s still so many things that I don’t but wish I understood how to do…heck, I was pretty excited a couple days ago when I wrote my first web callout script that basically just went out and grabbed Pokemon stats from Serebii by parsing the pages, which isn’t really that hard to do (although, I did get temporarily blocked by Serebii because I kinda forgot to put a delay in between calls the first time I actually had it run on all the Pokemon…oops).

    It’s especially humbling when people make things like these and have stuff like ‘I’m not a great programmer…’ 😛 I really need to work on improving my abilities…if only my gaming backlog wasn’t so huge 🙁

  12. You know, I wonder if it’s possible to do the same thing with arcade games on MAME (if both Japanese and English versions are emulatable at the moment, or even if there’s an English version available at all). Arcade translations up to the late 90s (maybe the early 00s at most) are a whole different beast. Of course, even the way arcade games store text can be different between games or even arcade board systems.

    Dipswitch settings, for example, can store multiple languages/regional variations. And I think Capcom’s CPS2 board system games even store all regional variations of its games (except for Quiz Nanairo Dreams, which is only in Japanese no matter what region you hack it into) on one board even if it’s ultimately region-locked to one variant and you’d have to use cheats (or manually modify your arcade board if you’re not emulating) to get to another region (Interestingly, this is why Battle Circuit contains near-full Spanish and Portuguese translations hidden within its code accessible by cheats even if the game only stayed in Japan and Europe, and didn’t get any known official release in Mexico or Brazil).

    PuLiRuLa is one of those arcade games where the translation is just godawful, so much more so than Zero Wing’s Sega version. The game itself is a straightforward beat-em-up (with a bit of surrealism thrown in), so I bet it might be easy to compare versions (“(Japan)” and “(World)”). Incidentally, DaVince21 (one of the people on the Telefang fan translation team) and I did a comparison one day and DV21 even retranslated the dialogue (turns out, it only slightly makes more sense in Japanese), and I posted it on the Hardcore Gaming 101 forums later on. That was something indeed

  13. It’s really nice, but the text window is too narrow for all of the Japanese text to display.

    You could also have tracked down where the pointer tables for all of the text (including monster names) are between both versions, and then whenever that pointer is checked (a read breakpoint of sorts) in the English version, it would read from the Japanese version.

    1. Yeah, sadly the problem with my approach is that it can’t reliably determine when certain code is being run, meaning breakpoints aren’t an option. As I mentioned, that would’ve been infinitely helpful 🙁

  14. Having done some ROM hacking in the past, my question is how do you account for translations where the dialogue doesn’t match one-to-one? Is there some key bound to look at previous and future dialogue?

    Besides that, awesome work, and I really enjoy reading your posts! Thanks!

    1. Oh, that’s a good topic that I forgot to bring up in the main post. With Breath of Fire 2 I lucked out and all the lines matched perfectly. I wasn’t so lucky on the Zelda side of things – the English version has two fewer lines than the Japanese version (nothing important was lost, just some unneeded technical leftovers). It took me a while but after going through the scripts I found where the breaks in symmetry happened and wrote some tiny code to look at the English line ID and compensate as necessary to calculate the equivalent Japanese line ID.

  15. I think it would be interesting to see something like this but instead of being used to match up English lines with Japanese ones, to outright read from the Japanese game, maybe use this method of comparing what line you’re in (in a Japanese game) to the data of the Japanese game, so it could be used for looking up words you don’t know faster, using a tool like Rikaisama for Firefox.

    Just taking this out of existing tools like ITHVNR, which work mostly for VNs, and things like RPGMaker games where the text is like, already there, but they don’t (or rarely) parse emulated games.

  16. Holy smokes. If this leads to T-Edition being somewhat playable in English, that would be beyond amazing.

  17. This is genius. And shoutout for Breath of Fire II, one of the best and most under-appreciated SNES JRPGs!

  18. It’s “Lua”, not “LUA”. It’s not an acronym, it’s Portuguese for “moon”. To quote the official website:

    > “Lua” (pronounced LOO-ah) means “Moon” in Portuguese. As such, it is neither an acronym nor an abbreviation, but a noun. More specifically, “Lua” is a name, the name of the Earth’s moon and the name of the language. Like most names, it should be written in lower case with an initial capital, that is, “Lua”. Please do not write it as “LUA”, which is both ugly and confusing, because then it becomes an acronym with different meanings for different people. So, please, write “Lua” right!