An Overview of Isometric Engine Development
Before the arrival of 3D graphics accelerators in the PC market, one of the few ways to get a practical 3D-like look and feel on a PC machine was to fake it with an isometric projection. This article covers some of the issues associated with isometric engine development.
A Quick Historical Background
Isometric projections are, using the term loosely, a false 3D performed using an orthographic projection -- in other words, there are no vanishing points for the whole scene. There are, however, numerous kinds of isometric and faux-isometric styles. One of the simplest, earliest styles of isometric could be seen in the early Zaxxon coin-op game:
Each of the elements on the screen is drawn as if 3D, but if you look closely you can tell that all the images on the screen are actually parallel to each other (but slanted to the viewer, thus giving the illusion of 3D).
For most gamers today, however, the term isometric conjures up images of role playing games, as RPGs were one of the genres that leveraged this technique to its fullest. Ultima On-line is a popular example of a very skewed isometric form (it has a side-to-height ratio of 1:1, which practically feels like you're looking top down).
The isometric grid pattern is immediately apparent from the floor tiles in the above image.
Throughout the 90s there was an onslaught of isometric 3D games:
X-Com: UFO Defense
Crusader: No Remorse
All of the above have the more characteristic 2:1 aspect ratio for the isometric tiles. They also share the characteristic of a relatively flat floor and not much in the way of elevation changes.
One of the problems with the traditional isometric style is that backgrounds tended to be very repetitive. Not only that, but really dramatic artwork is difficult to create when forced to decompose larger images into a set of isometric tiles. This led to a new style of isometric engine that no longer used tiles for the background, but instead used a prerendered image along with separate "masks" that defined surface properties (walkability, etc.). The hallmark of this style were the BioWare/Black Isle Infinity Engine games such as Icewind Dale and Baldur's Gate.
Baldur's Gate 2
While noticeably better looking, the technology actually took a step back in some ways since there were no longer tiles or the concept of elevations. But the pictures sure were pretty (which exacted its own toll, primarily on the artists responsible for constructing such images).
As technology marched on the isometric engine was replaced by the "true 3D" engine, even if it adopted a kind of isometric style. Games like Warcraft 3, Silent Storm, and the rising tide of real-time strategy games all started using 3D engines.
Except some platforms simply couldn't -- cue the GameBoy? Advance.
Isometric Engines and the GBA
The GameBoy Advance is intriguing, both because of its ridiculous popularity and its simultaneously ridiculous lack of any appreciable hardware horsepower. This latter property prevents the adoption of hardware accelerated 3D graphics, so the GBA has become a kind of last bastion for software based isometric engines. Some of the most popular games for the GBA include Tactics:Ogre and Final Fantasy Tactics:Advance, both of which are hardcore isometric games:
Final Fantasy Tactics: Advance
From the screen shots we can see that it's the 2:1 isometric tile configuration again, just as with the earlier PC games. There are, in fact, a lot of similarities between these engines and those of the early games such as Syndicate. The biggest identifying characteristic of a GBA isometric game is that they're constructed from "blocks". They support elevation by stacking cubes on top of each other, and the maps are often very small (due to the limited RAM of the GBA, not to mention it's kind of annoying to scroll around large maps without a mouse or real keyboard).
Issues with Isometric Implementations
There is no single universal method for implementing an isometric engine. Different implementations will make different trade offs between asset requirements, memory usage, performance, and other factors.
From what I can gather (and this is mostly through inference since I didn't work on these titles), the primary architectural issue among isometric engine implementations is how to represent each tile. There are six basic methods I can think of:
- 2D background art with separate properties map (Infinity Engine)
- heightfield (flat, by vertex, or by tile) with objects
- multiheight tiles
- tile layers/stacks
- true 3D (Neverwinter Nights)
- sparse array/volumetric
The Infinity Engine (see previous Baldur's Gate 2 screenshot) uses gorgeous backgrounds, but this has a significant cost in designer and artist time. It is also lacks elevation differences (height changes are implicit to the X,Y location on the map and thus are not true heights), cannot handle bridges or overpasses (to my knowledge), and requires a significant amount of manual work "tagging" region properties.
Heightfield with Objects
The term of "heightfield" can be misleading because at the limit there may not be any heights at all, the terrain is just a flat 2D rectangular grid with implicit even elevation all the way across. One step beyond this is to use a vertex based height system, where the corner of a tile may be raised or lowered. A vertex based system like this shares vertices between tiles, so pulling up one tile's corner will likewise raise the corners of the three other tiles that share that vertex. A tile based heightfield allows the designer to modify the elevation of an entire tile -- vertices are not shared between adjacent tiles.
No matter what the heightfield representation may be, surface objects such as walls and buildings are sometimes represented as a separate set of entities, or they might be part of the tile's type, but this is difficult to reconcile with vertex based heightfields (you don't want tilted walls for example). Most PC based RPGs seem to use this type of system, including Rollercoaster Tycoon.
A multiheight tile system is basically a flatland grid but each tile's definition contains its height. It can be thought of as grabbing prefabricated blocks of varying heights and placing them on a board. This is a very simple conceptual model but it tends to lead towards a lot of repetition (not enough types of prefab blocks) or a lot of artist work (a lot of different types of prefab blocks). I believe that the GBA style tactics games (Final Fantasy Tactics: Advance, Onimusha Tactics, and Tactics: Ogre) use this style, since they are block based and lack overhangs and bridges.
Another implementation builds a map out of stacks or layers. A stack based system has an array of blocks for each map tile, whereas a layer based system has an array of layers, each containing a map. They're effectively the same representation, it just comes down to:
/* Stack system */ Tile map[ HEIGHT ][ WIDTH ][ DEPTH ]; /* Layered */ Tile map[ DEPTH ][ HEIGHT ][ WIDTH ];
There is an implicit assumption that each layer is a uniform depth. I'm not sure if any games use a system like this.
Of course, you can always revert back to a true 3D representation and just adjust the camera's position and projection characteristics to give you "true 3D" but in an isometric style. This is more of a viewpoint system than a true isometric representation, but I place it here for completeness.
Sparse Array (Volumetric)
The final implementation method simply keeps a list of occupied tiles, a volumetric representation of the world. I'm not sure if any implementations use this method or not, but it offers a large amount of flexibility while sticking to a fundamentally grid based representation. Instead of a single (potentially very large) hardcoded array, it tracks only those map elements that are "filled".
The Bridge Problem
One problem with all these map representations is that they often assume that there is a single height value for each map coordinate. This makes things like bridges/overpasses difficult to represent and implement. It's not impossible, but it does require either a certain amount of hackery (bridge "objects") or you have to implement a layered system and deal with the associated complexity involved.
Regardless of the specific underlying representation for the map it is also possible to incorporate some kind of texture layering or blending to break up the repetition and monotony of the same textures everywhere. This may be an aesthetic choice or it may be a practical requirement. For example, some games allow you to draw roads or train tracks on terrain of varying height , and the combination of different cover types, terrain types, and elevations would make the number of unique tiles overwhelming. It is often much simpler to have a base texture (grass, dirt, etc.) along with various layered textures to represent things like roads, bushes, etc.
Two Random Technical Notes
Completely unrelated to this article, I wanted to mention two technical notes about isometric engines. The first is about letterboxing and the second is about the screen to map transformation.
Letterboxing. For the life of me I couldn't figure out why some strategy games used square playfields and others used letterbox playfields. Tom Forsyth came up with the (obvious in hindsight) answer, which is that most isometric games use a 2:1 aspect ratio for their tiles, therefore a letterbox format will tend to show a relatively even number of rows and columns, whereas a square format would show far more rows than columns.
Screen to Map Transformation. The second note I'd like to make is that there is a lot of confusion over how to map a mouse coordinate to a world map coordinate (for unit or tile selection). I've seen a lot of pretty convoluted "solutions" to this problem, but it's a lot simpler than many make it out to be.
The fundamental realization is that you can just invert the map-to-screen transformation (used during rendering). The simplified form of the map to screen transformation is this:
Which could be expressed in matrix form:
Now that it's in matrix form we could think of this in terms of invertability -- to go from a screen coordinate to a map coordinate would just mean applying the inverse of the map-to-screen transformation.
But this is way more complicated than it needs to be, especially if you don't know or understand matrices and vectors and inversions and all that good stuff. Instead, we can look at it as basic algebra. If we take our earlier equations for our screen coordinates, apply substitution and solve for the map coordinates using good old elimination and back substitution, we get the same answer. I won't bother with all the steps -- this is basic high school math -- but the final result is something reasonably simple:
That's it, problem solved. The basic equation is simplistic in that it doesn't take into account any translation (scrolling) or other offsets for toolbars and user interface, but those are easy enough to substitute back in.
This isn't meant to be conclusive or comprehensive, it's just a compilation of my notes when I was looking at isometric rendering strategies. Hopefully it will help some others out there.
The Screenshots Most of these screenshots were taken from MobyGames. I would have simply linked to them, however with the propensity of anti-bandwidth stealing scripts out there (some of which have the annoying habit of using grotesque substitution images as an extra "incentive" not to link to their pictures), I've decided that copying them locally would be easier for everyone involved. If one of these screenshots is yours and you want it removed, just e-mail me at contact at bookofhook dot com and I'll pull it.