Useful Notes: Video RAM
aka: Frame Buffer
Although most of the theoretical concepts of computers don't envision different types of specialized memory in one system, the cold reality is always more complicated than these schemes. As soon as computers became able to handle graphics, it was found out that what's needed to process it is often radically different from the main system memory requirements. Those are described on the RAM page, what's discussed here is what video memory — memory for graphics — does. What matters here is what (and how) gets stored in video memory. And this is different depending on what kind of graphics processor you have hooked up to that memory. How video memory works depends on the type of the video hardware that processes graphical data. We can divide the hardware into several categories, largely graded on the complexity of their GPUs: barebone graphical systems (that include early consoles), framebuffer 2D, tile-based 2D, the first GPUs, and modern 3D graphics chips.
BarebonesThe earliest consoles (and some early Mainframes and Minicomputers that pioneered graphic usage in the early Sixties) didn't have video RAM at all, requiring the programmer to "ride the beam" and generate the video themselves, in real time, for every frame. The most their hardware could usually do is to convert a digital signal for connecting to an analog CRT unit, usually the standard household TV set or (rarely) a specialized monitor. This was especially true on the Atari 2600, but most 1970s consoles used a similar technique. This was difficult, time-consuming and error-prone work, often much exacerbated by the penny-pinching design of the hardware, that frequently produced inadequate results (see the whole The Great Videogame Crash Of 1983, caused in part by a rush of poorly made Atari 2600 games).
Framebuffers and blittingAs was it said above, some early big machines used the same setup as the early consoles and suffered from the same problems. However, the solution was quick — to set a portion of the system memory apart and add a circuit that will constantly scan it at regular intervals and generate the output signal based on its content. Thus the first true video memory was born, and this was dubbed "frame buffer", because it served as a buffer between the CPU (that can put image data there and then turn to other tasks) and the specialized hardware that later would grow into GPU that was doing nothing except putting he picture onto the screen — the frame. Soon, because leaving it within the main memory often clashed with the normal CPU work, frame buffer was separated into the specialized unit, and thus the Video RAM was born. Framebuffer approach was, and still is, the most generalized approach to display memory — its contents directly corresponds to what the used would see. For a simplest example, called "bit map" note each bit of the video memory may represent a dot on the screen — if there's a "1" in the memory, then the dot will be lit, and when there's a "0", it will be left dark. Often this approach wasn't even used for graphics, but for a text display — any location of framebuffer consisted of a character code, that would then drive the display circuitry to generate the video signal based on the outline of this character stored in a separate "character memory" (more on that later). Note, though, that the framebuffer usually doesn't take the whole video memory (though on the early machines with a tiny amount of RAM it often did). The prices soon dropped, though, and system designers began to add more video memory to do the tricks like page switching note . The rest of the video memory could also store other previously prepared images that could be quickly transferred into the framebuffer. You see, very soon the graphics hardware started to include one more specialized circuit — the blitter, that could do very fast block copies of memory content, both from the main memory and the other video memory locations alike. The capacity of such hardware was generally measured in the number of pixels that could be copied in video memory per second. Note again, however, that hardware like this first appeared in Mainframes and Minicomputers, that usually could afford it due to having much more memory and specialized hardware, as they didn't need to be competitive in the consumer market. In the early Eighties it appeared on UNIX workstations and spread to the consumer-grade machines like the Amiga by the end of the decade, but didn't make it to normal Intel PCs until the early 1990s, when the rise of Microsoft Windows all but required it.
The first GPUsWhat worked well for the big and expensive professional computers, though, didn't feet the need of the video gaming consoles, which was limited by the price the customer was willing to pay, and needed something that put out most bang for the buck. The first console to include anything resembling a modern GPU was the Atari 400/800 in 1979. These used a two-chip video system that had its own instruction set and implemented an early form of "scanline DMA"note , which was used in later consoles for special effects. The most popular of these early chips, however, was Texas Instruments' TMS9918/9928 family. The 99x8 provided either tile-based or bit-mapped memory arrangements, and could address up to 16 kilobytes of VRAM — a huge amount at the time. The most TMS99x8 display modes, however, were tile-based
Tiles and layersTile-based graphics was an outgrow of the aforementioned character display. In it the smallest picture unit wasn't a dot or pixel, but a tiny (usually 8×8) image called a "tile" — just like a character in the text mode, only it needn't to represent a letter or digit, and could be of any shape, and they are stored in the part of VRAM called "character memory". Another part of the VRAM, called "tilemap", stored which part of the screen held which tile. The TMS99x8 laid down the groundwork for how tile-based rendering worked, and the NES and later revs of the 99x8 in the MSX and Sega consoles refined it. While the Texas Instruments chip required the CPU to load tiles into memory itself, the Big N went one better and put what it called the "Picture Processing Unit" on its own memory bus, freeing the CPU to do other things and making larger backgrounds and smooth scrolling possible. NES video memory has a very specific structure. Tile maps are kept in a small piece of memory on the NES mainboard, while tiles (or "characters") are kept in a "CHR ROM" or "CHR RAM" on the cartridge. Some games use CHR RAM for flexibility, copying from the main program ROM; others rapidly change pages in CHR ROM for speed. Moreover, multiple tilemaps can exist. In most cases, each tilemap represents a specific layer of the picture; the final image is produced by combining the layers in a specific order. Many tile-based GPUs also included special circuitry for the smooth scrolling (where the image shifts one pixel each time that is), which is notoriously hard to do entirely in software: John Carmack was reportedly first to do it efficiently in Commander Keen, and his method required a pretty powerful 80286 CPU, which was a head and shoulders above the puny 8-bit chips in those early consoles. In the hardware, however, the CPU simply tells it the position offset for each layer, and that's what gets shown. Later consoles like the SNES added new features, such as more colors, "Mode 7" support (hardware rotate/scale/zoom effects) and scanline DMA (used to generate "virtual layers").
SpritesOf course, a game with just a tilemap would be boring, as objects would have to move in units of a whole tile. It's fine for games like Tetris, or games with large characters such as fighting games or the ZX Spectrum adaptation of The Trap Door, not so much for other genres. So most tile-based systems added a sprite rendering unit. A sprite is essentially a freely movable tile that is handled by a special hardware, and just like tiles it has a piece of video memory dedicated to it. This memory contains a set of sprite outlines, but instead the tilemap is stores where on the screen each of the sprites is right now. The sprite rendering hardware often has an explicit limit on the size of the sprite, and because some GPUs implemented smooth sprite movement not by post-compositingnote but by the character memory manipulation, the sprite often had to have the same size as the tile. The reason that the SNES had larger sprites than the NES was because the SNES's character memory was larger than on the NES, and its GPU could handle larger tiles. Sprites exist within certain layers, just like tilemaps. So a sprite can appear in front of or behind tilemaps or even other sprites. Some GPUs like Yamaha V99x8 allowed developers to program the sprite movement and even issued interruptsnote when two sprites collided. OTOH, sprite rendering, on both the NES and other early consoles, had certain limitations. Only X number of sprite tiles could be shown per horizontal scanline of the screen. If you exceed this limit, you see errors, like object flickering and so forth. Note that what matters is the number of sprite tiles, not the number of sprites. So 2x2 spites (a sprite composed of 4 tiles arranged in a square) count as 2 sprite tiles in the horizontal direction.
PC-style 2D (EGA and VGA)Meanwhile, what the PCs did have at the time was not nearly as neat. After the horrible, horrible CGA (that had just 16 kilobytes of VRAM and was limited to 4 colors at best) EGA at least had brought full 16-color graphics to the PC in the late 1980s, but its memory structure was, to put it mildly, weird. It used a planar memory setup, meaning that each of the four EGA color bits (red, green, blue and brightness) were separated into different parts of memory and accessed one at a time. This made writing slow, so the EGA also provided a set of rudimentary blitting functions to help out. Nevertheless, several games of the PC era made use of this mode. Just like blitter-based systems, there was no support for sprites, so collision detection and layering was up to the programmer. What really changed things, though, was VGA's introduction of a 256-color, 320x200, packed-pixel (meaning you could write color values directly to memory instead of having to separate them first) graphics mode. There was no blitter support at all in this mode, which slowed things down some; however, the new mode was so much easier to use than 16-color was that developers went for it right away. VGA also bettered EGA by having a full 18-bit palette available in every mode, a huge jump from EGA; EGA could do up to 6-bit color (64 colors), but only 16 at a time, and only in high-resolution mode — 200-line modes were stuck with the CGA palette (did we mention how horriblenote it was?). The huge-for-the-time palette had much richer colors and made special effects like the SNES-style fade in/out (done by reprogramming the color table in real time) and Doom's gamma-correction feature possible. The lack of a blitter meant that the path between the CPU and the VGA needed to be fast, especially for later games like Doom, and the move from 16-bit ISA video to 32-bit VLB and PCI video around 1994 made that possible.
Modern 3D GraphicsIn the world of 3D graphics, things are somewhat different. Polygonal Graphics don't have the kind of rigid layout that tile-based 2D rendering had. And it's not just a matter of managing the graphics, but how we see them. Thus video memory with 3D graphics is (conceptually, if not explicitly in hardware) split into two parts. Texture memory, which manages the actual graphics, and the frame buffer, which manages how we see them. The closest analogy so far is the difference between a movie set and the cameras that film the action. The texture memory is the set. All the people, places, and objects are handled there. The frame buffer is the camera. It merely sees and records what happens on the set so we can see it. Everything not shot by the camera is still part of the set, but the camera only needs to focus on the relevant parts. Just the same in 3D video games. If you are playing a First-Person Shooter, everything behind you is part of the level, and is still in the texture memory. But since you can't see them, they are not in the frame buffer, unless you turn around, in which case what you were facing before is now no longer in the frame buffer. This means that part of the video memory has to handle the texture memory, while the other handles the frame buffer. There's also usually a third part of the memory, so called "z-buffer", that stores the depth of the part of the scene currently being rendered for each of the final picture pixels, but as it is mostly used by the GPU itself and isn't generally available for the programmer to manipulate nowadays, it is of little importance. Today, VRAM holds the following data:
- 3D Models. These surprisingly take up little space.
- Textures, which comprise of the bulk of memory consumption. Often times these employ Texture Compression to save bandwidth and space. Modern games may also stream textures from the hard drive, though this can present undesirable texture pop-up.
- Render targets and frame buffers. Render targets are intermediate steps in the rendering process that pixel shaders can modify later. These include shadow maps, lighting effects, and mappings for reflections and the like.
- General data. If the GPU is running general purpose calculations, it'll have to use VRAM while it does so.