A GPU is the common term for a piece of a computer/console/arcade hardware that is dedicated to drawing things, i.e. graphics. The general purpose of a GPU is to relieve the CPU of the responsibility for a significant portion of rendering. This also gives performance benefits as it gives the CPU time for core program code, which in turn may be able to tell the GPU what to do more often. Both consoles and regular computers have had various kinds of GPUs. They had two divergent kinds of 2D GPUs, but they converged with the advent of 3D rendering.
The term "GPU" was coined by nVidia upon the launch of their GeForce line of hardware. This was generally a marketing stunt (as they claimed to have "invented the GPU" despite prior work by others), though the GeForce did have some fairly advanced processing features in it. However, the term GPU has become the accepted shorthand for any graphics processing chip, even pre-GeForce ones.
Pre- and Non-GPU Graphics
What came before graphics processing units, you may ask? Well, two things:
The most common form of "graphics" (as in, a method of displaying something somewhat interactively) before the late 1970s was to use something effectively just one step above CPU driven graphics, below. To wit, there was a machine, commonly called a "terminal", in between you and the computer that printed or displayed text at the computers command, but were distinct from the actual computer.
This came, effectively, in three variations:
- At its most primitive, there was the "teletype", which predated electronic computers. They were essentially a printer and keyboard, either wired directly to the computer, or hooked up to a telephone line, which was connected to the computer via early modemnote . Because of early inefficiency related to doing everything one character at a time for multiple users at the same time, this was frequently optimized so that one 80 character line was sent or received at a time from quite early on. (If you're wondering "what were they made for, if they predated computers?": They were originally a method of automating the telegraph.)
- Two steps beyond the teletype was the "terminal", which was essentially just a screen full of text characters that the programmer could alter on demand (which "only" took about 2000 bytes of RAM for a typical 80 by 25 terminal; and you could very easily go below even that if you used less than 8 bits for a single character of text, which was very common at the time); if you've ever played the original Rogue, that and what's now called "ASCII Art" were about the maximum level of sophistication of graphics for terminals.
- One step beyond that were terminals that could download fonts, and thus display very primitive non-ASCII graphics. And from here, we head into the first "sprite" machines, below.
- There was also an intermediate step between the teletype and the terminal, known as a "virtual", "glass", or "video" teletype, which replaced the printer with a monitor or television. Given that they were, as far as the computer was concerned, exactly identical to a teletype, they are frequently lumped in with one or the other nowadays.
- One somewhat common early variant was a terminal of some kind that directly displayed the contents of memory. This was possible when the amount of memory was small enough to display in such a method—say, when memory was less than the 2000 bytes mentioned above; as memory sizes grew, this became much rarer.
- The reason this is of note: One of the first video games, OXO, was played in exactly this way: there were three monitors displaying the contents of memory, and the OXO program was small enough that the middle monitor could be given over to the interface of the game.
While only arguably "Graphics units", in that they were just barely more sophisticated then doing everything from the CPU, they're an important stepping stone to all the below, except CPU Driven Graphics; in particular, tile-based graphics were a fairly direct descendant of the "font downloading" method.
CPU Driven Graphics
One fascinating fact: Some very early home computers and consoles lacked what we would call a "GPU", instead only having a chip for interfacing with the Monitor with the CPU, and thus lacked any kind of chip that we would call a Graphics Processing Unit—that is, while they had a graphics chip, that chip was effectively run entirely by the CPU, with no ability to function on their own. This was only possible because of design quirks in analogue television and monitor standards. In short, the image displayed on analogue televisions is surrounded by a field of "blacker then black" used to align the image properly. Non-display Processing could only take place during the period that this overscan field was being "drawn". (Note that many systems as late as the 1990s used these quirks as well for other reasons—most notably, the horizontal blanking period was a good time to do things to achieve a parallax effect; for further information, go look up "hblank" and "vblank")
Needless to say, this was horribly inefficient, but when the RAM needed for anything resembling a GPU was prohibitively expensive, it was also required to keep the costs of hardware down.
The most notable systems to have CPU-driven graphics were the Atari 2600 and Sinclair ZX81; a notable (and slightly insane) variation was the Vectrex. (A few "post-modern" systems also use this method; for example, the Gigatron TTL, as a GPU would violate the "computer without a microprocessor" gimmick of the system.)
Actual Graphics Processing Units
Arcade 2D GPU
The first 2D GPU chipsets appeared in early Arcade Games of the 1970's. The earliest known example was the Fujitsu MB14241, a video shifter chip that was used to handle graphical tasks in Taito & Midway titles such as Gun Fight (1975) and Space Invaders (1978). Taito and Sega began manufacturing dedicated video graphics boards for their arcade games from 1977. In 1979, Namco and Irem introduced tile-based graphics with their custom arcade graphics chipsets. The Namco Galaxian arcade system in 1979 used specialized graphics hardware supporting RGB color, multi-colored sprites and tilemap backgrounds. Nintendo's Radar Scope video graphics board was able to display hundreds of colors on screen for the first time.
By the mid-80's, Sega were producing Super Scaler GPU chipsets with advanced three-dimensional sprite/texture scaling graphics that would not be rivalled by home computers or consoles until the 1990s.
From the 1970s to the 1990s, arcade GPU chipsets were significantly more powerful than GPU chipsets for home computers and consoles, both in terms of 2D and 3D graphics. It was not until the mid-90's that home systems rivalled arcades in 2D graphics, and not until the early 2000s that home systems rivalled arcades in 3D graphics.
Console 2D GPU
This kind of GPU, introduced to home systems by the Texas Instruments TMS9918/9928 and popularized by the NES, Sega Master System and Sega Genesis, forces a particular kind of look onto the games that use them. You know this look: everything is composed of a series of images, tiles, that are used in various configurations to build the world.
This enforcement was a necessity of the times. Processing power was limited, and while tile-based graphics were somewhat limited in scope, it was far superior to what could be done without this kind of GPU.
In this GPU, the tilemaps and the sprites are all built up into the final image by the GPU hardware itself. This drastically reduces the amount of processing power needed — all the CPU needs to do is upload new parts of the tilemaps as the user scrolls around, adjust the scroll position of the tilemaps, and say where the sprites go.
Tilemap rendering is essentially a form of bitmap framebuffer compression. An entire screen could be filled with the same tiles re-drawn many times, without affecting performance, which was ideal for 2D games. This drastically reduced processing, memory, fillrate and bandwidth requirements by up to 64 times.
The Nintendo Entertainment System, for example, renders a 256x240 background and sixty-four 8x16 sprites at 60 frames/second, a tile fillrate equivalent to more than 4 megapixels/second, higher than what PC games rendered to a bitmap framebuffer until the early 1990s. The Sega Genesis renders two 512x512 backgrounds and eighty 32x32 sprites at 60 frames/second, a tile fillrate equivalent to more than 30 megapixels/second, higher than what PC games rendered in a bitmap framebuffer until the mid-1990s.
Computer 2D GPU
Computers had different needs. Computer 2D rendering was driven by the needs of applications more so than games. Therefore, rendering needed to be fairly generic. Such hardware had a framebuffer, an image that represents what the user sees. And the hardware had video memory to store extra images that the user could use.
Such hardware had fast routines for drawing colored rectangles and lines. But the most useful operation was the blit or BitBlt: a fast video memory copy. Combined with video memory, the user could store an image in VRAM and copy it to the frame buffer as needed. Some advanced 2D hardware had scaled-blits (so the destination location could be larger or smaller than the source image) and other special blit features.
Some 2D GPUs combined these two approaches, having both a framebuffer and a tilemap, and being able to output hardware accelerated sprites and tiles, and perform tile transformation routines over what was stored in the framebuffer. These were most powerful and advanced among them, but usually pretty specialized and tied to the specific platforms (such as the Amiga, X68000 and FM Towns), and in the end more general approach won over, being more conducive to the various performance-enchancing tricks and better adapting to the increasing computing horsepower and transition to 3D gaming.
The CPU effort is more involved in this case. Every element must be explicitly drawn by a CPU command. The background was generally the most complicated. This is why many early computer games used a static background. They basically had a single background image in video memory which they blitted to the framebuffer each frame, followed by a few sprites on top of it.
The NEC µPD7220, released in 1982, was one of the first implementations of a computer GPU as a single Large Scale Integration (LSI) integrated circuit chip, enabling the design of low-cost, high-performance video graphics cards such as those from Number Nine Visual Technology. It became one of the best known of what were known as graphics processing units in the 1980s.
PC GPUs of that era were designed for static desktop acceleration, rather than video game acceleration, so PC CPUs had to render games in software. As such, PC games were unable to match the smooth scrolling of consoles, due to consoles using tile-based GPUs, which reduced processing, memory, fillrate and bandwidth requirements by up to 64 times. It was not until 1991, with the release of Keen Dreams, that PC gaming caught up to the smooth 60 frames/second scrolling of the aging Nintendo Entertainment System.
The 80486DX2/66, a high-end gaming CPU of the early 90s, ran at 66 MHz and could run 32-bit code as an "extension" to 16-bit DOS. While faster than the CPU of the Sega Genesis and Super NES, that alone was not enough to surpass them, as both consoles had tile-based GPUs, which PCs were lacking at the time. It was through various programming tricks that PCs were able to exceed the Genesis and Super NES, by taking advantage of quirks in the way early PCs and VGA worked. John Carmack once described the engine underpinning his company's breakout hit Wolfenstein 3D as "a collection of hacks", and he was not too far off. It was also the last of their games that could run in a playable state on an 80286 PC with 1 MB RAM — a machine that was considered low-end even in 1992 — which serves as a testament to the efficiency of some of those hacks.
Before the rise of Windows in the mid-1990s, most PC games couldn't take advantage of newer graphics cards with hardware blitting support (or even the framebuffer, for that matter- as a matter of fact, many earlier GPUs drew directly to the screen memory and lacked a framebuffer). While the first accelerated graphics card came out in 1991, many games that were written ignored the features- the CPU had to do all the work, and this made both a fast CPU and a fast path to the video RAM essential. PCs with local-bus video and 80486 processors were a must for games like Doom and Heretic; playing them on an old 386 with ISA video was possible, but wouldn't be very fun. The only program that could remotely take advantage of the features found in these new cards was Windows 3.0, and even then games for the platform were mostly graphically simple as it was deemed to be too taxing to handle graphics-heavy games.
However, starting in the mid-90s with the creation of the DirectX (and specifically DirectDraw) API, as well as the introduction of hardware-accelerated QuickTime cards on the Macintosh front, games were finally able to take advantage of blitting, and the future of PCs moved from one that is mostly business oriented to one that is more oriented to gaming and multimedia.
Basic 3D GPU
The basic 3D-based GPU is much more complicated. It isn't as limiting as the NES-style 2D GPU.
This GPU concerns itself with drawing triangles. Specifically, triangles that appear to imitate shapes. They have special hardware in them that allows the user to map images across the surface of a triangular mesh, so as to give it surface detail. When an image is applied in this fashion, it is called a texture.
The early forms of this GPU were just triangle/texture renderers. The CPU had to position each triangle properly each frame. Later forms, starting with arcade systems like the Sega Model 2 and Namco System 22, then the Nintendo 64 console, and then the first GeForce PC chip, incorporated triangle transform and lighting into the hardware. This allowed the CPU to say, "here's a bunch of triangles; render them," and then go do something else while they were rendered.
Modern 3D GPU
In the early 2000s, something happened in GPU design.
Take the application of textures to a polygon. The very first GPU had a very simple function. For each pixel of a triangle:
A simple equation. But then, the Dreamcast released with hardware bump mapping capabilities, so developers wanted to apply 2 textures to a triangle. So this function became more complex:
Interesting though this may be, developers wanted more say in how the textures were combined. That is, developers wanted to insert more general math into the process. So GPU makers added a few more switches and complications to the process.
What used to be a simple function had now become a user-written program. The program took texture colors and could do fairly arbitrary computations with them.
In the early days, "fairly arbitrary computations" was quite limited. Nowadays, not so much. These GPU programs, called shaders, commonly do things like video decompression and other sundry activities. At first the shader execution units that did shaders were separated into pipelines for strictly vertex (positioning the 3D models) and pixels (for coloring) though with some clever programming, general operations could be done. Shader units in modern GPUs became generalized to take on any work. This led to to the General Purpose GPU, or GPGPU, which could do calculations much faster than a several traditional CPUs in tandem.
Difference between GPU and CPUGPUs and CPUs are built around some of the same general components, but they're put together in very different ways. A chip only has a limited amount of space to put circuits on, and GPUs and CPUs use the available space in different ways. The differences can be briefly summarized as follows:
- Execution units: These are the things that do things like add, multiply, and other actual work. A GPU has dozens of times as many of these as a CPU, so it can do a great deal more total work than a CPU in a given amount of time, if there's enough work to do.
- Control units: These are the things that read instructions and tell the execution units what to do. CPUs have many more of these than GPUs, so they can execute individual instruction streams in more complicated ways (out-of-order execution, speculative execution, etc.), leading to much greater performance for each individual instruction stream.
- Storage: GPUs have much smaller cache sizes and available RAM than CPUs. However, RAM bandwidth for GPUs typically exceeds the bandwidth of that of CPUs many times over since they need a fast constant stream of data to operate at their fullest. Any hiccups in the data stream will cause stuttering in the operation.
In the end, CPUs can execute a wide variety of programs at acceptable speed. GPUs can execute some special types of programs far faster than a CPU can, but anything else it will execute much slower, if it can execute it at all.
CPUs and GPUs on the same die
A somewhat important thing started happening in roughly 2010-2011: Chips started being sold that combined a CPU and a GPU on the same die, making the distinction between the two less directly important.
Both AMD and Intel were the first to offer such chips, although the concept was taken up by the computing industry in general very quickly (particularly the mobile parts of it (smartphones and laptops)). Intel's solution is called "Intel Graphics Technology", while AMD's is currently known as "AMD Accelerated Processing Unit"
AMD's solution came out later, but was slightly more advanced, in that it also allowed the CPU to interface more efficiently with non-GPU based technology that shared certain interface problems with GPUs as well (notably, Digital Signal Processors). The Playstation 4 and XBox One both used semi-custom versions of the third AMD generation of this technology.
That being said, the compute and programming distinctions between the CPU and the GPU remain sufficiently important that even in such integrated systems, the distinction is maintained, particularly on the programming side.
GPUs today can execute a lot of programs that formerly only CPUs could, but with radically different performance characteristics. A typical home GPU can run hundreds of threads at once, while a typical home CPU can run two to four. On the other hand, each GPU thread progresses far more slowly than a CPU thread. Thus if you have thousands of almost identical tasks you need to run at once, like many pixels in a graphical scene or many objects in a game with physics, a GPU might be able to do work a hundred times faster than a CPU. But if you only have a few things to do and they have to happen in sequence, a CPU-style architecture will give vastly better performance.
Rendering graphics is not the only task that meets the requirements for processing efficiently on a GPU. Many tasks in science and applied mathematics involve massive numbers of vector and matrix operations that GPU are ideal for solving. This has given rise to the field of GPGPU (general purpose GPU) computing. Most modern supercomputers use GPUs to perform the brute force parts of scientific computations - and to the chagrin of gamers, mid-to-late 2017 saw a major shortage of medium / high-end gaming GPUs due to the advent of cryptocurrencies, causing cryptocurrency miners to buy them in bulk for their mining rigs, doubling or even tripling retail prices. Retailers did wise up, and limited purchases to one card per person, but the price gouging lasted a good full year before going back down to normal around August 2018.