First things first. Despite being represented as 0's and 1's, binary is not actually made of them. It can be any two distinct states; it's commonly represented as zeroes and ones because "00011101" is much easier to read and comprehend than "off, off, off, on, on, on, off, on". The states can be hole or no hole (ye olde punchcarde), voltage/no voltage (RAM), magnetic field polarities (Magnetic Disks), reflective/not reflective (Optical Discs, e.g. Compact Discs), or anything else; the binary 0s and 1s are simply practical methods to represent the state of the electronic hardware. Which state represents which "digit" varies by architecture, but it's canonical to say either 0 or "off" and 1 or "on". Nowadays in some cases, it is not even a state that is represented by ones and zeroes, it is the change of the state, with 1 being an increase of some value, and 0 being a drop, for example.
A bit is one of these pieces of information. Its value - the state - can either be 1 or 0. A bit is the smallest single unit of information possible in an electronic device. Everything you work with on your computer is composed of nothing but a long series of bits, and all numbers are internally represented as a binary number. It is technically possible to build a computer that counts in regular decimal numbers, and some early ones were in fact built this way, until the benefits of a binary system were understood, but bits are used for various reasons: they're easier to handle, the underlying electronics are cheaper, and because it also allows for logical operations in addition to your standard addition and subtraction. In practice, it's common to write binary numbers in hexadecimal (base 16), because hexadecimal is easier to read than binary and converts easily to and from binary.
A byte is a set of eight bits. Why not ten, since we count in base ten? Because a byte is exactly two digits in hexadecimal and is large enough to be useful. A single bit isn't very useful, which is why from the earliest days of computing they were grouped in bytes, (which is actually a pun). They usually were the smallest unit that could be looked up in memory. On the other hand, while processing information it was usually handled in different groupings based on the Central Processing Unit design, called 'words', and each computer had its own word length. Some computers, especially military ones, could have decidedly odd word lengths, like 21 or 37 bits, and thus byte length varied from 5 to 11 bits (a word length usually being a multiple of a byte length). Smaller processors and microcontrollers could even have a sub-byte word lengths — for example, Intel 4004, the very first microprocessor ever, had a word that was 4 bits long. There is even a term for 4 bits, a "nybble" or "nibble," since it's "half a byte."
When they were building minicomputers and mainframes in the 1950s and 1960s nobody knew exactly what standard for byte and word size would become standard. Control Data made 60-bit word Cyber mainframes and six-bit bytes; Digital Equipment Corporation made 36-Bit DEC 10 and DEC 20 mainframes with 9 bit bytes, and other companies picked other sizes. Standardization wouldn't occur until someone made a huge number of sales.
It was the IBM System/360 mainframe of 1964, an incredibly influential computer system, that standardized the 8-bit byte, because it was one of the first systems geared for text processing in addition to pure number crunching. It encoded text as 8 bits per symbol, so it was quite logical to have a 8-bit byte; the computer could then address each symbol separately, saving the work on transcoding. For the same reason it also codified words constructed out of bytes before that the "byte addressing" described above wasn't a given, a lot of machines had pure "word addressing", without breaking it to bytes. This resolved a conflict between processing, where a longer word is more efficient, and addressing, where a shorter word is more efficient. Each address in the 360 corresponded to one byte, but it processed data in words that were four bytes (32 bits) long.
A byte has 28, or 256 unique combinations to work with. For example, a single byte can represent any integer between 0 and 255, or between -128 and +127 if you use one of the bits to indicate whether the number is positive or negative (27 = 128). The maximum representable number is always one less than two to the power of the number of bits, because zero takes up one combination of all the possible bit permutations.
So, what does that 8-bit, 16-bit, 32-bit, or 128-bit on a gaming system mean?
It means word length. Computer processors deal with words, and it's easiest to use words that are a multiple of the eight bit byte. In most cases, this doesn't matter. You can still program a 16-bit processor such that it displays numbers above 65,535 (assuming that you have enough RAM to do so); it will simply use two words to store the number. It's not going to block you from getting a high score or anything. These limitations can be gotten around with clever programming and knowledge of binary arithmetic.
However, the processor is limited in two ways. First, word length; specifically, it can only access one word in one operation. A 16-bit processor has to use two add instructions to get the answer to the question, "What is 70,000 + 1?" through integers, while a 32-bit computer can do so in one instruction. This has a knock-on effect on the speed of the processor; while a 16-bit and a 32-bit processor might both have the same absolute speed rating (say ten million instructions per second, or 10MHz), the 16-bit processor will require twice as many instructions, and therefore twice as much time, to handle numbers which don't fit in a single one of its words.
Second, the size of the address space. This is how the processor finds data to work on, and where to put the new data once it's finished processing. This is also measured in bits, but in this case, the decimal number created by the string of bits defines how many memory locations the processor can see. And what's at each location? Traditionally, one byte. So a 16-bit address space means that the processor can see 65,536 one-byte memory locations (known as 64 kilobytes, or 64KB, or 64K). A 32-bit address space can see 4,294,967,296 bytes, or four gigabytes, or 4GB. Note that address zero counts too, so in addressing, you use the power of 2 without any subtraction.
This is actually a significant problem today. Memory is so cheap that 4 or even 8 gigabytes of RAM have moved from being easily within reach of the average customer into the thrift store world, and programs are getting to the point where they can make use of it. But processors, operating systems, and software with 32-bit address spaces are limited to 4GB in total.note PC games like Supreme Commander have issues with this, for example. To compound the problem, graphics cards need address space for their Video RAM, and many other devices use some of it as well, so the amount of actual RAM that a processor can see is usually limited to half of its address space. This problem is solved by increasing the address space to 64 bits, or 16 exabytes, so large it makes 4 gigabytes look like a speck. (In theory, anyway; most such processors are actually artificially capped at 48 to 52 bits.) Considering that the difference between 232 and 264 is so enormous, and considering also that modern IC manufacturing technology is getting to the point where transistors just can't be made any smaller, that should last us a while.
What's this mean in practice? More bits means you can discuss much bigger concepts without needing goofy programmer-unfriendly solutions. It means that you can get better graphics and larger environments, as larger textures and bigger levels can be loaded into memory. It's not the only important thing: there's not much benefit if you don't have processor speed, memory, graphics processing capability, and storage space for the relevant stuff to float around in first. In addition, 64-bit addresses take up twice as much space as 32-bit addresses, meaning more memory is required for the 64-bit system (although usually not as bad as double, since not everything needs to be replaced by a 64-bit version).
There is a lot more information than that, but we're just trying to give a starter course here.