- "C++ is a write-only language; one can write programs in C++, but I can't read any of them"—Programmer joke
Computers, for the most part, are dumb. If you were to take computer hardware that was freshly built off the assembly line, put the components together into a fully assembled device, and tried to turn it on, it wouldn't do anything useful (if anything at all). Yes, Windows and macOS don't magically appear in the computer right from the factory. But if you give them something to do, they'll be able to do it really fast! But how do you tell a machine what to do? Here comes the programming language. As the name implies, it's the language you use to program the computer to do what you want.
While there are other languages in computer science, the defining characteristic of a programming language is that it's used to implement algorithms. Where it does this provides further distinction such as higher-level scripting languages. A computer language that's not a programming one is Markup Language, which only defines how something should look, but not what something should do.
ConceptsA programming language has four basic elements to it:
- Symbols to hold data.
- Operators that modify the data.
- Conditional statements that control the flow of the program.
- The ability to jump around in the program at will.
Programs are written into source files, which can be compiled or assembled for later execution, or interpreted for execution right away. If there's something immediately wrong with the source file, the compiler, assembler, or interpreter will complain until it's fixed.
It should also be noted that a computer is more or less a Literal Genie. It's very, very rare a computer makes a mistake because it actually made a mistake. It "makes mistakes" because of how the program was written, which is entirely up to the human who wrote the code. However, the way someone writes code can make knowing how the program is supposed to function either easier or harder, so there's also an artistic side to programming.
Low-Level LanguagesAt its heart, a computer is simply a giant calculator that computes arithmetic billions of times per second. All of its constituent parts, from memory to modem to monitor to mouse, has an alphanumerical address associated with it. The computer uses these addresses to route data throughout itself. Hence, the earliest computer languages evolved to reflect how computers fundamentally worked. An extraordinarily simple instruction might be "Take the number stored at memory address X and subtract it from the number stored at memory address Y, then send it to the printer located at hardware address Z". Low level languages use hardware-specific instructions to talk directly to the computer this way.
A programmer can write a low-level source code in two ways:
- Machine code: The lowest level of code. This is literally writing the 0's and 1's, or, more commonly, their hexadecimal equivalents. There's usually two portions to a machine code: the operation code (opcode), which is the instruction, and the operand(s), which is the datum or data that act on this instruction. Computers until The '50s had to be programmed like this.
- Assembly code: The next step in the language development introduced in The '50s. Opcodes and some special operands are now given more human-readable mnemonics, but often kept to very abbreviated words. This allowed certain features like labels that allowed sections of memory to have a name, getting rid of the tedious job of keeping track of where one was in memory. Assembly code is machine-specific, and one cannot assume one mnemonic means the same exact operation in another machine. Because of this, porting an assembly language program to a different hardware platform usually means rewriting it completely. note
The reason for using low-level languages is for maximum performance and maximum flexibility. The code that's written is directly talking to hardware and the programmer has full access (barring specific security features) to the hardware. The trade-off is that it's very easy to write code that breaks the system in software and a lack of portability.
It's unusual these days to write assembly code by hand, because compilers have gotten so good at optimizing slightly higher-level languages like C. So languages are sometimes termed "low-level" because they give you a lot of control over how the assembly code turns out. In addition, many modern compilers actually allow programmers to create assembler "inserts" in the high-level source code. Some languages (such as Forth) are "multi-level" and allow both low-level and high-level coding to be done in the same syntax, and sometimes the same program.
High-Level LanguagesHigh-level languages translate assembly code into something easily human-readable and automate the more mind-numbing parts. For example, you could write "x = 2" instead of "MOV x, 2"; or "for(5: Array)" instead of manually looping through instructions. However, this sacrifices performance and (arguably) flexibility because the computer must parse the instructions and translate them into machine code. More abstracted high-level languages go further, replacing, say, "x = 2" with "x is 2". Complex languages are written in earlier ones, "standing on the shoulders of giants"; a compiler is essentially a text parser that goes through source code for a given language and translates it into whatever lower-level language the compiler was written in. Along with readability, high-level languages also allow for "portability" as long as a compiler or interpreter exists for the platform.
With handwritten assembly code falling by the wayside, some high-level languages have been dubbed "mid-level languages", meaning the language is closer to assembly. A mid-level language can directly manipulate the computer's memory and input/output devicesnote . Higher-level languages are instead sealed off from the hardware and must interface with a program (usually the part of an operating system called the kernel) written in a mid- or low-level language that then interacts with the hardware and memory. Consequently, higher-level languages like Java cannot be used to write operating systems or hardware drivers unassisted and often perform slower than mid-level languages like the near-ubiquitous C and C++ (which Java was written in).
Source files can be executed in three different ways:
- Ahead-of-Time Compiling (AOT): This turns the source code into an executable that can be loaded directly into memory and run without further processing. This has the fastest execution time, but is limited by what software libraries it needs and compiler support for the CPU architecture it's meant to run on.
- Just-in-Time Compiling (JIT): The source files are run through another program or framework that compiles some of it into executable code for the computer architecture when it's needed (hence, just-in-time). While it can be fast as AOT compiled code and can run anywhere the JIT framework is available, it consumes more resources to do so.
- Interpreting: A program executes the source code essentially line by line. While running source code is slower and more resource intensive, the source code can be run as-is. In a lot of cases, the program can be halted and the code modified on the fly.
Programming can be thought of as making a recipe for a dish. For instance, making a cake:
- Preheat the oven to 400F
- Put flour, eggs, sugar and milk into a bowl.
- Mix the ingredients for a batter.
- Put the batter into a pan.
- Bake for 30 minutes.
- Take the pan out and poke it with a toothpick. Does it come out clean?
- If not, put the pan back in the oven for five minutes and test again.
- If so, take the pan out and leave to cool for 10 minutes.
- You now have delicious cake to serve!
A program equivalent could look like this:
import kitchen import toothpick oven.temperature = 400 ingredients = [flour, eggs, sugar, milk] batter = mix(ingredients) cake = pour(batter) oven.bake(cake, 30) toothpick.poke(cake) while not toothpick.clean oven.bake(cake, 5) toothpick.poke(cake) cool(cake, 10) serve(cake)
A quirk with different programming languages is that, like natural language, different "words" have different meanings, or no meaning at all. If you wanted to display something on your monitor, you may have to type out "Print", "Display" or even C++'s exotic sounding "cout" (for character output, and pronounced "see-out"). There are also different paradigms to how to structure code. For example, procedural programming involves breaking up tasks into subroutines to make things legible. Another one, object-oriented programming, groups variables and tasks into "objects". With so many different ways to write a program or routine, a programming language can be thought of as any natural language you may learn. Thus, it's important to practice it, if you want to get good at it.
And if yoou want to see how exactly function an interpreter then read this tutorial about making a basic one.
- BASIC (Beginner's All-purpose Symbolic Instruction Code): A family of languages designed in 1964 to be easy to learn and use. Has undergone many permutations, and its descendants bear no obvious resemblance to it or to each other. Although it's mostly used in programmable calculators and hobbyist software today, historically, it was very common in home and school microcomputers, making its use into a trope for creators who grew up in the 70s and 80s. Many of the programming jokes in Futurama, for instance, are in BASIC. One of its descendents is Visual Basic.
- COBOL: Created in the late 1950s by Mark Hawes with help from the US Government, COBOL is a programming language designed for use in businesses with the goal of having one language for business applications. The aim of COBOL was to make it easy for non-computer science people of the time to work with, and part of this was to make heavy use of an English-like syntax (e.g., you can do "X IS GREATER THAN Y" rather than "X > Y", though the latter can still be used). Arguably it's the lingua franca of business and government application software, where it was reported in 1997 that 80% of all business applications in the world ran on COBOL. More recently, in 2017 it was reported that nearly 43% of banks still use COBOL software. Today, knowing COBOL is highly prized due to the original developers retiring and many organizations and bussiness still using software written with it.
- FORTRAN: The very first high-level language in existence, though some call its early incarnations little more than a symbolic assembler, as a lot of features that modern programmers now take for granted simply had not yet been invented back then. Developed by IBM's John Backus in 1954 for scientific calculations and is still used to this day for the very same goal. Recent versions are actually closer to C than to the original language.
- Lisp: Originally LISP, as in LISt Processor. Another early language (a second one, in fact), this time at a much higher level that the industry was ready for. Created by John McCarthy in 1955 as a research tool in the abstract algebra field and later found its use in AI development. Another Long Runner, which, although not as popular per se, influenced basically all modern programming languages, especially languages like Python. Is known for several rather hard-to-bend-the-brain-around concepts like first order functions and closures, as well as for its idiosyncratic (or, as many say, nonexistent) syntax that consists entirely of parentheses. Has evolved greatly with time. Popular dialects are Common Lisp and Scheme.
- C: The most widely used language in the world, with a compiler available for nearly every modern hardware platform known to man. Works "down to the metal", meaning very close to hardware and thus very fast. Originally used to write the UNIX OSnote , it's since been used in Windows, Linux, and macOS, the "big three" Operating Systems. Allows for a lot of Mind Screwy tricks, but with great power comes great responsibility; you can easily shoot yourself in the foot, which is why it's often jokingly referred to as a "high-level assembler".
- C++: Started off as an extension of C to include object-oriented programming, but grew into its own general-purpose programming language that supports a variety of paradigms. Due to its direct ancestry, C++ is (mostly) compatible with C functions and applications. It's complexity of features has drawn criticisms over the years, but a lot of this may be due to teething issues of figuring out the best practices with the feature set that C++ provides.
- C#: Microsoft's alternative to Java, which was allegedly developed after Sun axed its licensing of Java in Microsoft's development tools. C# (Pronounced "C Sharp" like the musical note") is mainly used in Windows Universal Apps and as the primary scripting language for the Unity game engine. It runs on a set of libraries called the Common Language Infrastructure, which is the .NET Framework on Windows and Mono elsewhere, and is JIT compiled.
- Java: Developed by Sun in the 90s, the idea of Java was it could be compiled into so-called "bytecode", which resembled machine code of a fictional Java computer. The bytecode started off being interpreted but later extended to be JIT-compiled for performance using a Java Virtual Machine (JVM). It got its start in web applications, but soon expanded to many platforms that could run the virtual machine. It's the language that the first version of Minecraft was written in and is the primary language that Android apps are written in, though Google has stated in 2019 that they prefer Kotlin be used instead.
- Objective-C: Apple's (originally NeXT's) cross between C and Smalltalk. Originated in NeXTstep and was briefly offered as a programming tool for Windows in the early Nineties, but didn't really take off, mainly due to performance problems: PCs of the time were a lot weaker than now, and the translators weren't up to task. It was, however, revitalized by the introduction of Mac OS X and iOS. Objective-C has been largely replaced by Apple's Swift.
- Perl: Practical Extend and Report Language, a.k.a. Pathetically Eclectic Rubbish Lister thanks to the degree of unreadability code written in it can be. It was originally a popular language for common gateway interfaces which connected web pages to other services. Perl is the glue language of choice for UNIX and Linux Systems.
- PHP: PHP Hypertext Preprocessor, which for years was considered a broken unfixable mess. This Very Wiki runs PHP (look at the .php extension in this page, assuming it hasn't yet been removed), and just around 80% of the Internet also runs on it. PHP is essentially a wrapper for C and has over '''eight thousand''' functions built in!
- Python: An interpreted language, though usually compiled in some way for performance, designed to be readable. It's used significantly in prototyping and science related fields, notably in AI. A major change happened between versions 2.7 and 3.0, causing 2.7 to be supported for many years after 3.0 was released. Though as of 2020, only the 3.0 version is now supported.
- Ruby: Another interpreted, though now JIT compiled, language that gained popularity among web developers thanks to its efficiency when used for rapid prototyping. It's most commonly used with the Rails framework to build dynamic webpages.
- Swift: Developed in the early 2010s by Apple, Swift was designed to replace Objective-C by combining the best traits from it and incorporating ways to ensure safe code before it gets compiled. By the late 2010s, Swift has all but replaced Objective-C as the programming language for apps made for Apple's products.
Esoteric LanguagesLanguages made mostly for fun. Some of them are for testing the limits of a programmer.
- The Other Wiki has a list.
- Brainfuck. Exactly what it sounds like for any programmer. A pointer-based language that only accepts 8 characters as valid commands.
- LOLCODE. Imagine BASIC meets LOLCats.
- Whitespace. A programming language made entirely of spaces, tabs, and newlines.
- Befunge. A bit like Brainfuck, except more complicated: pointers are two-dimensional and a program can edit its own source code.