Computer Translation Hierarchy: Turning Language into 1’s and 0's
What happens when a computer program runs? What happens when a developer compiles a program? These are questions fundamental to the process of Computer translation and are essential in understanding how computers ingest human-friendly languages and turn them into machine code.
Modern translation pipelines often make use of distributed systems like cloud-based solutions, dynamic link libraries, and virtualization. Advances in translation strategies are nothing short of inspiring. At their heart, however, these pipelines encompass several fundamental steps having been taken by Computer Scientists for decades.
Machine Translation Steps
Each step involved in translating high-level programming languages to machine code involves plenty of complexities. However, each can be regarded as completing one discrete stage of the process. Below, you’ll find a brief outline of each step and its role in the process of translating languages like C and Java into 1’s and 0’s.
The compiler translates the programming language of one machine into the programming language of another. Generally, this process involves converting high-level language such as C into lower-level language like assembly code. The compiler is tasked with several jobs including preprocessing, lexical analysis, and code optimization.
Input: Takes a high-level language like C, Java, or C++
Output: Assembly language
The assembly generates object code by translating the code generated by the compiler. Assemblers are tasked with creating a binary version of instructions generated by the compiler. This involves keeping track of labels used in portions of code that recur, such as procedures and static variables. The assembler does this with a Symbol table which consists of symbol:memory_address pairs.
Input: Assembly language
Output: Machine language objects
Sometimes referred to as the link editor, the Linker allows programs to only re-compile certain components rather than re-compiling the entire codebase. The linker gathers all the different assembly language programs generated by the assembler and “links” them together. There are three distinct steps involved in the linker’s role in computer translation:
- Puts unique code into memory as symbolic references
- Looks up addresses of data and instructions from the assembler files
- Links the references of procedures and labels from many files to single places in memory
In simple terms, the linker ensures that the program doesn’t use duplicated code and, where unnecessary, turns many parts into a single entity. One main benefit of the linker is that it allows only parts of the program having been changed since the last compilation to necessitate re-compilation.
Input: Assembled files and modules
Output: Executable file with completely resolved references
The loader is responsible for loading the executable file into an operating system’s memory, completing the following sequence of actions:
- Allocates required memory for text and data segments
- Copies instructions and data from the linker’s executable file into memory
- Copies parameters to the main stack segment in memory
- Initializes CPU registers and sets stack pointer to first free location
- Copies parameters into appropriate CPU registers and begins execution of the main program terminates on end.
Input: Executable File
Output: Depends on the program’s function
Each step in the translation of human-readable code to executable machine code serves an essential purpose. The encapsulation of processes within each step allows programmers from different disciplines and backgrounds to focus on discrete optimizations in ways that all other programmers can develop.
To get an idea of how this discretization of translation works; check out the GNU Compiler Collection (GCC) which is an open-source project having spanned decades in open-source development. If you’re developing in C — you’re probably already familiar with it!