While there continue to be debates about wether x language is high or low level, the undisputed king of low-level level languages is assembly, the human readable equivalent of machine code. While for most programming tasks the language remains useless, a good understanding of it can help a software engineer when trying to understand computer hardware and architecture. The following is a brief but simple guide on creating your first assembly program. To begin create file with the extention .s or .asm, which signifies that it is an assembly file.Then organize your file the following way:
.global _start
section .text
_start:
This Template is essential in assembly, as it organizes the program in a way the computer can understand.
Assembly works with registers, or the ram inside the CPU. Here is an analogy. Let’s say you are solving a math problem. If it is complex enough, you need a piece of paper where you write your work down. Writing down something is fairly quick, and you don’t need to keep that paper; it is temporary. This is your RAM, it can hold a decent amount of data, so it works great with complicated apps like games, where you need to store many variables, but after you are done playing, those variables can be deleted. Ram often comes as 8 gb, 16 gb, or 32 gb.
If you need a formula you forgot, or need to revisit a problem you have solved previously, you can look back at your notes. This process takes time, but those notes are stored permanently, or at least long-term. This is your computer storage. It has the most memory, eg 256 gb, 512 gb, or 1 tb.
But what about all those small arithmetic or algebra calculations. Do you need to write down how you mode the varibales to one side and the number to the other? probably not. You have mental math for that. It is by far the fastest method but you can’t store a lot of information in your head at one time. This is the metaphor for registers, they are very small but very fast storage, often a few bits, located in the cpu. the cpu uses them to perform basic calculations, input and output data.
Normally, registers are taken care of by the programming language itself, however, when writing in assembly, they are criticall. If you have experience with C or C++, you can think of them as the pointers used by the assembly language.
Depending on your cpu architecture( x64, x86(intel), arm, or riscv), you can have different registers. For this demonstration we will use x86 registers, since intel cpus are most common. Let’s talk about a few key types of registers and how to use them:
RAX: System call register, used whenever outputting anything from the cpu. We will use this to output the words “Hello World!”.
RDI: File descriptor: tells how data will be outputted; for example, adding 1 to RDI will make the system use standard output or stdout, which is what we need to output hello world to the console.
RSI: This is the register which will store our string. We will move a variable defined in the .global_start section of the code, and then move RSI into RAX.
RDX: This register will hold the string length.
Now, let’s combine everything into the actual program:
section .text
global _start ;
hello db 'Hello World!', 0
;comment
_start:
; Write "Hello World!" to Console
mov eax, 4 ; Defines type of program, here it will be the write function
mov ebx, 1 ; file descriptor for standart output; to write to console
mov ecx, hello ; pointer to the string to print
mov edx, 13 ; number of bytes to write (length of the string including newline)
int 0x80 ; make the system call
; Exit the program
mov eax, 1 ; syscall number for sys_exit
xor ebx, ebx ; return code 0
int 0x80 ; make the system call
The first lines of this code indicate how this program will function. Its function, essentially 🙂 This is what the eax and ebx registers are for.
Within the section.text section we have to also define our variables. that is what our hello db instruction does.
Lastly, the int 0x80 command prints our string to console, while the last three lines terminate the program. This is effectively the same as the return 0; line at the end of the main function in C++