How a file runs on your PC?
Suppose you wrote the below simple code.
#include <stdio.h>
#define RET 0
int main()
{
printf("Hello Linux\n");
return RET;
}
Then you compiled the file by using the command:
gcc hello.c -o myhello
In Embedded Systems we are building our projects with an IDE, GCC, or whatever, and we get the hex or binary file to load it on the microcontroller memory.
On Linux what will happen or how this will happen?
myhello is just a static file on the hard desk, how will it become a program in RAM and execute to print on screen.
Standard Formats
File is saved on hard desk in a standard format to avoid compiler dependency issues. There are some standard formats like below:
a. out: old Unix format (initial version). Currently a. out is only the file name (historical name), however, its format isn't a. out, it is ELF
ELF: Executable and Linkable Format
Our compiled file myhello is generated in '. elf' format. You can use the command file myhello
to get the file format.
Is elf for the final executable file only?
if you write gcc -c hello.c
, you will get the object file hello.o
if you write file hello.o
, you will find its format is elf also, but it is relocatable. So, both object and final executable files are in elf format. However, object file is relocatable, while final file is executable.
Why Object file is in elf format?
linker reads multiple object files, so they must be in the same standard format to be able to read them all, link them, and generate the final executable.
ELF Format File Contents
ELF header: Some consecutive bytes defines a specific meaning for the elf file and some information about it like number of existing sections, machine type, .. etc.
. text Section: Contains the machine code.
. rodata Section: Contains constant variables and printed strings (like
printf("Hello\n");
"Hello" is saved in this section).. data Section: Contains initialized global and static variables.
. bss Section: Contains uninitialized global and static variables. This section name is Better Save Space because it doesn't actually take space, it just tells you a specific section's start and end address.
. symtab: Contains all symbol.
. debug and . line: Contains debugging information in standard format . dwarf. And it is used to facilitate the linking process between te executable code on process with the source code, so the debugger can accomplish its steps.
How to Read the information of the elf file?
There is a header file named elf.h
that defines the elf structure. We can open it in Vim with the command vim /usr/include/elf.h
.
We can then search for elf header and we will find the structure that describes the elf header content as below:
/* The ELF file header. This appears at the start of every ELF file. */
#define EI_NIDENT (16)
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;
Example of reading myhello header by using the command readelf -h myhello
:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1060
Start of program headers: 64 (bytes into file)
Start of section headers: 13968 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
For easy reading you can use pipe method to pipe the output of a command to another command.
You can use pipe with readelf readelf -a myhello | less
or readelf -s myhello | less
. Last command print the symbol table only.
Disassemble ELF File
Command objdump
will convert the text section inside the elf file to assembly.
Example:
objdump -D myhello | less
To get intel format:
objdump -D -M intel myhello | less
The disassembled file could be useful in debugging.
How ELF File is Converted to Executable and Run?
Physical memory is a limited resource and OS on personal computers can't use it directly because it might overwrite an important data to the OS and crash the system.
Now while we are running many applications, how hardware will manage the software (OS) to execute multiple programs, and each program will not affect the memory of the others, although every program thinks that it has a large memory or all the memory space.
Virtual Address Concept
Virtual Address is a hardware supported mechanism. Machines have a unit called MMU (Memory Management Unit), this MMU manages us to have virtual address space (Memory Virtualization).
With Virtual Address Space, the software executed program or even the Kernel itself could think as that is has the the address space. Suppose we have a 32-bit machine, so its address space is 4GB. With virtual address space, an executing program will think that it has all the 4GB.
The advantages of virtual address space are:
Every program can write anywhere, it is writing its own virtual address without affecting another program memory.
If a program crash something, it will affect it own memory only.
The Kernel is managing the memory Virtualization.
How Does a Program (elf file) Run in Memory?
When the program runs, kernel and hardware will make the program think that it has all the address space.
Loader with help of the Kernel will load the elf file to memory to begin the execution.
Loader with help of the Kernel will use a system call (execve) to construct a new process.
Do we need all sections in the elf file to be loaded in memory to run the program?
No, some sections only will be loaded. For example, we don't need . debug section in the process memory to execute the program.
Program vs Process
Program is the file itself (elf file) on the hard desk.
Process is the action of taking the program and runs it in memory.
Please note that a program is represented by a single file in memory, however, multiple processes can be constructed from this file.
Command ps
gives you information about the processes running on a terminal.
Every process has PID (Process ID). You can open multiple terminals and run whatever processes you want regardless the running processes on other terminals. To get the processes on a specific terminal use the ps command with terminal ID ps -t pts/1
. Command ps -a
will give you all the running processes on all terminals.
So, one program can run multiple times by constructing different process instants and every instant has a unique process ID.
Foreground Process vs Background Process
If a process took the prompt and shell can't work anymore, it is called a foreground process. If a process is running in the background while you are using the sell, it is called background process.
Multiple Processes of a Program on The Same Terminal
A running process could be killed by pressing the kill signal "CTRL + C" on same terminal. Also you can kill a process from another terminal by the command kill -9 PID
where 9 is the terminate signal.
You can find more information on ps
and kill
in the manual man ps
and man kill
.
While a process is running, press the stop signal "CTRL + Z", it will stop the working process in the foreground. Command jobs
gives you the processes on the terminal and their statuses.
After stopping a program, you can construct a new process of the same program.
To continue a stooped program use the command fg
with the job number. If you write fg
only, it will continue the last stopped program.
Example:
Lets run the below program twice on the same terminal.
#include <stdio.h>
#define RET 0
int main()
{
printf("Hello Linux\n");
getchar();
return RET;
}
How to Run a Process in The Background?
You can run a program then stop it with "CTRL + Z" then you could check the job number with command jobs
and use the command bg
with the job number to run it in the background.
Another trivial way to open gedit and run in background is to write the command gedit &
.