We all know that the standard file for representing executable, shared library and object file is ELF in UNIX based systems. In this post we will not talk about ELF file formats or any details of how to load it in to memory, but about the steps kernel does after loading ELF into memory and starting the process corresponding to it.
Assuming all process related entries are made in kernel data structures and memory is allocated for it, we start the steps on how kernel load, initialize and start executing a ELF executable file.
1. Kernel reads and load multiple sections of ELF files per the details in ELF header. Typically loading of sections are done lazily in a VM enabled Operating System (i.e. sections are loaded into memory based on need — whenever a page fault occurs).
2. Initialize the stack pointer by pointing to the top of the User segment (varies from architecture and OS). nothing but load the address of top of user segment into SP register.
3. Initialize the Instruction pointer by pointing to the start address taken from the ELF Header file.
4. Start the user process by simulating a return from an
interrupt (typically some architecture specific assembly code).
That’s it — when process returns to user mode — it start executing the code from the begin address from ELF header.
How to Parse a ELF file?
As we all know ELF (Executable and Linkable File) is a standard file format for executables, object code, shared libraries, and core dumps in UNIX like systems. It is a spec of ABI (Application Binary interface). By design it is flexible, extensible, cross-platform, CPU architecture & ISA independent. Let us talk about the internals of ELF file.
Dissecting a ELF file
- ELF header: Metadata about the elf file like 32/64 bit
- Program header table: zero or more memory segments. Only appears at executable. This carries info on creation of (process image) especially how to put together process virtual memory.
- Section header table: zero or more sections. This carries info on how or where section should be loaded. This is mainly used during linking time.
Section vs Segment:
ELF before linking will have sections and after linking has segments. During linking time, Linker puts one or more sections into a single segment. Sections and segments have no specified order in ELF.
ELF header contains 20 fields. First 2 fields are standard size 4 bytes and 1 byte each holding a magic no to identify it as ELF file and boolean value to identify as 32/64 bit architecture. Except for 3 fields (ENTRY — start point for execution, PHOFF — start of program header table, SHOFF — start of section header table), which are memory reference others remain the same.
In case of 32 bit arch — these 3 fields are 4 bytes size and for 64 bit arch — they are 8 bytes size each.
The order of fields in ELF header are given below:
MAGIC = 0;
ARCH_32_64 = 1;
ENDIAN = 2;
VERSION = 3;
OS = 4;
ABI = 5;
PADDING = 6;
TYPE = 7;
MACHINE = 8;
VERSION_1 = 9;
ENTRY = 10;
PHOFF = 11;
SHOFF = 12;
FLAGS = 13;
EHSIZE = 14;
PHENTSIZE = 15;
PHNUM = 16;
SHENTSIZE = 17;
SHNUM = 18;
SHSTRNDX = 19;