...: Linux Kernel Development by Robert Love

Process Address Space

Process address space consists of virtual memory addressable by a process. The intervals of legal addresses are called memory areas. A memory area has associated permissions - read, write and execute. An invalid access to a memory area e.g. write in a read only are will lead to Segmentation Fault. A process can dynamically add or remove memory areas to its address space.

Memory areas contain -
- text section - memory map of the executable code.
- data section - memory map of initialized global variables.
- bss section - memory map of uninitialized global variables which are zeroed out.
- user space stack - memory map of zeroed space for the process use stack.
- additional text, data and bss section for each shared library.
- memory mapped files
- shared memory segments

Memory descriptor struct is used by the kernel to represent a process's address space. Threads spawned by a process refer to the same address space. When all threads referring an address space exit, the kernel frees the memory descriptor.

Note - kernel threads do not have a process address space and thus no memory descriptor.

The memory descriptor struct contains a list of VMA structs (virtual memory area). Each VMA struct represents a contiguous memory interval in a given address space with certain flags and permissions -
- VM_READ - read permissions on pages
- VM_WRITE - write permission son pages
- VM_EXEC - execute permission on pages
- VM_SHARED - shared with other processes
- etc.

This list of VMA structs is sorted in ascending addresses. There's also a Red-Black Tree which points to the same VMA structs. The list used when iterating over all memory areas and rbTree when a specif VMA needs to be accessed.

Note - the memory space for a process can be viewed by cat /proc/<pid>/maps or pmap <pid> command.

A process accesses memory in the VMA, this address needs to be converted to actual physical address. This mapping is done by page tables which are usually 3 levels of lookup. These lookups are done very often and there is hardware support for them i.e. Transition Lookaside Buffers (TLB) are caches of past lookups.

Page Cache and Page Writeback

Disk Access time > RAM access > L2 Cache > L1 Cache

The Page Cache attempts to reduce disk access by caching pages in memory. Storage device being cached is known as the backing store. When kernel needs to access the disk, it first checks the cache if the page is found its a Cache Hit otherwise its a Cache Miss. In case of a miss, the kernel schedules a Block I/O operation. The file read is also cached so that subsequent access are likely to be filled by the cache.

Write caching is implemented as one of these strategies -
- No Write - all write operations are put to disk, invalidating the cache and requiring to be read again from disk on subsequent reads.
- Write Through - write operations immediately go through cache to the disk. The cache is kept synchronized.
- Write Back - write operations are done only in cache. The backing store is not immediately updated. The written to pages are marked dirty and periodically the dirty pages are written to disk.

Cache Eviction - very often the kernel has to remove pages from the cache to make more memory available for other processes or to load other pages. Few strategies used for deciding which pages to evict are -
- Least Recently Used (LRU) - all pages in the cache are time stamped for when they were last accessed. A page accessed recently is likely to be accessed again (temporal). However a lot of files are only accessed once and thus end up on the top of LRU list, making it an inappropriate strategy.
- Two List Strategy - is modified LRU where two lists - active and inactive lists are maintained. A page is placed on active list if it s accessed and already exists on inactive list. Pages are first evicted from inactive list and then active list. Better name for the two list is Priority 1 and Priority 2.

Dirty Page Writeback occurs in three situations -
- free memory shrinks below a threshold.
- dirty data grows older than a threshold.
- user process request an on-demand writeback.
A gang of flusher threads perform these jobs.

Laptop Mode - in this mode writeback behavior is modified to reduce disk spinning by piggybacking any disk I/O with flushing of all dirty pages. This reduces overall disk I/O by reducing the spinning and thus reduces power requirements.

Note - if the file /proc/sys/vm/laptop.mode is set to 1 it enables laptop mode.

Devices

Linux defines the following types -
- Block Devices - blkdevs - accessible via VFS, allow seek and non-contiguous access.
- Character Devices - cdevs - access a stream of data/bytes.
- Network Devices - access a network via physical adapter (network card) and a specific protocol (e.g. IP).
- Misscellaneous Devices - miscdevs - simplified form of cdevs
- Pseudo Devices - virtual devices that provide access to a kernel functionality through their device drivers -
-- Kernel Random Number Genrator - /dev/random and /dev/urandom
-- Null Device - /dev/null
-- Zero Device - /dev/zero
-- Full Device - /dev/full
-- Memory Device /dev/mem

Linux has a unified device model allowing us to view all the devices, types, drivers, status, etc. The model is represented using -
- KObjects - kernel objects
- KTypes - kernel types
- KSets - kernel sets
The sysfs is vritual file system (in-memory) that represents all the KObjects in the system. sysfs is mounted at /sys. /sys/devices is the device model.

Klogd daemon retrieves kernel log messages from the log buffer (/proc/kmsg file) and appends them to the system log file /var/log/messages

Linux Kernel Development by Robert Love - 4