At any point of time, the processors is doing one of these three -
- In User-Space, executing user code in a process.
- In Kernel-Space, in process context executing on behalf of a process.
- In Kernel-Space, in interrupt context and not associated with a process i.e. handling an interrupt.
Note -
- kernel.org - kernel source code
- lxr.linux.no - online source code browsing
- lkml.org - linux kernel mail list archive
Linux source code directories
/arch - architecture specific code
/block - block I/O layer
/crypto - crypto api
/Documentation - kernel documentation
/drivers - device drivers
/firmware - device firmware needed for certain drivers
/fs - virtual file system and other file systems
/include - kernel headers
/init - kernel boot and initialization
/ipc - inter-process communication
/kernel - core subsystems like scheduler
/lib - helper routines
/mm - memory management
/net - networking
/virt - virtualization infrastructure
/scripts - scripts to build kernel
/tools - tools helpful for developing linux
/security - security module
Building the Kernel
Configure it using one of these options -
- make config
- make menuconfig
- make gconfig
- make defconfig
Current system config is at /proc/config.gz
Installing the Kernel
- Copy the bzimage from arch/i386/boot to /boot.
- See /boot for existing installed kernel.
- Edit /boot/grub/grub.conf or /etc/lilo.conf to modify the kernel to load.
- To install modules, run make modules_install
Threads & Processes
Linux treats threads and processes as the same. Processes are one or more threads sharing the same resources.
A new process is created by calling fork() and then immediately exec(). fork() returns twice, once in the calling (parent) thread and again in the newly created (child) thread. Call to exec() loads a new program and allocates a different set of resources. Linux allows the child thread to run first i.e. right after the call to fork(). This allows the child to invoke exec() before fork() returns in the parent. This tweak, allows the child to load a new program thus allocating its own set of resources and not sharing them with the parent thread.
If the parent thread were allowed to run first and if it were to write to its memory this would require a copy of the memory for the child thread.
Maximum number of processes in a system is limited by the value configured at /proc/sys/kernel/pid_max
A process is in one of these five states -
- TASK_RUNNING - currently running or ready to run in the run queue waiting to be scheduled.
- TASK_INTERUPTIBLE - currently sleeping, waiting for a condition to exist. When the condition exists, the state is changed to TASK_RUNNING
- TASK_UNINTERUPTIBLE - identical to TASK_INTERUPTIBLE except that the task does not wake up if it receives a signal. This is used when the process must wait without interruption or when the signal is expected to come in quickly.
- _TASK_TRACED - being traced by another process via ptrace to debug.
- _TASK_STOPPED - execution stopped, not running and not eligible to run e.g. when a signal is received by a debugged task.
A normal process runs in User Space. When the process executes a system call or triggers an exception, it enters the Kernel Space. At this point the kernel is executing on behalf of the process. These are the only two entry points for Kernel Space.
Kernel Threads are standard processes which exist solely in kernel space and are used to perform operations in the background. They can be created by another kernel thread. All kernel threads are created by the kthreadd process. Kernel threads are visible in the output of ps -ef command, their names start with knnnnn e.g. ksoftirqd/1.
A process exits/dies by using the system call exit(). This is placed by the C compiler after main() returns. If a parent process exits before a child, then all its children are re-parented to the parent of the parent or init process if all parents have exited.
Scheduler
Linux uses a preemptive scheduler which allocates time slices to each process, it does not wait for the process to yield the processor.
Processes are either I/O bound i.e. they block often for I/O or they are Processor bound i.e. they spend most of their time executing code.
Nice Values have a range of -20 to +19, the default is 0. Large nice values imply lower priority. Use ps -el to view nice values under the NI column.
Real Time Priority has a range of 0 to 99. High values imply higher priority. A value of '-' implies the process isn't real time. Use ps -eo uid,pid,ppid,rtprio to see real time priorities under RTPRIO column.
Completely Fair Scheduler (CFS) - assigns a proportionate time slice to each process, instead of an absolute time slice value. Two processes of the same nice value will be assigned 50% of the processor. If one of them is I/O bound it will block often i.e. sleep for an event to occur. If the other is processor bound it will run more often and possible exceed 50% of the processor time if the I/O bound process has no event to process. Every time a process sate changes to TASK_RUNNING, the CFS will check if it has used up or how much of its proportionate time slice it has used and if that is a small value compared to the currently running process, it will preempt the currently executing process.
Kernel can preempt a process if it does not hold a lock. Locks are markers for regions of non-preemptability.
Note - these are kernel locks and not C++ library locks.
Kernel preemption occurs -
- when interrupt handlers exit, before reutrning to kernel space
- when kernel code becomes preemptible
- when a task explicitly calls schedule()
- when a task blocks, which results in a call to schedule()
A normal process is of type SCHED_NORMAL. These type of processes are scheduled by the CFS. Additionally, linux provides two other types SCHED_FIFO and SCHED_RR for real time processes. These processes are scheduled by the real time scheduler.
Todo - read more on real time scheduler
SCHED_FIFO tasks can run indefinitely without any time slices. They can be preempted by a SCHED_FIFO or SCHED_RR task with a higher priority.
SCHED_RR tasks can run till they exhaust their predetermined time slice. Preemption is similar to SCHED_FIFO tasks.
- In User-Space, executing user code in a process.
- In Kernel-Space, in process context executing on behalf of a process.
- In Kernel-Space, in interrupt context and not associated with a process i.e. handling an interrupt.
Note -
- kernel.org - kernel source code
- lxr.linux.no - online source code browsing
- lkml.org - linux kernel mail list archive
Linux source code directories
/arch - architecture specific code
/block - block I/O layer
/crypto - crypto api
/Documentation - kernel documentation
/drivers - device drivers
/firmware - device firmware needed for certain drivers
/fs - virtual file system and other file systems
/include - kernel headers
/init - kernel boot and initialization
/ipc - inter-process communication
/kernel - core subsystems like scheduler
/lib - helper routines
/mm - memory management
/net - networking
/virt - virtualization infrastructure
/scripts - scripts to build kernel
/tools - tools helpful for developing linux
/security - security module
Building the Kernel
Configure it using one of these options -
- make config
- make menuconfig
- make gconfig
- make defconfig
Current system config is at /proc/config.gz
Installing the Kernel
- Copy the bzimage from arch/i386/boot to /boot.
- See /boot for existing installed kernel.
- Edit /boot/grub/grub.conf or /etc/lilo.conf to modify the kernel to load.
- To install modules, run make modules_install
Threads & Processes
Linux treats threads and processes as the same. Processes are one or more threads sharing the same resources.
A new process is created by calling fork() and then immediately exec(). fork() returns twice, once in the calling (parent) thread and again in the newly created (child) thread. Call to exec() loads a new program and allocates a different set of resources. Linux allows the child thread to run first i.e. right after the call to fork(). This allows the child to invoke exec() before fork() returns in the parent. This tweak, allows the child to load a new program thus allocating its own set of resources and not sharing them with the parent thread.
If the parent thread were allowed to run first and if it were to write to its memory this would require a copy of the memory for the child thread.
Maximum number of processes in a system is limited by the value configured at /proc/sys/kernel/pid_max
A process is in one of these five states -
- TASK_RUNNING - currently running or ready to run in the run queue waiting to be scheduled.
- TASK_INTERUPTIBLE - currently sleeping, waiting for a condition to exist. When the condition exists, the state is changed to TASK_RUNNING
- TASK_UNINTERUPTIBLE - identical to TASK_INTERUPTIBLE except that the task does not wake up if it receives a signal. This is used when the process must wait without interruption or when the signal is expected to come in quickly.
- _TASK_TRACED - being traced by another process via ptrace to debug.
- _TASK_STOPPED - execution stopped, not running and not eligible to run e.g. when a signal is received by a debugged task.
A normal process runs in User Space. When the process executes a system call or triggers an exception, it enters the Kernel Space. At this point the kernel is executing on behalf of the process. These are the only two entry points for Kernel Space.
Kernel Threads are standard processes which exist solely in kernel space and are used to perform operations in the background. They can be created by another kernel thread. All kernel threads are created by the kthreadd process. Kernel threads are visible in the output of ps -ef command, their names start with knnnnn e.g. ksoftirqd/1.
A process exits/dies by using the system call exit(). This is placed by the C compiler after main() returns. If a parent process exits before a child, then all its children are re-parented to the parent of the parent or init process if all parents have exited.
Scheduler
Linux uses a preemptive scheduler which allocates time slices to each process, it does not wait for the process to yield the processor.
Processes are either I/O bound i.e. they block often for I/O or they are Processor bound i.e. they spend most of their time executing code.
Nice Values have a range of -20 to +19, the default is 0. Large nice values imply lower priority. Use ps -el to view nice values under the NI column.
Real Time Priority has a range of 0 to 99. High values imply higher priority. A value of '-' implies the process isn't real time. Use ps -eo uid,pid,ppid,rtprio to see real time priorities under RTPRIO column.
Completely Fair Scheduler (CFS) - assigns a proportionate time slice to each process, instead of an absolute time slice value. Two processes of the same nice value will be assigned 50% of the processor. If one of them is I/O bound it will block often i.e. sleep for an event to occur. If the other is processor bound it will run more often and possible exceed 50% of the processor time if the I/O bound process has no event to process. Every time a process sate changes to TASK_RUNNING, the CFS will check if it has used up or how much of its proportionate time slice it has used and if that is a small value compared to the currently running process, it will preempt the currently executing process.
Kernel can preempt a process if it does not hold a lock. Locks are markers for regions of non-preemptability.
Note - these are kernel locks and not C++ library locks.
Kernel preemption occurs -
- when interrupt handlers exit, before reutrning to kernel space
- when kernel code becomes preemptible
- when a task explicitly calls schedule()
- when a task blocks, which results in a call to schedule()
A normal process is of type SCHED_NORMAL. These type of processes are scheduled by the CFS. Additionally, linux provides two other types SCHED_FIFO and SCHED_RR for real time processes. These processes are scheduled by the real time scheduler.
Todo - read more on real time scheduler
SCHED_FIFO tasks can run indefinitely without any time slices. They can be preempted by a SCHED_FIFO or SCHED_RR task with a higher priority.
SCHED_RR tasks can run till they exhaust their predetermined time slice. Preemption is similar to SCHED_FIFO tasks.