2024 Fall NCU Linux OS Project 2
- Implement a custom wait queue-like functionality in kernel space. Allowing user applications to operate through the system call.
- Add a thread to the wait queue to wait. Remove threads from the wait queue, allowing them exit in FIFO order
Link : Project2.pdf , github , hackmd , DEMO
Environment
OS: Ubuntu 22.04 on VMware workstation 17
ARCH: X86_64
Kernel Version: 5.15.137
以下為kernel space code
wait queue node
typedef struct wait_queue_node {
struct list_head list;
struct task_struct *task;
} wait_queue_node_t;
- 每個node存表示
my_wait_queue
中的一個節點,每個節點對應一個process,裡面存了:list_head list
:使用list_head
(Circular doubly linked list)將所有等待的process接在一起。task_struct *task
: 每個process 指向自己的 process descriptor
global var.
static LIST_HEAD(my_wait_queue); // self-define FIFO Queue
static spinlock_t queue_lock; // protect list_add, list_del
my_wait_queue
: 使用LIST_HEAD
定義了一個空的circularly doubly linked list,作為head node of wait queue。queue_lock
:使用spinlock_t
定義一個spin lock,用於保護multi-threads對my_wait_queue
的操作,防止race condition。- 一次就只會有一個thread 取得鑰匙,目的就是要保護
my_wait_queue
,同時間只會有一個thread 在操作,包含 delete, add 都不能同時間執行
- 一次就只會有一個thread 取得鑰匙,目的就是要保護
enter_wait_queue
static int enter_wait_queue(void)
{
wait_queue_node_t *node;
// 分配新節點
node = kmalloc(sizeof(wait_queue_node_t), GFP_KERNEL);
if (!node)
return 0;
node->task = current;
spin_lock(&queue_lock);
////////// Critical Region ///////////////////
list_add_tail(&node->list, &my_wait_queue); // 加入 FIFO Queue
printk(KERN_INFO "Thread ID %d added to wait queue\n", current->pid);
////////// Critical Region End ///////////////
spin_unlock(&queue_lock);
set_current_state(TASK_INTERRUPTIBLE); // 設定為可中斷的睡眠狀態
schedule(); // 進入睡眠
return 1; // 成功加入
}
- 使用
spin_lock
保護list_add_tail
,同時間只會有一個node 加入my_wait_queue
,確保不會出現race condition。 - 使用
list_add_tail
將節點加入queue尾部。 - 設置process state為
TASK_INTERRUPTIBLE
,並使用schedule()
將當前process 設為睡眠,直到呼叫wake_up_process
才可以離開wait queue。
:::warning
記得 schedule();
要加在spinlock之外,
否則在 critical region 中使用 schedule();
會使 process 占用 lock 不放,導致deadlock 發生
:::
clean_wait_queue
static int clean_wait_queue(void)
{
wait_queue_node_t *node, *temp;
spin_lock(&queue_lock);
////////// Critical Region ///////////////////
list_for_each_entry_safe(node, temp, &my_wait_queue, list) {
list_del(&node->list); // 從 Queue 中移除
printk(KERN_INFO "Thread ID %d removed from wait queue\n", node->task->pid);
wake_up_process(node->task); // 喚醒進程
kfree(node); // 釋放記憶體
msleep(100); // sleep 100 ms
}
////////// Critical Region End ///////////////
spin_unlock(&queue_lock);
return 1; // 成功清理
}
- 使用
list_for_each_entry_safe
遍歷wait queue- 使用
list_del
從queue中移除節點。 - 使用
wake_up_process
喚醒process。 - 釋放節點的記憶體。
- 使用
:::success
這邊加入 msleep(100)
因為在user space 中印出的順序受到multi-thread的影響可能會亂掉
因此即使kernel space 的wait queue已經確保是FIFO的順序移除,但是在User space I/O 卻無法保證照順序印出,
所以在wake_up_process
之後加上睡眠100ms,讓user space 有時間可以印出資訊而不被多執行緒打亂
:::
list_head
/*
* Circular doubly linked list implementation.
*
* Some of the internal functions ("__xxx") are useful when
* manipulating whole lists rather than single entries, as
* sometimes we already know the next/prev entries and we can
* generate better code by using them directly rather than
* using the generic single-entry routines.
*/
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define LIST_HEAD(name) \
struct list_head name = LIST_HEAD_INIT(name)
可以看到這是一個 Circular doubly linked list , 也就是我們要用來儲存wait queue的資料結構,
將linked list的 next 和 prev 都指向變數 name 本身,形成一個空的環狀雙向鏈表。
也就是說,使用 LIST_HEAD
就定義了一個空的雙向鏈表,作為等待Queue的head節點。
list_add_tail
/*
* Insert a new entry between two known consecutive entries.
*
* This is only for internal list manipulation where we know
* the prev/next entries already!
*/
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
if (!__list_add_valid(new, prev, next))
return;
next->prev = new;
new->next = next;
new->prev = prev;
WRITE_ONCE(prev->next, new);
}
// ...
/**
* list_add_tail - add a new entry
* @new: new entry to be added
* @head: list head to add it before
*
* Insert a new entry before the specified head.
* This is useful for implementing queues.
*/
static inline void list_add_tail(struct list_head *new, struct list_head *head)
{
__list_add(new, head->prev, head);
// = add (new, tail-node , head)
}
head
是circular linked list的 head節點- 透過
head->prev
獲取尾節點,並將新節點插入到尾節點和頭節點之間,也就是加入linked list最後面。 __list_add
中,其實就是將new
插入到prev
和next
當中,透過4個操作:- 更新
next->prev = new
和new->next = next
- 更新
new->prev = prev
和prev->next = new
- 更新
圖例:
# circularly linked list
before add: next <-> A <-> B <-> C <->prev
# 加入new node到最後面:
after add : next <-> A <-> B <-> C <-> prev <-> new
list_del
/*
* Delete a list entry by making the prev/next entries point to each other.
* This is only for internal list manipulation where we know
the prev/next entries already!
*/
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev;
WRITE_ONCE(prev->next, next);
}
/*
* Check validity of deletion
* */
static inline void __list_del_entry(struct list_head *entry)
{
if (!__list_del_entry_valid(entry))
return;
__list_del(entry->prev, entry->next);
}
/**
* list_del - deletes entry from list.
* @entry: the element to delete from the list.
* Note: list_empty() on entry does not return true after this, the entry is
* in an undefined state.
*/
static inline void list_del(struct list_head *entry)
{
__list_del_entry(entry);
entry->next = LIST_POISON1; // (void *) 0x00100100, invalid addr.
entry->prev = LIST_POISON2; // (void *) 0x00200200, invalid addr.
}
- 從雙向鏈表(
list_head
)中刪除指定的節點。- 透過
next->prev = prev
及prev->next = next
達到delete node
- 透過
- 使用
LIST_POISON1
及LIST_POISON2
明確標記已刪除的節點,便於檢測誤用
圖例:
# circularly linked list
before delete: head <-> A <-> prev <-> dnode <-> next
# Delete dnode:
after delete : head <-> A <-> prev <-> next
list_for_each_entry_safe
/**
* list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
* @pos: the type * to use as a loop cursor.
* @n: another type * to use as temporary storage
* @head: the head for your list.
* @member: the name of the list_head within the struct.
*/
#define list_for_each_entry_safe(pos, n, head, member) \
for (pos = list_first_entry(head, typeof(*pos), member), \
n = list_next_entry(pos, member); \
!list_entry_is_head(pos, head, member); \
pos = n, n = list_next_entry(n, member))
pos
保存當前節點,n
保存pos
的下一個節點(通常用存放在tmp
)。n
確保當前節點(pos
)刪除後,仍然能獲取下一個節點的地址。- 刪除
pos
後,會透過pos=n
及n=list_next_entry()
一棟到下一個node,因此可以確保刪除的正確性.
:::spoiler wait_queue.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/syscall.h>
#define NUM_THREADS 10
#define SYS_CALL_MY_WAIT_QUEUE 452
void *enter_wait_queue(void *thread_id)
{
pid_t tid = syscall(SYS_gettid); // get thread id
fprintf(stderr, "enter wait queue thread_id: %d, current thread id = %d\n", *(int *)thread_id, tid);
int result; //
result = syscall(SYS_CALL_MY_WAIT_QUEUE, 1);
fprintf(stderr, "exit wait queue thread_id: %d, current thread id = %d\n", *(int *)thread_id, tid);
}
void *clean_wait_queue()
{
int result; //
result = syscall(SYS_CALL_MY_WAIT_QUEUE, 2);
}
int main()
{
void *ret;
pthread_t id[NUM_THREADS];
int thread_args[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++)
{
thread_args[i] = i;
pthread_create(&id[i], NULL, enter_wait_queue, (void *)&thread_args[i]);
}
sleep(1);
fprintf(stderr, "start clean queue ...\n");
clean_wait_queue();
for (int i = 0; i < NUM_THREADS; i++)
{
pthread_join(id[i], &ret);
}
return 0;
}
:::
User space result:
透過原本的User space code for迴圈中 i
值以及真正的thread id來檢查,
進入wait queue以及離開wait queue的順序是保持FIFO
Kernel space result:
:::spoiler my_wait_queue.c
#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
#include <linux/delay.h>
static int condition = 0;
DECLARE_WAIT_QUEUE_HEAD(my_wait_queue);
static DEFINE_MUTEX(my_wait_queue_lock);
static int enter_wait_queue(void)
{
condition = 0;
DECLARE_WAITQUEUE(wait_entry,current);
mutex_lock(&my_wait_queue_lock); // 加鎖
add_wait_queue_exclusive(&my_wait_queue, &wait_entry);
printk("before join wait queue, %d",current->pid);
wait_event_exclusive_cmd(my_wait_queue,condition,mutex_unlock(&my_wait_queue_lock);,mutex_lock(&my_wait_queue_lock););
printk("after waiting event, %d",current->pid);
remove_wait_queue(&my_wait_queue,&wait_entry);
if(waitqueue_active(&my_wait_queue)!=0){
msleep(100);
wake_up(&my_wait_queue);
}
mutex_unlock(&my_wait_queue_lock); // 加鎖
return 0;
}
static int clean_wait_queue(void)
{
printk("start waking up process.");
mutex_lock(&my_wait_queue_lock); // 加鎖
condition = 1;
if(waitqueue_active(&my_wait_queue)!=0)
wake_up(&my_wait_queue);
mutex_unlock(&my_wait_queue_lock); // 加鎖
return 0;
}
SYSCALL_DEFINE1(call_my_wait_queue, int, id)
{
switch (id){
case 1:
enter_wait_queue();
break;
case 2:
clean_wait_queue();
break;
}
return 0;
}
:::
static int condition = 0;
DECLARE_WAIT_QUEUE_HEAD(my_wait_queue);
static DEFINE_MUTEX(my_wait_queue_lock);
此區塊會初始化變數
6:
首先將用於wait_event()
判斷process是否擁有所需的資源的condition
變數設為0(告訴wait_event目前process無法取得所需資源,需要進行wait)。
7:
透過DECLARE_WAIT_QUEUE_HEAD()
來建立一個作為全域變數的wait queue head實例:my_wait_queue
。
8:
DEFINE一個static的Mutex lock實例:my_wait_queue_lock
。
static int enter_wait_queue(void)
{
condition = 0;
DECLARE_WAITQUEUE(wait_entry,current);
mutex_lock(&my_wait_queue_lock); // 加鎖
add_wait_queue_exclusive(&my_wait_queue, &wait_entry);
printk("before join wait queue, %d",current->pid);
wait_event_exclusive_cmd(my_wait_queue,condition,mutex_unlock(&my_wait_queue_lock);,mutex_lock(&my_wait_queue_lock););
printk("after waiting event, %d",current->pid);
remove_wait_queue(&my_wait_queue,&wait_entry);
if(waitqueue_active(&my_wait_queue)!=0){
msleep(100);
wake_up(&my_wait_queue);
}
mutex_unlock(&my_wait_queue_lock); // 加鎖
return 0;
}
11:
將condition
設回0以確保想加入wait queue的process能夠正確進入wait狀態。
12:
透過wait.h
中的DECLARE_WAITQUEUE()
macro來建立一個新的wait queue entry。
13:
加鎖以確保進行add_wait_queue()
操作時不會有競爭條件 (Race Condition)的情況發生。
14:
透過wait.h
中的function add_wait_queue_exclusive()
來將process加入wait queue。之所以會選擇exclusive加入的原因一方面是讓之後可以直接使用wake_up()
就可以每次叫醒一個process進行執行(減少規定不能改動的user space code中printf順序出錯的機率),另一方面是考量add_wait_queue_exclusive()
本身會將要進入wait_queue的process串接在wait_queue最後方的性質,正好符合這次題目FIFO的需求。
17:
透過同樣於wait.h
中的functionwait_event_exclusive_cmd()
來讓process進入wait狀態(因為沒有所需資源(condition=0),故讓process wait),而最後面的兩個cmd分別會於process開始wait前;process被叫醒時執行,這裡設定於開始wait前將mutex lock解鎖,並於被叫醒後嘗試獲得mutex lock。 當process進入wait狀態後實際上被暫停在這行
18:
當process被叫醒後會由這行繼續執行(得到mutex lock後)。
19:
透過wait.h
中的function remove_wait_queue()
將process由wait queue中移除。
20~23:
讓此process檢查,假設waitqueue中還有其他在wait的process,就將其叫醒(同樣是應對題目不可修改的Userspace code中printf順續亂掉的問題)。
24:
在此被叫醒的process返回userspace繼續執行前將mutex lock解鎖讓剛剛wake_up()的其他process可以開始執行。
static int clean_wait_queue(void)
{
printk("start waking up process.");
mutex_lock(&my_wait_queue_lock); // 加鎖
condition = 1;
if(waitqueue_active(&my_wait_queue)!=0)
wake_up(&my_wait_queue);
mutex_unlock(&my_wait_queue_lock); // 加鎖
return 0;
}
此function內的實作與上方enter_wait_queue()
中負責叫醒其他process的19~24
行相同,主要用於叫醒第一個process。