Unikraft函数调用

Unikraft函数调用

1、论文相关

  • 每个实现系统调用处理程序的库,都通过宏注册到syscall shim这个库中,然后shim层生成lib级别的系统调用接口,当使用unikraft本地编译应用程序源文件时,直接链接到系统调用,从而将系统调用转化为廉价的函数调用

2、源码相关

1、lib/syscall_shim

1、syscall_shim/syscall.h.in
  • 与Linux相似,unikraft中也有系统调用号,以NR_开头
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* SPDX-License-Identifier: BSD-2-Clause */
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
#define __NR_lseek 19
#define __NR_getpid 20
#define __NR_mount 21
#define __NR_setuid 23
#define __NR_getuid 24
#define __NR_ptrace 26
#define __NR_pause 29
#define __NR_access 33
#define __NR_nice 34
#define __NR_sync 36
#define __NR_kill 37
#define __NR_rename 38
#define __NR_mkdir 39
#define __NR_rmdir 40
#define __NR_dup 41
#define __NR_pipe 42
#define __NR_times 43
#define __NR_brk 45
#define __NR_setgid 46
#define __NR_getgid 47
#define __NR_geteuid 49
#define __NR_getegid 50
...
2、syscall_shim下的Config.uk:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
if LIBSYSCALL_SHIM
# Hidden configuration option that can be set by libc's in order to
# switch off the generation of libc-style wrapper symbols
config LIBSYSCALL_SHIM_NOWRAPPER
bool
default n

config LIBSYSCALL_SHIM_LIBCSTUBS
depends on !LIBSYSCALL_SHIM_NOWRAPPER
bool "Provide libc-style stubs"
default n
help
Automatically generate libc-style stubs for unavailable
system calls. The aim is to provide all libc-style system
call symbols although just a subset of the full API may be
implemtented. The symbols are defined as `weak`.
Please note that depending on the used compiler and optimization
options, this functionality may sometimes cause linking failures
because of double definitions of symbols. This is the case when
another library is providing some libc-style system calls
without registering them to libsyscall_shim.
  • 最后一段提到这里会为不可用的系统调用自动生成 libc -style的存根stubs, 但是由于符号的二重定义,此功能有时可能会导致链接失败。 当这种情况时,另一个库会提供一些 libc -style的系统调用,并且无需将它们注册到 syscall_shim库。也就是会有部分库中涉及到的系统调用会在syscall_shim这里通过宏注册,但是也会有部分系统调用会直接在一些库中实现,并没有通过宏定义。

  • 接着:

1
2
3
4
5
6
7
8
9
10
11
config LIBSYSCALL_SHIM_HANDLER
bool "Binary system call handler (Linux ABI)"
default n
depends on ARCH_X86_64
select HAVE_SYSCALL
help
Enables a system call handler for binary system call
requests (e.g., sysenter/sysexit). The handler maps
register values accordingly to the Linux ABI standard
(see: man syscalls[2]).

  • 为二进制系统调用启用系统调用处理程序请求(例如,sysenter/sysexit)。 处理程序映射根据 Linux ABI 标准注册值
3、syscall_shim中的awk文件
  • 在syscall_shim下有很多awk脚本,会生成一些关于系统调用名称、调用函数的头文件和程序(后面有图)

  • img

  • 如在gen_syscall_nrs.awk中生成定义系统调用名称的程序

1
2
3
4
5
6
7
8
9
10
11
12
13
BEGIN {
print "/* Automatically generated file; DO NOT EDIT */"
print "#ifndef __LIBSYSCALL_SHIM_SYSCALL_NRS_H__"
print "#define __LIBSYSCALL_SHIM_SYSCALL_NRS_H__"
}

/#define __NR_/{
printf "\n#define SYS_%s\t\t%s", substr($2,6),$3 //定义系统调用名称
}

END {
print "\n\n#endif /* __LIBSYSCALL_SHIM_SYSCALL_NRS_H__ */"
}
  • 在gen_provided.awk中生成已经提供的系统调用的头文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
BEGIN {
print "/* Automatically generated file; DO NOT EDIT */"
print "#ifndef __LIBSYSCALL_SHIM_PROVIDED_SYSCALLS_H__"
print "#define __LIBSYSCALL_SHIM_PROVIDED_SYSCALLS_H__"
print "\n#include <uk/bits/syscall_nrs.h>"
}

/[a-zA-Z0-9]+-[0-9]+/{
# check if the syscall is not defined
printf "\n#ifndef SYS_%s\n", $1;
# if a LEGACY_<syscall_name> symbol is defined, the syscall is not required
# and we can continue the build process
printf "#ifdef LEGACY_SYS_%s\n", $1;
printf "#ifdef CONFIG_LIBSYSCALL_SHIM_LEGACY_VERBOSE\n"
printf "#warning Ignoring legacy system call '%s': No system call number available\n", $1
printf "#endif /* LIBSYSCALL_SHIM_LEGACY_VERBOSE */\n"
# if the LEGACY symbol is not defined for this syscall, it means that the
# syscall is required and we can't continue without a definition
printf "#else\n";
printf "#error Failed to map system call '%s': No system call number available\n", $1
printf "#endif /* LEGACY_SYS_%s */\n", $1
printf "#else\n";
printf "#define HAVE_uk_syscall_%s t\n", $1;
printf "UK_SYSCALL_E_PROTO(%s, %s);\n", $2, $1;
printf "UK_SYSCALL_R_PROTO(%s, %s);\n", $2, $1;
printf "#endif /* !SYS_%s */\n", $1;
}

END {
print "\n#endif /* __LIBSYSCALL_SHIM_PROVIDED_SYSCALLS_H__ */"
}

  • 在stubs、map、syscall_r、syscall_r_fn等awk中根据提供的系统调用名,定义(或者说注册)uk_syscall_e_*以及uk_syscall_r_*(*表示系统调用的名称)一系列宏定义函数,在其它库的exportsysm.uk文件中能找到
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
gen_syscall_map.awk:

BEGIN {print "/* Automatically generated file; DO NOT EDIT */\n"}
/#define __NR_/{
printf "#define uk_syscall_fn_%s(...) uk_syscall_e_%s(__VA_ARGS__)\n", $3,substr($2,6)
printf "#define uk_syscall_r_fn_%s(...) uk_syscall_r_%s(__VA_ARGS__)\n", $3,substr($2,6)
}

gen_uk_syscall_r.awk:

/[a-zA-Z0-9]+-[0-9]+/{
name = $1
sys_name = "SYS_" name
uk_syscall_r = "uk_syscall_r_" name //根据provided系统调用名进行注册
args_nr = $2 + 0
printf "#ifdef HAVE_uk_syscall_%s\n", name;
printf "\tcase %s:\n", sys_name;
for (i = 1; i <= args_nr; i++)
printf "\t\ta%s = va_arg(arg, long);\n", i;
printf "\t\treturn %s(", uk_syscall_r;
for (i = 1; i < args_nr; i++)
printf "a%d, ", i;
if (args_nr > 0)
printf "a%d", args_nr;
printf(");\n")
printf "#endif /* HAVE_uk_syscall_%s */\n\n", name;
}
  • 在syscall_shim下的Makefile.uk文件中出现相关头文件和程序:

  • img

4、legacy_syscall.h
  • 这里定义了一些被新的系统调用实现的一些遗失调用(在前面的一些实现中会提及查阅legacy头文件部分并跳过)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/**
* This file defines the syscalls that are marked as legacy.
* 标记为lagacy的系统调用?遗留?
*
* A legacy syscall is a syscall that is missing on some architectures, but
* it's functionality can be implemented using a newer syscall. Therefore,
* this syscalls will not generate build failures on the architectures that
* don't implement them.
* 在一些体系结构上miss的系统调用,但他们的功能会被更新的系统调用实现,
* 因此这个系统调用不会在没有实现它的架构上产生构建失败
*
* The format for marking a system call as legacy is the following one:
* `#define LEGACY_SYS_<syscall_name>` (see the list below).
*/

#ifndef __UK_LEGACY_SYSCALL_H__
#define __UK_LEGACY_SYSCALL_H__

#define LEGACY_SYS_fork /* modern: clone */
#define LEGACY_SYS_vfork /* modern: clone */
#define LEGACY_SYS_getpgrp /* modern: getpgid */
#define LEGACY_SYS_readlink /* modern: readlinkat */
#define LEGACY_SYS_symlink /* modern: symlinkat */
#define LEGACY_SYS_unlink /* modern: unlinkat */
#define LEGACY_SYS_link /* modern: linkat */
#define LEGACY_SYS_access /* modern: faccessat */
#define LEGACY_SYS_open /* modern: openat */
#define LEGACY_SYS_creat /* modern: openat */
#define LEGACY_SYS_mkdir /* modern: mkdirat */
#define LEGACY_SYS_rmdir /* modern: unlinkat */
#define LEGACY_SYS_mknod /* modern: mknodat */
#define LEGACY_SYS_chmod /* modern: fchmodat */
#define LEGACY_SYS_lchown /* modern: fchownat */
#define LEGACY_SYS_chown /* modern: fchownat */
#define LEGACY_SYS_rename /* modern: renameat */
#define LEGACY_SYS_lstat /* modern: fstatat */
#define LEGACY_SYS_stat /* modern: fstatat */
#define LEGACY_SYS_getdents /* modern: getdents64 */
#define LEGACY_SYS_time /* modern: gettimeofday */
#define LEGACY_SYS_futimesat /* modern: utimensat */
#define LEGACY_SYS_utime /* modern: utimensat */
#define LEGACY_SYS_utimes /* modern: utimensat */
#define LEGACY_SYS_dup2 /* modern: dup3 */
#define LEGACY_SYS_pipe /* modern: pipe2 */
#define LEGACY_SYS_pause /* modern: sigsuspend */
#define LEGACY_SYS_alarm /* modern: timer_settime */

#endif
5、syscall.h
  • 在这个头文件中是大量的宏定义,包括了原始的系统调用样式(直接将函数调用映射到目标程序)、包括/不包括lib风格封装的两种宏定义、以及一些错误的汇总
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/* System call, returns -1 and sets errno on errors */
long uk_syscall(long nr, ...);
long uk_vsyscall(long nr, va_list arg);
long uk_syscall6(long nr, long arg1, long arg2, long arg3,
long arg4, long arg5, long arg6);

/*
* Use this variant instead of `uk_syscall()` whenever the system call number
* is a constant. This macro maps the function call directly to the target
* handler instead of doing a look-up at runtime
*/
#define uk_syscall_static(...) \
UK_CONCAT(__uk_syscall, __UK_SYSCALL_NARGS(__VA_ARGS__))(__VA_ARGS__)

/* Raw system call, returns negative codes on errors */
long uk_syscall_r(long nr, ...);
long uk_vsyscall_r(long nr, va_list arg);
long uk_syscall6_r(long nr, long arg1, long arg2, long arg3,
long arg4, long arg5, long arg6);

/*
* Use this variant instead of `uk_syscall_r()` whenever the system call number
* is a constant. This macro maps the function call directly to the target
* handler instead of doing a look-up at runtime
*/
#define uk_syscall_r_static(...) \
UK_CONCAT(__uk_syscall, \
UK_CONCAT(__UK_SYSCALL_NARGS(__VA_ARGS__)), _r)(__VA_ARGS__)

6、uk_syscall_e_*和uk_syscall_r_*
  • uk_syscall_e_*是通过宏UK_LLSYSCALL_R_DEFINE展开的,uk_syscall_r_*是通过宏UK_SYSCALL_R_DEFINE展开的

  • 在syscall.h中对这两种宏的定义:

  • UK_LLSYSCALL_R_DEFINE()不提供libc风格的封装

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    /*
    * UK_LLSYSCALL_R_DEFINE()
    * Low-level variant, does not provide a libc-style wrapper
    */
    #define __UK_LLSYSCALL_R_DEFINE(x, rtype, name, ename, rname, ...) \
    long rname(UK_ARG_MAPx(x, UK_S_ARG_LONG, __VA_ARGS__)); \
    long ename(UK_ARG_MAPx(x, UK_S_ARG_LONG, __VA_ARGS__)) \
    { \
    long ret = rname( \
    UK_ARG_MAPx(x, UK_S_ARG_CAST_LONG, __VA_ARGS__)); \
    if (ret < 0 && PTRISERR(ret)) { \
    errno = -(int) PTR2ERR(ret); \
    return -1; \
    } \
    return ret; \
    }
    ...

    而UK_SYSCALL_R_DEFINE提供:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    /*
    * UK_SYSCALL_R_DEFINE()
    * Based on UK_LLSYSCALL_R_DEFINE and provides a libc-style wrapper
    * in case UK_LIBC_SYSCALLS is enabled
    */
    #if UK_LIBC_SYSCALLS
    #define __UK_SYSCALL_R_DEFINE(x, rtype, name, ename, rname, ...) \
    long ename(UK_ARG_MAPx(x, UK_S_ARG_LONG, __VA_ARGS__)); \
    rtype name(UK_ARG_MAPx(x, UK_S_ARG_ACTUAL, __VA_ARGS__)) \
    { \
    return (rtype) ename( \
    UK_ARG_MAPx(x, UK_S_ARG_CAST_LONG, __VA_ARGS__)); \
    } \
    __UK_LLSYSCALL_R_DEFINE(x, rtype, name, ename, rname, __VA_ARGS__)
    #define _UK_SYSCALL_R_DEFINE(...) __UK_SYSCALL_R_DEFINE(__VA_ARGS__)
    ...
  • 如在vfscore\main.c里面可以看到,像uk_syscall_e_*都是在它定义的open()、ioctl()这样的接口里面调用的,相当于是需要在uk_syscall_e_*这个外面再封装一层才能给应用程序调用,而uk_syscall_r_*则是直接可以被调用的

2、lib层的系统调用实现

  • 在syscall_shim层注册了一些提供的系统调用名称以及相关宏定义之后,在其他具体涉及到相关系统调用的lib中进行实现,这里以vfscore举例说明
1、vfscore/vfs.h
  • 在vfscore/vfs.h中定义了系统调用函数原型:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/* * per task data
*/
struct task {
char t_cwd[PATH_MAX]; /* current working directory */
struct vfscore_file *t_cwdfp; /* directory for cwd */
};

int sys_open(char *path, int flags, mode_t mode, struct vfscore_file **fp);
int sys_read(struct vfscore_file *fp, const struct iovec *iov, size_t niov,
off_t offset, size_t *count);
int sys_write(struct vfscore_file *fp, const struct iovec *iov, size_t niov,
off_t offset, size_t *count);
int sys_lseek(struct vfscore_file *fp, off_t off, int type, off_t * cur_off);
int sys_ioctl(struct vfscore_file *fp, unsigned long request, void *buf);
int sys_fstat(struct vfscore_file *fp, struct stat *st);
int sys_fstatfs(struct vfscore_file *fp, struct statfs *buf);
int sys_fsync(struct vfscore_file *fp);
int sys_ftruncate(struct vfscore_file *fp, off_t length);
2、vfscore/syscall.c
  • 在vfscore/syscall.c中具体实现,如sys_open():
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
/*
* vfs_syscalls.c - everything in this file is a routine implementing
* a VFS system call.
*/
...
int
sys_open(char *path, int flags, mode_t mode, struct vfscore_file **fpp)
{
struct vfscore_file *fp;
struct dentry *dp, *ddp;
struct vnode *vp;
char *filename;
int error;

DPRINTF(VFSDB_SYSCALL, ("sys_open: path=%s flags=%x mode=%x\n",
path, flags, mode));

flags = vfscore_fflags(flags);
if (flags & O_CREAT) {
error = namei(path, &dp);
if (error == ENOENT) {
/* Create new struct vfscore_file. */
if ((error = lookup(path, &ddp, &filename)) != 0)
return error;

vn_lock(ddp->d_vnode);
if ((error = vn_access(ddp->d_vnode, VWRITE)) != 0) {
vn_unlock(ddp->d_vnode);
drele(ddp);
return error;
}
mode &= ~S_IFMT;
mode |= S_IFREG;
error = VOP_CREATE(ddp->d_vnode, filename, mode);
vn_unlock(ddp->d_vnode);
drele(ddp);

if (error)
return error;
if ((error = namei(path, &dp)) != 0)
return error;

vp = dp->d_vnode;
flags &= ~O_TRUNC;
} else if (error) {
return error;
} else {
/* File already exits */
if (flags & O_EXCL) {
error = EEXIST;
goto out_drele;
}
}

vp = dp->d_vnode;
flags &= ~O_CREAT;
} else {
/* Open */
if (flags & O_NOFOLLOW) {
error = open_no_follow_chk(path);
if (error != 0)
return error;
}
error = namei(path, &dp);
if (error)
return error;

vp = dp->d_vnode;

if (flags & UK_FWRITE || flags & O_TRUNC) {
error = vn_access(vp, VWRITE);
if (error)
goto out_drele;

error = EISDIR;
if (vp->v_type == VDIR)
goto out_drele;
}
if (flags & O_DIRECTORY) {
if (vp->v_type != VDIR) {
error = ENOTDIR;
goto out_drele;
}
}
}

fp = calloc(sizeof(struct vfscore_file), 1);
if (!fp) {
error = ENOMEM;
goto out_drele;
}

fhold(fp);
fp->f_flags = flags;

/*
* Don't need to increase refcount here, we already hold a reference
* to dp from namei().
*/
fp->f_dentry = dp;

uk_mutex_init(&fp->f_lock);
UK_INIT_LIST_HEAD(&fp->f_ep);

vn_lock(vp);

if (flags & O_TRUNC) {
error = EINVAL;
if (!(flags & UK_FWRITE) || vp->v_type == VDIR)
goto out_fp_free_unlock;

error = VOP_TRUNCATE(vp, 0);
if (error)
goto out_fp_free_unlock;
}

error = VOP_OPEN(vp, fp);
if (error)
goto out_fp_free_unlock;

vn_unlock(vp);

*fpp = fp;
return 0;

out_fp_free_unlock:
free(fp);
vn_unlock(vp);
out_drele:
if (dp)
drele(dp);
return error;
}

3、vfscore/main.c
  • 在vfscore/main.c中定义了一些libc样式的,像c标准库函数那样的调用接口:

  • img

  • 比如提供了调用接口open():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#if UK_LIBC_SYSCALLS 
int open(const char *pathname, int flags, ...)
{
mode_t mode = 0;

if (flags & O_CREAT) {
va_list ap;

va_start(ap, flags);
mode = apply_umask(va_arg(ap, mode_t));
va_end(ap);
}

return uk_syscall_e_open((long int)pathname, flags, mode); //来自注册过的宏定义
}

#ifdef open64
#undef open64
#endif
  • 最后一行uk_syscall_e_open()来自宏的展开,像open系统调用是在syscall_shim层注册过的。在这个展开的函数中调用了在syscall.c中实现的真正的系统调用函数sys_open()。(猜测像这里出现的宏的展开是对系统调用的参数的验证)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
UK_LLSYSCALL_R_DEFINE(int, open, const char*, pathname, int, flags,
mode_t, mode)
{
trace_vfs_open(pathname, flags, mode);

struct task *t = main_task;
char path[PATH_MAX];
struct vfscore_file *fp;
int fd, error;
int acc;

acc = 0;
switch (flags & O_ACCMODE) { //判断文件的权限?打开文件的操作?读、写、读写
case O_RDONLY:
acc = VREAD;
break;
case O_WRONLY:
acc = VWRITE;
break;
case O_RDWR:
acc = VREAD | VWRITE;
break;
}

error = task_conv(t, pathname, acc, path);
if (error)
goto out_error;

error = sys_open(path, flags, mode, &fp); //这里调用sys_open系统调用
if (error)
goto out_error;

error = fdalloc(fp, &fd);
if (error)
goto out_fput;
fdrop(fp);
trace_vfs_open_ret(fd);
return fd;

out_fput:
fdrop(fp);
out_error:
trace_vfs_open_err(error);
return -error;
}

  • 总结open()的调用过程:应用程序接口open() —-> 宏定义展开系统调用函数uk_syscall_e_open() —->syscall.c中实现的最底层的系统调用sys_open()
4、support/scripts的解释
  • 这里提到一个声明:
1
2
3
4
5
# check for UK_(LL)SYSCALL_DEFINE(), raw implementation should be preferred
if ($line =~ /\bUK_(LL)?SYSCALL_DEFINE\s*\(/) {
WARN("NON_RAW_SYSCALL",
"Prefer using raw system call definitions: 'UK_SYSCALL_R_DEFINE', 'UK_LLSYSCALL_R_DEFINE'\n" . $herecurr);
}
  • 这里提到首选使用原始实现UK_SYSCALL_R_DEFINE

  • 在vfscore中可以看到一些只有UK_SYSCALL_R_DEFINE定义,而没有以sys_开头的系统调用函数,如pipe2,位于lib/vfscore/pipe.c文件中,但没有找到sys_pipe这种东西

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
UK_SYSCALL_R_DEFINE(int, pipe, int*, pipefd)
{
int ret = 0;
int r_fd, w_fd;
struct pipe_file *pipe_file;

/* Allocate pipe internal structure. */
pipe_file = pipe_file_alloc(PIPE_MAX_SIZE, 0);
if (!pipe_file) {
ret = -ENOMEM;
goto ERR_EXIT;
}

r_fd = pipe_fd_alloc(pipe_file, UK_FREAD);
if (r_fd < 0)
goto ERR_VFS_INSTALL;

w_fd = pipe_fd_alloc(pipe_file, UK_FWRITE);
if (w_fd < 0)
goto ERR_W_FD;

/* Fill pipefd fields. */
pipefd[0] = r_fd;
pipefd[1] = w_fd;

return ret;

ERR_W_FD:
vfscore_put_fd(r_fd);
ERR_VFS_INSTALL:
pipe_file_free(pipe_file);
ERR_EXIT:
UK_ASSERT(ret < 0);
return ret;
}

/* TODO find a more efficient way to implement pipe2() */
UK_SYSCALL_R_DEFINE(int, pipe2, int*, pipefd, int, flags)
{
int rc;

rc = pipe(pipefd);
if (rc)
return rc;

if (flags & O_CLOEXEC) {
fcntl(pipefd[0], F_SETFD, FD_CLOEXEC);
fcntl(pipefd[1], F_SETFD, FD_CLOEXEC);
}
if (flags & O_NONBLOCK) {
fcntl(pipefd[0], F_SETFL, O_NONBLOCK);
fcntl(pipefd[1], F_SETFL, O_NONBLOCK);
}
return 0;
}

3、总结

  • 在syscall_shim层定义了一些系统调用编号、名称,通过那些.awk脚本、sycall.h去注册生成对应的系统调用名称和定义几类宏,接着在lib层,如vfscore中去实现相关的系统调用的宏的展开以及具体的系统调用(某些可能是不需要注册而直接定义的系统调用),并在main.c中会封装一些接口给应用程序直接调用

  • 根据上面提到的support/scripts的声明解释,所以当有注册过的宏定义时,应优先调用的是以UK_SYSCALL_R_DEFINE、UK_LLSYSCALL_R_DEFINE展开的函数

  • 注意到前面提到过的,部分系统调用是不需要注册而直接在lib层定义实现的,所以可以看到存在一些直接调用sys_xxx()而没有调用uk_syscall_e_xxx等

4、与Linux系统调用比较

image-20221112202406338

Linux:

  • 调用syscall(),然后通过系统调用号在系统调用表里面找到对应系统调用函数的入口地址,然后int 0x80中断,进入内核态去做系统调用

Unikraft:

  • 直接在syscall_shim去注册涉及到的系统调用,一大堆宏定义展开后是每个具体的系统调用,没有系统调用表
  • 在其他涉及到的库中,直接实现了对应的系统调用函数,在程序中直接调用已经实现的系统调用函数,不需要通过系统调用表和中断陷入内核的方式