lwip 源码分析

lwip

lwip 简介

LwIP全名:Light weight IP,意思是轻量化的TCP/IP协议,是瑞典计算机科学院(SICS)的Adam Dunkels 开发的一个小型开源的TCP/IP协议栈。LwIP的设计初衷是:用少量的资源消耗实现一个较为完整的TCP/IP协议栈,其中“完整”主要指的是TCP协议的完整性,实现的重点是在保持TCP协议主要功能的基础上减少对RAM 的占用。此外LwIP既可以移植到操作系统上运行,也可以在无操作系统的情况下独立运行。

lwip 的三种编程接口

LwIP提供了三种编程接口,分别为RAW/Callback API、NETCONN API、SOCKET API。它们的易用性从左到右依次提高,而执行效率从左到右依次降低。下面对这三种 API 进行介绍:

  1. RAW/Callback API
    RAW/Callback API是LwIP的一大特色,在没有操作系统支持的裸机环境中,只能使用这种API进行开发,同时这种API也可以用在操作系统环境中。
  2. NETCONN API
    在操作系统环境中,可以使用NETCONN API或者Socket API进行网络应用程序的开发。NETCONN API是基于操作系统的IPC机制(即信号量和邮箱机制)实现的,它的设计将LwIP内核代码和网络应用程序分离成了独立的线程。如此一来,LwIP内核线程就只负责数据包的TCP/IP封装和拆封,而不用进行数据的应用层处理,大大提高了系统对网络数据包的处理效率。
  3. SOCKET API
    Socket,即套接字,它对网络连接进行了高级的抽象,使得用户可以像操作文件一样操作网络连接。它十分易用,许多网络开发人员最早接触的就是Socket编程,Socket已经成为了网络编程的标准。在不同的系统中,运行着不同的TCP/IP协议,但是只要它实现了Socket的接口,那么用Socket编写的网络应用程序就能在其中运行。可见用Socket编写的网络应用程序具有很好的可移植性。

unikraft 下的 lwip

通过上文的描述,我们可以发现 unikraft 内部使用了两种 API 来接入 lwip,uknetdev 部分对应 NETCONN API,posix_socket 部分对应 socket API。

在 LwIP 中,用户代码与协议栈内部之间是通过邮箱进行数据的交互的,邮箱本质上就是一个指向数据的指针,API 将指针传递给内核,内核通过这个指针访问数据,然后去处理,反之内核将数据传递给用户代码也是通过邮箱将一个指针进行传递。

在操作系统环境下,LwIP 会作为一个线程运行,线程的名字叫 tcpip_thread,在初始化 LwIP的时候,内核就会自动创建这个线程,并且在线程运行的时候阻塞在邮箱上,等待数据进行处理,这个邮箱数据的来源可能在底层网卡接收到的数据或者上层应用程序的数据,总之,tcpip_thread 线程在获取到邮箱中的数据时候,就会退出阻塞态,去处理数据,在处理完毕数据后又进入阻塞态中等待数据的到来,如此反复。

信号量与互斥量的实现为内核提供同步与互斥的机制,比如当用户想要发送一个数据的时候,就会调用上层 API 接口,API 接口就会去先发送一个数据给内核去处理,然后尝试获取一个信号量,因为此时是没有信号量的,所以就会阻塞用户线程;内核在知道用户想要发送数据后,就会调用对应的网卡去发送数据,当数据发送完成后就释放一个信号量告知用户线程发送完成,这样子用户线程就得以继续执行。

为了保证 lwip 能够使用以上功能,我们需要将系统实现的相关函数绑定至 sys_arch.c/.h。在 liblwip 中可以看到诸如 mutex.csemaphore.cmailbox.cthread.csockets.c 之类的文件,这些就是 unikraft 环境下对接口的实现。

liblwip 初始化

初始化流程如下图

初始化流程图

初始化流程图

整个初始化流程从 liblwip_init 开始,在这个函数中首先调用 tcpip_init 进行协议栈的初始化,随后遍历读取 netdev 列表,对每个 dev 调用 uk_netdev_addif 函数。这个函数完成对 netif 的内存分配后调用 netif_add 函数,将原本的 uk_netdev 结构体存入 *state 字段后调用指定的 init 函数 uknetdev_init。在这个函数中调用 dev → ops 中注册好的各种函数进行配置以及启动设备。我们分函数走完这个初始化流程

最开始调用的函数是是 liblwip_init

暂不清楚以何种方式调用:uk_lib_initcall(liblwip_init); 猜测可能是宏定义
确实是宏定义,让注册的函数在 bootstrap 阶段就开始执行。

1
Register a Unikraft init function that is called during bootstrap (uk_inittab)

在函数中首先调用了 lwip_init 以及 tcpip_init 进行初始化,随后遍历所有的 netdev 并逐个检查配置状态,最后通过 uknetdev_addif 在 lwip 侧添加对应的 netif

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
static int liblwip_init(void)
{
#if CONFIG_LWIP_UKNETDEV && CONFIG_LWIP_AUTOIFACE
unsigned int devid;
struct uk_netdev *dev;
struct netif *nf;
const char __maybe_unused *strcfg;
uint16_t __maybe_unused int16cfg;
int is_first_nf;
int ret;
#if LWIP_IPV4
ip4_addr_t ip4;
ip4_addr_t *ip4_arg;
ip4_addr_t mask4;
ip4_addr_t *mask4_arg;
ip4_addr_t gw4;
ip4_addr_t *gw4_arg;
#endif /* LWIP_IPV4 */
#endif /* CONFIG_LWIP_UKNETDEV && CONFIG_LWIP_AUTOIFACE */

uk_pr_info("Initializing lwip\n");
#if !CONFIG_LWIP_NOTHREADS
uk_semaphore_init(&_lwip_init_sem, 0);
#endif /* !CONFIG_LWIP_NOTHREADS */

#if CONFIG_LWIP_NOTHREADS
lwip_init();
#else /* CONFIG_LWIP_NOTHREADS */
tcpip_init(_lwip_init_done, NULL);

/* Wait until stack is booted */
uk_semaphore_down(&_lwip_init_sem);
#endif /* CONFIG_LWIP_NOTHREADS */

#if LWIP_NETIF_EXT_STATUS_CALLBACK && CONFIG_LWIP_NETIF_STATUS_PRINT
/* Add print callback for netif state changes */
netif_add_ext_callback(&netif_status_print, _netif_status_print);
#endif /* LWIP_NETIF_EXT_STATUS_CALLBACK && CONFIG_LWIP_NETIF_STATUS_PRINT */

#if CONFIG_LWIP_UKNETDEV && CONFIG_LWIP_AUTOIFACE
is_first_nf = 1;
// 遍历 netdev 列表并进行相应操作
for (devid = 0; devid < uk_netdev_count(); ++devid) {
dev = uk_netdev_get(devid);
if (!dev)
continue;
if (uk_netdev_state_get(dev) != UK_NETDEV_UNCONFIGURED
&& uk_netdev_state_get(dev) != UK_NETDEV_UNPROBED) {
uk_pr_info("Skipping to add network device %u to lwIP: Not in unconfigured state\n",
devid);
continue;
}
if (uk_netdev_state_get(dev) == UK_NETDEV_UNPROBED) {
ret = uk_netdev_probe(dev);
if (ret < 0) {
uk_pr_err("Failed to probe features of network device %u: %d; skipping device...\n",
devid, ret);
continue;
}
}

/* Here, the device has to be in unconfigured state */
UK_ASSERT(uk_netdev_state_get(dev) == UK_NETDEV_UNCONFIGURED);

uk_pr_info("Attach network device %u to lwIP...\n",
devid);

#if LWIP_IPV4
ip4_arg = NULL;
mask4_arg = NULL;
gw4_arg = NULL;

/* IP */
strcfg = uk_netdev_einfo_get(dev, UK_NETDEV_IPV4_ADDR_STR);
if (strcfg) {
if (ip4addr_aton(strcfg, &ip4) != 1) {
uk_pr_err("Error converting IP address: %s\n",
strcfg);
goto no_conf;
}
} else
goto no_conf;
ip4_arg = &ip4;

/* mask */
strcfg = uk_netdev_einfo_get(dev, UK_NETDEV_IPV4_MASK_STR);
if (strcfg) {
if (ip4addr_aton(strcfg, &mask4) != 1) {
uk_pr_err("Error converting net mask: %s\n",
strcfg);
goto no_conf;
}
} else
/* default mask */
ip4_addr_set_u32(&mask4, lwip_htonl(IP_CLASSC_NET));
mask4_arg = &mask4;

/* gateway */
strcfg = uk_netdev_einfo_get(dev, UK_NETDEV_IPV4_GW_STR);
if (strcfg) {
if (ip4addr_aton(strcfg, &gw4) != 1) {
uk_pr_err("Error converting gateway: %s\n",
strcfg);
goto no_conf;
}
gw4_arg = &gw4;
}
no_conf:
nf = uknetdev_addif(dev, ip4_arg, mask4_arg, gw4_arg);
#else /* LWIP_IPV4 */
/*
* TODO: Add support for IPv6 device configuration from
* netdev's econf interface
*/

nf = uknetdev_addif(dev);
#endif /* LWIP_IPV4 */
if (!nf) {
uk_pr_err("Failed to attach network device %u to lwIP\n",
devid);
continue;
}

/* Print hardware address */
if (nf->hwaddr_len == 6) {
uk_pr_info("%c%c%u: Hardware address: %02"PRIx8":%02"PRIx8":%02"PRIx8":%02"PRIx8":%02"PRIx8":%02"PRIx8"\n",
nf->name[0], nf->name[1], nf->num,
nf->hwaddr[0], nf->hwaddr[1], nf->hwaddr[2],
nf->hwaddr[3], nf->hwaddr[4], nf->hwaddr[5]);
}

#if LWIP_CHECKSUM_CTRL_PER_NETIF
uk_pr_info("%c%c%u: Check checksums:",
nf->name[0], nf->name[1], nf->num);
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_CHECK_IP) {
uk_pr_info(" IP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_CHECK_UDP) {
uk_pr_info(" UDP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_CHECK_TCP) {
uk_pr_info(" TCP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_CHECK_ICMP) {
uk_pr_info(" ICMP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_CHECK_ICMP6) {
uk_pr_info(" ICMP6");
}
uk_pr_info("\n");

uk_pr_info("%c%c%u: Generate checksums:",
nf->name[0], nf->name[1], nf->num);
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_GEN_IP) {
uk_pr_info(" IP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_GEN_UDP) {
uk_pr_info(" UDP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_GEN_TCP) {
uk_pr_info(" TCP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_GEN_ICMP) {
uk_pr_info(" ICMP");
}
IF__NETIF_CHECKSUM_ENABLED(nf, NETIF_CHECKSUM_GEN_ICMP6) {
uk_pr_info(" ICMP6");
}
uk_pr_info("\n");
#endif /* LWIP_CHECKSUM_CTRL_PER_NETIF */

/* Declare the first network device as default interface */
if (is_first_nf) {
uk_pr_info("%c%c%u: Set as default interface\n",
nf->name[0], nf->name[1], nf->num);
netif_set_default(nf);
is_first_nf = 0;
}
netif_set_up(nf);

#if LWIP_IPV4 && LWIP_DHCP
if (!ip4_arg) {
uk_pr_info("%c%c%u: DHCP configuration (background)...\n",
nf->name[0], nf->name[1], nf->num);
dhcp_start(nf);
}
#endif /* LWIP_IPV4 && LWIP_DHCP */
}
#endif /* CONFIG_LWIP_UKNETDEV && CONFIG_LWIP_AUTOIFACE */
return 0;
}

该函数实际上是 lwip 提供的 netif_add 函数的一层封装,在其中为 netif 分配了内存

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
struct netif *uknetdev_addif(struct uk_netdev *n
#if LWIP_IPV4
,
const ip4_addr_t *ipaddr,
const ip4_addr_t *netmask,
const ip4_addr_t *gw
#endif /* LWIP_IPV4 */
)
{
static const void *pethernet_input = NETIF_INPUT;
struct netif *nf;
struct netif *ret;

nf = mem_calloc(1, sizeof(*nf));
if (!nf)
return NULL;

ret = netif_add(nf,
#if LWIP_IPV4
ipaddr, netmask, gw,
#endif /* LWIP_IPV4 */
n, uknetdev_init, UK_READ_ONCE(pethernet_input));
UK_ASSERT(nf->input);

if (!ret) {
mem_free(nf);
return NULL;
}

return ret;
}

这个函数的功能是初始化刚刚分配好的 netif 并将其插入到 netif_list 的链表头部,令 state 字段指向对应的 uk_netdev,调用用户提供的 init 函数对 netif 进行操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
struct netif *
netif_add(struct netif *netif,
#if LWIP_IPV4
const ip4_addr_t *ipaddr, const ip4_addr_t *netmask, const ip4_addr_t *gw,
#endif /* LWIP_IPV4 */
void *state, netif_init_fn init, netif_input_fn input)
{
#if LWIP_IPV6
s8_t i;
#endif

LWIP_ASSERT_CORE_LOCKED();

#if LWIP_SINGLE_NETIF
if (netif_default != NULL) {
LWIP_ASSERT("single netif already set", 0);
return NULL;
}
#endif

LWIP_ERROR("netif_add: invalid netif", netif != NULL, return NULL);
LWIP_ERROR("netif_add: No init function given", init != NULL, return NULL);

#if LWIP_IPV4
if (ipaddr == NULL) {
ipaddr = ip_2_ip4(IP4_ADDR_ANY);
}
if (netmask == NULL) {
netmask = ip_2_ip4(IP4_ADDR_ANY);
}
if (gw == NULL) {
gw = ip_2_ip4(IP4_ADDR_ANY);
}

/* reset new interface configuration state */
ip_addr_set_zero_ip4(&netif->ip_addr);
ip_addr_set_zero_ip4(&netif->netmask);
ip_addr_set_zero_ip4(&netif->gw);
netif->output = netif_null_output_ip4;
#endif /* LWIP_IPV4 */
#if LWIP_IPV6
for (i = 0; i < LWIP_IPV6_NUM_ADDRESSES; i++) {
ip_addr_set_zero_ip6(&netif->ip6_addr[i]);
netif->ip6_addr_state[i] = IP6_ADDR_INVALID;
#if LWIP_IPV6_ADDRESS_LIFETIMES
netif->ip6_addr_valid_life[i] = IP6_ADDR_LIFE_STATIC;
netif->ip6_addr_pref_life[i] = IP6_ADDR_LIFE_STATIC;
#endif /* LWIP_IPV6_ADDRESS_LIFETIMES */
}
netif->output_ip6 = netif_null_output_ip6;
#endif /* LWIP_IPV6 */
NETIF_SET_CHECKSUM_CTRL(netif, NETIF_CHECKSUM_ENABLE_ALL);
netif->mtu = 0;
netif->flags = 0;
#ifdef netif_get_client_data
memset(netif->client_data, 0, sizeof(netif->client_data));
#endif /* LWIP_NUM_NETIF_CLIENT_DATA */
#if LWIP_IPV6
#if LWIP_IPV6_AUTOCONFIG
/* IPv6 address autoconfiguration not enabled by default */
netif->ip6_autoconfig_enabled = 0;
#endif /* LWIP_IPV6_AUTOCONFIG */
nd6_restart_netif(netif);
#endif /* LWIP_IPV6 */
#if LWIP_NETIF_STATUS_CALLBACK
netif->status_callback = NULL;
#endif /* LWIP_NETIF_STATUS_CALLBACK */
#if LWIP_NETIF_LINK_CALLBACK
netif->link_callback = NULL;
#endif /* LWIP_NETIF_LINK_CALLBACK */
#if LWIP_IGMP
netif->igmp_mac_filter = NULL;
#endif /* LWIP_IGMP */
#if LWIP_IPV6 && LWIP_IPV6_MLD
netif->mld_mac_filter = NULL;
#endif /* LWIP_IPV6 && LWIP_IPV6_MLD */

/* remember netif specific state information data */
netif->state = state;
netif->num = netif_num;
netif->input = input;

NETIF_RESET_HINTS(netif);
#if ENABLE_LOOPBACK
netif->loop_first = NULL;
netif->loop_last = NULL;
#if LWIP_LOOPBACK_MAX_PBUFS
netif->loop_cnt_current = 0;
#endif /* LWIP_LOOPBACK_MAX_PBUFS */
#if LWIP_NETIF_LOOPBACK_MULTITHREADING
netif->reschedule_poll = 0;
#endif /* LWIP_NETIF_LOOPBACK_MULTITHREADING */
#endif /* ENABLE_LOOPBACK */

#if LWIP_IPV4
netif_set_addr(netif, ipaddr, netmask, gw);
#endif /* LWIP_IPV4 */

/* call user specified initialization function for netif */
if (init(netif) != ERR_OK) {
return NULL;
}
#if LWIP_IPV6 && LWIP_ND6_ALLOW_RA_UPDATES
/* Initialize the MTU for IPv6 to the one set by the netif driver.
This can be updated later by RA. */
netif->mtu6 = netif->mtu;
#endif /* LWIP_IPV6 && LWIP_ND6_ALLOW_RA_UPDATES */

#if !LWIP_SINGLE_NETIF
/* Assign a unique netif number in the range [0..254], so that (num+1) can
serve as an interface index that fits in a u8_t.
We assume that the new netif has not yet been added to the list here.
This algorithm is O(n^2), but that should be OK for lwIP.
*/
{
struct netif *netif2;
int num_netifs;
do {
if (netif->num == 255) {
netif->num = 0;
}
num_netifs = 0;
for (netif2 = netif_list; netif2 != NULL; netif2 = netif2->next) {
LWIP_ASSERT("netif already added", netif2 != netif);
num_netifs++;
LWIP_ASSERT("too many netifs, max. supported number is 255", num_netifs <= 255);
if (netif2->num == netif->num) {
netif->num++;
break;
}
}
} while (netif2 != NULL);
}
if (netif->num == 254) {
netif_num = 0;
} else {
netif_num = (u8_t)(netif->num + 1);
}

/* add this netif to the list */
// 链表插入部分
netif->next = netif_list;
netif_list = netif;
#endif /* "LWIP_SINGLE_NETIF */
mib2_netif_added(netif);

#if LWIP_IGMP
/* start IGMP processing */
if (netif->flags & NETIF_FLAG_IGMP) {
igmp_start(netif);
}
#endif /* LWIP_IGMP */

LWIP_DEBUGF(NETIF_DEBUG, ("netif: added interface %c%c IP",
netif->name[0], netif->name[1]));
#if LWIP_IPV4
LWIP_DEBUGF(NETIF_DEBUG, (" addr "));
ip4_addr_debug_print(NETIF_DEBUG, ipaddr);
LWIP_DEBUGF(NETIF_DEBUG, (" netmask "));
ip4_addr_debug_print(NETIF_DEBUG, netmask);
LWIP_DEBUGF(NETIF_DEBUG, (" gw "));
ip4_addr_debug_print(NETIF_DEBUG, gw);
#endif /* LWIP_IPV4 */
LWIP_DEBUGF(NETIF_DEBUG, ("\n"));

netif_invoke_ext_callback(netif, LWIP_NSC_NETIF_ADDED, NULL);

return netif;
}

在该函数中将 uknetdev_input/output 函数注册到 netif 中,并且完成了 netifuk_netdev 两种结构体的映射工作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
err_t uknetdev_init(struct netif *nf)
{
struct uk_alloc *a = NULL;
struct uk_netdev *dev;
struct uk_netdev_conf dev_conf;
struct uk_netdev_rxqueue_conf rxq_conf;
struct uk_netdev_txqueue_conf txq_conf = {0};
struct lwip_netdev_data *lwip_data;
const struct uk_hwaddr *hwaddr;
unsigned int i;
int ret;
// netif 中的 state 字段存储的是相对应的 uk_netdev
UK_ASSERT(nf);
dev = netif_to_uknetdev(nf);
UK_ASSERT(dev);

lwip_data = (struct lwip_netdev_data *)dev->scratch_pad;

LWIP_ASSERT("uknetdev needs an input callback (netif_input or tcpip_input)",
nf->input != NULL);

/* Netdev has to be in unconfigured state */
if (uk_netdev_state_get(dev) != UK_NETDEV_UNCONFIGURED) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: Netdev %u not in uncofigured state\n",
__func__, uk_netdev_id_get(dev)));
return ERR_ISCONN;
}

/* Interface name, the interface number (nf->num) is assigned by lwip */
nf->name[0] = UKNETDEV_NETIF_NAME0;
nf->name[1] = UKNETDEV_NETIF_NAME1;

/*
* Bring up uknetdev
* Note: We use the default allocator for setting up the rx/tx queues
*/
/* TODO: In case the device initialization should happen manually before
* attaching to lwip, we require another init function that skips
* this initialization steps.
*/
a = uk_alloc_get_default();
if (!a)
return ERR_MEM;

/* Get device information */
uk_netdev_info_get(dev, &lwip_data->dev_info);
if (!lwip_data->dev_info.max_rx_queues
|| !lwip_data->dev_info.max_tx_queues)
return ERR_IF;
#if CONFIG_LWIP_UKNETDEV_POLLONLY
/* Unset receive interrupt support: We force polling mode */
lwip_data->dev_info.features &= ~UK_FEATURE_RXQ_INTR_AVAILABLE;
#endif /* CONFIG_LWIP_UKNETDEV_POLLONLY */
lwip_data->pkt_a = a;

LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Headroom rx:%"PRIu16", tx:%"PRIu16"; I/O align: 0x%"PRIx16"\n",
__func__, nf->name[0], nf->name[1], nf->num,
lwip_data->dev_info.nb_encap_rx,
lwip_data->dev_info.nb_encap_tx,
lwip_data->dev_info.ioalign));

/*
* Device configuration,
* we want to use just one queue for each direction
*/
dev_conf.nb_rx_queues = 1;
dev_conf.nb_tx_queues = 1;
ret = uk_netdev_configure(dev, &dev_conf);
if (ret < 0) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Failed to configure netdev %u\n",
__func__, nf->name[0], nf->name[1], nf->num,
uk_netdev_id_get(dev)));
return ERR_IF;
}

/*
* Receive queue,
* use driver default descriptors
*/
rxq_conf.a = a;
rxq_conf.alloc_rxpkts = netif_alloc_rxpkts;
rxq_conf.alloc_rxpkts_argp = lwip_data;
#ifdef CONFIG_LWIP_NOTHREADS
/*
* In mainloop mode, we will not use interrupts.
*/
rxq_conf.callback = NULL;
rxq_conf.callback_cookie = NULL;
#else /* CONFIG_LWIP_NOTHREADS */
rxq_conf.callback = uknetdev_input;
rxq_conf.callback_cookie = nf;
#ifdef CONFIG_LIBUKNETDEV_DISPATCHERTHREADS
rxq_conf.s = uk_sched_get_default();
if (!rxq_conf.s)
return ERR_IF;

#endif /* CONFIG_LIBUKNETDEV_DISPATCHERTHREADS */
#endif /* CONFIG_LWIP_NOTHREADS */
ret = uk_netdev_rxq_configure(dev, 0, 0, &rxq_conf);
if (ret < 0) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Failed to configure rx queue of netdev %u\n",
__func__, nf->name[0], nf->name[1], nf->num,
uk_netdev_id_get(dev)));
return ERR_IF;
}

/*
* Transmit queue,
* use driver default descriptors
*/
txq_conf.a = a;
ret = uk_netdev_txq_configure(dev, 0, 0, &txq_conf);
if (ret < 0) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Failed to configure tx queue of netdev %u\n",
__func__, nf->name[0], nf->name[1], nf->num,
uk_netdev_id_get(dev)));
return ERR_IF;
}

/* Start interface */
ret = uk_netdev_start(dev);
if (ret < 0) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Failed to start netdev %u\n",
__func__, nf->name[0], nf->name[1], nf->num,
uk_netdev_id_get(dev)));
return ERR_IF;
}

/* Driver callbacks */
#if LWIP_IPV4
nf->output = etharp_output;
#endif /* LWIP_IPV4 */
#if LWIP_IPV6
nf->output_ip6 = ethip6_output;
#endif /* LWIP_IPV6 */
nf->linkoutput = uknetdev_output;

/* TODO: Set remove callback */

/* Device capabilities */
netif_set_flags(nf, (NETIF_FLAG_BROADCAST
| NETIF_FLAG_ETHARP
| NETIF_FLAG_LINK_UP));
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: flags: %"PRIx8"\n",
__func__, nf->name[0], nf->name[1], nf->num, nf->flags));

#if LWIP_CHECKSUM_CTRL_PER_NETIF
NETIF_SET_CHECKSUM_CTRL(nf, (NETIF_CHECKSUM_GEN_IP
| NETIF_CHECKSUM_GEN_UDP
| NETIF_CHECKSUM_GEN_TCP
| NETIF_CHECKSUM_GEN_ICMP
| NETIF_CHECKSUM_GEN_ICMP6));
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: chksum_flags: %"PRIx16"\n",
__func__, nf->name[0], nf->name[1], nf->num,
nf->chksum_flags));
#endif /* LWIP_CHECKSUM_CTRL_PER_NETIF */

/* MAC address */
UK_ASSERT(NETIF_MAX_HWADDR_LEN >= UK_NETDEV_HWADDR_LEN);
hwaddr = uk_netdev_hwaddr_get(dev);
UK_ASSERT(hwaddr);
nf->hwaddr_len = UK_NETDEV_HWADDR_LEN;
for (i = 0; i < UK_NETDEV_HWADDR_LEN; ++i)
nf->hwaddr[i] = hwaddr->addr_bytes[i];
#if UK_NETDEV_HWADDR_LEN == 6
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Hardware address: %02"PRIx8":%02"PRIx8":%02"PRIx8":%02"PRIx8":%02"PRIx8":%02"PRIx8"\n",
__func__, nf->name[0], nf->name[1], nf->num,
nf->hwaddr[0], nf->hwaddr[1], nf->hwaddr[2],
nf->hwaddr[3], nf->hwaddr[4], nf->hwaddr[5]));
#else /* UK_NETDEV_HWADDR_LEN */
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Hardware address set\n",
__func__, nf->name[0], nf->name[1], nf->num));
#endif /* UK_NETDEV_HWADDR_LEN */

/* Maximum transfer unit */
nf->mtu = uk_netdev_mtu_get(dev);
UK_ASSERT(nf->mtu);
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: MTU: %u\n",
__func__, nf->name[0], nf->name[1], nf->num,
nf->mtu));

#ifndef CONFIG_LWIP_NOTHREADS
/*
* We will use the status update callback to enable and disabled
* receive queue interrupts
*/
netif_set_status_callback(nf, uknetdev_updown);
#endif /* !CONFIG_LWIP_NOTHREADS */

/*
* Initialize the snmp variables and counters inside the struct netif.
* The last argument is the link speed, in units of bits per second.
*/
NETIF_INIT_SNMP(nf, snmp_ifType_ethernet_csmacd, UKNETDEV_BPS);
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Link speed: %"PRIu32" bps\n",
__func__, nf->name[0], nf->name[1], nf->num,
UKNETDEV_BPS));

return ERR_OK;
}

lwip 与 unikraft 结构映射

结构映射

上图所示的对应结构是连接 unikraft 和 lwip 的关键部分。下面对 lwip 下的两个结构进行介绍。

此处介绍结构体 netif。netif 是用于描述网络接口/网卡的一种结构体,同样存储在一张链表中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
// lwip 的网络接口
struct netif {
#if !LWIP_SINGLE_NETIF
struct netif *next;
#endif
#if LWIP_IPV4
// IPv4 下使用的地址掩码及网关
ip_addr_t ip_addr;
ip_addr_t netmask;
ip_addr_t gw;
#endif /* LWIP_IPV4 */
// 传送上行数据 网卡->协议栈
netif_input_fn input;
#if LWIP_IPV4
// 传送下行数据 协议栈->网卡
netif_output_fn output;
#endif /* LWIP_IPV4 */
netif_linkoutput_fn linkoutput;
#if LWIP_NETIF_STATUS_CALLBACK
// 当网卡状态发生变化时调用
netif_status_callback_fn status_callback;
#endif /* LWIP_NETIF_STATUS_CALLBACK */
#if LWIP_NETIF_LINK_CALLBACK
// 连接发生变化时调用
netif_status_callback_fn link_callback;
#endif /* LWIP_NETIF_LINK_CALLBACK */
#if LWIP_NETIF_REMOVE_CALLBACK
/** This function is called when the netif has been removed */
// 网卡移除时调用
netif_status_callback_fn remove_callback;
#endif /* LWIP_NETIF_REMOVE_CALLBACK */
// 在网卡驱动中配置该参数,用于存储网卡的状态,在 unikraft 中存储的就是 uknetdev
void *state;
#ifdef netif_get_client_data
void* client_data[LWIP_NETIF_CLIENT_DATA_INDEX_MAX + LWIP_NUM_NETIF_CLIENT_DATA];
#endif
#if LWIP_NETIF_HOSTNAME
// 该网卡的主机名
const char* hostname;
#endif /* LWIP_NETIF_HOSTNAME */
// **对比原 lwip 缺失了 dhcp 以及 autoip**
#if LWIP_CHECKSUM_CTRL_PER_NETIF
u16_t chksum_flags;
#endif /* LWIP_CHECKSUM_CTRL_PER_NETIF*/
/** maximum transfer unit (in bytes) */
// 最大传输单元
u16_t mtu;
/** link level hardware address of this interface */
// MAC 地址
u8_t hwaddr[NETIF_MAX_HWADDR_LEN];
/** number of bytes used in hwaddr */
u8_t hwaddr_len;
/** flags (@see @ref netif_flags) */
// 标志位
u8_t flags;
/** descriptive abbreviation */
// 网卡名称缩写
char name[2
// 编号
u8_t num;
// 此处缺失 SNMP 相关
#if MIB2_STATS
/** link type (from "snmp_ifType" enum from snmp_mib2.h) */
u8_t link_type;
/** (estimate) link speed */
u32_t link_speed;
/** timestamp at last change made (up/down) */
u32_t ts;
/** counters */
struct stats_mib2_netif_ctrs mib2_counters;
#endif /* MIB2_STATS */
#if LWIP_IPV4 && LWIP_IGMP
/** This function could be called to add or delete an entry in the multicast
filter table of the ethernet MAC.*/
netif_igmp_mac_filter_fn igmp_mac_filter;
#endif /* LWIP_IPV4 && LWIP_IGMP */
#if LWIP_NETIF_USE_HINTS
struct netif_hint *hints;
#endif /* LWIP_NETIF_USE_HINTS */
#if ENABLE_LOOPBACK
/* List of packets to be queued for ourselves. */
// 支持本地回传所需的参数
struct pbuf *loop_first;
struct pbuf *loop_last;
#if LWIP_LOOPBACK_MAX_PBUFS
u16_t loop_cnt_current;
#endif /* LWIP_LOOPBACK_MAX_PBUFS */
#if LWIP_NETIF_LOOPBACK_MULTITHREADING
/* Used if the original scheduling failed. */
u8_t reschedule_poll;
#endif /* LWIP_NETIF_LOOPBACK_MULTITHREADING */
#endif /* ENABLE_LOOPBACK */
};

下面是 netif 中的 flags,用来标志 netif 的类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#define NETIF_FLAG_UP           0x01U
/** If set, the netif has broadcast capability.
* Set by the netif driver in its init function. */
#define NETIF_FLAG_BROADCAST 0x02U
/** If set, the interface has an active link
* (set by the network interface driver).
* Either set by the netif driver in its init function (if the link
* is up at that time) or at a later point once the link comes up
* (if link detection is supported by the hardware). */
#define NETIF_FLAG_LINK_UP 0x04U
/** If set, the netif is an ethernet device using ARP.
* Set by the netif driver in its init function.
* Used to check input packet types and use of DHCP. */
#define NETIF_FLAG_ETHARP 0x08U
/** If set, the netif is an ethernet device. It might not use
* ARP or TCP/IP if it is used for PPPoE only.
*/
#define NETIF_FLAG_ETHERNET 0x10U
/** If set, the netif has IGMP capability.
* Set by the netif driver in its init function. */
#define NETIF_FLAG_IGMP 0x20U
/** If set, the netif has MLD6 capability.
* Set by the netif driver in its init function. */
#define NETIF_FLAG_MLD6 0x40U

netbuf.h 文件中我们可以看到 lwip 库使用的 pbuf,其目的是在原本的 netbuf 基础上嵌入特定协议栈的数据。

1
2
3
4
struct _netbuf_pbuf {
struct pbuf_custom pbuf_custom;
struct uk_netbuf *netbuf;
};

pbuf.h/pbuf_custom 代码如下:

1
2
3
4
struct pbuf_custom {
struct pbuf pbuf;
pbuf_free_custom_fn custom_free_function;
};

pbuf 结构体代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct pbuf {
// pbuf 链为单链表
struct pbuf *next;
// 指向实际负载数据
void *payload;
// 截至该 pbuf 的总长度
u16_t tot_len;
u16_t len;
// pbuf 内部类型
u8_t type_internal;
// 各种各样的标志
u8_t flags;
// 引用计数,来自应用、栈本身或 pbuf 链中的 pbuf->next
LWIP_PBUF_REF_T ref;
// netif 的索引
u8_t if_idx;
LWIP_PBUF_CUSTOM_DATA
};

type_internal 有如下几个选项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// 声明 pbuf 结构体与 payload 部分的内存直接连接
#define PBUF_TYPE_FLAG_STRUCT_DATA_CONTIGUOUS 0x80
// 声明在这个 pbuf 中的数据是可以修改的
#define PBUF_TYPE_FLAG_DATA_VOLATILE 0x40
// 内存分配来源掩码
#define PBUF_TYPE_ALLOC_SRC_MASK 0x0F
// 声明该 pbuf 用于 RX (底下这俩位置怪怪的)
#define PBUF_ALLOC_FLAG_RX 0x0100
#define PBUF_ALLOC_FLAG_DATA_CONTIGUOUS 0x0200
// 具体的内存分配来源
#define PBUF_TYPE_ALLOC_SRC_MASK_STD_HEAP 0x00
#define PBUF_TYPE_ALLOC_SRC_MASK_STD_MEMP_PBUF 0x01
#define PBUF_TYPE_ALLOC_SRC_MASK_STD_MEMP_PBUF_POOL 0x02
// 应用可自行设置的分配位置代码 从 3 ~ 15
#define PBUF_TYPE_ALLOC_SRC_MASK_APP_MIN 0x03
#define PBUF_TYPE_ALLOC_SRC_MASK_APP_MAX PBUF_TYPE_ALLOC_SRC_MASK

copy 与 duplicate 的区别?

还定义了一个枚举结构,通过对以上标志位的组合来代表不同类型的 pbuf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
typedef enum {
/** pbuf data is stored in RAM, used for TX mostly, struct pbuf and its payload
are allocated in one piece of contiguous memory (so the first payload byte
can be calculated from struct pbuf).
pbuf_alloc() allocates PBUF_RAM pbufs as unchained pbufs (although that might
change in future versions).
This should be used for all OUTGOING packets (TX).*/
PBUF_RAM = (PBUF_ALLOC_FLAG_DATA_CONTIGUOUS | PBUF_TYPE_FLAG_STRUCT_DATA_CONTIGUOUS | PBUF_TYPE_ALLOC_SRC_MASK_STD_HEAP),
/** pbuf data is stored in ROM, i.e. struct pbuf and its payload are located in
totally different memory areas. Since it points to ROM, payload does not
have to be copied when queued for transmission. */
PBUF_ROM = PBUF_TYPE_ALLOC_SRC_MASK_STD_MEMP_PBUF,
/** pbuf comes from the pbuf pool. Much like PBUF_ROM but payload might change
so it has to be duplicated when queued before transmitting, depending on
who has a 'ref' to it. */
PBUF_REF = (PBUF_TYPE_FLAG_DATA_VOLATILE | PBUF_TYPE_ALLOC_SRC_MASK_STD_MEMP_PBUF),
/** pbuf payload refers to RAM. This one comes from a pool and should be used
for RX. Payload can be chained (scatter-gather RX) but like PBUF_RAM, struct
pbuf and its payload are allocated in one piece of contiguous memory (so
the first payload byte can be calculated from struct pbuf).
Don't use this for TX, if the pool becomes empty e.g. because of TCP queuing,
you are unable to receive TCP acks! */
PBUF_POOL = (PBUF_ALLOC_FLAG_RX | PBUF_TYPE_FLAG_STRUCT_DATA_CONTIGUOUS | PBUF_TYPE_ALLOC_SRC_MASK_STD_MEMP_PBUF_POOL)
} pbuf_type;

pbuf 中的 flags 各位定义如下

1
2
3
4
5
6
7
8
9
10
11
12
// 声明该数据包应该立即传递给应用
#define PBUF_FLAG_PUSH 0x01U
// 声明这是一个定制的 pbuf,在释放空间时应该调用 pbuf_custom->custom_free_function()
#define PBUF_FLAG_IS_CUSTOM 0x02U
// 声明该 pbuf 是 UDP 的多播
#define PBUF_FLAG_MCASTLOOP 0x04U
// 声明该 pbuf 是链接级的广播包
#define PBUF_FLAG_LLBCAST 0x08U
// 声明该 pbuf 是链接级的多播包
#define PBUF_FLAG_LLMCAST 0x10U
//声明该 pbuf 包含 TCP 协议中的 FIN 标志位
#define PBUF_FLAG_TCP_FIN 0x20U

最后 pbuf 与 netbuf 形成了以下结构,这种结构方便再在两者之间快速切换

_netbuf_pbuf

结构示意图

其中的 _netbuf_pbuf 就是 netbuf.h 中提到的 private meta data

该结构的构建部分位于 lwip_alloc_netbuf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
struct uk_netbuf *lwip_alloc_netbuf(struct uk_alloc *a, size_t alloc_size,
size_t alloc_align, uint16_t headroom)
{
struct uk_netbuf *b;
struct _netbuf_pbuf *np;

b = uk_netbuf_alloc_buf(a, alloc_size, alloc_align,
headroom, sizeof(struct _netbuf_pbuf), NULL);
if (unlikely(!b)) {
LWIP_DEBUGF(PBUF_DEBUG,
("Failed to allocate netbuf with encapsulated pbuf: requested headroom: %"__PRIu16", size: %"__PRIsz", alignement: %"__PRIsz"\n",
headroom, alloc_size, alloc_align));
goto err_out;
}

/* Fill-out meta data */
np = (struct _netbuf_pbuf *) uk_netbuf_get_priv(b);
memset(np, 0, sizeof(struct _netbuf_pbuf));
np->pbuf_custom.pbuf.type_internal = PBUF_ROM;
np->pbuf_custom.pbuf.flags = PBUF_FLAG_IS_CUSTOM;
np->pbuf_custom.pbuf.payload = b->data;
np->pbuf_custom.pbuf.ref = 1;
np->pbuf_custom.custom_free_function = _netbuf_free;
np->netbuf = b;

/*
* Set length of netbuf to available space so that it
* can be used as receive buffer
*/
b->len = b->buflen - headroom;

LWIP_DEBUGF(PBUF_DEBUG,
("Allocated netbuf with encapsulated pbuf %p (buflen: %"__PRIsz", headroom: %"__PRIsz")\n",
b, b->buflen, uk_netbuf_headroom(b)));
return b;

err_out:
return NULL;
}

input 执行流程

input 执行流程

在 unikraft 中一共有两种调取 input 相关函数的方式:中断和轮询。

从图中可以看出,uknetdev_input 位于核心的地方,这个函数便是连接起 lwip 与 unikraft 的桥梁。它响应来自 unikraft 的调用后通过调用 uk_netdev_rx_oneuk_netdev 获取 netbuf,将其转化为 lwip 所使用的 pbuf 后调用对应的 netif 提供的 input 函数将数据包传递给上层协议。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
static void uknetdev_input(struct uk_netdev *dev,
uint16_t queue_id __unused, void *argp)
{
struct netif *nf = (struct netif *) argp;
struct uk_netbuf *nb;
struct pbuf *p;
err_t err;
int ret;

UK_ASSERT(dev);
UK_ASSERT(nf);
UK_ASSERT(nf->input);

LWIP_DEBUGF(NETIF_DEBUG, ("%s: %c%c%u: Poll receive queue...\n",
__func__, nf->name[0], nf->name[1], nf->num));
do {
// 调用函数接收数据包并判断设备的状态
ret = uk_netdev_rx_one(dev, 0, &nb);
if (unlikely(ret < 0)) {
uk_pr_crit("%c%c%u: Receive error %d. Stopping interface...\n",
nf->name[0], nf->name[1], nf->num, ret);
// 收包出现错误时关闭接口并跳出循环
netif_set_down(nf);
break;
}
// 如果没有数据包则停止
if (uk_netdev_status_notready(ret)) {
/* No (more) packets received */
break;
}
// 将接收到的 netbuf 转化为 pbuf 供 lwip 使用
p = lwip_netbuf_to_pbuf(nb);
p->payload = nb->data;
p->tot_len = p->len = nb->len;
err = nf->input(p, nf);
if (unlikely(err != ERR_OK)) {
// 错误处理
#if CONFIG_LWIP_THREADS && CONFIG_LIBUKNETDEV_DISPATCHERTHREADS
/* At this point it is possible that lwIP's input queue
* is full or we run out of memory. In this case, we
* return to the scheduler and hope that lwIP's main
* thread is able to process some packets.
* Afterwards, we try it once again.
*/
if (err == ERR_MEM) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: lwIP's input queue full: yielding and trying once again...\n",
__func__, nf->name[0], nf->name[1],
nf->num));
// 内存满,令当前线程挂起
uk_sched_yield();
err = nf->input(p, nf);
if (likely(err == ERR_OK))
continue;
}
#endif

/*
* Drop the packet that we could not send to the stack
*/
uk_pr_err("%c%c%u: Failed to forward packet to lwIP: %d\n",
nf->name[0], nf->name[1], nf->num, err);
uk_netbuf_free_single(nb);
}
} while (uk_netdev_status_more(ret));
}

该函数通过判断 netif 的状态来修改中断的状态,并且调用 poll 功能的相关函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
static void uknetdev_updown(struct netif *nf)
{
struct uk_netdev *dev;
int ret;
struct lwip_netdev_data *lwip_data;

UK_ASSERT(nf);
dev = netif_to_uknetdev(nf);
UK_ASSERT(dev);
lwip_data = (struct lwip_netdev_data *)dev->scratch_pad;
// 通过判断 netif 的状态来修改中断的状态
if (nf->flags & NETIF_FLAG_UP) {
if (uk_netdev_rxintr_supported(lwip_data->dev_info.features)) {
ret = uk_netdev_rxq_intr_enable(dev, 0);
if (ret < 0) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Failed to enable rx interrupt mode on netdev %u\n",
__func__, nf->name[0],
nf->name[1],
nf->num,
uk_netdev_id_get(dev)));
} else {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Enabled rx interrupt mode on netdev %u\n",
__func__, nf->name[0],
nf->name[1],
nf->num,
uk_netdev_id_get(dev)));
}

if (ret == 1) {
// 中断开启后调用 poll 函数进行监控
uknetdev_poll(nf);
}
} else {
// 若不支持中断则创建线程来执行 poll 操作
#ifdef CONFIG_HAVE_SCHED
LWIP_DEBUGF(NETIF_DEBUG,
("%s: Poll receive enabled\n",
__func__));
/* Create a thread */
lwip_data->sched = uk_sched_get_default();
UK_ASSERT(lwip_data->sched);
lwip_data->poll_thread =
uk_sched_thread_create(lwip_data->sched, NULL,
NULL, _poll_netif, nf);
#else /* CONFIG_HAVE_SCHED */
uk_pr_warn("The netdevice does not support interrupt. Ensure the netdevice is polled to receive packets");
#endif /* CONFIG_HAVE_SCHED */
}
} else {
// 停止网络接口
if (uk_netdev_rxintr_supported(lwip_data->dev_info.features)) {
uk_netdev_rxq_intr_disable(dev, 0);
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Disabled rx interrupts on netdev %u\n",
__func__, nf->name[0], nf->name[1],
nf->num, uk_netdev_id_get(dev)));
}

}
}

若定义了 CONFIG_LWIP_NOTHREADS 通过 uknetdev_poll_all 来执行 poll 操作

1
2
3
4
5
6
7
8
9
void uknetdev_poll_all(void)
{
struct netif *nf;
NETIF_FOREACH(nf) {
if (nf->name[0] == UKNETDEV_NETIF_NAME0
&& nf->name[1] == UKNETDEV_NETIF_NAME1)
uknetdev_poll(nf);
}
}

若未定义则通过 _poll_netif 来执行

1
2
3
4
5
6
7
8
static void _poll_netif(void *arg)
{
struct netif *nf = (struct netif *) arg;
while (1) {
uknetdev_poll(nf);
uk_sched_yield();
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
void uknetdev_poll(struct netif *nf)
{
struct uk_netdev *dev;

UK_ASSERT(nf);
UK_ASSERT(nf->name[0] == UKNETDEV_NETIF_NAME0);
UK_ASSERT(nf->name[1] == UKNETDEV_NETIF_NAME1);
// 将 netif 转化为 uk_netdev
dev = netif_to_uknetdev(nf);
UK_ASSERT(dev);
// 进行收包的操作
uknetdev_input(dev, 0, nf);
}

output 执行流程

output 执行流程

output 部分的逻辑由于不需要轮询和中断,相对而言会更加简单一些

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
static err_t uknetdev_output(struct netif *nf, struct pbuf *p)
{
struct uk_netdev *dev;
struct lwip_netdev_data *lwip_data;
struct pbuf *q;
struct uk_netbuf *nb;
char *wpos;
int ret;

UK_ASSERT(nf);
// 将 netif 转化为 unikraft 中的 uk_netdev
dev = netif_to_uknetdev(nf);
UK_ASSERT(dev);
lwip_data = (struct lwip_netdev_data *) dev->scratch_pad;
UK_ASSERT(lwip_data);
// 分配内存
nb = uk_netbuf_alloc_buf(lwip_data->pkt_a,
UKNETDEV_BUFLEN,
lwip_data->dev_info.ioalign,
lwip_data->dev_info.nb_encap_tx,
0, NULL);
if (!nb)
return ERR_MEM;

if (unlikely(p->tot_len > uk_netbuf_tailroom(nb))) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Cannot send %"PRIu16" bytes, too big (> %"__PRIsz")\n",
__func__, nf->name[0], nf->name[1], nf->num,
p->tot_len, uk_netbuf_tailroom(nb)));
uk_netbuf_free_single(nb);
return ERR_MEM;
}

/*
* Copy pbuf to netbuf
* NOTE: Unfortunately, lwIP seems not to support zero-copy transmit,
* yet. As long as we do not have this, we have to copy.
*/
// 由于目前 lwip 库不支持零复制传输,故需要使用 memcpy 进行复制
wpos = nb->data;
for (q = p; q != NULL; q = q->next) {
memcpy(wpos, q->payload, q->len);
wpos += q->len;
}
nb->len = p->tot_len;
// 开始发包,并判断是否仍然有待处理的数据包
/* Transmit packet */
do {
ret = uk_netdev_tx_one(dev, 0, nb);
} while (uk_netdev_status_notready(ret));
if (unlikely(ret < 0)) {
LWIP_DEBUGF(NETIF_DEBUG,
("%s: %c%c%u: Failed to send %"PRIu16" bytes\n",
__func__, nf->name[0], nf->name[1], nf->num,
p->tot_len));
/*
* Decrease refcount again because in
* the error case the netdev did not consume the pbuf
*/
uk_netbuf_free_single(nb);
return ERR_IF;
}
LWIP_DEBUGF(NETIF_DEBUG, ("%s: %c%c%u: Sent %"PRIu16" bytes\n",
__func__, nf->name[0], nf->name[1], nf->num,
p->tot_len));

return ERR_OK;
}

与用户交互

前文中提到 unikraft 接入了两套 lwip 的 API,分别是 NETCONN API 和 socket API。socket部分不用多说,用户调用 sys/socket.h 中提供的各种函数即可。下面简单介绍一下用户如何调用 NETCONN API 以及在调用时都发生了什么。

首先,用户需要调用 netconn_new 来新建一个 netconn。netconn_new 本质上是一个宏定义,实际上指向的函数是 netconn_new_with_proto_and_callback

1
#define netconn_new(t) netconn_new_with_proto_and_callback(t, 0, NULL)

在这个函数中会先调用 netconn_alloc 进行分配内存以及初始化的操作,随后调用 netconn_apimsg 函数构造一个消息,并且通过系统的消息邮箱发送给内核线程, 请求LwIP内核去执行 lwip_netconn_do_newconn 函数,并且在执行的时候,需要利用 op_completed 字段的信号量进行同步,直到内核处理完后,会释放一个信号量表示执行完成,这样子就形成两个线程间的同步,netconn_new 函数才得以继续执行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
struct netconn *
netconn_new_with_proto_and_callback(enum netconn_type t, u8_t proto, netconn_callback callback)
{
struct netconn *conn;
API_MSG_VAR_DECLARE(msg);
API_MSG_VAR_ALLOC_RETURN_NULL(msg);
// 分配内存及初始化
conn = netconn_alloc(t, callback);
if (conn != NULL) {
err_t err;

API_MSG_VAR_REF(msg).msg.n.proto = proto;
API_MSG_VAR_REF(msg).conn = conn;
err = netconn_apimsg(lwip_netconn_do_newconn, &API_MSG_VAR_REF(msg));
if (err != ERR_OK) {
LWIP_ASSERT("freeing conn without freeing pcb", conn->pcb.tcp == NULL);
LWIP_ASSERT("conn has no recvmbox", sys_mbox_valid(&conn->recvmbox));
#if LWIP_TCP
LWIP_ASSERT("conn->acceptmbox shouldn't exist", !sys_mbox_valid(&conn->acceptmbox));
#endif /* LWIP_TCP */
#if !LWIP_NETCONN_SEM_PER_THREAD
LWIP_ASSERT("conn has no op_completed", sys_sem_valid(&conn->op_completed));
sys_sem_free(&conn->op_completed);
#endif /* !LWIP_NETCONN_SEM_PER_THREAD */
sys_mbox_free(&conn->recvmbox);
memp_free(MEMP_NETCONN, conn);
API_MSG_VAR_FREE(msg);
return NULL;
}
}
API_MSG_VAR_FREE(msg);
return conn;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
struct netconn *
netconn_alloc(enum netconn_type t, netconn_callback callback)
{
struct netconn *conn;
int size;
u8_t init_flags = 0;
// 分配内存
conn = (struct netconn *)memp_malloc(MEMP_NETCONN);
if (conn == NULL) {
return NULL;
}

conn->pending_err = ERR_OK;
conn->type = t;
conn->pcb.tcp = NULL;
#if LWIP_NETCONN_FULLDUPLEX
conn->mbox_threads_waiting = 0;
#endif

/* If all sizes are the same, every compiler should optimize this switch to nothing */
switch (NETCONNTYPE_GROUP(t)) {
#if LWIP_RAW
case NETCONN_RAW:
size = DEFAULT_RAW_RECVMBOX_SIZE;
break;
#endif /* LWIP_RAW */
#if LWIP_UDP
case NETCONN_UDP:
size = DEFAULT_UDP_RECVMBOX_SIZE;
#if LWIP_NETBUF_RECVINFO
init_flags |= NETCONN_FLAG_PKTINFO;
#endif /* LWIP_NETBUF_RECVINFO */
break;
#endif /* LWIP_UDP */
#if LWIP_TCP
case NETCONN_TCP:
size = DEFAULT_TCP_RECVMBOX_SIZE;
break;
#endif /* LWIP_TCP */
default:
LWIP_ASSERT("netconn_alloc: undefined netconn_type", 0);
goto free_and_return;
}
// 新建信箱以及信号量
if (sys_mbox_new(&conn->recvmbox, size) != ERR_OK) {
goto free_and_return;
}
#if !LWIP_NETCONN_SEM_PER_THREAD
if (sys_sem_new(&conn->op_completed, 0) != ERR_OK) {
sys_mbox_free(&conn->recvmbox);
goto free_and_return;
}
#endif

#if LWIP_TCP
sys_mbox_set_invalid(&conn->acceptmbox);
#endif
conn->state = NETCONN_NONE;
#if LWIP_SOCKET
/* initialize socket to -1 since 0 is a valid socket */
conn->socket = -1;
#endif /* LWIP_SOCKET */
conn->callback = callback;
#if LWIP_TCP
conn->current_msg = NULL;
#endif /* LWIP_TCP */
#if LWIP_SO_SNDTIMEO
conn->send_timeout = 0;
#endif /* LWIP_SO_SNDTIMEO */
#if LWIP_SO_RCVTIMEO
conn->recv_timeout = 0;
#endif /* LWIP_SO_RCVTIMEO */
#if LWIP_SO_RCVBUF
conn->recv_bufsize = RECV_BUFSIZE_DEFAULT;
conn->recv_avail = 0;
#endif /* LWIP_SO_RCVBUF */
#if LWIP_SO_LINGER
conn->linger = -1;
#endif /* LWIP_SO_LINGER */
conn->flags = init_flags;
return conn;
free_and_return:
memp_free(MEMP_NETCONN, conn);
return NULL;
}

与 netconn_new 相对应的就是 netconn_delete。对于 TCP 连接, 如果此时是处于连接状态的,在调用该函数后,将请求内核执行终止连接操作,此时应用线程是无需理会到底是怎么运作的, 因为 lwip 内核将会完成所有的挥手过程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
err_t
netconn_delete(struct netconn *conn)
{
err_t err;

/* No ASSERT here because possible to get a (conn == NULL) if we got an accept error */
if (conn == NULL) {
return ERR_OK;
}

#if LWIP_NETCONN_FULLDUPLEX
if (conn->flags & NETCONN_FLAG_MBOXINVALID) {
/* Already called netconn_prepare_delete() before */
err = ERR_OK;
} else
#endif /* LWIP_NETCONN_FULLDUPLEX */
{
err = netconn_prepare_delete(conn);
}
if (err == ERR_OK) {
netconn_free(conn);
}
return err;
}

此处与新建 netconn 相对应,调用 lwip_netconn_do_delconn 函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
err_t
netconn_prepare_delete(struct netconn *conn)
{
err_t err;
API_MSG_VAR_DECLARE(msg);

/* No ASSERT here because possible to get a (conn == NULL) if we got an accept error */
if (conn == NULL) {
return ERR_OK;
}

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
#if LWIP_TCP
#if LWIP_SO_SNDTIMEO || LWIP_SO_LINGER
/* get the time we started, which is later compared to
sys_now() + conn->send_timeout */
API_MSG_VAR_REF(msg).msg.sd.time_started = sys_now();
#else /* LWIP_SO_SNDTIMEO || LWIP_SO_LINGER */
API_MSG_VAR_REF(msg).msg.sd.polls_left =
((LWIP_TCP_CLOSE_TIMEOUT_MS_DEFAULT + TCP_SLOW_INTERVAL - 1) / TCP_SLOW_INTERVAL) + 1;
#endif /* LWIP_SO_SNDTIMEO || LWIP_SO_LINGER */
#endif /* LWIP_TCP */
err = netconn_apimsg(lwip_netconn_do_delconn, &API_MSG_VAR_REF(msg));
API_MSG_VAR_FREE(msg);

if (err != ERR_OK) {
return err;
}
return ERR_OK;
}

netconn_getaddr 函数的作用很简单,就是获取一个 netconn 连接结构的源 IP 地址、端口号与目标IP地址、端口号等信息, 并且 IP 地址保存在 addr 中,端口号保存在 port 中,而 local 指定需要获取的信息是本地IP地址(源 IP 地址) 还是远端 IP 地址(目标 IP 地址),如果是 1 则表示获取本地 IP 地址与端口号,如果为 0 表示远端 IP 地址与端口号。 同样的,该函数会调用 netconn_apimsg 函数构造一个 API 消息,并且请求内核执行lwip_netconn_do_getaddr 函数, 然后通过 netconn 连接结构的信号量进行同步。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
err_t
netconn_getaddr(struct netconn *conn, ip_addr_t *addr, u16_t *port, u8_t local)
{
API_MSG_VAR_DECLARE(msg);
err_t err;

LWIP_ERROR("netconn_getaddr: invalid conn", (conn != NULL), return ERR_ARG;);
LWIP_ERROR("netconn_getaddr: invalid addr", (addr != NULL), return ERR_ARG;);
LWIP_ERROR("netconn_getaddr: invalid port", (port != NULL), return ERR_ARG;);

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
API_MSG_VAR_REF(msg).msg.ad.local = local;
#if LWIP_MPU_COMPATIBLE
err = netconn_apimsg(lwip_netconn_do_getaddr, &API_MSG_VAR_REF(msg));
*addr = msg->msg.ad.ipaddr;
*port = msg->msg.ad.port;
#else /* LWIP_MPU_COMPATIBLE */
msg.msg.ad.ipaddr = addr;
msg.msg.ad.port = port;
err = netconn_apimsg(lwip_netconn_do_getaddr, &msg);
#endif /* LWIP_MPU_COMPATIBLE */
API_MSG_VAR_FREE(msg);

return err;
}

netconn_bind 函数用于将一个 IP 地址及端口号与 netconn 连接结构进行绑定,如果作为服务器端,这一步操作是必然需要的,同样的, 该函数会调用 netconn_apimsg 函数构造一个API 消息,并且请求内核执行 lwip_netconn_do_bind 函数, 然后通过 netconn 连接结构的信号量进行同步,事实上内核线程的处理也是通过函数调用 xxx_bind( xxx_bind 可以是udp_bind、tcp_bind、raw_bind,具体是哪个函数内核是根据 netconn 的类型决定的) 完成相应控制块的绑定工作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
err_t
netconn_bind(struct netconn *conn, const ip_addr_t *addr, u16_t port)
{
API_MSG_VAR_DECLARE(msg);
err_t err;

LWIP_ERROR("netconn_bind: invalid conn", (conn != NULL), return ERR_ARG;);

#if LWIP_IPV4
/* Don't propagate NULL pointer (IP_ADDR_ANY alias) to subsequent functions */
if (addr == NULL) {
addr = IP4_ADDR_ANY;
}
#endif /* LWIP_IPV4 */

#if LWIP_IPV4 && LWIP_IPV6
/* "Socket API like" dual-stack support: If IP to bind to is IP6_ADDR_ANY,
* and NETCONN_FLAG_IPV6_V6ONLY is 0, use IP_ANY_TYPE to bind
*/
if ((netconn_get_ipv6only(conn) == 0) &&
ip_addr_cmp(addr, IP6_ADDR_ANY)) {
addr = IP_ANY_TYPE;
}
#endif /* LWIP_IPV4 && LWIP_IPV6 */

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
API_MSG_VAR_REF(msg).msg.bc.ipaddr = API_MSG_VAR_REF(addr);
API_MSG_VAR_REF(msg).msg.bc.port = port;
err = netconn_apimsg(lwip_netconn_do_bind, &API_MSG_VAR_REF(msg));
API_MSG_VAR_FREE(msg);

return err;
}

netconn_connect 函数是一个主动建立连接的函数,它一般在客户端中调用,将服务器端的 IP 地址和端口号与本地的 netconn 连接结构绑定,当 TCP 协议使用该函数的时候就是进行握手的过程,调用的应用线程将阻塞至握手完成; 而对于 UDP 协议来说,调用该函数只是设置 UDP 控制块的目标 IP 地址与目标端口号, 其实这个函数也是通过调用netconn_apimsg 函数构造一个API消息,并且请求内核执行 lwip_netconn_do_connect 函数, 然后通过 netconn 连接结构的信号量进行同步,在 lwip_netconn_do_connect 函数中,根据 netconn 的类型不同, 调用对应的 xxx_connect 函数进行对应的处理,如果是 TCP连接,将调用 tcp_connect;如果是 UDP 协议, 将调用 udp_connect;如果是 RAW,将调用 raw_connect 函数处理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
err_t
netconn_connect(struct netconn *conn, const ip_addr_t *addr, u16_t port)
{
API_MSG_VAR_DECLARE(msg);
err_t err;

LWIP_ERROR("netconn_connect: invalid conn", (conn != NULL), return ERR_ARG;);

#if LWIP_IPV4
/* Don't propagate NULL pointer (IP_ADDR_ANY alias) to subsequent functions */
if (addr == NULL) {
addr = IP4_ADDR_ANY;
}
#endif /* LWIP_IPV4 */

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
API_MSG_VAR_REF(msg).msg.bc.ipaddr = API_MSG_VAR_REF(addr);
API_MSG_VAR_REF(msg).msg.bc.port = port;
err = netconn_apimsg(lwip_netconn_do_connect, &API_MSG_VAR_REF(msg));
API_MSG_VAR_FREE(msg);

return err;
}

该函数是用于终止一个 UDP 协议的通信,简单来说就是将UDP控制块的目标IP地址与目标端口号清除, 同样的该函数也是构造API消息请求内核执行 lwip_netconn_do_disconnect 函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
err_t
netconn_disconnect(struct netconn *conn)
{
API_MSG_VAR_DECLARE(msg);
err_t err;

LWIP_ERROR("netconn_disconnect: invalid conn", (conn != NULL), return ERR_ARG;);

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
err = netconn_apimsg(lwip_netconn_do_disconnect, &API_MSG_VAR_REF(msg));
API_MSG_VAR_FREE(msg);

return err;
}

netconn_listen 函数的本质是一个带参宏,其真正调用的函数是netconn_listen_with_backlog, 只适用于 TCP 服务器中调用,它的作用是让 netconn 连接结构处于监听状态,同时让 TCP 控制块的状态处于 LISTEN 状态, 以便客户端连接,同样的,它通过 netconn_apimsg 函数请求内核执行 lwip_netconn_do_listen, 这个函数才是真正处理 TCP 连接的监听状态,并且在这个函数中会创建一个连接邮箱—— acceptmbox 邮箱在 netconn 连接结构中, 然后在TCP控制块中注册连接回调函数—— accept_function,当有客户端连接的时候,这个回调函数被执行, 并且向 acceptmbox 邮箱发送一个消息,通知应用程序有一个新的客户端连接,以便用户去处理这个连接。当然, 在wip_netconn_do_listen 函数处理完成的时候会释放一个信号量,以进行线程间的同步。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
err_t
netconn_listen_with_backlog(struct netconn *conn, u8_t backlog)
{
#if LWIP_TCP
API_MSG_VAR_DECLARE(msg);
err_t err;

/* This does no harm. If TCP_LISTEN_BACKLOG is off, backlog is unused. */
LWIP_UNUSED_ARG(backlog);

LWIP_ERROR("netconn_listen: invalid conn", (conn != NULL), return ERR_ARG;);

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
#if TCP_LISTEN_BACKLOG
API_MSG_VAR_REF(msg).msg.lb.backlog = backlog;
#endif /* TCP_LISTEN_BACKLOG */
err = netconn_apimsg(lwip_netconn_do_listen, &API_MSG_VAR_REF(msg));
API_MSG_VAR_FREE(msg);

return err;
#else /* LWIP_TCP */
LWIP_UNUSED_ARG(conn);
LWIP_UNUSED_ARG(backlog);
return ERR_ARG;
#endif /* LWIP_TCP */
}

该函数可以接收一个UDP或者TCP的数据包,从 recvmbox 邮箱中获取数据包,如果该邮箱中没有数据包,那么线程调用这个函数将会进入阻塞状态以等待消息的到来,如果在等待 TCP 连接上的数据时,远端主机终止连接,将返回一个终止连接的错误代码(ERR_CLSD),应用程序可以根据错误的类型进行不一样的处理。

对应TCP连接,netconn_recv 函数将调用 netconn_recv_data_tcp 函数去获取 TCP 连接上的数据, 在获取数据的过程中,调用 netconn_recv_data 函数从 recvmbox 邮箱获取 pbuf, 然后通过 netconn_tcp_recvd_msg->netconn_apimsg 函数构造一个 API 消息投递给系统邮箱, 请求内核执行 lwip_netconn_do_recv 函数,该函数将调用 tcp_recved 函数去更新 TCP 接收窗口, 同时 netconn_recv 函数将完成 pbuf 数据包封装在 netbuf 中,返回给应用程序;而对于 UDP 协议和 RAW 连接将直接调用 netconn_recv_data 函数获取数据,完成 pbuf 封装在 netbuf 中, 返回给应用程序。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
err_t
netconn_recv(struct netconn *conn, struct netbuf **new_buf)
{
#if LWIP_TCP
struct netbuf *buf = NULL;
err_t err;
#endif /* LWIP_TCP */

LWIP_ERROR("netconn_recv: invalid pointer", (new_buf != NULL), return ERR_ARG;);
*new_buf = NULL;
LWIP_ERROR("netconn_recv: invalid conn", (conn != NULL), return ERR_ARG;);

#if LWIP_TCP
#if (LWIP_UDP || LWIP_RAW)
if (NETCONNTYPE_GROUP(conn->type) == NETCONN_TCP)
#endif /* (LWIP_UDP || LWIP_RAW) */
{
struct pbuf *p = NULL;
/* This is not a listening netconn, since recvmbox is set */

buf = (struct netbuf *)memp_malloc(MEMP_NETBUF);
if (buf == NULL) {
return ERR_MEM;
}

err = netconn_recv_data_tcp(conn, &p, 0);
if (err != ERR_OK) {
memp_free(MEMP_NETBUF, buf);
return err;
}
LWIP_ASSERT("p != NULL", p != NULL);

buf->p = p;
buf->ptr = p;
buf->port = 0;
ip_addr_set_zero(&buf->addr);
*new_buf = buf;
/* don't set conn->last_err: it's only ERR_OK, anyway */
return ERR_OK;
}
#endif /* LWIP_TCP */
#if LWIP_TCP && (LWIP_UDP || LWIP_RAW)
else
#endif /* LWIP_TCP && (LWIP_UDP || LWIP_RAW) */
{
#if (LWIP_UDP || LWIP_RAW)
return netconn_recv_data(conn, (void **)new_buf, 0);
#endif /* (LWIP_UDP || LWIP_RAW) */
}
}

可以看到,在这个函数就和我们之前分析的流程成功串接在了一起

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
void
tcp_recved(struct tcp_pcb *pcb, u16_t len)
{
u32_t wnd_inflation;
tcpwnd_size_t rcv_wnd;

LWIP_ASSERT_CORE_LOCKED();

LWIP_ERROR("tcp_recved: invalid pcb", pcb != NULL, return);

/* pcb->state LISTEN not allowed here */
LWIP_ASSERT("don't call tcp_recved for listen-pcbs",
pcb->state != LISTEN);

rcv_wnd = (tcpwnd_size_t)(pcb->rcv_wnd + len);
if ((rcv_wnd > TCP_WND_MAX(pcb)) || (rcv_wnd < pcb->rcv_wnd)) {
/* window got too big or tcpwnd_size_t overflow */
LWIP_DEBUGF(TCP_DEBUG, ("tcp_recved: window got too big or tcpwnd_size_t overflow\n"));
pcb->rcv_wnd = TCP_WND_MAX(pcb);
} else {
pcb->rcv_wnd = rcv_wnd;
}

wnd_inflation = tcp_update_rcv_ann_wnd(pcb);

/* If the change in the right edge of window is significant (default
* watermark is TCP_WND/4), then send an explicit update now.
* Otherwise wait for a packet to be sent in the normal course of
* events (or more window to be available later) */
if (wnd_inflation >= TCP_WND_UPDATE_THRESHOLD) {
tcp_ack_now(pcb);
tcp_output(pcb);
}

LWIP_DEBUGF(TCP_DEBUG, ("tcp_recved: received %"U16_F" bytes, wnd %"TCPWNDSIZE_F" (%"TCPWNDSIZE_F").\n",
len, pcb->rcv_wnd, (u16_t)(TCP_WND_MAX(pcb) - pcb->rcv_wnd)));
}

netconn_write()函数的本质是一个宏,用于处于稳定连接状态的TCP协议发送数据,TCP协议的数据是以流的方式传输的,只需要指出发送数据的其实地址与长度即可,LwIP内核会帮我们直接处理这些数据,将这些数据按字节流进行编号,让它们按照TCP协议的方式进行传输,这样子就无需我们理会怎么传输了,对于数据的长度也没限制,内核会直接处理,使得它们变成最适的方式发送出去。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
err_t
netconn_write_vectors_partly(struct netconn *conn, struct netvector *vectors, u16_t vectorcnt,
u8_t apiflags, size_t *bytes_written)
{
API_MSG_VAR_DECLARE(msg);
err_t err;
u8_t dontblock;
size_t size;
int i;

LWIP_ERROR("netconn_write: invalid conn", (conn != NULL), return ERR_ARG;);
LWIP_ERROR("netconn_write: invalid conn->type", (NETCONNTYPE_GROUP(conn->type) == NETCONN_TCP), return ERR_VAL;);
dontblock = netconn_is_nonblocking(conn) || (apiflags & NETCONN_DONTBLOCK);
#if LWIP_SO_SNDTIMEO
if (conn->send_timeout != 0) {
dontblock = 1;
}
#endif /* LWIP_SO_SNDTIMEO */
if (dontblock && !bytes_written) {
/* This implies netconn_write() cannot be used for non-blocking send, since
it has no way to return the number of bytes written. */
return ERR_VAL;
}

/* sum up the total size */
size = 0;
for (i = 0; i < vectorcnt; i++) {
size += vectors[i].len;
if (size < vectors[i].len) {
/* overflow */
return ERR_VAL;
}
}
if (size == 0) {
return ERR_OK;
} else if (size > SSIZE_MAX) {
ssize_t limited;
/* this is required by the socket layer (cannot send full size_t range) */
if (!bytes_written) {
return ERR_VAL;
}
/* limit the amount of data to send */
limited = SSIZE_MAX;
size = (size_t)limited;
}

API_MSG_VAR_ALLOC(msg);
/* non-blocking write sends as much */
API_MSG_VAR_REF(msg).conn = conn;
API_MSG_VAR_REF(msg).msg.w.vector = vectors;
API_MSG_VAR_REF(msg).msg.w.vector_cnt = vectorcnt;
API_MSG_VAR_REF(msg).msg.w.vector_off = 0;
API_MSG_VAR_REF(msg).msg.w.apiflags = apiflags;
API_MSG_VAR_REF(msg).msg.w.len = size;
API_MSG_VAR_REF(msg).msg.w.offset = 0;
#if LWIP_SO_SNDTIMEO
if (conn->send_timeout != 0) {
/* get the time we started, which is later compared to
sys_now() + conn->send_timeout */
API_MSG_VAR_REF(msg).msg.w.time_started = sys_now();
} else {
API_MSG_VAR_REF(msg).msg.w.time_started = 0;
}
#endif /* LWIP_SO_SNDTIMEO */

/* For locking the core: this _can_ be delayed on low memory/low send buffer,
but if it is, this is done inside api_msg.c:do_write(), so we can use the
non-blocking version here. */
err = netconn_apimsg(lwip_netconn_do_write, &API_MSG_VAR_REF(msg));
if (err == ERR_OK) {
if (bytes_written != NULL) {
*bytes_written = API_MSG_VAR_REF(msg).msg.w.offset;
}
/* for blocking, check all requested bytes were written, NOTE: send_timeout is
treated as dontblock (see dontblock assignment above) */
if (!dontblock) {
LWIP_ASSERT("do_write failed to write all bytes", API_MSG_VAR_REF(msg).msg.w.offset == size);
}
}
API_MSG_VAR_FREE(msg);

return err;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
void
lwip_netconn_do_write(void *m)
{
struct api_msg *msg = (struct api_msg *)m;

err_t err = netconn_err(msg->conn);
if (err == ERR_OK) {
if (NETCONNTYPE_GROUP(msg->conn->type) == NETCONN_TCP) {
#if LWIP_TCP
if (msg->conn->state != NETCONN_NONE) {
/* netconn is connecting, closing or in blocking write */
err = ERR_INPROGRESS;
} else if (msg->conn->pcb.tcp != NULL) {
msg->conn->state = NETCONN_WRITE;
/* set all the variables used by lwip_netconn_do_writemore */
LWIP_ASSERT("already writing or closing", msg->conn->current_msg == NULL);
LWIP_ASSERT("msg->msg.w.len != 0", msg->msg.w.len != 0);
msg->conn->current_msg = msg;
#if LWIP_TCPIP_CORE_LOCKING
if (lwip_netconn_do_writemore(msg->conn, 0) != ERR_OK) {
LWIP_ASSERT("state!", msg->conn->state == NETCONN_WRITE);
UNLOCK_TCPIP_CORE();
sys_arch_sem_wait(LWIP_API_MSG_SEM(msg), 0);
LOCK_TCPIP_CORE();
LWIP_ASSERT("state!", msg->conn->state != NETCONN_WRITE);
}
#else /* LWIP_TCPIP_CORE_LOCKING */
lwip_netconn_do_writemore(msg->conn);
#endif /* LWIP_TCPIP_CORE_LOCKING */
/* for both cases: if lwip_netconn_do_writemore was called, don't ACK the APIMSG
since lwip_netconn_do_writemore ACKs it! */
return;
} else {
err = ERR_CONN;
}
#else /* LWIP_TCP */
err = ERR_VAL;
#endif /* LWIP_TCP */
#if (LWIP_UDP || LWIP_RAW)
} else {
err = ERR_VAL;
#endif /* (LWIP_UDP || LWIP_RAW) */
}
}
msg->err = err;
TCPIP_APIMSG_ACK(msg);
}
static err_t
lwip_netconn_do_writemore(struct netconn *conn WRITE_DELAYED_PARAM)
{
err_t err;
const void *dataptr;
u16_t len, available;
u8_t write_finished = 0;
size_t diff;
u8_t dontblock;
u8_t apiflags;
u8_t write_more;

LWIP_ASSERT("conn != NULL", conn != NULL);
LWIP_ASSERT("conn->state == NETCONN_WRITE", (conn->state == NETCONN_WRITE));
LWIP_ASSERT("conn->current_msg != NULL", conn->current_msg != NULL);
LWIP_ASSERT("conn->pcb.tcp != NULL", conn->pcb.tcp != NULL);
LWIP_ASSERT("conn->current_msg->msg.w.offset < conn->current_msg->msg.w.len",
conn->current_msg->msg.w.offset < conn->current_msg->msg.w.len);
LWIP_ASSERT("conn->current_msg->msg.w.vector_cnt > 0", conn->current_msg->msg.w.vector_cnt > 0);

apiflags = conn->current_msg->msg.w.apiflags;
dontblock = netconn_is_nonblocking(conn) || (apiflags & NETCONN_DONTBLOCK);

#if LWIP_SO_SNDTIMEO
if ((conn->send_timeout != 0) &&
((s32_t)(sys_now() - conn->current_msg->msg.w.time_started) >= conn->send_timeout)) {
write_finished = 1;
if (conn->current_msg->msg.w.offset == 0) {
/* nothing has been written */
err = ERR_WOULDBLOCK;
} else {
/* partial write */
err = ERR_OK;
}
} else
#endif /* LWIP_SO_SNDTIMEO */
{
do {
dataptr = (const u8_t *)conn->current_msg->msg.w.vector->ptr + conn->current_msg->msg.w.vector_off;
diff = conn->current_msg->msg.w.vector->len - conn->current_msg->msg.w.vector_off;
if (diff > 0xffffUL) { /* max_u16_t */
len = 0xffff;
apiflags |= TCP_WRITE_FLAG_MORE;
} else {
len = (u16_t)diff;
}
available = tcp_sndbuf(conn->pcb.tcp);
if (available < len) {
/* don't try to write more than sendbuf */
len = available;
if (dontblock) {
if (!len) {
/* set error according to partial write or not */
err = (conn->current_msg->msg.w.offset == 0) ? ERR_WOULDBLOCK : ERR_OK;
goto err_mem;
}
} else {
apiflags |= TCP_WRITE_FLAG_MORE;
}
}
LWIP_ASSERT("lwip_netconn_do_writemore: invalid length!",
((conn->current_msg->msg.w.vector_off + len) <= conn->current_msg->msg.w.vector->len));
/* we should loop around for more sending in the following cases:
1) We couldn't finish the current vector because of 16-bit size limitations.
tcp_write() and tcp_sndbuf() both are limited to 16-bit sizes
2) We are sending the remainder of the current vector and have more */
if ((len == 0xffff && diff > 0xffffUL) ||
(len == (u16_t)diff && conn->current_msg->msg.w.vector_cnt > 1)) {
write_more = 1;
apiflags |= TCP_WRITE_FLAG_MORE;
} else {
write_more = 0;
}
err = tcp_write(conn->pcb.tcp, dataptr, len, apiflags);
if (err == ERR_OK) {
conn->current_msg->msg.w.offset += len;
conn->current_msg->msg.w.vector_off += len;
/* check if current vector is finished */
if (conn->current_msg->msg.w.vector_off == conn->current_msg->msg.w.vector->len) {
conn->current_msg->msg.w.vector_cnt--;
/* if we have additional vectors, move on to them */
if (conn->current_msg->msg.w.vector_cnt > 0) {
conn->current_msg->msg.w.vector++;
conn->current_msg->msg.w.vector_off = 0;
}
}
}
} while (write_more && err == ERR_OK);
/* if OK or memory error, check available space */
if ((err == ERR_OK) || (err == ERR_MEM)) {
err_mem:
if (dontblock && (conn->current_msg->msg.w.offset < conn->current_msg->msg.w.len)) {
/* non-blocking write did not write everything: mark the pcb non-writable
and let poll_tcp check writable space to mark the pcb writable again */
API_EVENT(conn, NETCONN_EVT_SENDMINUS, 0);
conn->flags |= NETCONN_FLAG_CHECK_WRITESPACE;
} else if ((tcp_sndbuf(conn->pcb.tcp) <= TCP_SNDLOWAT) ||
(tcp_sndqueuelen(conn->pcb.tcp) >= TCP_SNDQUEUELOWAT)) {
/* The queued byte- or pbuf-count exceeds the configured low-water limit,
let select mark this pcb as non-writable. */
API_EVENT(conn, NETCONN_EVT_SENDMINUS, 0);
}
}

if (err == ERR_OK) {
err_t out_err;
if ((conn->current_msg->msg.w.offset == conn->current_msg->msg.w.len) || dontblock) {
/* return sent length (caller reads length from msg.w.offset) */
write_finished = 1;
}
out_err = tcp_output(conn->pcb.tcp);
if (out_err == ERR_RTE) {
/* If tcp_output fails because no route is found,
don't try writing any more but return the error
to the application thread. */
err = out_err;
write_finished = 1;
}
} else if (err == ERR_MEM) {
/* If ERR_MEM, we wait for sent_tcp or poll_tcp to be called.
For blocking sockets, we do NOT return to the application
thread, since ERR_MEM is only a temporary error! Non-blocking
will remain non-writable until sent_tcp/poll_tcp is called */

/* tcp_write returned ERR_MEM, try tcp_output anyway */
err_t out_err = tcp_output(conn->pcb.tcp);
if (out_err == ERR_RTE) {
/* If tcp_output fails because no route is found,
don't try writing any more but return the error
to the application thread. */
err = out_err;
write_finished = 1;
} else if (dontblock) {
/* non-blocking write is done on ERR_MEM, set error according
to partial write or not */
err = (conn->current_msg->msg.w.offset == 0) ? ERR_WOULDBLOCK : ERR_OK;
write_finished = 1;
}
} else {
/* On errors != ERR_MEM, we don't try writing any more but return
the error to the application thread. */
write_finished = 1;
}
}
if (write_finished) {
/* everything was written: set back connection state
and back to application task */
sys_sem_t *op_completed_sem = LWIP_API_MSG_SEM(conn->current_msg);
conn->current_msg->err = err;
conn->current_msg = NULL;
conn->state = NETCONN_NONE;
#if LWIP_TCPIP_CORE_LOCKING
if (delayed)
#endif
{
sys_sem_signal(op_completed_sem);
}
}
#if LWIP_TCPIP_CORE_LOCKING
else {
return ERR_MEM;
}
#endif
return ERR_OK;
}

netconn_close 函数用于主动终止一个 TCP 连接,它通过调用 netconn_apimsg 函数构造一个 API 消息,并且请求内核执行 lwip_netconn_do_close 函数,然后通过 netconn 连接结构的信号量进行同步,内核会完成终止 TCP 连接的全过程,无需我们理会。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
netconn_close_shutdown(struct netconn *conn, u8_t how)
{
API_MSG_VAR_DECLARE(msg);
err_t err;
LWIP_UNUSED_ARG(how);

LWIP_ERROR("netconn_close: invalid conn", (conn != NULL), return ERR_ARG;);

API_MSG_VAR_ALLOC(msg);
API_MSG_VAR_REF(msg).conn = conn;
#if LWIP_TCP
/* shutting down both ends is the same as closing */
API_MSG_VAR_REF(msg).msg.sd.shut = how;
#if LWIP_SO_SNDTIMEO || LWIP_SO_LINGER
/* get the time we started, which is later compared to
sys_now() + conn->send_timeout */
API_MSG_VAR_REF(msg).msg.sd.time_started = sys_now();
#else /* LWIP_SO_SNDTIMEO || LWIP_SO_LINGER */
API_MSG_VAR_REF(msg).msg.sd.polls_left =
((LWIP_TCP_CLOSE_TIMEOUT_MS_DEFAULT + TCP_SLOW_INTERVAL - 1) / TCP_SLOW_INTERVAL) + 1;
#endif /* LWIP_SO_SNDTIMEO || LWIP_SO_LINGER */
#endif /* LWIP_TCP */
err = netconn_apimsg(lwip_netconn_do_close, &API_MSG_VAR_REF(msg));
API_MSG_VAR_FREE(msg);

return err;
}

最后让我们来看一下具体的调用流程,以下是在网络上找到的一个例程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
static void
tcpecho_thread(void *arg)
{
struct netconn *conn, *newconn;
err_t err;
LWIP_UNUSED_ARG(arg);

/* Create a new connection identifier. */
/* Bind connection to well known port number 7. */
#if LWIP_IPV6
conn = netconn_new(NETCONN_TCP_IPV6);
netconn_bind(conn, IP6_ADDR_ANY, LOCAL_PORT);
#else /* LWIP_IPV6 */
conn = netconn_new(NETCONN_TCP); (1)
netconn_bind(conn, IP_ADDR_ANY, LOCAL_PORT); (2)
#endif /* LWIP_IPV6 */
LWIP_ERROR("tcpecho: invalid conn", (conn != NULL), return;);

PRINTF("本地端口号是%d\n\n",LOCAL_PORT);

/* Tell connection to go into listening mode. */
netconn_listen(conn); (3)

while (1) {

/* Grab new connection. */
err = netconn_accept(conn, &newconn); (4)
/*printf("accepted new connection %p\n", newconn);*/
/* Process the new connection. */
if (err == ERR_OK) {
struct netbuf *buf;
void *data;
u16_t len;

while ((err = netconn_recv(newconn, &buf)) == ERR_OK) {(5)
/*printf("Recved\n");*/
do {
netbuf_data(buf, &data, &len);(6)
err = netconn_write(newconn, data, len, NETCONN_COPY);(7)
#if 0
if (err != ERR_OK) {
PRINTF("tcpecho: netconn_write: error \"%s\"\n",lwip_strerr(err));
}
#endif
} while (netbuf_next(buf) >= 0);(8)
netbuf_delete(buf); (9)
}
/*printf("Got EOF, looping\n");*/
/* Close connection and discard connection identifier. */
netconn_close(newconn); (10)
netconn_delete(newconn); (11)
}
}
}

static void client(void *thread_param)
{
struct netconn *conn;
int ret;
ip4_addr_t ipaddr;
uint8_t send_buf[]= "This is a TCP Client test...\n";
while (1) {
conn = netconn_new(NETCONN_TCP); (1)
if (conn == NULL) {
PRINT_DEBUG("create conn failed!\n");
vTaskDelay(10);
continue;
}

IP4_ADDR(&ipaddr,DEST_IP_ADDR0,DEST_IP_ADDR1,DEST_IP_ADDR2,DEST_IP_ADDR3); (2)
ret = netconn_connect(conn,&ipaddr,DEST_PORT); (3)
if (ret == -1) {
PRINT_DEBUG("Connect failed!\n");
netconn_close(conn); (4)
vTaskDelay(10);
continue;
}

PRINT_DEBUG("Connect to server successful!\n");

while (1) {
ret = netconn_write(conn,send_buf,sizeof(send_buf),0); (5)

vTaskDelay(1000);
}
}
}

从以上的代码可以看出,使用 netconn 建立连接的过程很像 socket  的建立过程。实际上,lwip 内的 socket 实现实际上就是对 NETCONN API 提供的方法进行了一层封装。