Go net包剖析

声明

下面的分析均基于Golang1.14版本。
以下只分析Tcp相关的源码，其它的类似。
网络底层只分析Linux下epoll的实现。

epoll基础

epoll的相关接口

int epoll_create(int size); // 初始化epoll句柄 // 套接字的事件监听注册。含新增，修改，删除操作。可以监听读写事件（一般只监听读事件）int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);// 收集注册在epoll句柄中的已触发的事件int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

在Golang中通过汇编代码调用对应的系统调用来调用epoll的接口。以下是对应关系：

epollcreate ==> epoll_createepollctl ==> epoll_ctlepollwait ==> epoll_wait

net包到底层类图

net包到底层的类图

1.在net包的最上层要提供接口建立网络连接和数据读取。提供net.Listen获取监听socket，提供Connect，Accept和获取连接socket。
2.连接的建立最终需要调用Linux的系统的socket, bind, listen, accept, connect, send, recv等函数。这些系统调用在netFD这一层被调用。
3.连接的管理最终需要调用Linux系统的epoll_create，epoll_ctl，epoll_wait等函数。这些系统调用在runtime.pollDesc这一层被调用。

连接的建立

epoll实现TCP Client + Server

1.在Linux下用epoll实现网络连接比较简单，流程图如上。

Go调用Linux系统调用实现TCP Client + Server

2.在上面我们已经知道Go在Linux下实现网络连接最终是调用Linux下的系统调用。分析Go源码时，可以尝试去对应C实现网络连接的流程。
3.其中scoket函数里调用了Linux相关的 socket，bind，listen，connect等系统调用。
4.poll_runtime_pollOpen调用了Linux相关的epoll_ctl系统调用。
5.以上的流程图侧重于类比Go和C实现网络连接的流程，所以有部分分支未深入，如socket中调用系统调用初始化socket的部分。
PS：accept也是连接建立的一部分，但accept本质也是对监听套接字的数据读写，因此accept部分在一部数据读写部分进行分析。

epoll_create的调用

6.epoll句柄在一个进程中最多只需要一个。当调用pollDesc.init时，说明该进程有网络相关的操作，于是通过sync.Once调用runtime_pollServerinit进行初始化，该函数最终会调用epoll_create创建epoll句柄。
7.使用epoll_ctl注册监听事件。

func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {    pd := pollcache.alloc() // 分配pollDesc    lock(&pd.lock) // 初始化 pollDesc    if pd.wg != 0 && pd.wg != pdReady {        throw("runtime: blocked write on free polldesc")    }    if pd.rg != 0 && pd.rg != pdReady {        throw("runtime: blocked read on free polldesc")    }    pd.fd = fd    pd.closing = false    pd.everr = false    pd.rseq++    pd.rg = 0    pd.rd = 0    pd.wseq++    pd.wg = 0    pd.wd = 0    unlock(&pd.lock)    var errno int32    errno = netpollopen(fd, pd) // epoll_ctl 注册监听事件    return pd, int(errno)}func netpollopen(fd uintptr, pd *pollDesc) int32 {    var ev epollevent    ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET // 4种类型的事件都需要注册    *(**pollDesc)(unsafe.Pointer(&ev.data)) = pd    return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev) // 调用epoll_ctl}

异步的数据读写

异步读写数据

1.简单分析accept，read，write的调用场景。accept是监听套接字等待客户端连接，因此同步必然会导致阻塞。read是读取对方socket发过来的数据，因此也必须是异步读取，否则阻塞。write是往缓冲区写数据，如果缓冲区满则返回写入的数量，因此最好是异步写，否则缓冲区满的情况下也会阻塞。
2.调用accept，read，write这些函数最终都会调用runtime的poll_runtime_pollWait函数，源码如下。

func poll_runtime_pollWait(pd *pollDesc, mode int) int {    err := netpollcheckerr(pd, int32(mode))    if err != 0 {        return err    }    // As for now only Solaris, illumos, and AIX use level-triggered IO.    if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" {        netpollarm(pd, mode)    }    for !netpollblock(pd, int32(mode), false) { // 最终调用netpollblock阻塞        err = netpollcheckerr(pd, int32(mode))        if err != 0 {            return err        }        // Can happen if timeout has fired and unblocked us,        // but before we had a chance to run, timeout has been reset.        // Pretend it has not happened and retry.    }    return 0}func netpollcheckerr(pd *pollDesc, mode int32) int {    if pd.closing {        return 1 // ErrFileClosing or ErrNetClosing 如果套接字关闭或即将关闭    }    if (mode == 'r' && pd.rd < 0) || (mode == 'w' && pd.wd < 0) {        return 2 // ErrTimeout 如果套接字已经超时（定了事件的超时时间且过了超时时间）    }    // Report an event scanning error only on a read event.    // An error on a write event will be captured in a subsequent    // write call that is able to report a more specific error.    if mode == 'r' && pd.everr {        return 3 // ErrNotPollable epoll_wati发生_EPOLLERR错误    }    return 0}func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {    gpp := &pd.rg    if mode == 'w' {        gpp = &pd.wg    }    // set the gpp semaphore to WAIT    for {        old := *gpp        if old == pdReady {            *gpp = 0            return true        }        if old != 0 {            throw("runtime: double wait")        }        if atomic.Casuintptr(gpp, 0, pdWait) {            break        }    }    // need to recheck error states after setting gpp to WAIT    // this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl    // do the opposite: store to closing/rd/wd, membarrier, load of rg/wg    if waitio || netpollcheckerr(pd, mode) == 0 { // 如果可以进入休眠 调用gopark G进入waitting状态        gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)    }    // be careful to not lose concurrent READY notification    old := atomic.Xchguintptr(gpp, 0)    if old > pdWait {        throw("runtime: corrupted polldesc")    }    return old == pdReady}

3.由上面的源码可知，套接字的读写最终会调用gopark使得协程进入waitting状态。因此需要在网络事件完成时，调用ready唤醒协程，让网络协程继续运行读写socket。

SetDeadline使用场景。如果不设置超时时间，那么G会一直阻塞在读写中，直到注册的事件发生。如果业务层要处理长时间没有读写请求的G，不设置超时时间是无法实现的。

调度有网络消息的G

1.对比C用epoll写的TCP客户端或服务器，需要轮询调用epoll_wait，处理触发了读写事件的socket。同理Go这边也需要有对应的代码轮询调用epoll_wait，处理socket的读写事件。
2.轮询调用epoll_wait的函数是netpoll函数，在sysmon中和findrunnable中都有调用该函数（忽略了GC中的调用）。

        lastpoll := int64(atomic.Load64(&sched.lastpoll))        if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {            atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))            list := netpoll(0) // non-blocking - returns list of goroutines            if !list.empty() { // 触发读写事件的G的List                incidlelocked(-1)                injectglist(&list)                incidlelocked(1)            }        }    if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {        if list := netpoll(0); !list.empty() { // non-blocking            gp := list.pop() // 取触发读写事件的G的List中的首个元素继续运行            injectglist(&list)            casgstatus(gp, _Gwaiting, _Grunnable)            if trace.enabled {                traceGoUnpark(gp, 0)            }            return gp, false        }    }

3.netpoll主要是调用epoll_wait，将已经触发了读写事件的G取出并串联在gList中。

func netpoll(delay int64) gList {    if epfd == -1 {        return gList{}    }    var waitms int32    if delay < 0 {        waitms = -1    } else if delay == 0 {        waitms = 0    } else if delay < 1e6 {        waitms = 1    } else if delay < 1e15 {        waitms = int32(delay / 1e6)    } else {        // An arbitrary cap on how long to wait for a timer.        // 1e9 ms == ~11.5 days.        waitms = 1e9    }    var events [128]epolleventretry:    n := epollwait(epfd, &events[0], int32(len(events)), waitms)    if n < 0 { // 出现错误        if n != -_EINTR {            println("runtime: epollwait on fd", epfd, "failed with", -n)            throw("runtime: netpoll failed")        }        // If a timed sleep was interrupted, just return to        // recalculate how long we should sleep now.        if waitms > 0 { // 如果设定了大于0的等待时间 不阻塞 直接退出            return gList{}        }        goto retry // 如果超时时间小于等于0 则再次尝试    }    var toRun gList    for i := int32(0); i < n; i++ { // 取出触发读写事件socket绑定的G        ev := &events[i]        if ev.events == 0 {            continue        }        if *(**uintptr)(unsafe.Pointer(&ev.data)) == &netpollBreakRd {            if ev.events != _EPOLLIN {                println("runtime: netpoll: break fd ready for", ev.events)                throw("runtime: netpoll: break fd ready for something unexpected")            }            if delay != 0 {                // netpollBreak could be picked up by a                // nonblocking poll. Only read the byte                // if blocking.                var tmp [16]byte                read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))            }            continue        }        var mode int32        if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {            mode += 'r'        }        if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {            mode += 'w'        }        if mode != 0 {            pd := *(**pollDesc)(unsafe.Pointer(&ev.data))            pd.everr = false            if ev.events == _EPOLLERR {                pd.everr = true            }            netpollready(&toRun, pd, mode) // 将触发读写事件的socket绑定的G放入toRun中        }    }    return toRun}func netpollready(toRun *gList, pd *pollDesc, mode int32) {    var rg, wg *g    if mode == 'r' || mode == 'r'+'w' {        rg = netpollunblock(pd, 'r', true)    }    if mode == 'w' || mode == 'r'+'w' {        wg = netpollunblock(pd, 'w', true)    }    if rg != nil {        toRun.push(rg)    }    if wg != nil {        toRun.push(wg)    }}func netpollunblock(pd *pollDesc, mode int32, ioready bool) *g {    gpp := &pd.rg    if mode == 'w' {        gpp = &pd.wg    }    for {        old := *gpp        if old == pdReady {            return nil        }        if old == 0 && !ioready {            // Only set READY for ioready. runtime_pollWait            // will check for timeout/cancel before waiting.            return nil        }        var new uintptr        if ioready {            new = pdReady        }        if atomic.Casuintptr(gpp, old, new) {            if old == pdReady || old == pdWait {                old = 0            }            return (*g)(unsafe.Pointer(old))        }    }}

上面的源码都比较简单，调用epoll_wait找到触发读写事件的socket和绑定的pollDesc，并根据pollDesc找到对应的G。将G放入gList中，再调度gList中的G，调用ready使G从waitting进入runnable状态。

带超时时间的读写

设置超时时间，主要是增加一个timer，设置timer的到期时间和回调函数。

func poll_runtime_pollSetDeadline(pd *pollDesc, d int64, mode int) {    lock(&pd.lock)    if pd.closing {        unlock(&pd.lock)        return    }    rd0, wd0 := pd.rd, pd.wd    combo0 := rd0 > 0 && rd0 == wd0    if d > 0 {        d += nanotime() // d为到期时间        if d <= 0 {            // If the user has a deadline in the future, but the delay calculation            // overflows, then set the deadline to the maximum possible value.            d = 1<<63 - 1        }    }    if mode == 'r' || mode == 'r'+'w' {        pd.rd = d    }    if mode == 'w' || mode == 'r'+'w' {        pd.wd = d    }    combo := pd.rd > 0 && pd.rd == pd.wd    rtf := netpollReadDeadline // 绑定到期后的处理函数     if combo {        rtf = netpollDeadline    }    if pd.rt.f == nil {        if pd.rd > 0 {            pd.rt.f = rtf            // Copy current seq into the timer arg.            // Timer func will check the seq against current descriptor seq,            // if they differ the descriptor was reused or timers were reset.            pd.rt.arg = pd            pd.rt.seq = pd.rseq            resettimer(&pd.rt, pd.rd) // 新增read timer        }    } else if pd.rd != rd0 || combo != combo0 {         pd.rseq++ // invalidate current timers        if pd.rd > 0 { // 修改原有的read timer            modtimer(&pd.rt, pd.rd, 0, rtf, pd, pd.rseq)        } else { // 删除原有的timer            deltimer(&pd.rt)            pd.rt.f = nil        }    }    if pd.wt.f == nil {        if pd.wd > 0 && !combo {            pd.wt.f = netpollWriteDeadline // 绑定到期后的处理函数             pd.wt.arg = pd            pd.wt.seq = pd.wseq            resettimer(&pd.wt, pd.wd) // 新增timer        }    } else if pd.wd != wd0 || combo != combo0 {        pd.wseq++ // invalidate current timers        if pd.wd > 0 && !combo { // 修改原有的timer            modtimer(&pd.wt, pd.wd, 0, netpollWriteDeadline, pd, pd.wseq)        } else { // 删除原有的timer             deltimer(&pd.wt)            pd.wt.f = nil        }    }    // If we set the new deadline in the past, unblock currently pending IO if any.    var rg, wg *g    if pd.rd < 0 || pd.wd < 0 { // 如果超时时间 < 0  则尝试将对应的G取出并设置为runnable        atomic.StorepNoWB(noescape(unsafe.Pointer(&wg)), nil) // full memory barrier between stores to rd/wd and load of rg/wg in netpollunblock        if pd.rd < 0 {            rg = netpollunblock(pd, 'r', false)        }        if pd.wd < 0 {            wg = netpollunblock(pd, 'w', false)        }    }    unlock(&pd.lock)    if rg != nil {        netpollgoready(rg, 3)    }    if wg != nil {        netpollgoready(wg, 3)    }}

2.读写timer到期后的回调处理函数，设置读写的超时时间并且唤醒协程，源码如下。

func netpolldeadlineimpl(pd *pollDesc, seq uintptr, read, write bool) {    lock(&pd.lock)    // Seq arg is seq when the timer was set.    // If it's stale, ignore the timer event.    currentSeq := pd.rseq    if !read {        currentSeq = pd.wseq    }    if seq != currentSeq { // 如果是先触发了读写事件 再触发超时 则序列号不相等. 此时需要唤醒G        // The descriptor was reused or timers were reset.        unlock(&pd.lock)        return    }    var rg *g    if read {        if pd.rd <= 0 || pd.rt.f == nil {            throw("runtime: inconsistent read deadline")        }        pd.rd = -1 // 设置超时时间 和 回调处理函数        atomic.StorepNoWB(unsafe.Pointer(&pd.rt.f), nil) // full memory barrier between store to rd and load of rg in netpollunblock        rg = netpollunblock(pd, 'r', false)    }    var wg *g    if write {        if pd.wd <= 0 || pd.wt.f == nil && !read {            throw("runtime: inconsistent write deadline")        }        pd.wd = -1 // 设置超时时间 和 回调处理函数        atomic.StorepNoWB(unsafe.Pointer(&pd.wt.f), nil) // full memory barrier between store to wd and load of wg in netpollunblock        wg = netpollunblock(pd, 'w', false)    }    unlock(&pd.lock)    if rg != nil {        netpollgoready(rg, 0) // 唤醒read协程    }    if wg != nil {        netpollgoready(wg, 0) // 唤醒write协程    }}

rg/wg值的互相转换。rg/wg初始值为0，调用gopark前，rg/wg的值为pdWait，调用gopark时，将rg/wg值设置为G。唤醒前，如果是通过epoll_wait中注册的事件唤醒，在netpollunblock中被设置为pdReady。如果是在timer超时中被唤醒，则在netpollunblock中设置为0。唤醒后，pdWait设置为0。
rg/wg值的状态转换
SetDeadline, SetReadDeadline, SetWriteDeadline这3个函数会修改pollDesc中的rd，wd (read deadline, write deadline)，rd/wd函数的值的设置和含义如下。

rd/wd的值	何时设置	含义
0	初始化时	gopark使协程陷入waitting后，只会通过epoll_wait唤醒协程
>0	调用SetDealLine(read/write)后	设置timer唤醒协程，gopark使协程陷入waitting后，既可以由epoll_wait唤醒，也可以由timer唤醒
<0	调用SetDeadLine后阻塞在读写中，后由timer唤醒	该连接的读写超时，无法再异步读写

总结

1.Go中Linux下epoll实现的TCP连接和C实现的epoll的模型是一样的，相当于用Go的语法和特性为epoll模型做了一层封装。
2.学习的本质是将陌生的东西转化为熟悉的东西，把陌生的模型和熟悉的模型进行类比，能更快的学会陌生的知识。
3.Go提供的TCP连接读写接口是同步阻塞的，同步表示调用结束后，要么读写成功，要么超时，阻塞表示该接口返回的时间是不确定的，上层业务需要封装读写。

智云一二三科技

Go net包剖析

目录

声明

epoll基础

net包到底层类图

连接的建立

异步的数据读写

调度有网络消息的G

带超时时间的读写

总结

关于作者: 智云科技

目录

声明

epoll基础

net包到底层类图

连接的建立

异步的数据读写

调度有网络消息的G

带超时时间的读写

总结

给这篇文章的作者打赏

关于作者: 智云科技

相关文章

Linux性能及调优指南之Linux进程管理

详解SpringIOC容器的基本配置

Goland配置GOROOT无法识别已安装的go

热门文章

1分享新浪图床上传接口源码

2PHP简单实现路由Route功能

3Tideways、xhprof 和 xhgui 打造 PHP 非侵入式监控平台

4centos系统如何查看是否安装了mysql

5curl 工具简述