Contributed by Peter N. M. Hansteen on from the all the packets all at once dept.
The work to improve the capabilities of the network stack is about to take a noticeable step forward. In a message to tech@
titled parallel raw IP input, Alexander Bluhm (bluhm@
) posted a patch that he describes as
List: openbsd-tech Subject: parallel raw IP input From: Alexander Bluhm <bluhm () openbsd ! org> Date: 2024-04-11 20:24:39 Hi, As mvs@ mentioned, running raw IP in parallel is easier as it is less complex than UDP. Especially there is no socket splicing. So I fixed one race in rip_input() and reused my shared net lock ip_deliver() loop.
The idea is that ip_deliver() may run with shared or exclusive net lock. The last parameter indicates the mode. If is is running with shared netlock and encounters a protocol that needs exclusive lock, the packet is queued. Before ip_ours() always queued the packet. Now it tries to deliver with shared net lock, and if that is not possible, it queues the packet. In case we have an IPv6 header chain that must switch from shared to exclusive processing, the next protocol and mbuf offset are stored in a mbuf tag. The only drawback is that we have very limited test coverage for raw IP. The ip_deliver() shared locking change works also with UDP, Hvroje has tested it in 2022. ok? bluhm
Followed by the patch itself, which should apply to a then-recent -current checkout.
a little later, the patch was committed:
CVSROOT: /cvs Module name: src Changes by: bluhm@cvs.openbsd.org 2024/04/14 14:46:27 Modified files: sys/net : if_bridge.c sys/netinet : in_proto.c ip_input.c ip_var.h sys/netinet6 : ip6_input.c sys/sys : mbuf.h protosw.h Log message: Run raw IP input in parallel. Running raw IPv4 input with shared net lock in parallel is less complex than UDP. Especially there is no socket splicing. New ip_deliver() may run with shared or exclusive net lock. The last parameter indicates the mode. If is is running with shared netlock and encounters a protocol that needs exclusive lock, the packet is queued. Old ip_ours() always queued the packet. Now it calls ip_deliver() with shared net lock, and if that cannot handle the packet completely, the packet is queued and later processed with exclusive net lock. In case of an IPv6 header chain, that switches from shared to exclusive processing, the next protocol and mbuf offset are stored in a mbuf tag. OK mvs@
Via email, bluhm@
added some further explanation:
The commit from January is sending UDP in parallel. Socket send, when called from userland, uses shared net lock. You need multiple UDP sockets and threads writing to them to see an effect. Now we are working on parallel input. When traffic is directed to different IP or ports, the network hardware can distribute flows to different receive queues. These queues are processed by one CPU each. Goal is to keep procssing parallel until data reaches userland. IP input and forward runs parallel for a while. Until last week all protocol input was single threaded. Now raw IP can run in parallel. Next step is UDP input in parallel. It kind of works, but locking in socket splicing is wrong. In my experiments I see increase in UDP througput of factor 4 to 7. But the locking problems are quite nasty. I think we need more tests that agressively splice and unsplice sockets. Advantage of UDP over raw IP would be that testing is much easier. Final protocol will be TCP, but that is hardest of all. Single stream TCP performance already got a performance boost by hardware offloading. hardware receive => IP input -> protocol input => userland => protocol output => IP output => hardware transmit bluhm@ is working on -> protocol input to make it parallel for more protocols. It is the final bottle neck. mvs@ is looking at => to and from userland. They behave differently for UNIX domain, raw IP, UDP, and TCP, the latter is still single threaded.
This all boils down to faster packets, due to the system's now ever more increasing ability to fully utilize multiple cores to process network traffic.
Testing is of course still appreciated, but this code is anyway destined to be in the next release.
(Comments are closed)