Contributed by tj on from the gotta-go-fast dept.
My name is Hrvoje Popovski, I'm a husband, a father of 3 little kids and network engineer at University of Zagreb University Computing Centre – SRCE. Somewhere around the beginning of 2015, I got one server to play with that luckily had four em(4) (Intel I350) and two ix(4) (Intel 82599) onboard NICs. Around that time, developers started to throw out some interesting MP diffs, and I couldn't resist trying them. So I started to beg my boss and people around me to buy or lend me some PCs or servers to generate traffic with the MP diffs. I don't know how, but two Dell servers came to my lab...
I only had em(4) and ix(4), but since the myx(4) driver was where fun is, I bought a card that used it on eBay. At one point I suspected that bge(4) is faster than em(4), and for just a few dollars, I bought a dual-port BCM5720. Currently I have em, bge, ix and myx cards in that box. The variety of network cards allows me to monitor the progress and performance of the MP networking work. I love that box.The generator and receiver in the examples can be one or two servers or any desktop PC that you have handy. In this case, the generator is running Ubuntu 14.04 LTS. For generating UDP traffic up to 14Mpps, I'm using pktgen. For generating TCP traffic up to 12Mpps, I'm using trafgen from netsniff-ng. For counting packets, I'm using ifpps on the sending and receiving Linux interfaces, together with netstat -i 1 on the OpenBSD box.
I'm also starting to use Cisco TREX generator, and when the basic setup is done, I think that this tool could be very good for testing MP PF performance.
This simple lab could be quite aggressive when playing with pktgen or the trex generator, and it's easy to set up and send bugs to developers! Through the 5.8 and 5.9 development, I had a setup like this:
OpenBSD net lab fully loaded.
IBM X3550 M4
2 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz, 2400.00 MHz/etc/rc.conf.local:
pf=NO ddb.console=1 kern.pool_debug=0 net.inet.ip.forwarding=1 net.inet.ip.ifq.maxlen=8192 kern.maxclusters=32768And here are the net performance results. Generator is pktgen with 64 byte UDP packets.
OpenBSD 5.8 OpenBSD 5.9
ix - Intel 82599
send receive receive | send receive receive routed bridged | routed bridged 200kpps 200kpps 200kpps | 200kpps 200kpps 200kpps 400kpps 400kpps 400kpps | 400kpps 400kpps 400kpps 600kpps 600kpps 520kpps | 600kpps 600kpps 600kpps 700kpps 700kpps 520kpps | 700kpps 700kpps 620kpps 800kpps 550kpps 520kpps | 800kpps 690kpps 570kpps 1Mpps 640kpps 520kpps | 1Mpps 680kpps 490kpps 1.4Mpps 640kpps 520kpps | 1.4Mpps 680kpps 280kpps 4Mpps 640kpps 520kpps | 4Mpps 680kpps 280kpps 8Mpps 640kpps 520kpps | 8Mpps 680kpps 280kpps 14Mpps 640kpps 520kpps | 14Mpps 680kpps 280kpps
myx - 10G-PCIE2-8BL2-2S
send receive receive | send receive receive routed bridged | routed bridged 200kpps 200kpps 200kpps | 200kpps 200kpps 200kpps 400kpps 400kpps 400kpps | 400kpps 400kpps 400kpps 600kpps 600kpps 510kpps | 600kpps 600kpps 580kpps 700kpps 630kpps 510kpps | 700kpps 640kpps 580kpps 800kpps 630kpps 510kpps | 800kpps 640kpps 580kpps 1Mpps 630kpps 510kpps | 1Mpps 640kpps 580kpps 1.4Mpps 630kpps 510kpps | 1.4Mpps 640kpps 580kpps 4Mpps 630kpps 510kpps | 4Mpps 640kpps 360kpps 8Mpps 630kpps 520kpps | 8Mpps 640kpps 360kpps 14Mpps 630kpps 530kpps | 14Mpps 640kpps 360kpps
em - Intel I350
send receive receive | send receive receive routed bridged | routed bridged 200kpps 200kpps 200kpps | 200kpps 200kpps 200kpps 400kpps 400kpps 400kpps | 400kpps 400kpps 400kpps 600kpps 600kpps 500kpps | 600kpps 600kpps 600kpps 700kpps 620kpps 500kpps | 700kpps 700kpps 640kpps 800kpps 620kpps 500kpps | 800kpps 700kpps 600kpps 1Mpps 620kpps 500kpps | 1Mpps 700kpps 510kpps 1.4Mpps 620kpps 500kpps | 1.4Mpps 700kpps 410kpps
bge - Broadcom BCM5720
send receive receive | send receive receive routed bridged | routed bridged 200kpps 200kpps 200kpps | 200kpps 200kpps 200kpps 400kpps 400kpps 400kpps | 400kpps 400kpps 400kpps 600kpps 600kpps 545kpps | 600kpps 600kpps 600kpps 700kpps 660kpps 545kpps | 700kpps 700kpps 650kpps 800kpps 660kpps 545kpps | 800kpps 720kpps 600kpps 1Mpps 660kpps 545kpps | 1Mpps 720kpps 520kpps 1.4Mpps 660kpps 545kpps | 1.4Mpps 710kpps 440kpps
Thanks for the detailed stats, Hrvoje.
Much work has been done, but there's still plenty left to do. Even in 5.9, the network stack is still under KERNEL_LOCK. The improvements shown here mainly come from the fact that the driver interrupt handler can run at the same time now. We're excited to see what possibilities will open when the work is complete and the lock can be removed. Still, it's good to see that there's steady progress being made in one of the areas everyone seems to be interested in.
(Comments are closed)
By journeysquid (Tor) on http://www.openbsd.org/donations.html
Comments
By Anonymous Coward (133.237.7.86) on
>
Also strange is that the numbers for routed are higher than bridged...
Comments
By phessler (92.206.52.143) phessler@openbsd.org on
> >
>
> Also strange is that the numbers for routed are higher than bridged...
not strange at all. routed is the true way for packets to flow through the system.
By karchnu (2001:660:4701:1001:de4a:3eff:fe01:3b44) karchnu@karchnu.fr on https://karchnu.fr
>
That's a very important step backward iiuc. Can someone can give us more details about it ?
Comments
By Martin (93.129.212.227) mpi@openbsd.org on
> >
>
> That's a very important step backward iiuc. Can someone can give us more details about it ?
How is it backward? Aren't you seeing an improvement too in bridged mode.
But maybe by "more details" you're asking why in bridged mode the numbers of transmitted packets per second drops when the sending side saturates the machine. If that's your question then I'd say that's certainly due to the extra queue introduced for bridge(4). This queue is only temporary. It is here to divide the network stack in independent pieces that can be taken out of the KERNE_LOCK one at a time.
Comments
By rjc (rjc) on
> > >
> >
> > That's a very important step backward iiuc. Can someone can give us more details about it ?
>
> How is it backward? Aren't you seeing an improvement too in bridged mode.
>
> But maybe by "more details" you're asking why in bridged mode the numbers of transmitted packets per second drops when the sending side saturates the machine. If that's your question then I'd say that's certainly due to the extra queue introduced for bridge(4). This queue is only temporary. It is here to divide the network stack in independent pieces that can be taken out of the KERNE_LOCK one at a time.
Hi Martin,
Some (including myself) see the lower number and automatically think "bad", so thank you for clarifying.
Cheers,
Raf
By Denis (2001:7a8:b5ad::10:10) on
Comments
By Blake (2001:1b48:4:1337:cabc:c8ff:fe96:6d2f) on 2112.net
I'd be very interested to know if the fixed-length lookup for MPLS packets increases the performance significantly vs. variable-length IPv6/IPv4...
By mxb (104.233.108.86) on
Dobro da znate.