[CFT] Major pfsync(4) Rewrite on the Horizon

Contributed by Peter N. M. Hansteen on 2023-06-21 from the mekmisyncdistates dept.

A major rewrite of pfsync(4), the state table synchronization tool for redundant pf(4) setups is in the works.

In a recent message to tech@, David Gwynne (dlg@) describes the multi-year process behind the diff contained in the message,

moving pf forward has been a real struggle, and pfsync has been a
constant source of pain. we have been papering over the problems
for a while now, but it reached the point that it needed a fundamental
restructure, which is what this diff is.

i started rewriting pfsync (again) during h2k22 last year, and it's
only been in the last couple of months that i got all the existing
functionality working again, and it's only been the last three weeks in
particular that it's been solid. this is the first time since about
openbsd 6.9 that i've been able to upgrade my production firewalls
without them falling over.

which means there may still be rough edges, but testing by brave souls is encouraged. There are huge potential performance gains to be found if this works out right.

You can read the entire message (with the diff) here, or just take in the rest of the text after the fold.

unfortunately the diff is very big. we have tried to evolve and slowing
improve the locking in pfsync and pf, but because of how twisted
together they are we have often caused more problems then we've solved.

the big headliner changes in this diff are:

- pfsync specific locks

this is the whole reason for this diff.

rather than rely on NET_LOCK or KERNEL_LOCK or whatever, pfsync now has
it's own locks to protect it's internal data structures. this is
important because pfsync runs a bunch of timeouts and tasks to push
pfsync packets out on the wire, or when it's handling requests
generated by incoming pfsync packets, both of which happen outside
pf itself running.  having pfsync specific locks around pfsync data
structures makes the mutations of these data structures a lot more
explicit and auditable.

- partitioning

to enable future parallelisation of the network stack, this rewrite
includes support for pfsync to partition states into different "slices".
these slices run independently, ie, the states collected by one slice
are serialised into a separate packet to the states collected and
serialised by another slice.

states are mapped to pfsync slices based on the pf state hash, which
is the same hash that the rest of the network stack and multiq
hardware uses.

- no more pfsync called from netisr

pfsync used to be called from netisr to try and bundle packets, but now
that there's multiple pfsync slices this doesnt make sense. instead it
uses tasks in softnet tqs.

- improved bulk transfer handling

there's shiny new state machines around both the bulk transmit and
receive handling. pfsync used to do horrible things to carp demotion
counters, but now it is very predictable and returns the counters back
where they started.

- better tdb handling

the tdb handling was pretty hairy, but hrvoje has kicked this around
a lot with ipsec and sasyncd and we've found and fixed a bunch of
issues as a result of that testing.

- mpsafe pf state purges

this was committed previously, but because the locks pfsync relied on
were'nt clear this just caused a ton of bugs. as part of this diff it's
now reliable, and moves a big chunk of work out from under KERNEL_LOCK,
which in turn improves the responsiveness and throughput of a firewall.

there's a bunch of other little changes along the way, but the above are
the big ones.

hrvoje has done performance testing with this diff and notes a big
improvement when pfsync is not in use. performance when pfsync is
enabled is about the same, but im hoping the slices means we can scale
along with pf as it improves.

if you want pf and network stack performance to improve, can you please
test this diff and let me know how it goes?

The rest of the message contains the diff (which requires a recent -current) for testing.

We expect this to have interesting followups, and we at Undeadly very much look forward to test results of all kinds, which may or may not come from you, dear reader.

(Comments are closed)

Latest Articles

Sat, Jun 28
- 05:57 Game of Trees 0.115 released (0)
Tue, Jun 24
- 07:48 Game of Trees 0.114 released (0)
- 07:23 Call for testing: bge/bnx/iavf/igc/ix/ixl/ngbe/pcn: ifq_restart() fix (0)
Mon, Jun 16
- 08:22 j2k25 hackathon report from kn@: installer, low battery, and more (0)
Fri, Jun 13
- 11:18 dhcpd(8): use UDP sockets instead of BPF (1)
Thu, Jun 12
- 12:32 clang(1)/llvm/lld(1) updated to version 19 (0)
Wed, Jun 11
- 12:22 Source code sandboxing (0)
Tue, Jun 10
- 06:50 TearFree option backported to modesetting(4) driver (0)
Mon, Jun 09
- 07:32 FFS optimizations with dirhash, as blogged by rsadowski@ (1)

Credits

Copyright © 2004-2008 Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to April 2nd 2004 as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]