Contributed by dhartmei on Wed Sep 27 08:05:17 2006 (GMT)
from the eleven-releases-with-pf dept.
This is the first installment in a series of three articles about PF.
I originally wrote them as chapters for a book, but then publication
was cancelled.
Luckily, the rights could be salvaged, and now you get to enjoy
them as undeadly.org exclusives.
In celebration of the upcoming
OpenBSD 4.0 release. ;)
- Firewall Ruleset Optimization
- Goals
- The significance of packet rate
- When pf is the bottleneck
- Filter statefully
- The downside of stateful filtering
- Ruleset evaluation
- Ordering rulesets to maximize skip steps
- Use tables for address lists
- Use quick to abort ruleset evaluation when rules match
- Anchors with conditional evaluation
- Let pfctl do the work for you
- Testing Your Firewall (read)
- Firewall Management (read)
Firewall Ruleset Optimization
Goals
Ideally, the operation of a packet filter should not affect legitimate
network traffic. Packets violating the filtering policy should
be blocked, and compliant packets should pass the device as if the
device wasn't there at all.
In reality, several factors limit how well a
packet filter can achieve that goal. Packets have to pass through the
device, adding some amount of latency between the time a packet is
received and the time it is forwarded. Any device can only process a
finite amount of packets per second. When packets arrive at a higher
rate than the device can forward them, packets are lost.
Most protocols, like TCP, deal well with added latency. You can achieve
high TCP transfer rates even over links that have several hundred
milliseconds of latency. On the other hand, in interactive network
gaming even a few tens of milliseconds are usually perceived as too
much. Packet loss is generally a worse problem, TCP performance will
seriously deteriorate when a significant number of packets are lost.
This article explains how to identify when pf is becoming the limiting
factor in network throughput and what can be done to improve performance
in this case.
The significance of packet rate
One commonly used unit to compare network performance is throughput in
bytes per second. But this unit is completely inadequate to measure pf
performance. The real limiting factor isn't throughput but packet rate,
that is the number of packets per second the host can process. The same
host that handles 100Mbps of 1500 byte packets without breaking a sweat
can be brought to its knees by a mere 10Mbps of 40 byte packets.
The former amounts to only 8,000 packets/second, but the latter
traffic stream amounts to
32,000 packets/second, which causes roughly four times the amount of
work for the host.
To understand this, let's look at how packets actually pass through the
host. Packets are received from the wire by the network interface card
(NIC) and read into a small memory buffer on the NIC. When that buffer
is full, the NIC triggers a hardware interrupt, causing the NIC driver
to copy the packets into network memory buffers (mbufs) in kernel
memory. The packets are then passed through the TCP/IP stack in form of
these mbufs. Once a packet is transferred into an mbuf, most operations
the TCP/IP stack performs on the packet are not dependant on the packet size,
as these operations only inspect the packet headers and not the payload.
This is also true for pf, which gets passed one packet at a time and
makes the decision of whether to block it or pass it on.
If the packet needs forwarding, the TCP/IP stack will pass it to a
NIC driver, which will extract the packet from the mbuf and put it back
onto the wire.
Most of these operations have a comparatively high cost per packet, but a very
low cost per size of the packet. Hence, processing a large packet is
only slightly more expensive than processing a small packet.
Some limits are based on hardware and software outside of pf. For instance,
i386-class machines are not able to handle much more than 10,000 interrupts
per second, no matter how fast the CPU is, due to architectural constraints.
Some network interface cards will generate one interrupt for each packet
received. Hence, the host will start to lose packets when the packet
rate exceeds around 10,000 packets per second. Other NICs, like more
expensive gigabit cards, have larger built-in memory buffers that allow
them to bundle several packets into one interrupt. Hence, the choice of
hardware can impose some limits that no optimization of pf can surpass.
When pf is the bottleneck
The kernel passes packets to pf sequentially, one after the other.
While pf is being called to decide the fate of one packet, the flow of
packets through the kernel is briefly suspended. During that short period
of time, further packets read off the wire by NICs have to fit into
memory buffers. If pf evaluations take too long, packets will quickly
fill up the buffers, and further packets will be lost. The goal of
optimizing pf rulesets is to reduce the amount of time pf spends for
each packet.
An interesting exercise is to intentionally push the host into this
overloaded state by loading a very large ruleset like this:
$ i=0; while [ $i -lt 100 ]; do \
printf "block from any to %d.%d.%d.%d\n" \
`jot -r -s " " 4 1 255`; \
let i=i+1; \
done | pfctl -vf -
block drop inet from any to 151.153.227.25
block drop inet from any to 54.186.19.95
block drop inet from any to 165.143.57.178
...
This represents a worst-case ruleset that defies all automatic
optimizations. Because each rule contains a different random
non-matching address, pf is forced to traverse the entire ruleset
and evaluate each rule for every packet. Loading a ruleset that
solely consists of thousands of such rules, and then generating
a steady flow of packets that must be filtered, inflicts
noticeable load on even the fastest machine. While the host
is under load, check the interrupt rate with:
$ vmstat -i
And watch CPU states with:
$ top
This will give you an idea of how the host reacts to overloading,
and will help you spot similar symptoms when using your own ruleset.
You can use the same tools to verify effects of optimizations later on.
Then try the other extreme. Completely disable pf like:
$ pfctl -d
Then compare the vmstat and top values.
This is a simple way to get a rough estimate and upper limit on what
to realistically expect from optimization. If your host
handles your traffic with pf disabled, you can aim to achieve similar
performance with pf enabled. However, if the host already shows problems
handling the traffic with pf disabled, optimizing pf rulesets is
probably pointless, and other components should be changed first.
If you already have a working ruleset and are wondering whether you
should spend time on optimizing it for speed, repeat this test
with your ruleset and compare the results with both extreme cases.
If running your ruleset shows effects of overloading, you can use the
guidelines below to reduce those effects.
In some cases, the ruleset shows no significant amount of load on the
host, yet connections through the host show unexpected problems, like
delays during connection establishment, stalling connections or
disappointingly low throughput. In most of these cases, the problem is
not filtering performance at all, but a misconfiguration of the ruleset
which causes packets to get dropped. See
Testing Your Firewall
about how to identify and deal with such problems.
And finally, if your ruleset is evaluated without causing significant
load and everything works as expected, the most reasonable conclusion is
to leave the ruleset as is is. Often, rulesets written in a straight-forward
approach without respect for performance are evaluated efficiently
enough to cause no packet loss. Manual optimizations will only make
the ruleset harder to read for the human maintainer, while having only
insignificant effect on performance.
Filter statefully
The amount of work done by pf mainly consists of two kinds of
operations: ruleset evaluations and state table lookups.
For every
packet, pf first does a state table lookup. If a matching state entry is
found in the state table, the packet is immediately passed. Otherwise pf
evaluates the filter ruleset to find the last matching rule for the
packet which decides whether to block or pass it. If the rule passes the
packet, it can optionally create a state entry using the 'keep state'
option.
When filtering statelessly, without using 'keep state' to create state
entries for connections, every packet causes an evaluation of the
ruleset, and ruleset evaluation is the single most costly operation pf
performs in this scenario. Each packet still causes a state table lookup,
but since the table is empty, the cost of the lookup is basically zero.
Filtering statefully means using 'keep state' in filter rules, so
packets matching those rules will create a state table entry. Further
packets related to the same connections will match the state table
entries and get passed automatically, without evaluations of the ruleset.
In this scenario, only the first packet of each connection causes a
ruleset evaluation, and subsequent packets only cause a state lookup.
Now, a state lookup is much cheaper than a ruleset evaluation. A ruleset
is basically a list of rules which must be evaluated from top to bottom.
The cost increases with every rule in the list, twice as many rules mean
twice the amount of work. And evaluating a single rule can cause
comparison of numerous values in the packet. The state table, on the
other hand, is a tree. The cost of lookup increases only logarithmically with
the number of entries, twice as many states mean only one additional
comparison, a fraction of additional work. And comparison is needed only
for a limited number of values in the packet.
There is some cost to creating and removing
state entries. But assuming the state will match several subsequent
packets and saves ruleset evaluation for them, the sum is much cheaper.
For specific connections like DNS lookups, where each connection only
consists of two packets (one request and one reply), the overhead of
state creation might be worse than two ruleset evaluations. Connections
that consist of more than a handful of packets, like most TCP
connections, will benefit from the created state entry.
In short, you can make ruleset evaluation a per-connection cost instead
of a per-packet cost. This can easily make a factor of 100 or more. For
example, I see the following counters when I run:
$ pfctl -si
State Table Total Rate
searches 172507978 887.4/s
inserts 1099936 5.7/s
removals 1099897 5.7/s
Counters
match 6786911 34.9/s
This means pf gets called about 900 times per second. I'm filtering on
multiple interfaces, so that would mean I'm forwarding about 450 packets
per second, each of which gets filtered twice, once on each interface it
passes through. But ruleset evaluation occurs only about 35 times per
second, and state insertions and deletions only 6 times per second. With
anything but a tiny ruleset, this is very well worth it.
To make sure that you're really creating state for each connection,
search for 'pass' rules which don't use 'keep state', like in:
$ pfctl -sr | grep pass | grep -v 'keep state'
Make sure you have a tight 'block by default' policy, as otherwise
packets might pass not only due to explicit 'pass' rules, but mismatch
all rules and pass by default.
The downside of stateful filtering
The only downside to stateful filtering is that state table entries need
memory, around 256 bytes for each entry. When pf fails to allocate
memory for a new state entry, it blocks the packet that should have
created the state entry instead, and increases an out-of-memory counter
shown by:
$ pfctl -si
Counters
memory 0 0.0/s
Memory for state entries is allocated from the kernel memory pool called
'pfstatepl'. You can use vmstat(8) to show various aspects of pool
memory usage:
$ vmstat -m
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
pfstatepl 256 1105099 0 1105062 183 114 69 127 0 625 62
The difference between 'Requests' and 'Releases' equals the number of
currently allocated state table entries, which should match the counter
shown by:
$ pfctl -si
State Table Total Rate
current entries 36
Other counters shown by pfctl can get reset by pfctl -Fi.
Not all memory of the host is available to the kernel, and the way the
amount of physical RAM affects the amount available to the kernel
depends on architecture and kernel options and version. As of OpenBSD
3.6, an i386 kernel can use up to 256MB of memory. Prior to 3.6, that
limit was much lower for i386. You could have 8GB of RAM in your host,
and still pf would fail to allocate memory beyond a small fraction of
that amount.
To make matters worse, when pf really hits the limit where pool_get(9)
fails, the failure is not as graceful as one might wish. Instead, the
entire system becomes unstable after that point, and eventually crashes.
This really isn't pf's fault, but a general problem with kernel
pool memory management.
To address this, pf itself limits the number of state entries it will
allocate at the same time, using pool_sethardlimit(9), also shown by
vmstat -m output. The default for this limit is 10,000 entries, which is
safe for any host. The limit can be printed with:
$ pfctl -sm
states hard limit 10000
src-nodes hard limit 10000
frags hard limit 500
If you need more concurrent state entries, you can increase the limit in
pf.conf with:
set limit states 10000
The problem is determining a large value that is still safe enough
not to trigger a pool allocation failure. This is still a sore topic, as
there is no simple formula to calculate the value. Basically, you have
to increase the limit and verify the host remains stable after reaching
that limit, by artificially creating many entries.
On the bright side, if you have 512MB or more of RAM, you can now use
256MB for the kernel, which should be safe for at least 500,000 state
entries. And most people consider that a lot of concurrent connections.
Just imagine each of those connections generating just one packet every
ten seconds, and you end up with a packet rate of 50,000 packets/s.
More likely, you don't expect that many states at all. But whatever your
state limit is, there are cases where it will be reached, like during a
denial-of-service attack. Remember, pf will fail closed not open when
state creation fails. An attacker could create state entries until the
limit is reached, just for the purpose of denying service to
legitimate users.
There are several ways to deal with this problem.
You can limit the number of states created from specific rules, for
instance like:
pass in from any to $ext_if port www keep state (max 256)
This would limit the number of concurrent connections to the web server to
256, while other rules could still create state entries. Similarly, the
maximum number of connections per source address can be restricted with:
pass keep state (source-track rule, max-src-states 16)
Once a state entry is created, various timeouts define when it is
removed. For instance:
$ pfctl -st
tcp.opening 30s
The timeout for TCP states that are not yet fully established is
set to 30 seconds.
These timeouts can be lowered to remove state entries more
aggressively. Individual timeout values can be set globally
in pf.conf:
set timeout tcp.opening 20
They can also be set in individual rules, applying only to states
created by these rules:
pass keep state (tcp.opening 10)
There are several pre-defined sets of global timeouts which can
be selected in pf.conf:
set optimization aggressive
Also, there's adaptive timeouts, which means these timeouts are not
constants, but variables which adjust to the number of state entries
allocated. For instance:
set timeout { adaptive.start 6000, adaptive.end 12000 }
pf will use constant timeout values as long as there are less than 6,000
state entries. When there are between 6,000 and 12,000 entries, all
timeout values are linearly scaled from 100% at 6,000 to 0% at 12,000
entries, i.e. with 9,000 entries all timeout values are reduced to 50%.
In summary, you probably can specify a number of maximum states you
expect to support. Set this as the limit for pf. Expect the limit to get
reached during certain attacks, and define a timeout strategy for this
case. In the worst case, pf will drop packets when state insertion
fails, and the out-of-memory counter will increase.
Ruleset evaluation
A ruleset is a linear list of individual rules,
which are evaluated from top to bottom for a given packet. Each rule
either does or does not match the packet, depending on the criteria in
the rule and the corresponding values in the packet.
Therefore, to a first approximation, the cost of ruleset evaluation grows with
the number of rules in the ruleset. This is not
precisely true for reasons we'll get into soon, but the general concept
is correct. A ruleset with 10,000 rules will almost certainly cause much
more load on your host than one with just 100 rules. The most obvious
optimization is to reduce the number of rules.
Ordering rulesets to maximize skip steps
The first reason why ruleset evaluation can be cheaper than evaluating
each individual rule in the ruleset is called skip steps. This is a
transparent and automatic optimization done by pf when the ruleset is
loaded. It's best explained with an example. Imagine you have the
following simple ruleset:
1. block in all
2. pass in on fxp0 proto tcp from any to 10.1.2.3 port 22 keep state
3. pass in on fxp0 proto tcp from any to 10.1.2.3 port 25 keep state
4. pass in on fxp0 proto tcp from any to 10.1.2.3 port 80 keep state
5. pass in on fxp0 proto tcp from any to 10.2.3.4 port 80 keep state
A TCP packet arrives in on fxp0 to destination address 10.2.3.4 on
some port.
pf will start the ruleset evaluation for this packet with the first rule,
which fully matches. Evaluation continues with the second rule, which
matches the criteria 'in', 'on fxp0', 'proto tcp', 'from any' but
doesn't match 'to 10.1.2.3'. So the rule does not match, and evaluation
should continue with the third rule.
But pf is aware that the third and fourth rule also specify the same
criterion 'to 10.1.2.3' which caused the second rule to mismatch. Hence,
it is absolutely certain that the third and fourth rule cannot possibly
match this packet, either, and immediately jumps to the fifth rule,
saving several comparisons.
Imagine the packet under inspection was UDP instead of TCP. The first
rule would have matched, evaluation would have continued with the second
rule. There, the criterion 'proto tcp' would have made the rule mismatch
the packet. Since the subsequent rules also specify the same criterion
'proto tcp' which was found to mismatch the packet, all of them could be
safely skipped, without affecting the outcome of the evaluation.
Here's how pf analyzes your ruleset when you load it. Each rule can
contain a list of criteria like 'to 10.1.2.3', restricting the rule to
match packets with that destination address. For each criteria in each
rule, pf counts the number of rules immediately below that rule which
specify the exact same criterion. This can be zero, when the next rule
does not use the exact same criterion. Or it can be any number up to the
number of remaining rules, when they all specify it. The counted numbers
are stored in memory for later use. They're called skip steps because
they tell pf how many subsequent steps (rules) can be skipped when any
criteria in any rule is found to not match the packet being inspected.
Rule evaluation compares the criteria in the rule against the values in
the packet in a fixed order:
1. interface ('on fxp0')
2. direction ('in', 'out')
3. address family ('inet' or 'inet6')
4. protocol ('proto tcp')
5. source address ('from 10.1.2.3')
6. source port ('from port < 1024')
7. destination address ('to 10.2.3.4')
8. destination port ('to port 80')
If the rule completely matches, evaluation continues on the very next
rule. If the rule does not match, the first criterion from the list above
which mismatches decides which skip step is used. There might be more
than one criterion which mismatches, but only the first one, in the
order of the list above, matters.
Obviously, the order of rules in your ruleset affects the skip step
values calculated for each rule. For instance:
1. pass on fxp0
2. pass on fxp1
3. pass on fxp0
4. pass on fxp1
This ruleset will produce skip steps with value zero for the interface
criterion in each rule, because no adjacent rules contain the same interface
criterion.
Those rules could instead be ordered like:
1. pass on fxp0
2. pass on fxp0
3. pass on fxp1
4. pass on fxp1
The skip step value for the interface criterion would then equal one in
the first and third rule.
This makes a small difference when the ruleset is evaluated for a packet
on fxp2. Before the reordering, all four rules are evaluated because
none of them can be skipped. After the reordering, only rules one and
three need to be evaluated, and rules two and four can be skipped.
The difference may be insignificant in this little example, but imagine
a ruleset containing 1,000 rules which all apply only to two different
interfaces. If you order these rules so all rules applying to one
interface are adjacent, followed by the rules applying to the other
interface, pf can reliably skip 500 rules in each and every evaluation
of the ruleset, reducing the cost of ruleset evaluation to 50%, no
matter what packets your traffic consists of.
Hence, you can help pf maximize its skip steps by ordering your rules
by the criteria in the order they are listed above, i.e. order your
rules by interface first. Within the block of rules for the same
interface, order rules by direction. Within the block for the same
interface and direction, order by address family, etc.
To verify the effects, run
$ pfctl -gsr
pfctl prints the calculated skip step values for each criterion in each
rule, for instance
@18 block return-rst in quick on kue0 proto tcp from any to any port = 1433
[ Skip steps: i=38 d=38 f=41 p=27 sa=48 sp=end da=43 ]
In this output, 'i' stands for interface, 'd' for direction, 'f' for address
family, etc. The 'i=38' part means that packets which don't match 'on
kue0' will skip to rule number 38.
This also affects the number of evaluations counted for each rule, try:
$ pfctl -vsr
pfctl counts how many times each rule has been evaluated, how many
packets and bytes it matched and how many states it created. When a rule
is skipped by skip steps during evaluation, its evaluation counter is
not increased.
Use tables for address lists
The use of lists in curly braces allows to write very compact rules in
pf.conf, like:
pass proto tcp to { 10.1.2.3, 10.2.3.4 } port { ssh, www }
But these lists are not actually loaded into a single rule in the
kernel. Instead, pfctl expands the single input rule to multiple rules
for the kernel, in this case
$ echo "pass proto tcp to { 10.1.2.3, 10.2.3.4 } port { ssh, www }" |
pfctl -nvf -
pass inet proto tcp from any to 10.1.2.3 port = ssh keep state
pass inet proto tcp from any to 10.1.2.3 port = www keep state
pass inet proto tcp from any to 10.2.3.4 port = ssh keep state
pass inet proto tcp from any to 10.2.3.4 port = www keep state
The short syntax in pf.conf betrays the real cost of evaluating it.
Your pf.conf might be only a dozen rules long, but if those expand to
hundreds of rules in kernel, evaluation cost is the same as if you
put those hundreds of rules in pf.conf in the first place. To see what
rules are really being evaluated, check:
$ pfctl -sr
For one specific type of list, addresses, there is a container in
kernel, called 'table'. For example:
pass in from { 10.1.2.3, 10.2.3.4, 10.3.4.5 }
The list of addresses can be expressed as a table:
table <clients> const { 10.1.2.3, 10.2.3.4, 10.3.4.5 }
pass in from <clients>
This construct can be loaded as a single rule (and a table) into the
kernel, whereas the non-table version would expand to three rules.
During evaluation of the rule referencing the table, pf will do a lookup
of the packet's source address in the table to determine whether the
rule matches the packet. This lookup is very cheap, and the cost does
not increase with the number of entries in the table.
If the list of addresses is large, the performance gain of one rule
evaluation with one table lookup vs. one rule evaluation for each address
is significant. As a rule of thumb, tables are cheaper when the list
contains six or more addresses. For a list of 1,000 addresses, the
difference will be factor of 1,000.
Use quick to abort ruleset evaluation when rules match
When a rule does match, pf (unlike other packet filtering products) does
not by default abort ruleset evaluation, but continues until all rules
have been evaluated. When the end is reached, the last rule that matched
(the last-matching rule) makes the decision.
The option 'quick' can be used in rules to make them abort ruleset
evaluation when they match. When 'quick' is used on every single rule,
pf's behaviour effectively becomes first-matching, but that's not the
default.
For instance, pf filters packets passing through any interface,
including virtual interfaces such as loopback. If, like most people, you
don't intend to filter loopback traffic, a rule like the following at the
top can save a lot of rule evaluations:
set skip on { lo0 }
The ruleset might
contain hundreds of rules all mismatching the loopback interface, and
loopback traffic might just pass by the implicit default pass. The
difference is between evaluating these hundreds of rules for every
loopback packet.
Usually, you'd place a rule with 'quick' at the top of the ruleset,
reasoning that it has the potential of matching and saving the evaluation of
the rules further down. But in those cases where the rule does not match
a packet, placement of the rule at the top has caused one more
evaluation. In short, the frequency with which a rule is expected to
match on average is also relevant when deciding placement within the
ruleset for performance reasons. And the frequency with which it does
match depends on your actual traffic.
Instead of guessing how likely a rule should match on average, you can
use the rule evaluation and matching counters that are printed by:
$ pfctl -vsr
When you see a rule near the top that is evaluated a lot but
rarely matches, you can move it further down in the ruleset.
Anchors with conditional evaluation
An anchor is basically a ruleset separate from the main ruleset, or a
sub-ruleset. You can load entire rulesets into anchors, and cause them
to get evaluated from the main ruleset.
Another way to look at them is to compare filtering rules with a
programming language. Without anchors, all your code is in a single main
function, the main ruleset. Anchors, then, are just subroutines, code in
separate functions that you can call from the main function.
As of OpenBSD 3.6, you can also nest anchors within anchors, building a
hierarchy of subroutines, and call one subroutine from another. In
OpenBSD 3.5 and before, the hierarchy could only be one level deep, that
is, you could have multiple subroutines, but could call subroutines only
from the main ruleset.
For instance:
pass in proto tcp from 10.1.2.3 to 10.2.3.4 port www
pass in proto udp from 10.1.2.3 to 10.2.3.4
pass in proto tcp from 10.1.2.4 to 10.2.3.5 port www
pass in proto tcp from 10.1.2.4 to 10.2.3.5 port ssh
pass in proto udp from 10.1.2.4 to 10.2.3.5
pass in proto tcp from 10.1.2.5 to 10.2.3.6 port www
pass in proto udp from 10.1.2.5 to 10.2.3.6
pass in proto tcp from 10.1.2.6 to 10.2.3.7 port www
You could split the ruleset into two sub-rulesets, one for UDP called
"udp-only":
pass in proto udp from 10.1.2.3 to 10.2.3.4
pass in proto udp from 10.1.2.4 to 10.2.3.5
pass in proto udp from 10.1.2.5 to 10.2.3.6
And a second one for TCP called "tcp-only":
pass in proto tcp from 10.1.2.3 to 10.2.3.4 port www
pass in proto tcp from 10.1.2.4 to 10.2.3.5 port www
pass in proto tcp from 10.1.2.4 to 10.2.3.5 port ssh
pass in proto tcp from 10.1.2.5 to 10.2.3.6 port www
pass in proto tcp from 10.1.2.6 to 10.2.3.7 port www
Both of them can be called from the main ruleset with:
anchor udp-only
anchor tcp-only
That would not improve performance much, though. Actually, there is some
overhead involved when the kernel has to step into and out of these
sub-rulesets.
But anchor calls can also contain filter criteria, much like pass/block
rules:
anchor udp-only in on fxp0 inet proto udp
anchor tcp-only in on fxp0 inet proto tcp
The sub-ruleset is only evaluated for packets that match
the criteria. In other words, the subroutine is only conditionally
evaluated. When the criteria do not match, the call is skipped, and the
evaluation cost is limited to the comparison of the criteria in the
call.
For performance, this is mainly relevant when the sub-ruleset contains
many rules, and the call criteria are not those primarly optimized by
skip steps.
Let pfctl do the work for you
As of OpenBSD 3.6, several of the optimizations discussed can be
automated by pfctl -o. The optimizer analyzes a ruleset and makes
modifications that do not change the effect of the ruleset.
First, pfctl splits the ruleset into blocks of adjacent rules in such a
way that reordering rules within one block cannot possibly affect the
outcome of evaluation for any packet.
For example, the rules in the following block can be arbitrarily
reordered:
pass proto tcp to 10.1.2.3 port www keep state
pass proto udp to 10.1.2.3 port domain keep state
pass proto tcp to 10.1.0.0/16 keep state
But in most cases rule order is relevant. For instance:
block log all
block from 10.1.2.3
pass from any to 10.2.3.4
Changing the position of either of those rules produces completely
different effects. After swapping the first two rules, packets from
10.1.2.3 still get blocked, but they're now also logged. Exchange the
last two rules, and packets from 10.1.2.3 to 10.2.3.4 are suddenly
blocked. And switching the first and last rule blocks every packet.
In every case of possible dependancy, pfctl splits the rules into
separate blocks. In the worst case, when no two adjacent rules can
be freely reordered, each rule becomes a separate block containing
only that rule, and pfctl can't make any modifications.
Otherwise, pfctl sorts the rules in each block so that skip step
values are maximized:
$ cat example
pass proto tcp from 10.0.0.3 to 10.0.0.8
pass proto udp from 10.0.0.1
pass proto tcp from 10.0.0.2
pass proto tcp from 10.0.0.4
pass proto udp from 10.0.0.6
pass proto tcp from 10.0.0.3 to 10.0.0.7
$ pfctl -onvf example
pass inet proto tcp from 10.0.0.3 to 10.0.0.8
pass inet proto tcp from 10.0.0.3 to 10.0.0.7
pass inet proto tcp from 10.0.0.2 to any
pass inet proto tcp from 10.0.0.4 to any
pass inet proto udp from 10.0.0.1 to any
pass inet proto udp from 10.0.0.6 to any
When duplicate rules are found, they are removed:
$ cat example
pass proto tcp from 10.0.0.1
pass proto udp from 10.0.0.2
pass proto tcp from 10.0.0.1
$ pfctl -onvf example
pass inet proto tcp from 10.0.0.1 to any
pass inet proto udp from 10.0.0.2 to any
Redundant rules are removed as well:
$ cat example
pass proto tcp from 10.1/16
pass proto tcp from 10.1.2.3
pass proto tcp from 10/8
$ pfctl -onvf example
pass inet proto tcp from 10.0.0.0/8 to any
Multiple rules are combined into a single rule using a table where
possible and advantageous:
$ cat example
pass from 10.1.2.3
pass from 10.2.3.4
pass from 10.3.4.5
pass from 10.4.5.6
pass from 10.5.6.7
pass from 10.8.9.1
$ pfctl -onvf example
table <__automatic_0> const { 10.1.2.3 10.2.3.4 10.3.4.5 10.4.5.6
10.5.6.7 10.8.9.1 }
pass inet from <__automatic_0> to any
When called with -oo, pfctl also consults the evaluation counters shown
by pfctl -vsr to reorder 'quick' rules according to matching frequency.
It's very conservative in doing any changes, only performing changes
that are certain to not affect the outcome of the ruleset evaluation
under any circumstances for any packet. This has the advantage that the
optimizer can be used safely with any ruleset. The drawback is that
pfctl might not dare change something which you could, if you thought
about the effect of the change. Like the skip step optimization, the
performance improvement depends on how large the blocks of reorderable
rules are. By manually reordering rules first, you can potentially
improve the gain the optimizer can produce.
The easiest way to see what -o or -oo does with your ruleset is to
compare its suggestion with the original ruleset, like this:
$ pfctl -nvf /etc/pf.conf >before
$ pfctl -oonvf /etc/pf.conf >after
$ diff -u before after
When run against a manually optimized ruleset, the differences are
usually unspectacular. Significant improvements can be expected for
rulesets that were automatically generated by rule editing frontends.
Copyright (c) 2004-2006 Daniel Hartmeier <daniel@benzedrine.cx>.
Permission to use, copy, modify, and distribute this documentation for
any purpose with or without fee is hereby granted, provided that the
above copyright notice and this permission notice appear in all copies.
|
|