OpenBSD Journal

PF: Testing Your Firewall

Contributed by dhartmei on from the ozone-friendly-can-of-shoo-bug dept.

The second chapter, in a series of three, is about troubleshooting PF. Instead of just providing the common symptoms-to-solutions table, it tries to present a systematic procedure for problem analysis. The way is the goal. ;)
  • Firewall Ruleset Optimization (read)
  • Testing Your Firewall
    • Introduction
    • A well-defined filtering policy
    • A ruleset implementing the policy
    • Parser errors
    • Testing
    • Debugging
    • Debugging protocols
    • Debugging rulesets
    • Following connections through the firewall
    • Debugging states
    • Create TCP states on the initial SYN packet
  • Firewall Management (read)

Testing Your Firewall

Introduction

A packet filter enforces a filtering policy by evaluating the rules of a ruleset and passing or blocking packets accordingly. This chapter explains how to test whether a pre-defined policy is being enforced correctly, and how to find and correct mistakes when it isn't.

During the course of this chapter, we'll be comparing the task of writing a firewall ruleset to computer programming in general. If you don't have any experience with computer programming, this approach might sound complicated and unsettling. The configuration of a firewall shouldn't require a degree in computing sciene or experience in programming, right?

The answer is no, it shouldn't and mostly doesn't. The language used in rulesets to configure pf is made to ressemble the human language. For instance:

  block all
  pass out all keep state
  pass in proto tcp to any port www keep state
Indeed, it doesn't take a computer programmer to understand what this ruleset does or to intuitively write a ruleset to implement a similarly simple policy. Chances are good that a ruleset created like this will do precisely what the author wanted.

Unfortunately, computers do what you tell them to do instead of what you want them to do. Worse, they can't tell the difference between the two, if there is any. If the computer doesn't do precisely what you want, even though you assumed you made your instructions clear, it's up to you to identify the difference and reformulate the instructions. Since this is a common problem in programming, we can look at how programmers deal with it. It turns out that the skills and methods used to test and debug programs and rulesets are very similar. You won't need to know any programming languages to understand the implications for firewall testing and debugging.

A well-defined filtering policy

The filtering policy is an informal specification of what the firewall is supposed to do. A ruleset, like a program, is the implementation of a specification, a set of formal instructions executed by a machine. In order to write a program, you need to define what it should do.

Hence, the first step in configuring a firewall is specifying informally what it is supposed to do. What connections should it block or pass? An example would be:

  • There are three distinct sections of the network that are isolated from each other by the firewall. Any connection that crosses the border of one section must pass through the firewall. The firewall has three interfaces, each connected to one of the sections:
    • $ext_if to the external internet
    • $dmz_if to a DMZ with servers
    • $lan_if to a LAN with workstations
  • Hosts in the LAN may freely connect to any hosts in the DMZ or the external internet.
  • Servers in the DMZ may freely connect to hosts on the external internet. Hosts on the external internet may connect only to the following servers in the DMZ:
    • web server 10.1.1.1 port 80
    • mail server 10.2.2.2 port 25
  • Anything else is prohibited (for instance, external hosts may not connect to hosts on the LAN)
The policy is expressed informally, in any way a human reader can understand it. It should be specific enough so that the human reader can clearly deduce decisions of the form 'should a connection from host X to host Y coming in (or going out) on interface Z' be blocked or passed? If you can think of cases where the policy doesn't give a clear answer to any such question, the policy is not specific enough.

Vague policies like "allow only what is strictly necessary" or "block attacks" need to be made more precise, or you won't be able to implement them or test them. Like in software development, lacking specifications rarely lead to meaningful and correct implementations ("why don't you start writing code, while I go find out what the customer wants").

You might receive a complete policy and your task is to implement it, or defining the policy might be part of your task. In either case, you'll need to have the policy in hand before you can complete implementing and testing it.

A ruleset implementing the policy

The ruleset is written as a text file containing statements in a formal language. Like the source code of a programming language is parsed and translated into machine-level instructions by a compiler, the ruleset source text is parsed and translated by pfctl and the result is then interpreted by pf in the kernel.

When the source code violates the formal language, the parser reports a syntax error and refuses to translate the text file. This is called a compile-time error and such errors are reliably detected and usually resolved quickly. When pfctl can't parse your ruleset file, it reports the number of the line in the file where the error occured together with a more or less accurate description of what it didn't understand. Unless the entire ruleset file could be parsed without any syntax errors, pfctl does not change the previous ruleset in the kernel. As long as the ruleset file contains one or more syntactic errors, there is no program pf can execute.

The second type of errors is called run-time error, as it occurs only when a syntactically correct program that has been successfully translated is running. With a generic programming language, this might occur when the program is dividing a number by zero, tries to access an invalid memory location or runs out of memory. Since pf rulesets are very limited compared to the functionality of a generic programming language, most of these errors cannot occur when a ruleset is executed, i.e. a ruleset can't 'crash' like a generic program. But, of course, a ruleset can produce the wrong output at run-time, by blocking or passing packets that it shouldn't according to the policy. This is sometimes called a 'logic error', an error that doesn't cause the execution of a program to get aborted, but 'merely' produces incorrect output.

So, before we can start to test whether the firewall correctly implements our policy, we first need to have a ruleset loaded successfully.

Parser errors

Parser errors are reported when you try to load a ruleset file with pfctl, for instance:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: syntax error
The message means that there is a syntactic error on line 3 of the file /etc/pf.conf and pfctl couldn't load the ruleset. The in-kernel ruleset has not been changed, it remains the same as it was before the failed attempt to load the new ruleset.

There are many different errors that pfctl can produce. The first step is to take a close look at the error message and read it carefully. Not all parts might make sense immediately, but the best chance to understanding what is going wrong is to read all parts. If the message has the format "filename:number: text", it refers to the line with that number inside the file with that name.

The next step is to look at the specific line, either using a text editor (in vi, you can jump to line 3 by entering 3G in beep mode), or like this:

  # cat -n /etc/pf.conf
       1  int_if = "fxp 0"
       2  block all
       3  pass out on $int_if inet all kep state

  # head -n 3 /etc/pf.conf | tail -n 1
  pass out inet on $int_if all kep state
The problem might be a simple typo, like in this case ("kep" instead "keep"). After fixing that, we try to reload the file:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: syntax error
  # head -n 3 /etc/pf.conf | tail -n 1
  pass out inet on $int_if all keep state
Now the keywords are all valid, but on closer inspection, we notice that the placement of the "inet" keyword before "on $int_if" is invalid. It also illustrates that the same line can obviously contain more than a single mistake. pfctl only reports the first problem it finds, and then aborts. If, on retry, it reports the same line again, there might be more mistakes, or the first problem wasn't corrected properly.

Misplacement of keywords is another common mistake. It can be identified by comparing the rule with the BNF syntax at the bottom of the pf.conf(5) man page, which contains:

     pf-rule        = action [ ( "in" | "out" ) ]
                      [ "log" | "log-all" ] [ "quick" ]
                      [ "on" ifspec ] [ route ] [ af ] [ protospec ]
                      hosts [ filteropt-list ]
     ifspec         = ( [ "!" ] interface-name ) | "{" interface-list "}"
     af             = "inet" | "inet6"
This implies that "inet" should come after "on $int_if".

We correct that and retry again:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: syntax error
  # head -n 3 /etc/pf.conf | tail -n 1
  pass out on $int_if inet all keep state
There is nothing obviously wrong left now. But we're not seeing all the relevant details. The line depends on the definition of the macro $int_if. Could that be wrongly defined? Let's see:

  # pfctl -vf /etc/pf.conf
  int_if = "fxp 0"
  block drop all
  ...
  /etc/pf.conf:3: syntax error
After fixing the mistyped "fxp 0" into "fxp0", we retry again:

  # pfctl -f /etc/pf.conf
No output means the file was loaded successfully.

In some cases, pfctl can provide a more specific error message instead of the generic "syntax error", like:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:3: port only applies to tcp/udp
  /etc/pf.conf:3: skipping rule due to errors
  /etc/pf.conf:3: rule expands to no valid combination
  # head -n 3 /etc/pf.conf | tail -n 1
  pass out on $int_if to port ssh keep state
The error reported first is usually the most helpful one, and subsequent errors might be misleading. In this case, the problem is that the rule specifies a port criterion without specifying either proto udp or proto tcp.

In rare cases, pfctl is confused by unprintable characters or whitespace in the file, and the mistake is hard to spot without making those characters visible:

  # pfctl -f /etc/pf.conf
  /etc/pf.conf:2: whitespace after \
  /etc/pf.conf:2: syntax error
  # cat -ent /etc/pf.conf
       1  block all$
       2  pass out on gem0 from any to any \ $
       3  ^Ikeep state$
The problem is the blank after the backslash on the second line, before the end of the line, indicated by the dollar sign by cat -e.

Once the ruleset loads successfully, it can be a good idea to look at the result:

  $ cat /etc/pf.conf
  block all
  # pass from any to any \
  pass from 10.1.2.3 to any
  $ pfctl -f /etc/pf.conf
  $ pfctl -sr
  block drop all
The backslash at the end of the comment line actually continues the comment line.

Expansion of {} lists can have surprising results, which also shows up in the parsed ruleset:

  $ cat /etc/pf.conf
  pass from { !10.1.2.3, !10.2.3.4 } to any
  $ pfctl -nvf /etc/pf.conf
  pass inet from ! 10.1.2.3 to any
  pass inet from ! 10.2.3.4 to any
Here, the problem is that "{ !10.1.2.3, !10.2.3.4 }" doesn't mean "any address except 10.1.2.3 and 10.2.3.4", the expansion will literally match any possible address.

You should reload your ruleset after making permanent changes, to make sure pfctl can load it during the next reboot. On OpenBSD, the rc(8) startup script /etc/rc first loads a small default ruleset, which blocks everything by default except for traffic required during startup (like dhcp or ntp). When it subsequently fails to load the real ruleset /etc/pf.conf, due to syntax errors introduced before the reboot without testing, the small default ruleset will remain active. Luckily, the small default ruleset allows incoming ssh, so the problem can still be fixed remotely.

Testing

Once you have a well-defined policy and a ruleset implementing it, testing means verifying that the implementation matches the policy.

There are two ways for the ruleset to fail: it might block connections which should be passed or it might pass connections which should be blocked.

Testing generally refers to empirically trying various cases of connections in a systematic way. There's an almost infinite number of different connections that a firewall might be facing, and it would be unfeasible to try every combination of source and destination addresses and ports on all interfaces. Proving the correctness of a ruleset might be possible for very simple rulesets. In practice, the better approach is to create a list of test-cases based on the policy, such that every aspect of the policy is covered. For instance, for our example policy the list of cases to try would be

  • a connection from LAN to DMZ (should pass)
  • from LAN to external (should pass)
  • from DMZ to LAN (should block)
  • from DMZ to external (should pass)
  • from external to DMZ to 10.1.1.1 port 80 (should pass)
  • from external to DMZ to 10.1.1.1 port 25 (should block)
  • from external to DMZ to 10.2.2.2 port 80 (should block)
  • from external to DMZ to 10.2.2.2 port 25 (should pass)
  • from external to LAN (should block)
The expected result should be defined in this list before the actual testing starts. When, during testing, the observed result is different from the expectation, the test has succeeded in finding an error in the implementation.

This might sound odd, but the goal of each test should be to find an error in the firewall implementation, and not to NOT find an error. The overall goal of the process is to build a firewall ruleset without any errors, but if you assume that there are errors, you want to find them rather than miss them. When you assume the role of the tester, you have to adopt a destructive mindset and try to break the ruleset. Only then a failure to break it becomes a constructive argument that the ruleset is free of errors.

TCP and UDP connections can generally be tested with nc(1). nc can be used both as client and as server (using the -l option). For ICMP queries and replies, ping(8) is a simple testing client.

To test connections that should be blocked, you can use any kind of tool that attempts to communicate with the target.

Using tools from ports, like nmap(1), you can scan multiple ports, even across multiple target hosts. Make sure to read the man page when results look odd. For instance, a TCP port is reported as 'unfiltered' when nmap receives a RST from pf. Also, if the host that runs nmap is itself running pf, that might interfere with nmap's ability to do its job properly.

There are some online penetration testing services which allow you to scan yourself from the Internet. Some draw invalid conclusions depending on how you block ports with pf (block drop vs. return-rst or return-icmp), be sure you understand what they do and how they make conclusions before you get alarmed.

More advanced intrusion tools might include things like IP fragmentation or sending of invalid IP packets.

To test connections that should pass according to the policy, it's best to simply use the protocols using the common applications that legitimate users will be using. For instance, if you should pass HTTP connections to a web server, using different web browsers to fetch various content from differnt client hosts is a better test than just confirming that the TCP handshake to the server works with nc(1). Some errors are affected by factors like the hosts' operating systems, for instance you might see problems with TCP window scaling or TCP SACK only between hosts running specific operating systems.

When the expected result is 'pass', there are several ways in which an observed result can differ. The TCP handshake might fail because the peer (or the firewall) is returning an RST. The handshake might simply time out. The handshake might complete and the connection might work for a while but then stall or reset. The connection might work permanently, but throughput or latency might be different from expectations, either lower than expected or higher (in case you expect AltQ to rate-limit the connection).

Expected results can include other aspects than the block/pass decision, for instance whether packets are logged, how they are translated, how they are routed or whether they are increasing counters as expected. If you care about these aspects, they're worth including in the list of things to test together with their expected results.

Your policy might include requirements regarding performance, reaction to overloading or redundancy. These could require dedicated tests. If you set up fail-over using CARP, you probably want to test what happens in case of various kinds of failures.

When you do observe a result that differs from the expectation, systematically note what you did during the test, what result you expected, why you expected that result, what result you observed and how the observation differs from the expectation. Repeat the test to see if the observed result is consistantly reproducable or if results vary. Try varying but similar parameters (like different source or destination addresses or ports).

Once you have a reproducable problem, the next step is to debug it to understand why things don't work as expected and how to fix them. When, during the course of this, you modify the ruleset, you'll have to repeat the entire list of tests again, including the tests that didn't show a problem previously. The change you made might have inadvertedly broken something that worked before.

The same applies to other changes made to the ruleset. A formal procedure for testing can make the process less error-prone. You're probably not going to repeat the entire test procedure for every little change you add to the ruleset. Some changes are trivial and shouldn't be able to break anything. But sometimes they do, or the sum of several changes introduces an error. You can use a revision control system like cvs(1) on your ruleset file, which helps investigating past changes to the file when errors are discovered. If you know that the error was not present a week ago, but now is, looking at all changes made to the file over the last week can help spot the problem, or at least allows to revert the changes until there's time to investigate the problem.

Non-trivial rulesets are like programs, they are rarely perfect in their first version, and it takes a while until they can be trusted to be free of bugs. Unlike real programs, which are never considered completely bug-free by most programmers, rulesets are simple enough to usually mature to that point.

Debugging

Debugging refers to finding and removing programming mistakes in computer programs. In context of firewall rulesets, the term refers to the process of identifying why a ruleset's evaluation does not produce the expected result. The types of mistakes that can be made in rulesets are very limited compared to real computer programs, yet the methods used to find them are the same.

Before you start searching for the cause of a problem you should define what exactly is considered the problem. If you've spotted an error during testing yourself, that can be very simple. If another person is reporting the problem, it can be difficult to extract the essence from a vague problem report. The best starting point is a problem that you can reliably reproduce yourself.

Some network problems are not actually caused by pf. Before you focus on debugging your pf configuration, it is worth establishing that it is indeed pf responsible for the problem. This is simple to do and can save a lot of time searching in the wrong place. Just disable pf with pfctl -d and verify that the problem goes away. If it does, re-enable pf with pfctl -e and verify that the problem occurs again. This does not apply to certain kinds of problems, like when pf is not NAT'ing connections as desired, because when pf is disabled, obviously the problem can't go away. But when possible, try to first prove that pf must be responsible.

Similarly, if the problem is that pf is not doing something you expect it to do, the first step should be to ensure that pf is actually running and the intended ruleset is successfully loaded, with:

  # pfctl -si | grep Status
  Status: Enabled for 4 days 13:47:32           Debug: Urgent
  # pfctl -sr
  pass quick on lo0 all
  pass quick on enc0 all
  ...

Debugging protocols

The second prerequisite to debugging is expressing the problem in terms of specific network connections. For instance, if the report is 'instant messaging using application X is not working', you need to find out what kind of connections are involved. The conclusion might be, for instance, that 'host A cannot establish a TCP connection to host B on port C'. Sometimes, this represents the entire difficulty, and once you understand what connections are involved, you realize that the ruleset doesn't allow them yet, and a simple change to the ruleset resolves the issue.

There are several ways to find out what connections are used by an application or protocol. tcpdump(8) can show packets arriving at or leaving from interfaces, both real interfaces like network interface cards and virtual interfaces like pflog(4) and pfsync(4). You can supply an expression that filters what packets are being shown, thereby excluding existing noise on the network. Attempt to communicate using the application or protocol in question, and see what packets are being sent. For example:

  # tcpdump -nvvvpi fxp0 tcp and not port ssh and not port smtp
  23:55:59.072513 10.1.2.3.65123 > 10.2.3.4.6667: S
    4093655771:4093655771(0) win 5840 <mss 1380,sackOK,timestamp
    1039287798 0,nop,wscale 0> (DF)
This is a TCP SYN packet, the first packet part of the TCP handshake. The sender is 10.1.2.3 port 65123 (which looks like a random high port) and the receipient is 10.2.3.4 port 6667. A detailed description of the output format can be found in the tcpdump(8) man page. tcpdump is the most important tool used in debugging pf related problems, it's well worth getting familiar with.

Another approach is to use pf's log feature. Assuming you use the 'log' option in all 'block' rules, almost any packet blocked by pf will be logged. You can remove the 'log' option from rules that deal with known protocols, so only packets blocked on unknown ports are logged. Try to use the blocked application and check pflog, like:

  # ifconfig pflog0 up
  # tcpdump -nettti pflog0
  Nov 26 00:02:26.723219 rule 41/0(match): block in on kue0:
    195.234.187.87.34482 > 62.65.145.30.6667: S 3537828346:3537828346(0) win
    16384 <mss 1380,nop,nop,sackOK,[|tcp]> (DF)
If you're using pflogd(8), the daemon will constantly listen on pflog0 and store the log in /var/log/pflog, which you can view with:

  # tcpdump -netttr /var/log/pflog
When dumping pf logged packets, you can use extended filtering expressions to tcpdump, for instance it can show only logged packets that were blocked incoming on interface wi0 with:

  # tcpdump -netttr /var/log/pflog inbound and action block and on wi0
Some protocols, like FTP, are not that easy to match, because they don't use fixed port numbers or use multiple related connections. It might not be possible to pass them through the firewall without opening up a wide range of ports. For specific protocols there are solutions, like ftp-proxy(8).

Debugging rulesets

When your ruleset is blocking a certain protocol because you didn't allow a necessary port, the problem is more of a design flaw than a bug in the ruleset. But what if you see a connection blocked that you have an explicit pass rule for?

For example, your ruleset might contain the rule

  block in return-rst on $ext_if proto tcp from any to $ext_if port ssh
But when you try to connect to TCP port 22, the connection is accepted! It appears like the firewall is ignoring your rule. As puzzling as these cases may be when experienced the first couple of times, there's always a logical and often trivial explanation.

First, you should verify everything we just assumed so far. For instance, we assumed that pf is running and the ruleset contains the rule above. It might be unlikely that these assumptions are wrong, but they're quickly verified:

  # pfctl -si | grep Status
  Status: Enabled for 4 days 14:03:13           Debug: Urgent
  # pfctl -gsr | grep 'port = ssh'
  @14 block return-rst in on kue0 inet proto tcp from any to 62.65.145.30 port = ssh
Next, we assume that a TCP connection to port 22 is passing in on kue0. You might think that's obviously true, but it's worth verifying. Start tcpdump:

  # tcpdump -nvvvi kue0 tcp and port 22 and dst 62.65.145.30
Then repeat the SSH connection. You should see the packets of your connection in tcpdump's output. If you don't, that might be because the connection isn't actually passing through kue0, but through another interface, which would explain why the rule isn't matching. Or you might be connecting to a different address. In short, if you don't see the SSH packets arrive, pf won't see them either, and can't possibly block them using the rule in question.

But if you do see the packets with tcpdump, pf should see and filter them as well. The next assumption is that the block rule is not just present somewhere in the ruleset (which we verified already), but is the last matching rule for these connections. If it isn't the last matching rule, obviously it doesn't make the block decision.

How can the rule not be the last matching rule? Three reasons are possible:

  • a) The rule does not match because rule evaluation doesn't reach the rule. An earlier rule could also match and abort evaluation with the 'quick' option.
  • b) Rule evaluation reaches the rule, but the rule doesn't match the packet because some criteria in the rule mismatches.
  • c) Rule evaluation reaches the rule and the rule does match, but evaluation continues and a subsequent rule also matches.
To disprove these three cases, you can view the loaded ruleset, and mentally emulate a ruleset evaluation for a hypothetical TCP packet incoming on kue0 to port 22. Mark the block rule we're debugging. Start evaluation with the first rule. Does it match? If it does, mark the rule. Does it also have 'quick'? If so, abort evaluation. If not, continue with the next rule. Repeat until a rule matches and uses 'quick' or you reach the end of the ruleset. Which rule was the last matching one? If it isn't rule number 14, you have found the explanation for the problem.

Manually evaluating the ruleset like this can be tedious, even though it can be done pretty quickly and reliably with more experience. If the ruleset is large, you can temporarily reduce it. Save a copy of the real ruleset and remove all rules that you think can't affect this case. Load that ruleset and repeat the test. If the connection is now blocked, the conclusion is that one of the seemingly unrelated rules you removed is responsible for either a) or c). Re-add the rules one by one and repeat the test, until you reach the responsible rule. If the connection is still passed after removal of all unrelated rules, repeat the mental evaluation of the now reduced ruleset.

Another approach is to use pf's logging to identify the cases a) and c). Add 'log' to all 'pass quick' rules before rule 14. Add 'log' to all 'pass' rules after rule 14. Start tcpdump on pflog0 and establish a new SSH connection. You'll see what rule other than rule 14 is matching the packet last. If nothing is logged, the explanation must be b).

Following connections through the firewall

When a connection passes through the firewall, packets pass in on one interface and out on another. Replies pass in on the second interface and out of the first. Connections can therefore fail because pf is blocking packets in either of these four cases.

First you should find out which of the four cases is the problem. When you try to establish a new connection, you should see the TCP SYN on the first interface using tcpdump. You should see the same TCP SYN leaving out on the second interface. If you don't, the conclusion is that pf is blocking the packet in on the first interface or out on the second.

If the SYN is not blocked, you should see a SYN+ACK arrive in on the second interface and out on the first. If not, pf is blocking the SYN+ACK on either interface.

Add 'log' to the rules which should pass the SYN and SYN+ACK on both interfaces, as well as to all block rules. Repeat the connection attempt and check pflog. It should tell you precisely which case was blocked and by what rule.

Debugging states

The most common reason for pf to block a packet is because of an explicit block rule in the ruleset. The relevant last-matching block rule can be identified by adding the 'log' option to all potential rules and watching the pflog interface.

There are very few cases where pf silently drops packets not based on rules, where adding 'log' to all rules does not cause the dropped packets to get logged through pflog. The most common case is when a packet almost, but not entirely, matches a state entry.

Remember that for each packet seen, pf first does a state lookup. If a matching state entry is found, the packet is passed immediately, without evaluation of the ruleset.

A state entry contains information related to the state of one connection. Each state entry contains a unique key. This key consists of several values that are constant throughout the lifetime of a connection, these are:

  • address family (IPv4 or IPv6)
  • source address
  • destination address
  • protocol (like TCP or UDP)
  • source port
  • destination port
This key is shared among all packets related to the same connection, and packets related to different connections always have different keys.

When a new state entry is created by a 'keep state' rule, the entry is stored in the state tree using the state's key. An important limitation of the state tree is that all keys must be unique. That is, no two state entries can have the same key.

It might not be immediately obvious that the same two peers could not establish multiple concurrent connections involving the same addresses, protocol and ports, but this is actually a fundamental property of both TCP and UDP. In fact, the peers' TCP/IP stacks are only able to associate individual packets with their appropriate sockets by doing a similar lookup based on addresses and ports.

Even when a connection is closed, the same pair of addresses and ports cannot be reused immediately. The network might deliver a retransmitted packet of the old connection late, and if the receipient's TCP/IP stack would then falsely associate this packet with a new connection, this would disturb or even reset the new connection. For this reason, both peers are required to wait a specific period of time, called 2MSL for 'twice the maximum segment lifetime', before reusing an old pair of addresses and ports for a new connection.

You can observe this by manually establishing multiple connections between the same peer. For instance, you have a web server running on 10.1.1.1 port 80, and connect to it from client 10.2.2.2 using nc(8) twice, like this:

  $ nc -v 10.1.1.1 80 & nc -v 10.1.1.1 80
  Connection to 10.1.1.1 80 port [tcp/www] succeeded!
  Connection to 10.1.1.1 80 port [tcp/www] succeeded!
While the connections are still open, you can use netstat(8) on the client or server to list the connections:

  $ netstat -n | grep 10.1.1.1.80
  tcp        0      0  10.2.2.6.28054         10.1.1.1.80 ESTABLISHED
  tcp        0      0  10.2.2.6.43204         10.1.1.1.80 ESTABLISHED
As you can see, the client has chosen two different (random) source ports, so it doesn't violate the requirement of key uniqueness.

You can tell nc(8) to use a specific source port using -p, like:

  $ nc -v -p 31234 10.1.1.1 80 & nc -v -p 31234 10.1.1.1 80
  Connection to 10.1.1.1 80 port [tcp/www] succeeded!
  nc: bind failed: Address already in use
The TCP/IP stack of the client prevents the violation of the key uniqueness requirement. Some rare and faulty TCP/IP stacks do not respect this rule, and pf will block their connections when they violate the key uniqueness, as we'll see soon.

Let's get back to how pf does a state lookup when a packet is being filtered. The lookup consists of two steps. First, the state table is searched for a state entry with a key matching the protocol, addresses and port of the packet. This search accounts for packets flowing in either direction. For instance, assume the following packet has created a state entry:

  incoming TCP from 10.2.2.2:28054 to 10.1.1.1:80
A lookup for the following packets would find this state entry:

  incoming TCP from 10.2.2.2:28054 to 10.1.1.1:80
  outgoing TCP from 10.1.1.1:80 to 10.2.2.2:28054
The state includes information about the direction (incoming or outgoing) of the initial packet that created the state. For instance, the following packets would NOT match the state entry:

  outgoing TCP from 10.2.2.2:28054 to 10.1.1.1:80
  incoming TCP from 10.1.1.1:80 to 10.2.2.2:28054
The reason for this restriction is not obvious, but quite simple. Imagine you only have a single interface with address 10.1.1.1 where a web server is listening on port 80. When client 10.2.2.2 connects to you (using random source port 28054), the initial packet of the connection comes in on your interface and all your outgoing replies should be from 10.1.1.1:80 to 10.2.2.2:28054. You do not want to pass out packets from 10.2.2.2:28054 to 10.1.1.1:80, such packets would make no sense.

If you have a firewall with two interfaces and look at connections passing through the firewall, you'll see that every packet passing in on one interface passes out through the second. If you create state when the initial packet of the connection arrives in on the first interface, that state entry will not allow the same packet to pass out on the second interface, because the direction is wrong in the same way.

Instead, the packet is found to not match the state you already have, and the ruleset is evaluated. You'll have to explicitely allow the packet to pass out on the second interface with a rule. Usually, you'll want to use 'keep state' on that rule as well, so a second state entry is created that covers the entire connection on the second interface.

If you're wondering how it's possible to create a second state for the same connection when we've just explained how states must have unique keys, the explanation is that the state key also contains the direction of the connection, and the entire combination must be unique.

Now we can also explain the difference between floating and interface-bound states. By default, pf creates states that are not bound to any interface. That is, once you allow a connection in on one interface, packets related to the connection that match the state (including the direction restriction!) are passed on any interface. In simple setups with static routing this is only a theoretical issue. There is no reason why you should see packets of the same connection arrive in through several interfaces or why your replies should leave out through several interfaces. With dynamic routing, however, this can happen. You can choose to restrict states to specific interfaces. By using the global setting 'set state-policy if-bound' or the per-rule option 'keep state (if-bound)' you ensure that packets can match state only on the interface that created the state.

When virtual tunneling interfaces are involved, there are cases where the same connection passes through the firewall multiple times. For instance, the initial packet of a connection might first pass in through interface A, then pass in through interface B, then out through interface C and finally pass out through interface D. Usually the packet will be encapsulated on interfaces A and D and decapsulated on interfaces B and C, so pf sees packets of different protocols, and you can create four different states. Without encapsulation, when the packet is the same on all four interfaces, you may not be able to use some features like translations or sequence number modulation, because that would lead to state entries with conflicting keys. Unless you have a complex setup involving tunneling interfaces without encapsulation and see error messages like 'pf: src_tree insert failed', this should be of no concern to you.

Let's return to the state lookup done for each packet before ruleset evaluation. The search for a state entry with matching key will either find a single state entry or not find any state entry at all. If no state entry is found, the ruleset is evaluated.

When a state entry is found, a second step is performed for TCP packets before they are considered to be part of the known connection and passed without ruleset evaluation: sequence number checking.

There are many forms of TCP attacks, where an attacker is trying to manipulate a connection between two hosts. In most cases, the attacker is not located on the routing path between the hosts. That is, he can't listen in on the legitimate packets being sent between the hosts. He can, however, send packets to either host imitating packets of its peer, by spoofing (faking) his source address. The goal of the attacker might be to prevent establishment of connections or to tear down already established connections (to cause a denial of service) or to inject malicious payload into ongoing connections.

To succeed, the attacker has to correctly guess several parameters of the connection, like source and destination addresses and ports. Especially for well-known protocols, this isn't as impossible as it may appear. If the attacker knows both hosts' addresses and one port (because he's attacking a connection to a known service), he only has to guess one port. Even if the client is using a truly random source port (which isn't typical anyway), the attacker could try all 65536 possibilities in a short period of time.

The only thing that's truly hard to guess for an attacker is the right sequence number (and acknowledgement). If both peers chose their initial sequence numbers randomly (or you're modulating sequence numbers for hosts that have weak ISN generators), an attacker will not be able to guess an appropriate value at any given point during the connection.

Throughout the the lifetime of a valid TCP connection, the sequence numbers (and acknowledgements) of individual packets advance according to certain rules. For instance, once a host has sent a particular segment of data and the peer has acknowledged receiption, there is no legitimate reason for the sender to resend data for the same segment. In fact, an attempt to overwrite parts of already received data is not just invalid according to the TCP protocol, but a common attack.

pf uses these rules to deduce small windows for valid sequence numbers. Typically, pf can be sure that only about 30,000 out of 4,294,967,296 possible sequence numbers are valid at any point during a connection. Only when both a packet's sequence and acknowledgement number match these windows, pf will assume that the packet is legitimate and pass it.

When, during the state lookup, a state is found that matches the packet, the second step is to compare the packet's sequence numbers against the windows of allowed values stored in the state entry. When this second step fails, pf will produce a 'BAD state' message and drop the packet without evaluating the ruleset. There are two reasons for not evaluating the ruleset in this case: it would almost certainly be a mistake to pass the packet, and if the ruleset evaluation would result in a last-matching pass keep state rule, pf couldn't honour the decision and create a new state, as that would create a state key conflict.

In order to actually see and log 'BAD state' messages, you'll need to enable debug logging, using:

  $ pfctl -xm
Debug messages are sent to the console, and syslogd by default archives them in /var/log/messages. Look for messages starting with 'pf:', like:

  pf: BAD state: TCP 192.168.1.10:20 192.168.1.10:20 192.168.1.200:64828
    [lo=1185380879 high=1185380879 win=33304 modulator=0 wscale=1]
    [lo=1046638749 high=1046705357 win=33304 modulator=0 wscale=1]
    4:4 A seq=1185380879 ack=1046638749 len=1448 ackskew=0 pkts=940:631
    dir=out,fwd
  pf: State failure on: 1 |
These messages always come in pairs. The first message shows the state entry at the time the packet was blocked and the sequence numbers of the packet that failed the tests. The second message lists the conditions that were violated.

At the end of the first message, you'll see whether the state was created on an incoming (dir=in) or outgoing (dir=out) packet, and whether the blocked packet was flowing in the same (dir=,fwd) or reverse (dir=,rev) direction relative to the initial state-creating packet.

A state contains three address:port pairs, two of which are always equal unless the connection is being translated by nat, rdr or binat. For outgoing connections, the source is printed on the left and the destination on the right. If the outgoing connection involves source translation, the pair in the middle shows the source after translation. For incoming connections, the connection's source is found on the right and the destination in the middle. If the incoming connection involves destination translation, the left-most pair shows the destination after translation. This format corresponds to the output of pfctl -ss, the only difference is that pfctl indicates the direction of the state using arrows instead.

Next, you see the two peers' current sequence number windows in square brackets. The '4:4' means the state is fully established (smaller values are possible during handshake, larger ones during connection closing). The 'A' indicates that the blocked packet had the ACK flag set, similar to the formatting of TCP flags in tcpdump(8) output, followed by the sequence (seq=) and acknowledgement (ack=) numbers in the blocked packet and the length (len=) of the packet's data payload. ackskew is an internal value of the state entry, only relevant when not equal zero.

The 'pkts=940:631' part means that the state entry has matches 940 packets in the direction of the initial packet and 631 packets in the opposite direction since it was created. These counters can be especially helpful in identifying the cause of problems occuring during the handshake, when either one is zero, contradicting your expectation that the state has matched packets in both directions.

The second message contains a list of one or more digits. Each digit printed represents one check that failed:

  • 1: the packet violates the receipients window (seq + len > high)
  • 2: the packet contains data already acknowledged (seq < lo - win)
  • 3: ackskew is smaller than the minimum
  • 4: ackskew is larger than the maximum
  • 5: similar to 1, but worse (seq + len > high + win)
  • 6: similar to 2, but worse (seq < lo - maximum win)
Luckily, 'BAD state' messages are not common for regular real-life traffic, pf's sequence number verification accounts for many benign anomalies. If you see the messages only sporadically and notice no stalling connections, you can safely ignore them. There are many different TCP/IP stacks out there on the Internet, and some of them produce weird packets occasionally.

However, there is one class of problems in pf configuration that can be diagnosed based on the 'BAD state' messages produced steadily in those cases.

Create TCP states on the initial SYN packet

Ideally, TCP state entries are created when the first packet of the connection, the initial SYN is seen. You can enforce this by following a simple principle:

  Use 'flags S/SA' on all 'pass proto tcp keep state' rules!
All initial SYN packets (and only those packets) have flag SYN set but flag ACK not set. When all your 'keep state' rules that can apply to TCP packets are restricted these packet, only initial SYN packets can create states. Therefore, any TCP state created is created based on an initial SYN packet.

The reason for creating state only on initial SYN packets is a TCP extention called 'window scaling' defined in RFC 1323. The field of the TCP header used to advertise accepted windows became too small for today's fast links. Modern TCP/IP stacks would like to use larger window values than can be stored in the existing header field. Window scaling means that all window values advertised by one peer are to be multiplied by a certain factor by the receipient, instead of be taken literally. In order for this scheme to work, both peers must understand the extention and advertise their ability to support it during the handshake using TCP options. The TCP options are only present in the initial SYN and SYN+ACK packets of the handshake. If and only if both of those packets contain the TCP option, the negotiation is successful, and all further packets' window values are meant to be multiplied.

If pf didn't know about window scaling being used, it would take all advertised window values seen literally, and calculate its windows of acceptable sequence number ranges incorrectly. Typically, peers start to advertise smaller windows and gradually advertise larger windows during the course of a connection. Unaware of the window scaling factors, pf would at some point start to block packets because it would think one peer is overflowing the other's advertised window. The effects would be more or less subtle. Sometimes, the peers will react to the loss of the packets by going into a loss recovery mode and advertise smaller windows. When pf then passes subsequent retransmissions again, advertised windows grow again, up to the point where pf blocks packets. The effect is that connections temporarily stall and throughput is poor. It's also possible that connections stall completely and time out.

pf does know about window scaling and supports it. However, the prerequisite is that you create state on the initial SYN, so pf can associate the first two packets of the handshake with the state entry. Since the entire negotiation of the window scaling factors takes place only in these two packets, there is no reliable way to deduce the factors after the handshake.

Window scaling wasn't widely used in the past, but this is changing rapidly. Just recently, Linux started using window scaling by default. If you experience stalling connections, especially when problems are limited to certain combinations of hosts, and you see 'BAD state' messages related to these connections logged, verify that you're really creating states on the initial packet of a connection.

You can tell whether pf has detected window scaling for a connection from the output of pfctl like:

  $ pfctl -vss
  kue0 tcp 10.1.2.3:10604 -> 10.2.3.4:80 ESTABLISHED:ESTABLISHED
   [3046252937 + 58296] wscale 0  [1605347005 + 16384] wscale 1
If you see 'wscale x' printed in the second line (even if x is zero), pf is aware that the connection uses window scaling.

Another simple method to identify problems related to window scaling is to temporarily disable window scaling support on either peer and reproduce the problem. On OpenBSD, the use of window scaling can be controlled with sysctl(8), like:

  $ sysctl net.inet.tcp.rfc1323
  net.inet.tcp.rfc1323=1
  $ sysctl -w sysctl net.inet.tcp.rfc1323=0
  net.inet.tcp.rfc1323: 1 -> 0
Similar problems occur when you create a state entry based on packets other than the initial SYN and use 'modulate state' or use translations. In both cases, the translation should occur at the beginning of the connection. If the first packet is not already translated, translation of subsequent packets will usually confuse the receipient and cause it to send replies that pf blocks with 'BAD state' messages.

Copyright (c) 2004-2006 Daniel Hartmeier <daniel@benzedrine.cx>. Permission to use, copy, modify, and distribute this documentation for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

(Comments are closed)


Comments
  1. By Daniel Hartmeier (62.65.145.30) daniel@benzedrine.cx on http://www.benzedrine.cx/dhartmei.html

    If you spot any mistakes, please post corrections under this thread. Thank you!

    Comments
    1. By Henrik Kramshøj (195.212.29.83) hlk@kramse.dk on www.kramse.dk


      Haven't got time to read, but "accoding" should be according, I guess

      Great stuff!

      Regarding rulesets the problem is not so much writing a ruleset, but adding to rulesets - after a few years they become a mess!

    2. By Nathan Rickerby (59.167.54.65) on

      minor correction

      > When you do observe a result differ from the expectation,

      s/differ/that differs/

  2. By Martin Hein (193.138.173.158) on http://zensystems.dk

    Daniel, This is good work.

    Actualy i think it sould be added to the OpenBSD FAQ section.

  3. By Greg Hennessy (62.3.210.251) me@privacy.org on

    Nicely Done Mr H,

    Please add my voice to the chorus for making it part of the official PF documentation.

    Yesterday's chapter was most informative.


    I do hope Jacek Artymiak's lack of response isn't onerous.



    greg

  4. By Richard Kelsall (81.2.66.12) r.kelsall@millstream.com on http://www.millstream.com/

    Great work. Very useful. Thank you.

    Trivial corrections :
    'can safe a lot of time' -> 'can save a lot of time'.
    'so only packets blocked for on unknown ports are logged' ->
    'so only packets blocked on unknown ports are logged'.
    'If pf wouldn't know about window scaling being used, it would
    take all advertised window values ...' ->
    'If pf didn't know about window scaling, it would
    take all advertised window values ...'.

    An obscure question that I have wondered about :
    RFC 793 says 'Two processes which issue active OPENs to each other at
    the same time will be correctly connected.' This suggests to me a more
    secure connection than one which uses a passive open is possible where
    both ends know the address, port and time a connection is required.
    This might be possible where for example two clients are independently
    talking to a server on the internet - for example an IRC server - but
    then want an efficient direct connection to each other for a file
    transfer and therefore want to open a direct TCP connection to each
    other to handle this. For this I think most client software currently
    written will open a passive connection at one end for the other client
    to connect to. Two simultaneous active opens would seem to be a
    much more secure way of starting these connections since the clients
    can tell each other via the central server to simultaneously start a
    direct connection. Would a stateful pf firewall in front of both
    clients prevent this active to active TCP connection establishing?

    Comments
    1. By Daniel Hartmeier (62.65.145.30) daniel@benzedrine.cx on http://www.benzedrine.cx/dhartmei.html

      > Trivial corrections :

      Fixed, thank you!

      About TCP simultaneous open, pf should support it. It will see either one peer's SYN first and create state based on it. The next packet pf sees in this case is the other peer's SYN in the reverse direction. It will match the state entry, and, since it's the first packet of the state for this direction, initialize that side of the state. The two subsequent (non-SYN) ACKs and further packets match the state entry normally.

      I say 'should', because it seems rather tricky to test it. I don't know how you'd trigger the simultaneous open from two normal userland processes. Would both call socket(2), bind(2), and then connect(2)? Neither one calls listen(2)? I've never seen such code, maybe it IS just an artifact :)

      As for the security improvement, it doesn't seem more secure than have one side do a passive listen(2) and accept(2), then verify the source of the incoming connection, and drop the connection when it doesn't match. I think IRC clients do this for DCC, so do FTP clients/servers.

      Comments
      1. By Richard Kelsall (81.2.66.12) r.kelsall@millstream.com on http://www.millstream.com/

        Interesting, thank you. I have to admit I haven't actually programmed
        these things - I've only read the RFC and used pf. My thinking on the
        active-active connection being more secure than the passive open then
        dropping connections method was that an OpenBSD firewall, say on the
        edge of a corporate network, could be set to block all incoming
        connections from the internet to the insecure Windows machines behind
        the firewall, but pass out and keep state on any connections the
        Windows machines want to make to the internet. So all the suspect
        traffic is stopped at the secure OpenBSD firewall rather than reaching
        the Windows machines inside the network. By stopping the traffic at
        the firewall we can protect misconfigured computers and buggy network
        stacks etc inside the corporate network.

  5. By Dan Hassler (64.81.55.199) on

    Thank You!
    I have this comment in my pf.conf

    # NORMALIZATION: reduce/resolve ambiguities.
    #
    scrub on $admif all random-id reassemble tcp
    #scrub on $lanif all random-id reassemble tcp
    #scrub on $wanif all random-id reassemble tcp
    #
    # Problem using "reassemble tcp" on $lanif and/or $wanif
    # Mac OS X "software update" fails.
    # bad-timestamp counter increments,
    # RFC1323 errors in syslog with debug loud
    # All else works fine including other http on OS X.
    # TBD: investigate further.
    #
    scrub on $lanif all random-id fragment reassemble
    scrub on $wanif all random-id fragment reassemble

    Now I have the information to investigate further.

    -Dan

  6. By SleighBoy (24.113.145.178) on http://www.code.cx/

    Both thus far have been great and would be right at home in offical docs.

    One thing I would like to see more examples of is use is "reply-to" / "route-to". I used a snip of the load balancing example to setup a Soekris box to route traffic out two different internet connections depending on source IP. Allowed inbound port-forwarding on the non-default gateway to go back out it's incoming interface was quite trobulesome for me, I just used the default GW interface to fix it.

  7. By scotte (207.5.234.39) on

    Trivial typo, under the 'Parser errors' section

    pass from { !10.1.2.3, !10.2.3.4 } to an

    should be

    pass from { !10.1.2.3, !10.2.3.4 } to any


    Great stuff!

    Thanks.

  8. By frantisek holop (165.72.200.11) on

    thanks for the great article.
    
    i have just set -xm and see the following bad states:
    xxx.xxx.xxx.xxx: my server
    yyy.yyy.yyy.yyy: client
    
    pf: BAD state: TCP xxx.xxx.xxx.xxx:6881 xxx.xxx.xxx.xxx:6881 yyy.yyy.yyy.yyy:2947
    [lo=521071061 high=521136597 win=65535 modulator=0]
    [lo=4050979001 high=4051044535 win=65535 modulator=0]
    4:7 R seq=521071061 ack=4050979001 len=0 ackskew=0 pkts=2:3 dir=in,fwd
    pf: State failure on:         |
    
    there are a couple of things in this log your article doesn't mention:
    
    -what is state 4:7
    
    -if 'R' is reset (len=0 because R packets do not contain data, right?),
    and dir=in, if i understand correctly, pf got an incoming reset packet,
    why create or maintain a state for it?
    
    -there is no number in the second line
    
    > Luckily, 'BAD state' messages are not common for
    > regular real-life traffic
    
    now that i have turned on -xm, i get almost 1 msg every 2-3s...
    that's quite common for me :)  and it's not just bittorrent traffic,
    there are some for spamd also.
    
    i have also noticed that all these messages are 'R' packets...
    could you elaborate on this a little bit more please?
    
    

    Comments
    1. By Daniel Hartmeier (195.234.187.87) daniel@benzedrine.cx on http://www.benzedrine.cx/dhartmei.html

      > pf: BAD state: TCP xxx.xxx.xxx.xxx:6881 xxx.xxx.xxx.xxx:6881 yyy.yyy.yyy.yyy:2947
      > [lo=521071061 high=521136597 win=65535 modulator=0]
      > [lo=4050979001 high=4051044535 win=65535 modulator=0]
      > 4:7 R seq=521071061 ack=4050979001 len=0 ackskew=0 pkts=2:3 dir=in,fwd
      > pf: State failure on: |
      >
      > there are a couple of things in this log your article doesn't mention:
      >
      > -what is state 4:7

      A pf state entry represents a TCP connection between two peers. Each of those peers has a TCP socket for this connection, and these sockets are in particular states (the state of the TCP finite state machine, like SYN_SENT or ESTABLISHED).

      This sentence just used the word 'state' in two different meanings. But it's the appropriate word in either context. I can't help it :)

      So, a pf state contains two numbers representing the states the two peers' sockets are in. The values come from src/sys/netinet/tcp_fsm.h

      #define TCPS_CLOSED 0 /* closed */
      #define TCPS_LISTEN 1 /* listening for connection */
      #define TCPS_SYN_SENT 2 /* active, have sent syn */
      #define TCPS_SYN_RECEIVED 3 /* have sent and received syn */
      #define TCPS_ESTABLISHED 4 /* established */
      #define TCPS_CLOSE_WAIT 5 /* rcvd fin, waiting for close #define TCPS_FIN_WAIT_1 6 /* have closed, sent fin */
      #define TCPS_CLOSING 7 /* closed xchd FIN; await ACK */
      #define TCPS_LAST_ACK 8 /* had fin and close; await FIN ACK */
      #define TCPS_FIN_WAIT_2 9 /* have closed, fin is acked */
      #define TCPS_TIME_WAIT 10 /* in 2*msl quiet wait after close */

      The left-hand number printed (4 in your 4:7) is the state of the source peer, the right-hand number (7) that of the destination.

      That means one side has started closing the connection with a FIN handshake, when the other side is simply returning a RST. This happened after only 2+3=5 packets (3 for the opening handshake), so what you have there is a connection being closed immediately after being established, with one side simply dropping it's TCP control block, i.e. returning RST just as if the connection never existed.

      > -if 'R' is reset (len=0 because R packets do not contain data, right?),

      Yes

      > and dir=in, if i understand correctly, pf got an incoming reset packet,
      > why create or maintain a state for it?

      It didn't create state on the RST packet you see now. What you see logged is the packet that is now dropped because it doesn't match the state. The state was probably created on a SYN packet (the first of the preceding 5 packets matching the state).

      > -there is no number in the second line

      This is only possible in this specific case, where a RST packet is dropped right after the handshake when no data has ever been exchanged.

      This was fixed just recently in

      http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf.c.diff?r1=1.514&r2=1.515&f=h

      I'd say it's safe to ignore those particular 'BAD state' messages. If they bother you, you can apply the patch ;)

      Daniel

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]