Monitor TCP SYN Flooding Attacks

July 17, 2014 by Robert Birnie

TCP SYN Floods can wreak havoc on a network and at the node level they look quite weird. Since they are just SYN packets, from the normal monitoring point of view they looks like a decrease in traffic, as the kernel holds on to these non-existent connections waiting for the final ACK. So rather than looking at graphs and saying "wow we're getting hammered," it sounds like "wow, where'd our apache traffic go?" or "why does this server have less traffic than the rest". At the load balancer level though you'd still see all the connections, its just they don't make it into most OS level monitoring. Wikipedia's got some decent documentation on SYN Floods and you can increase resiliency with changing a few kernel settings.

Normally you don't even see these attacks on regular linux servers, the attacks are instead caught at the load-balancer or firewall layer. But if you are using DSR (Direct Server Return) the SYN requests must get sent on directly to the servers as the SYN-ACK comes from the servers, rather than the load balancer. The load balancer can still limit connections by any single IP address, but its nice to monitor for this as well on the servers so you can cover all your bases. The following check is designed to be used with anything that uses NRPE like Icinga or Nagios

To actually see the traffic in question, it shows up in netstat as netstat -n | grep SYN_RECV. I added back the two header lines that grep would normally pull off for your info. Make sure you add the '-n' for "Show numerical addresses" (from man). If you are actually getting a lot of these netstat can be super slow as it tries to translate in the IP addresses with DNS.

$ netstat -n | grep SYN_RECV
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address             Foreign Address      State
tcp        0      0 192.168.1.1:443           11.11.11.11:58753        SYN_RECV
tcp        0      0 192.168.1.1:443           11.11.11.11:49423        SYN_RECV

You can also see SYN flood traffic under ss, although by default ss hides this traffic category. To see if type ss -a state SYN-RECV.

$ ss -a state SYN-RECV
Recv-Q Send-Q                Local Address:Port           Peer Address:Port
0      0                     192.168.1.1:http             22.23.24.25:55000
0      0                     192.168.1.1:http             22.23.24.25:lpcp
0      0                     192.168.1.1:http             22.23.24.25:53539

For our check we'll try to avoid both netstat and ss because they are a bit to resource intensive. We want something we can easily monitor and run frequently, and although ss is still better than netstat, its still not perfect. Instead we'll go straight to '/proc/net/tcp'. Here's some basic documentation on the format of the file. The following script though will parse the file, count all the connections in SYN_RECV and create a hash of counters for each ip address with open connections. If the warning or critical thresholds are reached the script will exit with the correct status code and return an output with who the top offenders are (although the source ip is probably spoofed anyways).

The output looks like this:

$ ruby /usr/lib64/nagios/plugins/check_syn_flood.rb -w 500 -c 1000
SYN Count: 239
$ ruby /usr/lib64/nagios/plugins/check_syn_flood.rb -w 500 -c 100
SYN FLOOD CRITICAL SYN Count: 182 | DST: 192.168.1.1: 37 SRC: 1.2.3.4: 6

And here's the full script:

#!/usr/bin/ruby
# Nagios check for TCP SYN Flooding Attack
## check_syn_flood.rb -w WarningLevel -c CriticalLevel
#
# Written by Robert Birnie
# Source: http://www.uberobert.com/nagios-check-for-tcp-syn-flooding-attacks/
#
# /proc/net/tcp format:
# sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
#  0: 0100007F:46E0 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 36206 1 ffff810224e52140 3000 0 0 2 -1
#
#    %nethash = (
#        '01',  =>  TCP_ESTABLISHED,
#        '02',  =>  TCP_SYN_SENT,
#        '03',  =>  TCP_SYN_RECV,
#        '04',  =>  TCP_FIN_WAIT1,
#        '05',  =>  TCP_FIN_WAIT2,
#        '06',  =>  TCP_TIME_WAIT,
#        '07',  =>  TCP_CLOSE,
#        '08',  =>  TCP_CLOSE_WAIT,
#        '09',  =>  TCP_LAST_ACK,
#        '0A',  =>  TCP_LISTEN,
#        '0B',  =>  TCP_CLOSING,
#    );

require 'optparse'
require 'scanf'

options = {}

optparse = OptionParser.new do |opts|
  opts.on('-w', '--warn warning') do |f|
    options[:warn] = f
  end
  opts.on('-c', '--critical critical') do |f|
    options[:crit] = f
  end
end

optparse.parse!

raise OptionParser::MissingArgument if options[:warn].nil?
raise OptionParser::MissingArgument if options[:crit].nil?

@src_ips = Hash.new(0)
@dst_ips = Hash.new(0)
@count = 0
exit_code = 0

File.readlines('/proc/net/tcp').each do |line|
  i = line.split(' ')
  if i[3] == '03'
    @count += 1
    @dst_ips[i[1].split(':')[0].scanf('%2x'*4)*"."] += 1
    @src_ips[i[2].split(':')[0].scanf('%2x'*4)*"."] += 1
  end
end

msg = "SYN Count: #{@count}"

if @count > options[:crit].to_i or @count > options[:warn].to_i
  top_dst_ip = @dst_ips.max_by{|k,v| v}
  top_src_ip = @src_ips.max_by{|k,v| v}
  crit = "| DST: #{top_dst_ip[0].split('.').reverse.join('.')}: #{top_dst_ip[1]} SRC: #{top_src_ip[0].split('.').reverse.join('.')}: #{top_src_ip[1]}"
  if @count.to_i > options[:crit].to_i
    exit_code = 2
    msg = "SYN FLOOD CRITICAL #{msg} #{crit}"
  elsif @count.to_i > options[:warn].to_i
    exit_code = 1
    msg = "SYN FLOOD WARN #{msg} #{crit}"
  end
end

puts msg
exit exit_code

From here you can implement the check as any other NRPE check. I would not recommend alerting on the check direcly unless you have a very small number of servers. You'll end up to much noise probably. Instead set the check up with some sane defaults (maybe 1000 warn and 2000 critical) and no notifications. Then send an alert with a Service Cluster check for when a set number of your cluster is critical all together, say 50% or more. That'll keep you from having to wake up everytime someone pokes you.

Let me know in the comments if you found this useful or have any recommendations to improve it. Thanks!

Uber
Robert