[slide 0]

talk materials are here: http://monkey.org/~jose/presentations/pysniff04.d/

hello and welcome to my presentation. i'll be explaining how to use
the python language to sniff network traffic. we'll be using some
python libraries and building a few small applications that give us
great views into network traffic. i'm also using this as a chance to
more widely introduce a new tool i wrote, "flowgrep". so, let's get
going!

on slide 25 i have a list of URLs and resources.

[slide 1]

you might be asking yourself "why use python?" when you think about 
what network traffic sniffing means, you're often facing situations where
you have to use very fast programs. and since python is an interpreted
language, you expect it to be slower than C. that's ok for our
purposes.

python gives us many other advantages over C. first, we have an
easy to use language so we can rapidly develop prototypes and new code.
secondly, we have easy access to basic data structures, like lists and 
arrays, which we can easily build upon. a list of lists, for example,
would look like this in python:

>>> mylist1 = ['item 1', 'item 2']
>>> mylist2 = ['cat', 'dog']
>>> superlist = [mylist1, mylist2]
>>> print superlist
[['item 1', 'item 2'], ['cat', 'dog']]

you can do that in C, but it's harder and requires more lines of code
("LoC") to do so. my biggest reason for using a scripting language over
C is the easy string manipulation, either making tokens or using regular
expressions. 

you can learn more about python at http://www.python.org/, including 
tutorials, distributins (UN*X, Windows, and Mac OS X included), and
modules.

[slide 2]

we are going to marry python and sniffing by using C libraries which
have been exported to python using the "SWIG" tool. "SWIG" is the 
scripting wrapper interface generator, which lets us make new modules
for scripting languages from C and C++ libraries. it can export to
Tcl, Python, Perl, C#, and so much more. libraries exist for "pcap",
which is the canonical sniffing library, "dnet", dugsong's packet crafting
library, and "libnids", which is what we're using today.

these are exported as Python modules like pcapy or pypcap (for libpcap),
"dpkt" for a packet inspection tool, and libdnet even includes Python
bindings. we're going to use "pynids", a Python interface to libnids.
this lets us reassemble connections and easily see their contents.

i didn't write "pynids", but i built a few tools on top of it. i've been
sniffing packets for years and like doing it in Python when i get
a chance

[slide 3]

talk materials are here: http://monkey.org/~jose/presentations/pysniff04.d/

libnids is a simple library designd to facilitate the IP stream reassembly
fnuctions. simply put this means that we can see connections as connections
and not just packets, and the library also reassembles fragments into
their original packets. it's based on the Linux 2.0.36 TCP/IP stack.
internally it uses libpcap to sniff and libnet to handle the packets.

on slide 25 i have a list of URLs and resources.

[slide 4] 

the kernel is what normally houses the TCP/IP stack. functions for networking
are handled within the kernel, and you call into it using sockets and 
the like. 

[slide 5]

libnids exports this IP stack to the userland, where you can manipulate it
more easily to examine packets. you don't replace your kernel's stack,
but you do make a copy of it. very handy!

[slide 6]

using libnids is much easier than most people realize. you want to initialize
the library (nids_init()) first. then you want to register callback
functions which handle packets into their respective protocols. TCP
packets are handled by TCP functions which you tell the system about by
using "nids_register_tcp()". here packets will be placed into their
connection or a new connection state will be made if needed. you do the
same for UDP and IP packets, as well. 

you then launch the whole thing by calling "nids_run()", which starts
to listen for pakets and send them to the appropriate function that you
told it about. if you don't register an IP packet handler, it wont 
process them. you can even react and kill TCP connections by using
"nids_kill_tcp()", which sends TCP reset packets to each party in the 
connection. these are the C fuctions, which look a lot like out Python
functions.

[slide 7]

this shows you what "nids_run()" does. for every type of packet you
get the library will process it using the callback functions you
specified earlier.

TCP connections are tracked exactly as you would track them in the kernel,
using port numbers, IP addresses, sequence numbers and acknowledgement
numbers. 

[slide 8]

libnids exposes the basic TCP states in three ways. the first is for newly
created TCP sessions, it sets the state "NIDS_JUST_ESTABLISHED". this
occurs once the TCP 3 way handshake (3WHS) has taken place.

NIDS_DATA is set for new packets in an exsting stream. you can append them
to the existing data from the connection or discard the data if you want.

for TCP connection closure 3 states are available. NIDS_RESET is when
the connection was reset by one of the peers. NIDS_CLOSE is for a 
graceful TCP close, and NIDS_TIMED_OUT is for a connection that appears
to have timed out (ie no closure sent or seen).

libnids doesn't expose the intermediate stages of the TCP connection.

[slide 9]

pynids wraps this in much the same way that you would do in C. you have
nids_run() which drives it (or you can loop over nids_next() to go
one packet at a time). it dos the reassembly for you without any work
on your part. 

[slide 10]

here's a basic mapping of the functions from slide 6 to how they are done
in pynids. most of the functions have the same name and make sense. to
change internal libnids settings you can just use the "nids.param()"
function. and then you launch the sniffing process like you did in
C.

[slide 11]

the order of operations in pynids are exactly like they are in C: packets
arrive, libnids reassembles the fragments as needed, and the right callback
function is evaluated. within that callback function you can parse the
data stream, look at the packet attributes (ie sequence numbers) and you
can do whatever you want. you get all of these features exposed to you
without any trouble.

[slide 12]

here's a very simple python example. we will show the TCP handler on 
the nxt slide. we import the nids library, then in main() we set 
a parameter to disbale portscan detection, the library is initialied,
then we register a TCP handler (handleTcp()), and then we launch the
setup via nids.run(). very simple!

[slide 13]

here's what the TCP handler looks like. it gets passed a tcp object, which
holds the data, the addresses, and the attributes of the session. we 
can spit out the data when we're done. notice that we have to set the
tcp.client.collect variable to 1 to collect the client's data (data
sent to the client), and the same for the server data. without this
the library wont keep track of the session data, but it will keep track
of the session itself. 

[slide 14]

here's the same example in C. here's the main loop showing the library
setup and parameter setting, and then we launch libnids using nids_run().

[slide 15]

and here is the TCP clalback function. the TCP object here is a pointer
to the TCP state, and we can peek inside the structure at the destination
ports and the like. this simple example just collects data from webservers
and web clients and spits it out when the connection closes.

now, this has about the same number of lines (minus some of the C
brackets and such). but once you start doing string or data structure
manipulation, the operations are much easier Python than in C. that's
why we chose a scripting language.

[slide 16]

so, what can you do with this? well, here's three programs i wrote to
show you how to play with the library.

the first is VersionDetect, a tool i wrote to use the headers sent
by various servers and clients to interrogate their OS versions and 
software. it's very simple, and supports SSH client and server strings,
mail client strings, and web server and client strings. it tells you the 
string that was reported by the computer, which often reveals the OS
or the client architecture.

i wrote this because i had previously written a passive TCP stack fingerprinting
tool which sometimes gives bad results. i wanted to know for sure what
kind of system i was talking to, or if it was behind a security device that
proxied the connection.

some of the things i've found using this include the fact that some of the
MSN servers i talk to use OpenBSD firewalls and proxies but are
actually IIS servers. 

what i should do with the tool is have it write to a database so i can
log it all and query it when i need the information. :)

[slide 17]

here's some sample output of the tool when i run it on my work
laptop. you can see, for example, that i use OpenBSD on i386 to
browse the web with Firefox 0.6.1 (i need to update!). you an also 
see i talk to an OpenSSH 3.5 server and a bunch of web servers. some
of them reveal that they're on a particular type of platform, and
others don't reveal too much. after a while you get to see some interesting
stuff, like nice Apache modules and the like.

[slide 18]

http-graph is another tool i wrote to investigate something. ineeded a quick
proof of concept tool to show people what i meant when i was describing
how i wanted to think about new representations of web browsing histories.
http-graph sniffs the requests and replies from web servers to build a
directed graph of how you get from one website to the next. all of the
information we need is in the header of the request, including the
URL we want, the server it's on, and the referring website.

[slide 19]

what http-graph does is reconstruct this information into a pair of 
strings, one for the referrer and one for the request itself. this
gives us a view of natural "hubs" of information in our browsing.
other people have looked at this sort of thing, too. see 
http://www.uiweb.com.nyud.net:8090/issues/issue37.htm

[slide 20]

http-graph captures this information and creates a "dot" output file.
"dot" is part of the "graphviz" toolkit, which lets us graph directed
graphs very easily. i use the tool "neato" to make the graphs and 
display in various formats, like SVG (scalable vector graphics), 
postscript or PDF formats.

[slide 21]

here is an example graph and a detail of that graph. this particular 
spiderweb came from me lanching from my homepage to other pages i have.
from there i downloaded more pages, so you can see a natural progression
from my homepage to, for example, a new york times article. 

sometimes these graphs get very large, so i need to come up with a 
way to make them smaller and easier to view. 
 
[slide 22]

you can, of course, play with the data from the TCP session. it's just
a string object. what you can do includes searching, applying regular
expressions, or even rewriting the strings. this is how i use the data
to reconstruct the version strings from servers and clients or look
at the web browsing graph. just simple string operations. a scripting
language makes this worlds easier than it would be in C.

[slide 23]

so, knowing that we can sniff traffic reliably and look at the data,
of course you can have a lot of fun! you can invade peoples' privacy
and sniff their mail, log theirconversations over IM tools or IRC,
steal files that someone else is downloaded, or just disrupt their 
sessions. dugsong's tool "dsniff" includes a lot of functionality,
and we can make it up on the fly, too, using Python and pynids.

[slide 24]

flowgrep is a new tool i wrote recently to investigate worm activity
for new worms. i needed a tool that married regular expressions and
network sniffing. "ngrep" (or network grep) is a lot like this, but
it only logged a single packet. i needed the whole connection.
tcpkill couldn't look inside the data, and dsniff wasn't flexible
enough for my needs.
 
what flowgrep does is evaluate the data in the TCP connection using
these regular expressions and logs it or even kills the connection.
by marrying network pcap expressions and regular expressions, i have
a flexible tool to look at network malware like worms or even spam. 
 
flowgrep makes a very cheap and simple IDS or IPS (intrusion prevention
system) and is written in under 400 lines of code, using Python.

[slide 25]

you can download these libraries and tools in this talk at these links.
tcpdump.org hosts the libpcap library (which is included in most UN*X
distributions). my code is on my website at these links, and you'll
need "pynids" (the last link) for running these tools. a bunch of
fun network tools are being written in Python lately.
 
[slide 26]
 
finally, you'll want to read TCP/IP Illustrated if you plan to do any
serious amounts of sniffing or network evaluation. Mike Schiffman wrote
a great book introducing many of these libraries in C, like libpcap and
libnids. and finally you will often have to refer to IETF RFC documents
to look at how a protocol operates.