[slide 0] talk materials are here: http://monkey.org/~jose/presentations/pysniff04.d/ hello and welcome to my presentation. i'll be explaining how to use the python language to sniff network traffic. we'll be using some python libraries and building a few small applications that give us great views into network traffic. i'm also using this as a chance to more widely introduce a new tool i wrote, "flowgrep". so, let's get going! on slide 25 i have a list of URLs and resources. [slide 1] you might be asking yourself "why use python?" when you think about what network traffic sniffing means, you're often facing situations where you have to use very fast programs. and since python is an interpreted language, you expect it to be slower than C. that's ok for our purposes. python gives us many other advantages over C. first, we have an easy to use language so we can rapidly develop prototypes and new code. secondly, we have easy access to basic data structures, like lists and arrays, which we can easily build upon. a list of lists, for example, would look like this in python: >>> mylist1 = ['item 1', 'item 2'] >>> mylist2 = ['cat', 'dog'] >>> superlist = [mylist1, mylist2] >>> print superlist [['item 1', 'item 2'], ['cat', 'dog']] you can do that in C, but it's harder and requires more lines of code ("LoC") to do so. my biggest reason for using a scripting language over C is the easy string manipulation, either making tokens or using regular expressions. you can learn more about python at http://www.python.org/, including tutorials, distributins (UN*X, Windows, and Mac OS X included), and modules. [slide 2] we are going to marry python and sniffing by using C libraries which have been exported to python using the "SWIG" tool. "SWIG" is the scripting wrapper interface generator, which lets us make new modules for scripting languages from C and C++ libraries. it can export to Tcl, Python, Perl, C#, and so much more. libraries exist for "pcap", which is the canonical sniffing library, "dnet", dugsong's packet crafting library, and "libnids", which is what we're using today. these are exported as Python modules like pcapy or pypcap (for libpcap), "dpkt" for a packet inspection tool, and libdnet even includes Python bindings. we're going to use "pynids", a Python interface to libnids. this lets us reassemble connections and easily see their contents. i didn't write "pynids", but i built a few tools on top of it. i've been sniffing packets for years and like doing it in Python when i get a chance [slide 3] talk materials are here: http://monkey.org/~jose/presentations/pysniff04.d/ libnids is a simple library designd to facilitate the IP stream reassembly fnuctions. simply put this means that we can see connections as connections and not just packets, and the library also reassembles fragments into their original packets. it's based on the Linux 2.0.36 TCP/IP stack. internally it uses libpcap to sniff and libnet to handle the packets. on slide 25 i have a list of URLs and resources. [slide 4] the kernel is what normally houses the TCP/IP stack. functions for networking are handled within the kernel, and you call into it using sockets and the like. [slide 5] libnids exports this IP stack to the userland, where you can manipulate it more easily to examine packets. you don't replace your kernel's stack, but you do make a copy of it. very handy! [slide 6] using libnids is much easier than most people realize. you want to initialize the library (nids_init()) first. then you want to register callback functions which handle packets into their respective protocols. TCP packets are handled by TCP functions which you tell the system about by using "nids_register_tcp()". here packets will be placed into their connection or a new connection state will be made if needed. you do the same for UDP and IP packets, as well. you then launch the whole thing by calling "nids_run()", which starts to listen for pakets and send them to the appropriate function that you told it about. if you don't register an IP packet handler, it wont process them. you can even react and kill TCP connections by using "nids_kill_tcp()", which sends TCP reset packets to each party in the connection. these are the C fuctions, which look a lot like out Python functions. [slide 7] this shows you what "nids_run()" does. for every type of packet you get the library will process it using the callback functions you specified earlier. TCP connections are tracked exactly as you would track them in the kernel, using port numbers, IP addresses, sequence numbers and acknowledgement numbers. [slide 8] libnids exposes the basic TCP states in three ways. the first is for newly created TCP sessions, it sets the state "NIDS_JUST_ESTABLISHED". this occurs once the TCP 3 way handshake (3WHS) has taken place. NIDS_DATA is set for new packets in an exsting stream. you can append them to the existing data from the connection or discard the data if you want. for TCP connection closure 3 states are available. NIDS_RESET is when the connection was reset by one of the peers. NIDS_CLOSE is for a graceful TCP close, and NIDS_TIMED_OUT is for a connection that appears to have timed out (ie no closure sent or seen). libnids doesn't expose the intermediate stages of the TCP connection. [slide 9] pynids wraps this in much the same way that you would do in C. you have nids_run() which drives it (or you can loop over nids_next() to go one packet at a time). it dos the reassembly for you without any work on your part. [slide 10] here's a basic mapping of the functions from slide 6 to how they are done in pynids. most of the functions have the same name and make sense. to change internal libnids settings you can just use the "nids.param()" function. and then you launch the sniffing process like you did in C. [slide 11] the order of operations in pynids are exactly like they are in C: packets arrive, libnids reassembles the fragments as needed, and the right callback function is evaluated. within that callback function you can parse the data stream, look at the packet attributes (ie sequence numbers) and you can do whatever you want. you get all of these features exposed to you without any trouble. [slide 12] here's a very simple python example. we will show the TCP handler on the nxt slide. we import the nids library, then in main() we set a parameter to disbale portscan detection, the library is initialied, then we register a TCP handler (handleTcp()), and then we launch the setup via nids.run(). very simple! [slide 13] here's what the TCP handler looks like. it gets passed a tcp object, which holds the data, the addresses, and the attributes of the session. we can spit out the data when we're done. notice that we have to set the tcp.client.collect variable to 1 to collect the client's data (data sent to the client), and the same for the server data. without this the library wont keep track of the session data, but it will keep track of the session itself. [slide 14] here's the same example in C. here's the main loop showing the library setup and parameter setting, and then we launch libnids using nids_run(). [slide 15] and here is the TCP clalback function. the TCP object here is a pointer to the TCP state, and we can peek inside the structure at the destination ports and the like. this simple example just collects data from webservers and web clients and spits it out when the connection closes. now, this has about the same number of lines (minus some of the C brackets and such). but once you start doing string or data structure manipulation, the operations are much easier Python than in C. that's why we chose a scripting language. [slide 16] so, what can you do with this? well, here's three programs i wrote to show you how to play with the library. the first is VersionDetect, a tool i wrote to use the headers sent by various servers and clients to interrogate their OS versions and software. it's very simple, and supports SSH client and server strings, mail client strings, and web server and client strings. it tells you the string that was reported by the computer, which often reveals the OS or the client architecture. i wrote this because i had previously written a passive TCP stack fingerprinting tool which sometimes gives bad results. i wanted to know for sure what kind of system i was talking to, or if it was behind a security device that proxied the connection. some of the things i've found using this include the fact that some of the MSN servers i talk to use OpenBSD firewalls and proxies but are actually IIS servers. what i should do with the tool is have it write to a database so i can log it all and query it when i need the information. :) [slide 17] here's some sample output of the tool when i run it on my work laptop. you can see, for example, that i use OpenBSD on i386 to browse the web with Firefox 0.6.1 (i need to update!). you an also see i talk to an OpenSSH 3.5 server and a bunch of web servers. some of them reveal that they're on a particular type of platform, and others don't reveal too much. after a while you get to see some interesting stuff, like nice Apache modules and the like. [slide 18] http-graph is another tool i wrote to investigate something. ineeded a quick proof of concept tool to show people what i meant when i was describing how i wanted to think about new representations of web browsing histories. http-graph sniffs the requests and replies from web servers to build a directed graph of how you get from one website to the next. all of the information we need is in the header of the request, including the URL we want, the server it's on, and the referring website. [slide 19] what http-graph does is reconstruct this information into a pair of strings, one for the referrer and one for the request itself. this gives us a view of natural "hubs" of information in our browsing. other people have looked at this sort of thing, too. see http://www.uiweb.com.nyud.net:8090/issues/issue37.htm [slide 20] http-graph captures this information and creates a "dot" output file. "dot" is part of the "graphviz" toolkit, which lets us graph directed graphs very easily. i use the tool "neato" to make the graphs and display in various formats, like SVG (scalable vector graphics), postscript or PDF formats. [slide 21] here is an example graph and a detail of that graph. this particular spiderweb came from me lanching from my homepage to other pages i have. from there i downloaded more pages, so you can see a natural progression from my homepage to, for example, a new york times article. sometimes these graphs get very large, so i need to come up with a way to make them smaller and easier to view. [slide 22] you can, of course, play with the data from the TCP session. it's just a string object. what you can do includes searching, applying regular expressions, or even rewriting the strings. this is how i use the data to reconstruct the version strings from servers and clients or look at the web browsing graph. just simple string operations. a scripting language makes this worlds easier than it would be in C. [slide 23] so, knowing that we can sniff traffic reliably and look at the data, of course you can have a lot of fun! you can invade peoples' privacy and sniff their mail, log theirconversations over IM tools or IRC, steal files that someone else is downloaded, or just disrupt their sessions. dugsong's tool "dsniff" includes a lot of functionality, and we can make it up on the fly, too, using Python and pynids. [slide 24] flowgrep is a new tool i wrote recently to investigate worm activity for new worms. i needed a tool that married regular expressions and network sniffing. "ngrep" (or network grep) is a lot like this, but it only logged a single packet. i needed the whole connection. tcpkill couldn't look inside the data, and dsniff wasn't flexible enough for my needs. what flowgrep does is evaluate the data in the TCP connection using these regular expressions and logs it or even kills the connection. by marrying network pcap expressions and regular expressions, i have a flexible tool to look at network malware like worms or even spam. flowgrep makes a very cheap and simple IDS or IPS (intrusion prevention system) and is written in under 400 lines of code, using Python. [slide 25] you can download these libraries and tools in this talk at these links. tcpdump.org hosts the libpcap library (which is included in most UN*X distributions). my code is on my website at these links, and you'll need "pynids" (the last link) for running these tools. a bunch of fun network tools are being written in Python lately. [slide 26] finally, you'll want to read TCP/IP Illustrated if you plan to do any serious amounts of sniffing or network evaluation. Mike Schiffman wrote a great book introducing many of these libraries in C, like libpcap and libnids. and finally you will often have to refer to IETF RFC documents to look at how a protocol operates.