[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing tcpdump files
- To: tech_(_at_)_openbsd_(_dot_)_org
- Subject: Re: Parsing tcpdump files
- From: Chuck Yerkes <chuck+obsd_(_at_)_2003_(_dot_)_snew_(_dot_)_com>
- Date: Sun, 2 Mar 2003 00:36:04 -0500
- Mail-followup-to: Chuck Yerkes <chuck+obsd_(_at_)_2003_(_dot_)_snew_(_dot_)_com>, tech_(_at_)_openbsd_(_dot_)_org
- Reply-to: misc_(_at_)_openbsd_(_dot_)_org
It's not appropriate for tech, so I encourage it to move to misc@
if it must continue.
Having watched coworkers write a deep statistical analysis on logs
files (mail), there was a fair amount of RDBMS work and a bunch
of sleepycat db3 work. Bumps run into were that perl didn't appear
to free memory when using DB. A 4GB machine starting to swap was
a bad sign.
You mention "small files" of 300MB, but your files "will be" 1GB.
And you complain of not having enough disk. Which is odd to me.
Performance disk (15kRPM SCSI, or better, that in RAID with lots
of NVRAM cache) costs, but 2-3 striped 7200 RPM IDE drives are
around $400 before rebates. How many people screwing around without
disk costs you $400. Bonus speed if put indices on something like
a Platypus NVRAM disk.
Get a mathemetician involved. Once you break it down, it's all stats.
Quoting Steve Bernard (sbernard_(_at_)_gmu_(_dot_)_edu):
> Joe,
>
> Thanks for the suggestions. Up until now I've been doing basically what
> you have suggested, but with small files (300MB), and using
> BASH/grep/etc. for the parsing. Importing everything into a RDBMS was my
> first strategy but that isn't feasible right now. I intend to store the
> reporting data in MySQL but I don't have the disk space to duplicate
> everything in the RDBMS. I was hoping that there were some tools for
> parsing the data in binary format because converting to ASCII adds about
> 35% to the file sizes, and my customer prefers the data to be in
> tcpdump's binary format. I still need to compare the file sizes using
> hex format instead of ASCII. Unless I hear other suggestions I plan on
> going forward using Perl or Python.
>
> Thanks,
>
> Steve
>
>
> Joseph C. Bender wrote:
> >On Saturday 01 March 2003 08:30 pm, Steve Bernard wrote:
> >
> >>Jack,
> >>
> >>I'm supporting several granted network engineering and security analysis
> >>research projects. Each project has specific data requirements and
> >>capabilities. To facilitate this I need to perform string parsing, data
> >>aggregation/sub-setting, statistical analysis, and reporting. They will
> >>each do much more on their own but, this is what is required at my end. I
> >>anticipate the capture files being around 1GB each.
> >>
> >
> > Well, I just had to do some looking at a 12 hour capture of some
> > database server traffic.
> >
> >I did the capture with Tcpdump, then did my initial sorting with tcpdump
> >reading the file with filtering for each major grouping of info I needed.
> >
> >Because I was looking for something, I then pulled each file into Ethereal
> >to better see the captures, and display-filter the data in question.
> >
> >However, it would seem that you'd want to write the ascii output of
> >tcpdump to files, then using $PARSING_LANGUAGE_OF_CHOICE to generate your
> >numbers, or pull the info into a RDBMS for using some major tool.
Visit your host, monkey.org