[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: script to harvest spammer's address
>>> Jay Moore 15-Jun-03 04:09 >>>
>
> On Sat, 14 Jun 2003 15:34:41 -0700, you wrote:
>
> > How is the text file formatted? What separates messages? The
> > traditional mbox separator is a dot on a line by itself.
No.
The traditional mbox separator is the From_ line:
e.g. From dummy Mon Apr 02 22:28:25 2001
Some parsers treat any line beginning with From-and-a-space as the
separator, others (e.g. Pine's C-Client) require a more complicated
match. Some parsers require a blank line befor the From_ line,
others don't.
My own preference is to treat any line beginning with "From " as the
separator; lines in the body that start with this should be escaped
with a >. (Lines in the header should not have a space before the :.)
There is ambiguity in implementations as to whether lines beginning
with >From should have additional >s added. To gain the ability to
always recover exactly what was originally sent, you should add a >
to the beginning of any line that matches /^>*From\s/. However, not
all implementations have done/do this.
> > If you have some kind of reliable format then a simple perl script
> > wille xtract all From addresses...Assuming that is what you want.
> >
> > Tell us the format you have. A few lines of perl can handle it.
>
> Cool - here's the format:
> 1) each msg begins with a "Return-Path: "
> 2) next line begins with "Received: " & contains an address in brkts []
> 3) next two line begin w/ white space (tab, it appears)
> 4) if there is more than one received line, the fourth line begins
> with "Received: ", but rec'd lines 2 -> n are on a single line
> 5) following the final received line, the next line begins "Message-ID: "
> 6) followed by From:, To:, Subject: , etc,etc
> 7) last line of each message is "-- End --" ; this line is preceded by
> two blank lines (CR, or CR-LF)
And this is why formail is not working for you. It expects each message
to begin with the From_.
I would recommend writing a Perl script to replace your -- End -- line
with a valid From_ line for the following message. (And the first message
will also need a From_ line.)
I would also suggest finding out how to create such files in the first
place. With Pine I just save messages, and the From_ line is included.
Much easier! I expect it is possible to configure your mail client to
do the same thing somehow.
Hope this helps
Tom Cosgrove