[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: script to harvest spammer's address



>>> Jay Moore 15-Jun-03 04:09 >>>
>
> On Sat, 14 Jun 2003 15:34:41 -0700, you wrote:
>
> > How is the text file formatted?  What separates messages?  The
> > traditional mbox separator is a dot on a line by itself.

No.

The traditional mbox separator is the From_ line:

e.g.  From dummy Mon Apr 02 22:28:25 2001

Some parsers treat any line beginning with From-and-a-space as the
separator, others (e.g. Pine's C-Client) require a more complicated
match.  Some parsers require a blank line befor the From_ line,
others don't.

My own preference is to treat any line beginning with "From " as the
separator; lines in the body that start with this should be escaped
with a >.  (Lines in the header should not have a space before the :.)
There is ambiguity in implementations as to whether lines beginning
with >From should have additional >s added.  To gain the ability to
always recover exactly what was originally sent, you should add a >
to the beginning of any line that matches /^>*From\s/.  However, not
all implementations have done/do this.

> > If you have some kind of reliable format then a simple perl script
> > wille xtract all From addresses...Assuming that is what you want.
> >
> > Tell us the format you have.  A few lines of perl can handle it.
>
> Cool - here's the format:
> 1) each msg begins with a "Return-Path: "
> 2) next line begins with "Received: " & contains an address in brkts []
> 3) next two line begin w/ white space (tab, it appears)
> 4) if there is more than one received line, the fourth line begins
>    with "Received: ", but rec'd lines 2 -> n are on a single line
> 5) following the final received line, the next line begins "Message-ID: "
> 6) followed by From:, To:, Subject: , etc,etc
> 7) last line of each message is "-- End --" ; this line is preceded by
>    two blank lines (CR, or CR-LF)

And this is why formail is not working for you.  It expects each message
to begin with the From_.

I would recommend writing a Perl script to replace your -- End -- line
with a valid From_ line for the following message.  (And the first message
will also need a From_ line.)

I would also suggest finding out how to create such files in the first
place.  With Pine I just save messages, and the From_ line is included.
Much easier!  I expect it is possible to configure your mail client to
do the same thing somehow.

Hope this helps

Tom Cosgrove