[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

string cleanup results



Some of you might have noticed that we have been doing a 'string
cleaning'.  (I should probably apologize for this northern-hemisphere
specific pun).

That means that we have been going through the tree cleaning out all
calls to sprintf(), strcpy(), and strcat().  Instead, these things are
being rewritten to use asprintf(), snprintf(), strlcpy(), and
strlcat().

As a result, I can tell you that we have removed at approximately 2000
occurances of these functions.  I cannot provide exact figures since
that would require hand-counting the actual diffs from 8 weeks ago ;)
Source code grep'ing is not always accurate.

The goal is to remove potential overflows, by always calculating what
the bounds of an operation are.  In doing so, some code is now
converted from an overflow to a truncation.  In followup work, we wish
to continue investigating all cases of truncation in the tree, and see
what we can do better in those cases where truncation is undesirable.
Sometimes you want to handle it, sometimes it is ok, sometimes it is
really nasty.  Obviously, that's an even bigger job than what we have
done here, since the tree was already full of truncations.  Current
theory held by most is that most of these are innocious, but who
knows..  I want us to clean them (but I do not want us to go insane
either).

Obviously there is the possibility that this huge effort has
introduced a minor bug here or there, but it much more likely that we
have squished many much more major bugs.

I would estimate that more than 20 developers have been involved in
this process which we started in earnest about 6 weeks ago.

We are obviously not doing this to some parts of the tree which we
borrow from other projects.  In particular, the gnu part of the tree
might remain largely dirty.  However, we have done this in httpd and
openssl, and will maintain those tree components ourself if need be.
We have also convinced the bind people to move in this direction, and
they have incorporated these functions into bind -- that work is not
finished however.  Since sendmail already has renamed versions of
strlcpy and strlcat, we have tried convincing them to move completely
in this direction as well, and I think that they are doing that work
now.

Of the programs that are in our base set, and under consideration, we
have a few that still call these functions:

	routed, dump, restore, awk, indent, lam,
	make, xlint, bind, openssl, ppp, rtadvd

Pretty small list eh.

There might be a few more, but I am sure you can comprehend the
direction.  Of course, we have completely purified our libraries of
these functions as well, except that libc still provides them (in
accordance with standards and... well, expected behaviour).

The remaining ones are very difficult.  afs, gcc, binutils, etc.

I must also say that in the last week we've fixed some that are
incredibly difficult.  Some of them were definate buffer overflows.
Sometimes we hit a piece of code in a program that just about made
us cry.  A few changes involved 10 or more people before we settled
on a correct change.  Crazy stuff.

In a week or so there will also be changes made to ensure that the
kernel is completely clear of these functions too, but we are waiting
for the i386 ELF W^X work to settle into the tree.



Visit your host, monkey.org