[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OT] Resources to understand integer overflow bugs ?



> Hi,
> 
> Could someone point me to some easy to understand resources explaining
> how integer overflow, like the OpenSSH one, occur ?

Haven't looked at the source for openssh at all, but I've spun my wheels
around integer extension/truncation problems, so I thought I'd comment.
I'm a complete newb, so if my comments miss the point, don't anybody
spank me too hard, okay?

My first serious introduction to integer overflow was from parsing
shift-JIS characters directly from the character buffer. 

(Shift-JIS is a multi-byte -- one or two byte -- encoding technique that
illustrates quite well how multi-byte characters should not be
implemented, but that's a topic for another day.)

I was doing this on a Mac with Codewarrior, and Codewarrior on the Mac
fortunately assumes that unsigned character strings should are pascal
strings, which clouded the issues enough for me to get rather familiar
with the whole thing.

(I like education opportunities. _Un_fortunately, they aren't often
immediately profitable.)

With euc-JIS encoding, you can use the sign bit as the fulcrum of your
tests for assembling octets, so your problems are fairly focused. With
shift-JIS, however, you have to look at ranges of valid values in both
the leading and trailing bytes. (IIRC, the latest JIS standard actually
specifies octets.)

The lead byte is fairly tame; no lead byte with a clear sign bit will
ever be a part of a multibyte character. But there are two ranges in the
128 - 255 range. And the trailing byte has one valid range in the 0 -
127 range and one in the 128 - 255 range.

So I want to use unsigned char all around, right? It sounds nice in
theory. But you run into these little things where what you expect, what
the compiler expects, and what some library function expects just don't
match. One case in point:

    int ch = getchar(); // Got a byte, right? 0 - 255, right? Good.
    if ( ch >= '\xa1' )  // Does this ever fail? Check compiler flags.

And I really wanted to use the literal character there (not '\xa1', of
course), because I didn't want to have to remember that 0xa1 is the
Japanese equivalent of a period in the one-byte character set that
happens to duplicate parts of the full JIS character set and is perhaps
the whole reason for the existence of shift-JIS.

Which led me to explore the difference between '\xa1' and 0xa1.

Very subtle stuff.

(I could post a link to a ctype sort of library I eventually wrote, but
have not myself had time to use. But I think I need to take that back
off the web until I have time to fix it so it can really be tested.)

UTF-8 should be a little less problematic, because there is no reason 
(temptation?) to use character literals.

-- 
Joel Rees <joel@alpsgiken.gr.jp>