Here's another hairy topic having to do with string handling. Usually, applications don't just want to copy strings around, they also want to do something with them. In a very large number of cases, this means you have to be acutely aware of whether there are any special characters or characters combinations that have special, potentially dangerous effects in the context in which you intend to use a string.
Here's an example. The Network Information System (NIS, formerly known as Yellow Pages or YP) comes with a network server named yppasswdd that lets users change their passwords remotely.
Just to refresh your memory, the passwd database is made up of one entry per line, containing a number of fields separated by single colon characters, with the first field being the login name, followed by the encrypted password, user and primary group ID, full name, home directory and login shell. Put differently, entries are delimited by the newline character, and fields within an entry are delimited by the colon character. When using a shadow password file, things are slightly different, but not enough to bother with the fine print here.
Here's a sample passwd entry for a user account named joe:
joe:PYcaX9vAPwWds:512:100:Joe Doe:/home/joe:/bin/bash
When joe wants to change his password using the yppasswd utility program, the program will ask him for his old password and the new one. It will hash the new one (producing, say xyZZYabcde123), and transmit it along with the current one (sent unencrypted) to the yppasswdd server. The server will check the provided ``current'' password against the entry in the passwd file, and if it is correct, it will replace PYcaX9vAPwWds with the new hashed password xyZZYabcde123.
Early versions of yppasswdd (both the original version from Sun,
and the Linux one independently developed by yours truly) had an interesting
flaw: they didn't check the new password for colons and newline characters.
Thus, a user could send a password change request with the new password
looking like this (\n denotes a newline character):
:0:0:Super Joe:/:\noldjoe:
Of course, the normal yppasswd client will never produce a request like this, because it hashes the new user password, and the resulting string will never contain newlines or colons. However users are free to write a small program that exploits this bug.
When yppasswdd replaces Joe's old encrypted password with this string, the passwd file will now look like this:
joe::0:0:Super Joe:/: oldjoe::512:100:Joe Doe:/home/joe:/bin/bash
This turns the old entry for joe into two entries; one for a user named joe, which now has no password and a user ID of 0, and an oldjoe account that has all the attributes of the old account. The upshot is that you can now log in as joe and get a shell running with root privilege.
The fix implemented commonly for this bug is to reject any password change requests where the new password contains control characters (including newline), or a colon. This is sometimes referred to as blacklist matching because the string is checked against a list of bad characters, and rejected if a match is found.
The opposite approach is whitelist matching; the string is checked against a list of acceptable characters, and rejected if there's a character that's not included in the whitelist.
Each approach has its relative merits. The whitelist approach is usually preferred when you know exactly that a string should always consist of a certain set of characters. For instance, login names in Unix should always start with an alphabetic character, followed by zero or more alphanumerics (Is there any standard that says this?). Using blacklist matching does give you some leeway when you're operating in an environment that's quite heterogeneous, and where you cannot guarantee that some bizarre client will come along and use characters not included in your whitelist.
In the case of yppasswdd, the common fix is to use blacklist
matching as explained above, most likely because almost every UNIX
derivative, as well as Linux, have added their own flavor of extensions.
If we limited the set of acceptable characters for the encrypted password
to the set of characters commonly used (alphanumerics, plus dot and
slash), our implementation probably wouldn't interoperate with some
SystemV derived platforms that attach password aging information to
the password, delimited by a comma, as well as recent BSDs and Linux,
which use the prefix $1$ to flag passwords that use the MD5 hash
algorithm rather than old-fashioned crypt to hash the password.
However, yppasswdd is a somewhat peculiar case; for most other applications there's a common standard across all Unix platforms so there's a fairly good chance your application will work if you start with a whitelist of what you think is acceptable. Testing and/or user feedback can give rise to some additions to the whitelist, but that's usually much better than shipping software which will potentially blow up right in your face.
There is a bunch of other cases where it is crucial to filter user input for dangerous characters:
Virtually all terminals and terminal emulators (such as xterm) let applications control their behavior via so-called control sequences: Erasing the screen, positioning the cursor etc is all done via special character sequences. Allowing any user to output arbitrary characters to somebody else's terminal at least allows for a number of silly pranks, but there's more to it. Some terminals will actually execute external shell commands when given the right character sequence.8.1 Therefore, you should never allow any control characters (i.e. from the ASCII range 0-31) except for newline and carriage return (ASCII 10 and 13, respectively). If you want to be on the safe side, you should also disallow characters on the range 128-159 because on terminals that support 7bit characters only, these will be mapped to ASCII 0-31.