next up previous
Next: Embedding String in Pathnames Up: Application layer issues Previous: Buffer Overflows

Dangerous Characters in Strings

Here's another hairy topic having to do with string handling. Usually, applications don't just want to copy strings around, they also want to do something with them. In a very large number of cases, this means you have to be acutely aware of whether there are any special characters or characters combinations that have special, potentially dangerous effects in the context in which you intend to use a string.

Here's an example. The Network Information System (NIS, formerly known as Yellow Pages or YP) comes with a network server named yppasswdd that lets users change their passwords remotely.

Just to refresh your memory, the passwd database is made up of one entry per line, containing a number of fields separated by single colon characters, with the first field being the login name, followed by the encrypted password, user and primary group ID, full name, home directory and login shell. Put differently, entries are delimited by the newline character, and fields within an entry are delimited by the colon character. When using a shadow password file, things are slightly different, but not enough to bother with the fine print here.

Here's a sample passwd entry for a user account named joe:

joe:PYcaX9vAPwWds:512:100:Joe Doe:/home/joe:/bin/bash

When joe wants to change his password using the yppasswd utility program, the program will ask him for his old password and the new one. It will hash the new one (producing, say xyZZYabcde123), and transmit it along with the current one (sent unencrypted) to the yppasswdd server. The server will check the provided ``current'' password against the entry in the passwd file, and if it is correct, it will replace PYcaX9vAPwWds with the new hashed password xyZZYabcde123.

Early versions of yppasswdd (both the original version from Sun, and the Linux one independently developed by yours truly) had an interesting flaw: they didn't check the new password for colons and newline characters. Thus, a user could send a password change request with the new password looking like this (\n denotes a newline character):

:0:0:Super Joe:/:\noldjoe:

Of course, the normal yppasswd client will never produce a request like this, because it hashes the new user password, and the resulting string will never contain newlines or colons. However users are free to write a small program that exploits this bug.

When yppasswdd replaces Joe's old encrypted password with this string, the passwd file will now look like this:

joe::0:0:Super Joe:/:
oldjoe::512:100:Joe Doe:/home/joe:/bin/bash

This turns the old entry for joe into two entries; one for a user named joe, which now has no password and a user ID of 0, and an oldjoe account that has all the attributes of the old account. The upshot is that you can now log in as joe and get a shell running with root privilege.

The fix implemented commonly for this bug is to reject any password change requests where the new password contains control characters (including newline), or a colon. This is sometimes referred to as blacklist matching because the string is checked against a list of bad characters, and rejected if a match is found.

The opposite approach is whitelist matching; the string is checked against a list of acceptable characters, and rejected if there's a character that's not included in the whitelist.

Each approach has its relative merits. The whitelist approach is usually preferred when you know exactly that a string should always consist of a certain set of characters. For instance, login names in Unix should always start with an alphabetic character, followed by zero or more alphanumerics (Is there any standard that says this?). Using blacklist matching does give you some leeway when you're operating in an environment that's quite heterogeneous, and where you cannot guarantee that some bizarre client will come along and use characters not included in your whitelist.

In the case of yppasswdd, the common fix is to use blacklist matching as explained above, most likely because almost every UNIX derivative, as well as Linux, have added their own flavor of extensions. If we limited the set of acceptable characters for the encrypted password to the set of characters commonly used (alphanumerics, plus dot and slash), our implementation probably wouldn't interoperate with some SystemV derived platforms that attach password aging information to the password, delimited by a comma, as well as recent BSDs and Linux, which use the prefix $1$ to flag passwords that use the MD5 hash algorithm rather than old-fashioned crypt to hash the password.

However, yppasswdd is a somewhat peculiar case; for most other applications there's a common standard across all Unix platforms so there's a fairly good chance your application will work if you start with a whitelist of what you think is acceptable. Testing and/or user feedback can give rise to some additions to the whitelist, but that's usually much better than shipping software which will potentially blow up right in your face.

There is a bunch of other cases where it is crucial to filter user input for dangerous characters:

Embedding in structured files
This is basically what we just discussed. When embedding untrusted data in a file, make sure it doesn't contain any ``magic'' characters that act as record separators etc.

Embedding in HTML/XML
From a theoretical point of view, this is just a corollary of the above, but as it is recently getting quite a bit of attention, we will discuss this in some more detail in section 8.4.

Passing strings to the shell
This is a very common case, with some special pitfalls, which we will discuss in section 8.5 below.

Using strings in path names
There's a common mistake when using strings specified by an untrusted peer in a path name. We'll describe this in section 8.3.

Embedding in database queries
XXX: To be done

Console output
There sort of applications that let one user display text on the terminal of another user, including venerable Unix services such as talk and write. Services like these are slowly fading into oblivion, but there's a peculiar gotcha that bears mentioning nevertheless.

Virtually all terminals and terminal emulators (such as xterm) let applications control their behavior via so-called control sequences: Erasing the screen, positioning the cursor etc is all done via special character sequences. Allowing any user to output arbitrary characters to somebody else's terminal at least allows for a number of silly pranks, but there's more to it. Some terminals will actually execute external shell commands when given the right character sequence.8.1 Therefore, you should never allow any control characters (i.e. from the ASCII range 0-31) except for newline and carriage return (ASCII 10 and 13, respectively). If you want to be on the safe side, you should also disallow characters on the range 128-159 because on terminals that support 7bit characters only, these will be mapped to ASCII 0-31.

Feeding data to external programs
XXX: Mention the mailx tilde problem


next up previous
Next: Embedding String in Pathnames Up: Application layer issues Previous: Buffer Overflows
Olaf Kirch 2002-01-16