Which brings us to another programming language - the shell itself. If you have some experience as a system administrator, you will know how convenient shell scripts can be, and how cryptic (And if you've ever taken a look at the shell scripts that come with the INN news system, you know that Perl regular expressions are for sissies).
However, unless you're extremely careful, shell code is just about the worst thing that can happen to you from a security perspective. Especially with complex code, making sure that untrusted data does not get exposed to the shell parser is very hard.
For instance, back in the 90s when the MIME standard for multimedia mail attachments such as audio or images was still fairly young, none of the common mail readers on Unix (such as elm) supported it.8.4 Instead, you had to rely on an external program called metamail when composing or viewing MIME messages. Unfortunately, it wasn't very secure. Not only did it have a number of buffer overflows, it also came with a collection of shell scripts it used to display certain message types. There was a script to play a Sun audio file, there was a script to download external messages components via FTP and display them, and so on. The metamail program itself was written in C, while the sripts were written for the C shell.
When metamail invoked one of the scripts, it would pass various arguments on the command line, including some extracted from the mail message. The script would do lots of clever things with these arguments, but first it would usually assign them to local variables.
One of these scripts, called showexternal, was supposed to handle the MIME type message/external. This is one of the slightly bizarre features specified in the MIME standard which no-one usually implements (except metamail had to because it was supposed to be the reference implementation). A message part of type external, does not contain the message body itself, but just a reference to its real location (e.g. on a public FTP server). The idea was that you can save tremendous amounts of bandwidth if only those people download the message body who really want to read it. It's pretty much like a link in an HTML document, except HTML hadn't been invented then. It's clumsy, too, and the only messages I've ever seen using this feature are announcements sent out by the IETF (the Internet standards body) when announcing the publication of a new standards document.
To make a long story short, the script contained code like this:
#!/bin/csh
...
FTP=ftp
# Assign arguments (extracted from mail message)
set bodyfile=$1
set atype=`echo $2 | tr A-Z a-z`
set name=$3
set site=$4
set dir=$5
set mode=$6
... lots of smart processing ...
# Now download the remote file file via FTP to $localfile
$FTP <<EOF
open $site
user $username $password
cd $dir
mode $mode
get $name $localfile
quit
EOF
To a programmer used to shell programming, this looks straightforward
enough. First, the command line arguments are assigned to a bunch of
shell variables because variable names are easier on the human brain
than $1, $2 etc. After a lot of clever diddling around,
it finally invokes the FTP client, and passes a number of commands to
its standard input using what's called a here script. If you're
not familiar with here scripts, don't bother. Basically it's just a
funny notation for telling the shell to pass the next few lines to the
command's standard input.
Now there's a nasty little feature of the C Shell that people
used to the Bourne Shell (or bash) may not be aware of. A statement
such as set name=$3 does unexpected things when the argument
contains white space.8.5 For instance, assume the message specifies the name of the file to
download as lalla FTP=/bin/sh. Rather than setting name
to this string, the statement above will set name to lalla,
and FTP to /bin/sh. Oops!
Subsequently, when the script invokes $FTP, it runs
/bin/bash rather than the FTP client, and feeds it the here
script on standard input. Now the commands shown above don't look
too dangerous. But there are a bunch of other values used in that here
script that the attacker can manipulate, such as the site name. Which
lets him easily embed arbitrary shell commands in what gets fed to the
shell above...which will be executed with the privilege of whoever
views this message with metamail.
The immediate fix for the problem at hand is to quote values when
assigning them, as in set name="$3". But that's not the full story.
Shell code is very hard to write safely. RedHat, who were still shipping
metamail last time I checked, use a heavily patched version. If
you take a look at the source package, you will find security-related
patches named mm-2.7-ohnonotagain.patch and
mm-2.7-arghhh.patch. Which betrays a lot about the sentiments
of the poor guy charged with the maintenance of that package:-)
In addition, the more complex a task your script is supposed to handle, the more convoluted the code becomes. I've referred to INN, the InterNet News system, above. This shows shell code at its most complex, and despite Rich Salz being such a good programmer, there were still one or two security glitches in these scripts.
So in my opinion, the real fix for security problems in shell scripts is not to use shell scripts at all.
XXX: explain why shell code is different from say perl