Industry profiles: Software tools: Exclusive to -
10-12-2011
, 11:11 PM
Why Python?
http://www.linuxjournal.com/article/3882?page=0,1
article on usenet may be incomplete
By Eric Raymond (2000)
Eric Steven Raymond (born December 4, 1957), often referred to as ESR, is
an American computer programmer, author and open source software
advocate. His name became known within hacker culture when he became the
maintainer of the "Jargon File" in 1990. After the 1997 publication of
"The Cathedral and the Bazaar", Raymond became, for a number of years, an
unofficial spokesman for the open source movement.[2]
"My first look at Python was an accident, and I didn't much like what I
saw at the time. It was early 1997, and Mark Lutz's book Programming
Python from O'Reilly & Associates had recently come out. O'Reilly books
occasionally land on my doorstep, selected from among the new releases by
some mysterious benefactor inside the organization using a random process
I've given up trying to understand.
One of them was Programming Python. I found this somewhat interesting, as
I collect computer languages. I know over two dozen general-purpose
languages, write compilers and interpreters for fun, and have designed
any number of special-purpose languages and markup formalisms myself. My
most recently completed project, as I write this, is a special-purpose
language called SNG for manipulating PNG (Portable Network Graphics)
images. Interested readers can surf to the SNG home page at http://
www.catb.org/~esr/sng/. I have also written implementations of several
odd general-purpose languages on my Retrocomputing Museum page, http://
www.catb.org/retro/.
I had already heard just enough about Python to know that it is what is
nowadays called a “scripting language”, an interpretive language with its
own built-in memory management and good facilities for calling and
cooperating with other programs. So I dived into Programming Python with
one question uppermost in my mind: what has this got that Perl does not?
Perl, of course, is the 800-pound gorilla of modern scripting languages.
It has largely replaced shell as the scripting language of choice for
system administrators, thanks partly to its comprehensive set of UNIX
library and system calls, and partly to the huge collection of Perl
modules built by a very active Perl community. The language is commonly
estimated to be the CGI language behind about 85% of the “live” content
on the Net. Larry Wall, its creator, is rightly considered one of the
most important leaders in the Open Source community, and often ranks
third behind Linus Torvalds and Richard Stallman in the current pantheon
of hacker demigods.
At that time, I had used Perl for a number of small projects. I'd found
it quite powerful, even if the syntax and some other aspects of the
language seemed rather ad hoc and prone to bite one if not used with
care. It seemed to me that Python would have quite a hill to climb as yet
another scripting language, so as I read, I looked first for what seemed
to set it apart from Perl.
I immediately tripped over the first odd feature of Python that everyone
notices: the fact that whitespace (indentation) is actually significant
in the language syntax. The language has no analog of the C and Perl
brace syntax; instead, changes in indentation delimit statement groups.
And, like most hackers on first realizing this fact, I recoiled in
reflexive disgust.
I am just barely old enough to have programmed in batch FORTRAN for a few
months back in the 1970s. Most hackers aren't these days, but somehow our
culture seems to have retained a pretty accurate folk memory of how nasty
those old-style fixed-field languages were. Indeed, the term “free
format”, used back then to describe the newer style of token-oriented
syntax in Pascal and C, has almost been forgotten; all languages have
been designed that way for decades now. Or almost all, anyway. It's hard
to blame anyone, on seeing this Python feature, for initially reacting as
though they had unexpectedly stepped in a steaming pile of dinosaur dung.
That's certainly how I felt. I skimmed through the rest of the language
description without much interest. I didn't see much else to recommend
Python, except maybe that the syntax seemed rather cleaner than Perl's
and the facilities for doing basic GUI elements like buttons and menus
looked fairly good.
I put the book back on the shelf, making a mental note that I should code
some kind of small GUI-centered project in Python sometime, just to make
sure I really understood the language. But I didn't believe what I'd seen
would ever compete effectively with Perl.
A lot of other things conspired to keep that note way down on my priority
list for many months. The rest of 1997 was eventful for me; it was, among
other things, the year I wrote and published the original version of “The
Cathedral and the Bazaar”. But I did find time to write several Perl
programs, including two of significant size and complexity. One of them,
keeper, is the assistant still used to file incoming submissions at the
Metalab software archive. It generates the web pages you see at
metalab.unc.edu/pub/Linux/!INDEX.html. The other, anthologize, was used
to automatically generate the PostScript for the sixth edition of Linux
from the Linux Documentation Project's archive of HOWTOs. Both programs
are available at Metalab.
Writing these programs left me progressively less satisfied with Perl.
Larger project size seemed to magnify some of Perl's annoyances into
serious, continuing problems. The syntax that had seemed merely eccentric
at a hundred lines began to seem like a nigh-impenetrable hedge of thorns
at a thousand. “More than one way to do it” lent flavor and
expressiveness at a small scale, but made it significantly harder to
maintain consistent style across a wider code base. And many of the
features that were later patched into Perl to address the complexity-
control needs of bigger programs (objects, lexical scoping, “use strict”,
etc.) had a fragile, jerry-rigged feel about them.
These problems combined to make large volumes of Perl code seem
unreasonably difficult to read and grasp as a whole after only a few
days' absence. Also, I found I was spending more and more time wrestling
with artifacts of the language rather than my application problems. And,
most damning of all, the resulting code was ugly—this matters. Ugly
programs are like ugly suspension bridges: they're much more liable to
collapse than pretty ones, because the way humans (especially engineer-
humans) perceive beauty is intimately related to our ability to process
and understand complexity. A language that makes it hard to write elegant
code makes it hard to write good code.
With a baseline of two dozen languages under my belt, I could detect all
the telltale signs of a language design that had been pushed to the edge
of its functional envelope. By mid-1997, I was thinking “there has to be
a better way” and began casting about for a more elegant scripting
language.
One course I did not consider was going back to C as a default language.
The days when it made sense to do your own memory management in a new
program are long over, outside of a few specialty areas like kernel
hacking, scientific computing and 3-D graphics—places where you
absolutely must get maximum speed and tight control of memory usage,
because you need to push the hardware as hard as possible.
For most other situations, accepting the debugging overhead of buffer
overruns, pointer-aliasing problems, malloc/free memory leaks and all the
other associated ills is just crazy on today's machines. Far better to
trade a few cycles and a few kilobytes of memory for the overhead of a
scripting language's memory manager and economize on far more valuable
human time. Indeed, the advantages of this strategy are precisely what
has driven the explosive growth of Perl since the mid-1990s.
I flirted with Tcl, only to discover quickly that it scales up even more
poorly than Perl. Old LISPer that I am, I also looked at various current
dialects of Lisp and Scheme—but, as is historically usual for Lisp, lots
of clever design was rendered almost useless by scanty or nonexistent
documentation, incomplete access to POSIX/UNIX facilities, and a small
but nevertheless deeply fragmented user community. Perl's popularity is
not an accident; most of its competitors are either worse than Perl for
large projects or somehow nowhere near as useful as their theoretically
superior designs ought to make them.
My second look at Python was almost as accidental as my first. In October
1997, a series of questions on the fetchmail-friends mailing list made it
clear that end users were having increasing trouble generating
configuration files for my fetchmail utility. The file uses a simple,
classically UNIX free-format syntax, but can become forbiddingly
complicated when a user has POP3 and IMAP accounts at multiple sites. As
an example, see Listing 1 for a somewhat simplified version of mine.
Listing 1
I decided to attack the problem by writing an end-user-friendly
configuration editor, fetchmailconf. The design objective of fetchmailconf
was clear: to completely hide the control file syntax behind a
fashionable, ergonomically correct GUI interface replete with selection
buttons, slider bars and fill-out forms.
The thought of implementing this in Perl did not thrill me. I had seen
GUI code in Perl, and it was a spiky mixture of Perl and Tcl that looked
even uglier than my own pure-Perl code. It was at this point I remembered
the bit I had set more than six months earlier. This could be an
opportunity to get some hands-on experience with Python.
Of course, this brought me face to face once again with Python's pons
asinorum, the significance of whitespace. This time, however, I charged
ahead and roughed out some code for a handful of sample GUI elements.
Oddly enough, Python's use of whitespace stopped feeling unnatural after
about twenty minutes. I just indented code, pretty much as I would have
done in a C program anyway, and it worked.
That was my first surprise. My second came a couple of hours into the
project, when I noticed (allowing for pauses needed to look up new
features in Programming Python) I was generating working code nearly as
fast as I could type. When I realized this, I was quite startled. An
important measure of effort in coding is the frequency with which you
write something that doesn't actually match your mental representation of
the problem, and have to backtrack on realizing that what you just typed
won't actually tell the language to do what you're thinking. An important
measure of good language design is how rapidly the percentage of missteps
of this kind falls as you gain experience with the language.
When you're writing working code nearly as fast as you can type and your
misstep rate is near zero, it generally means you've achieved mastery of
the language. But that didn't make sense, because it was still day one
and I was regularly pausing to look up new language and library features!
This was my first clue that, in Python, I was actually dealing with an
exceptionally good design. Most languages have so much friction and
awkwardness built into their design that you learn most of their feature
set long before your misstep rate drops anywhere near zero. Python was
the first general-purpose language I'd ever used that reversed this
process.
Not that it took me very long to learn the feature set. I wrote a
working, usable fetchmailconf, with GUI, in six working days, of which
perhaps the equivalent of two days were spent learning Python itself.
This reflects another useful property of the language: it is compact--you
can hold its entire feature set (and at least a concept index of its
libraries) in your head. C is a famously compact language. Perl is
notoriously not; one of the things the notion “There's more than one way
to do it!” costs Perl is the possibility of compactness.
But my most dramatic moment of discovery lay ahead. My design had a
problem: I could easily generate configuration files from the user's GUI
actions, but editing them was a much harder problem. Or, rather, reading
them into an editable form was a problem.
The parser for fetchmail's configuration file syntax is rather elaborate.
It's actually written in YACC and Lex, two classic UNIX tools for
generating language-parsing code in C. In order for fetchmailconf to be
able to edit existing configuration files, I thought it would have to
replicate that elaborate parser in Python. I was very reluctant to do
this, partly because of the amount of work involved and partly because I
wasn't sure how to ascertain that two parsers in two different languages
accept the same. The last thing I needed was the extra labor of keeping
the two parsers in synchronization as the configuration language evolved!
This problem stumped me for a while. Then I had an inspiration: I'd let
fetchmailconf use fetchmail's own parser! I added a --configdump option
to fetchmail that would parse .fetchmailrc and dump the result to
standard output in the format of a Python initializer. For the file
above, the result would look roughly like Listing 2 (to save space, some
data not relevant to the example is omitted).
Listing 2
Python could then evaluate the fetchmail --configdump output and have the
configuration available as the value of the variable “fetchmail”.
This wasn't quite the last step in the dance. What I really wanted wasn't
just for fetchmailconf to have the existing configuration, but to turn it
into a linked tree of live objects. There would be three kinds of objects
in this tree: Configuration (the top-level object representing the entire
configuration), Site (representing one of the sites to be polled) and
User (representing user data attached to a site). The example file
describes five site objects, each with one user object attached to it.
I had already designed and written the three object classes (that's what
took four days, most of it spent getting the layout of the widgets just
right). Each had a method that caused it to pop up a GUI edit panel to
modify its instance data. My last remaining problem was somehow to
transform the dead data in this Python initializer into live objects.
I considered writing code that would explicitly know about the structure
of all three classes and use that knowledge to grovel through the
initializer creating matching objects, but rejected that idea because new
class members were likely to be added over time as the configuration
language grew new features. If I wrote the object-creation code in the
obvious way, it would be fragile and tend to fall out of sync when either
the class definitions or the initializer structure changed.
What I really wanted was code that would analyze the shape and members of
the initializer, query the class definitions themselves about their
members, and then adjust itself to impedance-match the two sets.
This kind of thing is called metaclass hacking and is generally
considered fearsomely esoteric—deep black magic. Most object-oriented
languages don't support it at all; in those that do (Perl being one), it
tends to be a complicated and fragile undertaking. I had been impressed
by Python's low coefficient of friction so far, but here was a real test.
How hard would I have to wrestle with the language to get it to do this?
I knew from previous experience that the bout was likely to be painful,
even assuming I won, but I dived into the book and read up on Python's
metaclass facilities. The resulting function is shown in Listing 3, and
the code that calls it is in Listing 4.
Listing 3
Listing 4
That doesn't look too bad for deep black magic, does it? Thirty-two
lines, counting comments. Just from knowing what I've said about the
class structure, the calling code is even readable. But the size of this
code isn't the real shocker. Brace yourself: this code only took me about
ninety minutes to write—and it worked correctly the first time I ran it.
To say I was astonished would have been positively wallowing in
understatement. It's remarkable enough when implementations of simple
techniques work exactly as expected the first time; but my first metaclass
hack in a new language, six days from a cold standing start? Even if we
stipulate that I am a fairly talented hacker, this is an amazing
testament to Python's clarity and elegance of design.
There was simply no way I could have pulled off a coup like this in Perl,
even with my vastly greater experience level in that language. It was at
this point I realized I was probably leaving Perl behind.
This was my most dramatic Python moment. But, when all is said and done,
it was just a clever hack. The long-term usefulness of a language comes
not in its ability to support clever hacks, but from how well and how
unobtrusively it supports the day-to-day work of programming. The day-to-
day work of programming consists not of writing new programs, but mostly
reading and modifying existing ones.
So the real punchline of the story is this: weeks and months after
writing fetchmailconf, I could still read the fetchmailconf code and grok
what it was doing without serious mental effort. And the true reason I no
longer write Perl for anything but tiny projects is that was never true
when I was writing large masses of Perl code. I fear the prospect of ever
having to modify keeper or anthologize again—but fetchmailconf gives me
no qualms at all.
Perl still has its uses. For tiny projects (100 lines or fewer) that
involve a lot of text pattern matching, I am still more likely to tinker
up a Perl-regexp-based solution than to reach for Python. For good recent
examples of such things, see the timeseries and growthplot scripts in the
fetchmail distribution. Actually, these are much like the things Perl did
in its original role as a sort of combination awk/sed/grep/sh, before it
had functions and direct access to the operating system API. For anything
larger or more complex, I have come to prefer the subtle virtues of Python
—and I think you will, too." |