Many discard Python because it uses indentation to denote blocks in a language and non a familiar
C-like syntax. I think that's a big mistake.
Indentation is not a burden because the most natural form of the program is an indented one. In languages
like PL/1, C a pretty printer is a must. But if you consistently uses a pretty printer why waist
two useful symbols ("{" "}" ) for what can be done with comments. That means that indentation
can be always generated automatically from special comments. But the problem is that it is optional
feature, you don't have to use it, but it is highly recommended. You can use pseudo-comments
like "#{" and "#}"
as substitutes.
In Python adding block-delimiting comments is the optional feature and the use of whitespace is the
mandatory feature. Python reversed the priority because whitespace is a better visual aid in understanding
the flow of a program. But if you use comments to mark beginning and end of each block and automatic
indentation, than it's indentation than really matters. And that's great. It's easy to miss delimiters
like "{" and "}" when looking over one screenpage of code.
The problem with indentation is that if block is long it is visually lost, as you do not see on
the screen where the block starts. Folding in such cases might help. So working with large
scripting in Python presuppose using pretty advanced editor. Which might be a good thing, as I hate
programmers who use nano ;-)
The choice in Python was in favor of pretty printer and I tend to support it on the basis on usability
issues. Languages such as Perl and Tcl made their design decisions based on familiarity with C rather
than usability. They simply borrowed the successful C-based notation of using curly braces for blocks
(which actually is "one one step forward, two steps back" from PL/1 notation, which served as prototype for
C, as Pl/1 allows to close multiple blocks using labels and C does not ).
C misses some earlier PL/1 extensions like closing several blocks with one delimiter. Here in Python
one get this nice feature for free.
So forced conversion of tabs into spaces is a "must" in Python. This "gotcha" is often
missed in Python introductory books. Explanation in python.org is primitive and missed this gotcha:
See an interesting Slashdot discussion below that shed some additional light on the situation.
Python: Myths about Indentation
Note: Lines beginning with " >>> " and " ... " indicate input
to Python (these are the default prompts of the interactive interpreter). Everything else is
output from Python. |
There are quite some prejudices and myths about Python's indentation rules among people who don't
really know Python. I'll try to address a few of these concerns on this page.
"Whitespace is significant in Python source code."
No, not in general. Only the indentation level of your statements is significant (i.e. the whitespace
at the very left of your statements). Everywhere else, whitespace is not significant and can be used
as you like, just like in any other language. You can also insert empty lines that contain nothing
(or only arbitrary whitespace) anywhere.
Also, the exact amount of indentation doesn't matter at all, but only the relative indentation
of nested blocks (relative to each other).
Furthermore, the indentation level is ignored when you use explicit or implicit continuation lines.
For example, you can split a list across multiple lines, and the indentation is completely insignificant.
So, if you want, you can do things like this:
|
>>> foo = [
... 'some string',
... 'another string',
... 'short string'
... ]
>>> print foo
['some string', 'another string', 'short string']
|
>>> bar = 'this is ' \
... 'one long string ' \
... 'that is split ' \
... 'across multiple lines'
>>> print bar
this is one long string that is split across multiple lines
|
|
"Python forces me to use a certain indentation style."
Yes and no. First of all, you can write the inner block all on one line if you like, therefore
not having to care about intendation at all. The following three versions of an "if" statement are
all valid and do exactly the same thing (output omitted for brevity):
|
>>> if 1 + 1 == 2:
... print "foo"
... print "bar"
... x = 42
>>> if 1 + 1 == 2:
... print "foo"; print "bar"; x = 42
>>> if 1 + 1 == 2: print "foo"; print "bar"; x = 42
|
Of course, most of the time you will want to write the blocks in separate lines (like the first version
above), but sometimes you have a bunch of similar "if" statements which can be conveniently written
on one line each.
If you decide to write the block on separate lines, then yes, Python forces you to obey its indentation
rules, which simply means: The enclosed block (that's two "print" statements and one assignment in
the above example) have to be indented more than the "if" statement itself. That's it. And frankly,
would you really want to indent it in any other way? I don't think so.
So the conclusion is: Python forces you to use indentation that you would have used anyway, unless
you wanted to obfuscate the structure of the program. In other words: Python does not allow to obfuscate
the structure of a program by using bogus indentations. In my opinion, that's a very good thing.
Have you ever seen code like this in C or C++?
|
/* Warning: bogus C code! */
if (some condition)
if (another condition)
do_something(fancy);
else
this_sucks(badluck);
|
Either the indentation is wrong, or the program is buggy, because an "else" always applies to the
nearest "if", unless you use braces. This is an essential problem in C and C++. Of course, you could
resort to always use braces, no matter what, but that's tiresome and bloats the source code, and
it doesn't prevent you from accidentally obfuscating the code by still having the wrong indentation.
(And that's just a very simple example. In practice, C code can be much more complex.)
In Python, the above problems can never occur, because indentation levels and logical block structure
are always consistent. The program always does what you expect when you look at the indentation.
Quoting the famous book writer Bruce Eckel:
Because blocks are denoted by indentation in Python, indentation is uniform in Python programs.
And indentation is meaningful to us as readers. So because we have consistent code formatting,
I can read somebody else's code and I'm not constantly tripping over, "Oh, I see. They're putting
their curly braces here or there." I don't have to think about that.
"You cannot safely mix tabs and spaces in Python."
That's right, and you don't want that. To be exact, you cannot safely mix tabs and spaces in C
either: While it doesn't make a difference to the compiler, it can make a big difference to humans
looking at the code. If you move a piece of C source to an editor with different tabstops, it will
all look wrong (and possibly behave differently than it looks at first sight). You can easily introduce
well-hidden bugs in code that has been mangled that way. That's why mixing tabs and spaces in C isn't
really "safe" either. Also see the "bogus C code" example above.
Therefore, it is generally a good idea not to mix tabs and spaces for indentation. If you use
tabs only or spaces only, you're fine.
Furthermore, it can be a good idea to avoid tabs alltogether, because the semantics of tabs are
not very well-defined in the computer world, and they can be displayed completely differently on
different types of systems and editors. Also, tabs often get destroyed or wrongly converted during
copy&paste operations, or when a piece of source code is inserted into a web page or other kind of
markup code.
Most good editors support transparent translation of tabs, automatic indent and dedent. That is,
when you press the tab key, the editor will insert enough spaces (not actual tab characters!) to
get you to the next position which is a multiple of eight (or four, or whatever you prefer), and
some other key (usually Backspace) will get you back to the previous indentation level.
In other words, it's behaving like you would expect a tab key to do, but still maintaining portability
by using spaces in the file only. This is convenient and safe.
Having said that -- If you know what you're doing, you can of course use tabs and spaces to your
liking, and then use tools like "expand" (on UNIX machines, for example) before giving the source
to others. If you use tab characters, Python assumes that tab stops are eight positions apart.
"I just don't like it."
That's perfectly OK; you're free to dislike it (and you're probably not alone). Granted, the fact
that indentation is used to indicate the block structure might be regarded as uncommon and requiring
to get used to it, but it does have a lot of advantages, and you get used to it very quickly when
you seriously start programming in Python.
Having said that, you can use keywords to indicate the end of a block (instead of indentation),
such as " endif
". These are not really Python keywords, but there is a tool that comes
with Python which converts code using "end" keywords to correct indentation and removes those keywords.
It can be used as a pre-processor to the Python compiler. However, no real Python programmer uses
it, of course.
[Update] It seems this tool has been removed from recent versions of Python. Probably because
nobody really used it.
"How does the compiler parse the indentation?"
The parsing is well-defined and quite simple. Basically, changes to the indentation level are
inserted as tokens into the token stream.
The lexical analyzer (tokenizer) uses a stack to store indentation levels. At the beginning, the
stack contains just the value 0, which is the leftmost position. Whenever a nested block begins,
the new indentation level is pushed on the stack, and an "INDENT" token is inserted into the token
stream which is passed to the parser. There can never be more than one "INDENT" token in a row.
When a line is encountered with a smaller indentation level, values are popped from the stack
until a value is on top which is equal to the new indentation level (if none is found, a syntax error
occurs). For each value popped, a "DEDENT" token is generated. Obviously, there can be multiple "DEDENT"
tokens in a row.
At the end of the source code, "DEDENT" tokens are generated for each indentation level left on
the stack, until just the 0 is left.
Look at the following piece of sample code:
|
>>> if foo:
... if bar:
... x = 42
... else:
... print foo
...
|
In the following table, you can see the tokens produced on the left, and the indentation stack on
the right.
|
<if> <foo> <:> [0]
<INDENT> <if> <bar> <:> [0, 4]
<INDENT> <x> <=> <42> [0, 4, 8]
<DEDENT> <DEDENT> <else> <:> [0]
<INDENT> <print> <foo> [0, 2]
<DEDENT> [0]
|
Note that after the lexical analysis (before parsing starts), there is no whitespace left
in the list of tokens (except possibly within string literals, of course). In other words, the indentation
is handled by the lexer, not by the parser.
The parser then simply handles the "INDENT" and "DEDENT" tokens as block delimiters -- exactly
like curly braces are handled by a C compiler.
The above example is intentionally simple. There are more things to it, such as continuation lines.
They are well-defined, too, and you can read about them in the Python Language Reference if you're
interested, which includes a complete formal grammar of the language.
Kragen Sitaker
[email protected]
Wed, 26 Jun 2002 04:19:08 -0400 (EDT)
This kragen-tol article was published (with some of the mistakes removed) in ;login: this month; I had the following interesting conversation with a reader, which I repost here with his permission.
Date: Thu, 13 Jun 2002 14:32:50 -0700 (PDT)
From: "Simon J. Gerraty" <[email protected]>
Subject: Re: python indent sensitive
Hi,
I was reading your article in ;Login: regarding python cf. perl.
In your comments about python being indentation sensitive you
assert that anyone who finds this silly is an unthinking moron.
I recall looking at the python web pages some years ago - since python
sounded cool, and saw a claim that "there is a very good reason
python doesn't use braces", but that reason was never given.
Since you clearly hold that python's indentation sensitivity is a
bonus, perhaps you could explain why - with actual reasons that
is.
Its possible I'm wrong to think that python would have been a very
good language if only it used braces for grouping rather than
indentation, but absent something a bit more convincing than
"its better", I can only imagine it being a source of hard to track bugs.
Thanks
--sjg
Date: Thu, 13 Jun 2002 18:52:15 -0400
From: Kragen Sitaker <[email protected]>
Subject: Re: python indent sensitive
Simon J. Gerraty writes:
> I was reading your article in ;Login: regarding python cf. perl.
> In your comments about python being indentation sensitive you
> assert that anyone who finds this silly is an unthinking moron.I haven't seen the printed version yet, but I think I said that certain
kinds of "unthinking morons" (your words, not mine) didn't like it,
but I specifically said there might be reasons other than being an
"unthinking moron" that one might not like it.
Many people who "unthinkingly adhere to stupid traditions" (my words
from the article) are quite intelligent.
> Since you clearly hold that python's indentation sensitivity is a
> bonus, perhaps you could explain why - with actual reasons that
> is.
With braces, I have to say what I mean twice --- once with the braces,
so the compiler can understand me, and once with indentation, so humans
can understand me. If I get it wrong, I have an unseen bug. As a C,
C++, and Perl programmer, I eventually developed very sharp eyes for
things like this:
if (foo)
debug_printf("foo happened\n");
barf();
do_something_else();
In Python, I don't have to waste my attention on that any more.
Whitespace is a preattentive feature; it is processed by my retina
and visual cortex without any conscious attention on my part and very
little effort, and I can see *all* of the whitespace in a large visual
display in less than a quarter of a second, about the time it takes my
eye to move from one spot to another. It is a major benefit to be able
to understand the nesting structure of a whole screen or page of source
code automatically and with no effort. That is why indentation to show
nesting in source code is universal today.
Python simply removes the possibility that your subconscious, preattentive
understanding of your program's structure is incorrect. Knowing that
a program is written in Python also eliminates the possibility that
it is unindented; an unindented C program can be run through 'indent',
but an unindented Perl program is a nightmare.
People occasionally suggest that this lack of redundancy will lead to
bugs, as a misindented statement won't be noticed later. (Specifically,
a statement that should be inside a block being after the block, or
vice versa.) I've introduced lots of bugs in programs, and I have
kept exhaustive records of the bugs I found in many small programs.
I cannot recall any bugs of this nature in my Python or Perl programs,
but I have introduced this kind of bug a few times in C and C++ programs,
usually in the way shown above.
It is rarely subtle in its effects, but it is often hard to find.
I don't argue that redundancy is always bad, but that in this particular
case, the redundancy proves not to be worth its price.
For my thoughts on when redundancy in syntax (and
programming languages in general) is a good idea, see
http://lists.canonical.org/pipermail/kragen-tol/2002-April/000705.html
"Lisp syntax considered insufficiently redundant" and
http://www.paulgraham.com/redund.html "Redundancy and Power".
Subject: Re: python indent sensitive
Date: Thu, 13 Jun 2002 17:44:08 -0700
From: "Simon J. Gerraty" <[email protected]>
Hi, I dropped the login address [the original mail was Cc:ed to
[email protected]] - since it was wrong anyway :-)
> > Since you clearly hold that python's indentation sensitivity is a
> > bonus, perhaps you could explain why - with actual reasons that
> > is.
>
> With braces, I have to say what I mean twice --- once with the braces,
> so the compiler can understand me, and once with indentation, so humans
> can understand me. If I get it wrong, I have an unseen bug. As a C,
Hmmm, interesting. I'm thinking more of the case where I put the
braces in and the editor does the indenting for me - emacs, no pain at
all.
Further, while indentation alone may suffice for a screen's worth of
code, I find it inadequate for grossly large blocks of code. Note
that I often have to debug programs written by others so the fact that
a huge function should be re-organized is orthogonal to understanding
what it is doing. Being able to ask a tool like an editor "go show me
where this block started" is very handy. Matching braces etc makes
this easy.
Now of course it could be the case that there is a decent python mode
for emacs that handles all this.
> code automatically and with no effort. That is why indentation to show
> nesting in source code is universal today.
Sure, but using braces makes it simpler to write tools to
munge/analize code. Does python treat a line of code indented with
a TAB differently to code indented with spaces?
> it is unindented; an unindented C program can be run through 'indent',
> but an unindented Perl program is a nightmare.
Yep, any unindented code is a nightmare.
> People occasionally suggest that this lack of redundancy will lead to
> bugs, as a misindented statement won't be noticed later. (Specifically,
> a statement that should be inside a block being after the block, or
> vice versa.) I've introduced lots of bugs in programs, and I have
> kept exhaustive records of the bugs I found in many small programs.
> I cannot recall any bugs of this nature in my Python or Perl programs,
> but I have introduced this kind of bug a few times in C and C++ programs,
> usually in the way shown above.
Again, I'm more concerned about code I didn't write. Anything that
assists me in analyzing the code for bugs is good.
Anyway, thanks for elaborating on "its better".
--sjg
From: [email protected] (Kragen Sitaker)
Date: Wed, 19 Jun 2002 15:43:00 -0400 (I think)
Subject: Re: python indent sensitive
You write:
> Hmmm, interesting. I'm thinking more of the case where I put the
> braces in and the editor does the indenting for me - emacs, no pain at
> all.
Well, when you're *writing* code --- typing it out one line after
another --- it's usually not a problem, with or without braces. It's
when you're *editing* code that the problem comes up. In Emacs, at
least, I experience some pain when you're creating or deleting blocks
around existing code; sometimes I revert to C-n TAB C-n TAB, but there
is of course M-C-\, which requires me to select the affected code
first, after having already added or deleted braces. In Python, I
typically select the affected code and hit C-c > or C-c <.
I should record an editing session sometime to identify the particular
transformations I'm really performing instead of the ones that come to
mind.
When you're *reading* code, the braces just take up space without
conveying any information not already conveyed by the whitespace.
> Further, while indentation alone may suffice for a screen's worth of
> code, I find it inadequate for grossly large blocks of code.
Meaning that you need both block structure and editor support for it?
> Note that I often have to debug programs written by others so the
> fact that a huge function should be re-organized is orthogonal to
> understanding what it is doing. Being able to ask a tool like an
> editor "go show me where this block started" is very handy.
> Matching braces etc makes this easy.
Well, unfortunately forward-sexp and backward-sexp don't think Python
blocks are sexps, but when I want to know where a block began, I
usually move to the end, hit Enter, then hit Backspace a few times.
Each Backspace moves me back one indent level (like } in C-mode with
auto-mode turned on) and prints a line in the minibuffer telling me
where the block I'm closing began. This isn't ideal, but it's usually
good enough.
Someone has hacked up an outline minor-mode for Python thet lets you
collapse functions and classes; I wish I had something like that for
arbitrary blocks.
> Sure, but using braces makes it simpler to write tools to
> munge/analize code. Does python treat a line of code indented with
> a TAB differently to code indented with spaces?
It treats it as if it had been run through 'expand -8'.
I often find indentation more useful for writing automated tools, too;
questions like the following are easier to answer by indentation than
by parsing. (They all can get fouled up by Python's multiline
strings, though, because those can be arbitrarily indented or
outdented.)
- - what are the top-level constructs in this module?
- - what are the global functions defined in this module?
- - what lines of code are conditional on the variable DEBUG?
You can do shallow processing like this correctly without parsing the
code (just tokenizing it) if you have indentation to help you.
I originally posted the comparison article to a mailing list of my
own; would you mind if I forwarded this conversation about it to the
list too?
Subject: Re: python indent sensitive
Date: Wed, 19 Jun 2002 15:54:08 -0700
From: "Simon J. Gerraty" <[email protected]>
> > Further, while indentation alone may suffice for a screen's worth of
> > code, I find it inadequate for grossly large blocks of code.
>
> Meaning that you need both block structure and editor support for it?
Not sure I follow.
> > Note that I often have to debug programs written by others so the
> > fact that a huge function should be re-organized is orthogonal to
> > understanding what it is doing. Being able to ask a tool like an
> > editor "go show me where this block started" is very handy.
> > Matching braces etc makes this easy.
>
> Well, unfortunately forward-sexp and backward-sexp don't think Python
> blocks are sexps, but when I want to know where a block began, I
I guess that's my point. A little ginger bread like braces would
make python look a bit more like other langs that emacs groks well.
> I originally posted the comparison article to a mailing list of my
> own; would you mind if I forwarded this conversation about it to the
> list too?
Sure, no problem.
--sjg
--
<[email protected]> Kragen Sitaker <http://www.pobox.com/~kragen/>
A good conversation and even lengthy and heated conversations are probably
some of the most important pointful things I can think of. They are the
antithesis of pointlessness!
-- Matt O'Connor <[email protected]>