I’m a Unix guy, but the participants in my Python classes overwhelmingly use . Inevitably, when we get to talking about working with files in Python, someone will want to open a file using the complete path to the file.  And they’ll end up writing something like this:

filename = 'c:abcdefghi.txt'

But when my students try to open the file, they discover that Python gives them an error, indicating that the file doesn’t exist!  In other words, they write:

for one_line in open(filename):    print(one_line)

What’s the problem?  This seems like pretty standard Python, no?

Remember that in Python normally contain characters. Those characters are normally printable, but there are times when you want to include a character that isn’t really printable, such as a newline.  In those cases, Python (like many programming languages) includes special codes that will insert the special character.

The best-known example is newline, aka ‘n’, or ASCII 10. If you want to insert a newline into your Python string, then you can do so with ‘n’ in the middle.  For example:

s = 'abcndefnghi'

When we print the string, we’ll see:

>>> print(s)

abc

def

ghi

What if you want to print a literal ‘n’ in your code? That is, you want a , followed by an “n”?  Then you’ll need to double the :The “\” in a string will result in a single character. The following “n” will then be normal. For example:

s = 'abc\ndef\nghi'

When we say:

>>> print(s)

abcndefnghi

It’s pretty well known that you have to guard against this translation when you’re working with n. But what other characters require it? It turns out, more than many people might expect:

  • a — alarm bell (ASCII 7)
  • b — backspace (ASCII
  • f — form feed
  • n — newline
  • r — carriage return
  • t — tab
  • v — vertical tab
  • ooo —  character with octal value ooo
  • xhh — character with hex value hh
  • N{name} — Unicode character {name}
  • uxxxx — Unicode character with 16-bit hex value xxxx
  • Uxxxxxxxx — Unicode character with 32-bit hex value xxxxxxxx

In my experience, you’re extremely unlikely to use some of these on purpose. I mean, when was the last time you needed to use a form feed character? Or a vertical tab?  I know — it was roughly the same day that you drove your dinosaur to work, after digging a well in your backyard for drinking water.

But nearly every time I teach Python — which is, every day — someone in my class bumps up against one of these characters by mistake. That’s because the combination of the backslashes used by these characters and the backslashes used in Windows paths makes for inevitable, and frustrating, bugs.

Remember that path I mentioned at the top of the blog post, which seems so innocent?

filename = 'c:abcdefghi.txt'

It contains a “a” character. Which means that when we print it:

>>> print(filename)
c:bcdefghi.txt

See? The “a” is gone, replaced by an alarm bell character. If you’re lucky.

So, what can we do about this? Double the backslashes, of course. You only need to double those that would be turned into special characters, from the table I’ve reproduced above: But come on, are you really likely to remember that “f” is special, but “g” is not?  Probably not.

So my general rule, and what I tell my students, is that they should always double the backslashes in their Windows paths. In other words:

>>> filename = 'c:\abc\def\ghi.txt'

>>> print(filename)
c:abcdefghi.txt

It works!

But wait: No one wants to really wade through their pathnames, doubling every backslash, do they?  Of course not.

That’s where Python’s strings can help. I think of strings in two different ways:

  • what-you-see-is-what-you-get strings
  • automatically doubled backslashes in strings

Either way, the effect is the same: All of the backslashes are doubled, so all of these pesky and weird special characters go away.  Which is great when you’re working with Windows paths.

All you need to do is put an “r” before the opening quotes (single or double):

>>> filename = r'c:abcdefghi.txt'

>>> print(filename)
c:abcdefghi.txt

Note that a “raw string” isn’t really a different type of string at all. It’s just another way of entering a string into Python.  If you check, type(filename) will still be “str”, but its backslashes will all be doubled.

Bottom line: If you’re using Windows, then you should just write all of your hard-coded pathname strings as raw strings.  Even if you’re a Python expert, I can tell you from experience that you’ll bump up against this problem sometimes. And even for the best of us, finding that stray “f” in a string can be time consuming and frustrating.

PS: Yes, it’s true that Windows users can get around this by using forward slashes, like we Unix folks do. But my students find this to be particularly strange looking, and so I don’t see it as a general-purpose solution.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here