Ian K Rolfe (ianrolfe) wrote,
Ian K Rolfe
ianrolfe

Python: Removing blank lines from a string.

I've been using the Django template system to generate xml and csv files for a project I'm working on, and all is fine. One "cosmetic" issues is all the blank lines that get produced. So I thought I'd just strip them out.
My first thought was actually to do a list comprehension, splitting the string into a list of lines, and re-assembling it without the blank lines. That, in my opinion, is the most "Pythonic" way of doing it. If I was using a language like C or BASIC I'd just search-and-replace double '\n's until no more can be found, but in Python split() and join() was the way most people would do it, if my (admittedly hurried) google is anything to go by.
As a result of my google, I determined that there where basically 4 methods recommended by the peanut gallery:
import re

def method1(txt):
    ret=""
    for l in txt.split("\n"):
        if l.strip()!='':
            ret += l + "\n"
    return ret

def method2(txt):
    return '\n'.join([x for x in txt.split("\n") if x.strip()!=''])

def method3(txt):
    while '\n\n' in txt:
        txt=txt.replace('\n\n','\n')
    return txt

def method4(txt):
    return re.sub("\n\s*\n*", "\n", txt)

Of these methods, 2 & 4 are the easiest to include inline in your code, method 1 may well be just a matter of putting "if l.strip()!='': continue" in your existing program logic, but as functions method 2 looks best to me. I then considered the performance; surely all that search-and-replace was going to be faster than the list comprehension? List comprehensions are often pushed by pythonistas as more efficient than looping, so maybe this isn't the case? I decided therefore to use the timeit module to check this out:
text = """`Twas brillig, and the slithy toves
  Did gyre and gimble in the wabe:
All mimsy were the borogoves,
  And the mome raths outgrabe.


"Beware the Jabberwock, my son!
  The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
  The frumious Bandersnatch!"

He took his vorpal sword in hand:
  Long time the manxome foe he sought --
So rested he by the Tumtum tree,
  And stood awhile in thought.

And, as in uffish thought he stood,
  The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
  And burbled as it came!

One, two! One, two! And through and through
  The vorpal blade went snicker-snack!
He left it dead, and with its head
  He went galumphing back.

"And, has thou slain the Jabberwock?
  Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!'
  He chortled in his joy.


`Twas brillig, and the slithy toves
  Did gyre and gimble in the wabe;
All mimsy were the borogoves,
  And the mome raths outgrabe."""

if __name__=='__main__':
    from timeit import Timer

    n=10000

    print "For",n,"iterations,"
    t = Timer("method1(text)", "from __main__ import method1, text")
    print "Method1 =",1e6*t.timeit(number=n)/n,"uSec/pass"
    t = Timer("method2(text)", "from __main__ import method2, text")
    print "Method2 =",1e6*t.timeit(number=n)/n,"uSec/pass"
    t = Timer("method3(text)", "from __main__ import method3, text")
    print "Method3 =",1e6*t.timeit(number=n)/n,"uSec/pass"
    t = Timer("method4(text)", "from __main__ import method4, text")
    print "Method4 =",1e6*t.timeit(number=n)/n,"uSec/pass"

    """
    >>> 
    For 10000 iterations,
    Method1 = 19.5718990692 uSec/pass
    Method2 = 15.6529871731 uSec/pass
    Method3 = 8.10906783357 uSec/pass
    Method4 = 18.5611668997 uSec/pass
    >>>
    """ 
Wow! The old fashioned replace-in-a-loop is substantially faster than the new-fangled list comprehensions and regular expressions!
I guess that's not entirely to be unexpected, especially in this rather restricted test, but still interesting none the less. That said, the actual time (8-20uS) does rather illustrate how fast modern PC's and languages are. It was only in the 80's that 10us was about the time it took for a microprocessor to add two 16 bit numbers!
Tags: python
Subscribe
  • Post a new comment

    Error

    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment