Python tips – Handle text file

There are two text files, each are 10 million lines, the size of the text file at about 100M. Now need to know that the two documents there is cross-check the number of lines, in other words, we want to know the the number of lines simultaneously in the two documents exist. Each text file here is unique, so they do not have any duplicate rows. Python set could do this very easy and higher efficient than shell, awk.

#!/usr/bin/python
a = set(open(”data.uniq.1″))
b = set(open(”date.uniq.2″))
print len(a; b)

Here I find a blog in Chinese also description this tips

One thought on “Python tips – Handle text file

  1. I don’t believe that Python is faster than AWK, and even if it were, which I *know* he isn’t, I can always compile AWK code into a straight binary executable, so as long as Python doesn’t get a state of the art compiler, he will *NEVER* be faster than AWK!

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s