Batteries included: Download, unzip and parse in 13 lines
·1 min
The other day I needed to download some zip files, unpack them, parse the CSV files in them, and return the data as dicts. I did the very same thing a couple of years ago, and although the source is lost, I recall having a Python (2.4?) script of about two screens to do the download – so a hundred lines. When re-implementing the solution now that I know Python and the standard library better, I ended up with 12 lines written in just a few minutes – edited for blogging clarity it clocks in at 13 lines:
import zipfile, urllib, csv
def `get`_items(url):
zip, headers = urllib.urlretrieve(url)
with zipfile.ZipFile(zip) as zf:
csvfiles = [name for name in zf.namelist()
if name.endswith('.csv')]
for filename in csvfiles:
with zf.open(filename) as source:
reader = csv.DictReader([line.decode('iso-8859-1')
for line in source])
for item in reader:
yield item
os.unlink(zip)
As trivial as it is, I think it is a nice example of just how much you can do with very little (coding) effort.
Edit: I created a gist with a cleaned up version using codecs.getreader. I’ll be leaving this version as it is though.