Sunday, June 22, 2014

Python + MongoDB

The opportunity to brush up on Python and MongoDB presented itself, so I'm spending some time doing just that.  I played with Python a few years back, in my GE days, but haven't made my way back to it.  I love REPL based languages because you can just fire it up and go!

So I have Mongo running on my machine, and I need to pump some data into it.  Since it wants JSON, I figured I'd just hit some web api out there, grab a bunch of JSON data, and then pump it into Mongo.  I already had an api key for imgur, because I was messing around with uploading camera images to it from an Android app.

So I poked around a little bit on the web api documentation, and pretty quickly I had a legit response.  It looks like the 'requests' library (http://docs.python-requests.org) is pretty popular.  So first thing I had to learn to do was import a 3rd party library.  Looks like this is easier in the Unix world because they have command line retrieval built in.  Windows appears to have it too, but of course you have to download it.

Anyway, I learned the "manual" way.  Basically, fetch the zip file, extract it, cd into the folder and run a command and it pulls it into your site-packages folder and you're good to go.

I do have to say, 5 lines is pretty terse.  And 2 of those are import statements.  I love terse.

 import json  
 import requests  
 url = 'https://api.imgur.com/3/gallery/g/memes/'  
 header = {'Authorization': 'Client-ID 31d84e7e126befd'}  #not my real api key!
 r = requests.get(url, headers=header)  

Here's an example of one item returned from the above call:

{u'account_id': 11545168,
 u'account_url': u'PCard',
 u'animated': False,
 u'bandwidth': 76530096,
 u'datetime': 1403397798,
 u'description': None,
 u'downs': 0,
 u'favorite': False,
 u'height': 820,
 u'id': u'1DZ31Oy',
 u'is_album': False,
 u'link': u'http://i.imgur.com/1DZ31Oy.png',
 u'nsfw': False,
 u'score': 2,
 u'section': None,
 u'size': 708612,
 u'subtype': u'Good Fireman Greg',
 u'title': u'Good Fireman Greg',
 u'type': u'image/png',
 u'ups': 2,
 u'views': 108,
 u'vote': None,
 u'width': 610}


I am working with memes here, because I think they are hilarious.  Good Guy Greg, Bad Luck Brian, etc.

Here's how we add all of the items to the Mongo database.  Note that we can pass the list, and it will be treated as a bulk insert.

 import pymongo  
 from pymongo import MongoClient  
 client = MongoClient()   
 db = client.imgur  
 db.memes.insert(r['data'])  

Now that we've got some data in our DB, we can query.  Let's say that I wanted to get all of the "Scumbag Steve" memes:


 c = db.memes.find({"subtype": "Scumbag Steve"}) #c is a db cursor  
 #print them to screen  
 for i in c:  
   pprint(i)  

There you have it!  Using Python to hit the Imgur Web Api, creating and populating a MongoDB with that data, and querying back from MongoDB.