Last week Kellan from Flickr published my interview on code.flickr. I'm still somewhat amazed that they chose me to ask, but then I'm also pleased at how much people are liking snaptrip, and I'm happy to see my words in print, as it were.
I actually compiled my answers a couple of weeks before it was posted, hence the reference to groupr as a "lost project". Now, of course, it's back, but I've already posted a couple of times about that. What I would like to do is - finally, and belatedly - document (and update the released version of) my EXIF machine tagger.
Why bother with such a thing? Flickr will extract EXIF metadata, but it won't allow you to do any aggregate queries on it. (Well, that's not quite true; at dConstruct 2007 Tom Coates leaked some URLs which I picked over, but they don't cover all the useful things I'd like. Plus, it's not documented.) By extracting all the data from my photos into machine tags (and a local SQLite database), it becomes possible to point people at all the photos taken at the wide end of my widest lens, or those taken with a particular make of camera (and to do more complex queries locally).
With that out of the way, how do you go about such a thing? Well, as usual, it's actually a fairly simple joining operation. Get a list of photos, and for each of them, get the EXIF data (using flickr.photos.getExif), then store the data locally, and add tags back to Flickr. There's not much munging invovled - I convert spaces in the EXIF field names to underscores, and some things get put in the "file:" or "camera:" namespace, rather than "exif:" - so it's all pretty straightforward. (I do preserve spaces in the EXIF values, though, by quoting my arguments to the addTags method.)
I also add an meta:exif field with either "none" or the epoch seconds of the time of tagging, so that it's easy to exclude previously-tagged images from being examined again. Another minor niggle is that, to add tags, a script has to be authorised. I copied the code chunk from the flickr_upload script in a Perl module, and it seems to work for me.
However, the fact that users need to get an API key, secret, and then a token, is naturally going to limit the audience for such a script. A few other users have metadata in the "exif:" namespace, but it's not exactly common. It's hard to turn the script into a web app, too, since it needs about a second per image to run, and the first run has to examine your entire library, which these days is typically thousands of images. I may still do it, but I haven't bothered for months, so I wouldn't count on it.
Another drawback is that machine tags are normalised at Flickr. This means that when I query on exposure bias, both -1/3EV and +1/3EV show as just "exif:exposure_bias=13ev". I've been thinking about ways around this - by querying raw tags - but it's not straightforward. (Ways around this normalising, and ways of getting all predicates for a namespace, and values for a namespace (at least within a given user's photos), would have made my list for "things you'd like to see in Flickr" if I'd felt able to get away with being so demanding.)
One final observation is that the script's in Perl, and uses XML (which is, apparently, sometimes compressed at Flickr's end; at least, I had to add Compress::Zlib at one point for some reason). If I was to redo it, either in Python or Ruby, the data would all be fetched as JSON, and it'd probably get a few more users. Ah well. Installing the prereqs shouldn't be too hard.
That said, of course the script, as is, proved useful. I run it manually after an upload, while Tom, who is (as ever) a bit more sensible, has his fork running as a cron job. Either way, please download it, play, and feel free to let me know what you think.