Last week Kellan from Flickr published my interview on code.flickr. I'm still somewhat amazed that they chose me to ask, but then I'm also pleased at how much people are liking snaptrip, and I'm happy to see my words in print, as it were.
I actually compiled my answers a couple of weeks before it was posted, hence the reference to groupr as a "lost project". Now, of course, it's back, but I've already posted a couple of times about that. What I would like to do is - finally, and belatedly - document (and update the released version of) my EXIF machine tagger.
Why bother with such a thing? Flickr will extract EXIF metadata, but it won't allow you to do any aggregate queries on it. (Well, that's not quite true; at dConstruct 2007 Tom Coates leaked some URLs which I picked over, but they don't cover all the useful things I'd like. Plus, it's not documented.) By extracting all the data from my photos into machine tags (and a local SQLite database), it becomes possible to point people at all the photos taken at the wide end of my widest lens, or those taken with a particular make of camera (and to do more complex queries locally).
With that out of the way, how do you go about such a thing? Well, as usual, it's actually a fairly simple joining operation. Get a list of photos, and for each of them, get the EXIF data (using flickr.photos.getExif), then store the data locally, and add tags back to Flickr. There's not much munging invovled - I convert spaces in the EXIF field names to underscores, and some things get put in the "file:" or "camera:" namespace, rather than "exif:" - so it's all pretty straightforward. (I do preserve spaces in the EXIF values, though, by quoting my arguments to the addTags method.)
I also add an meta:exif field with either "none" or the epoch seconds of the time of tagging, so that it's easy to exclude previously-tagged images from being examined again. Another minor niggle is that, to add tags, a script has to be authorised. I copied the code chunk from the flickr_upload script in a Perl module, and it seems to work for me.
However, the fact that users need to get an API key, secret, and then a token, is naturally going to limit the audience for such a script. A few other users have metadata in the "exif:" namespace, but it's not exactly common. It's hard to turn the script into a web app, too, since it needs about a second per image to run, and the first run has to examine your entire library, which these days is typically thousands of images. I may still do it, but I haven't bothered for months, so I wouldn't count on it.
Another drawback is that machine tags are normalised at Flickr. This means that when I query on exposure bias, both -1/3EV and +1/3EV show as just "exif:exposure_bias=13ev". I've been thinking about ways around this - by querying raw tags - but it's not straightforward. (Ways around this normalising, and ways of getting all predicates for a namespace, and values for a namespace (at least within a given user's photos), would have made my list for "things you'd like to see in Flickr" if I'd felt able to get away with being so demanding.)
One final observation is that the script's in Perl, and uses XML (which is, apparently, sometimes compressed at Flickr's end; at least, I had to add Compress::Zlib at one point for some reason). If I was to redo it, either in Python or Ruby, the data would all be fetched as JSON, and it'd probably get a few more users. Ah well. Installing the prereqs shouldn't be too hard.
That said, of course the script, as is, proved useful. I run it manually after an upload, while Tom, who is (as ever) a bit more sensible, has his fork running as a cron job. Either way, please download it, play, and feel free to let me know what you think.
Good job Paul, that's quite similar to what I had in mind for my Flickr EXIF Explorer - shame that I never had enough time to get a proper public version out. Oh and thanks for the mention in the Flickr Dev blog :)
Posted by: dartar | 11/12/2008 at 10:42 PM
Hey blech,
I found your script and got started using it.
Posted by: dmourati | 02/17/2009 at 12:09 AM
$ ./flickr_exif_machinetag.pl
Initialised
Got page 1 (of 7) of photos
Got page 2 (of 7) of photos
Got page 3 (of 7) of photos
Got page 4 (of 7) of photos
Got page 5 (of 7) of photos
Got page 6 (of 7) of photos
Got page 7 (of 7) of photos
I have 3106 to process
Can't
locate object method "size" via package "f/4.5" (perhaps you forgot to
load "f/4.5"?) at ./flickr_exif_machinetag.pl line 115.
I'm guessing f/4.5 is the aperture?
Any tips?
Thanks.
-D
Posted by: dmourati | 02/17/2009 at 12:10 AM
Hi. I think some patches I recieved work on certain cameras but not others. If you're comfortable editing the file, then comment out the four lines from 113 to 116:
# if ($data->size() > 1) {
# $data = $data->shift();
# }
# $data = $data->to_literal();
Probably this really needs to be made more robust, by checking to see if size() is callable on $data in the first place, then doing so if necessary, but for now, that should fix it.
Posted by: Paul Mison | 02/17/2009 at 08:56 AM
[this is good]
Posted by: blackmanos | 04/07/2009 at 04:54 AM
Nice.. I was searching for flickr api examples using perl so I could write a script to ... do exactly this. Bonus.
I've run into a slight snag, though. My script is dying midway through with this error:
I have 2579 to process
:5: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xB0 0x40 0x3C 0x2F
I tried adding this after it pulls the content:
Posted by: Chris Wage | 07/23/2009 at 08:27 PM
Er, your script, that is..
Posted by: Chris Wage | 07/23/2009 at 08:27 PM
Uh, okay, i guess there's a length-limit on comments:
I tried adding this after it pulls the content:
$content = encode('utf-8', $content);
but that yields:
Got 1 exif tags for 1210014874.
Error calling flickr.photos.addTags: Invalid signature
I assume changing the encoding mucks up the signature somehow.. ideas?
Posted by: Chris Wage | 07/23/2009 at 08:29 PM
Hi. Sorry if the delay's been a bit slow in coming.
I've never had XML issues with my photos, so I'm a bit at a loss as to what your exact issue is. I am aware that sometimes Flickr ends up storing invalid Unicode sequences, and that you can't arbitrarily mess with the content that you're going to send back, because (as you've found) the signature needs to be correct.
If I was to redo this now I'd use JSON rather than XML (I have a patch against Flickr::API that enables this), but in the meantime I'd suggest catching the error and skipping that photo.
Good luck.
Posted by: Paul Mison | 07/26/2009 at 01:03 PM
[this is good]
Posted by: Chris Devers | 11/11/2009 at 06:05 AM