Dataset collection with Flickr
So Ive basically done the following for the past few days (since Wednesday):
- Enumerate all functions in flickr as a table. The webpages they provided was not useful for inter-function comparisons
- Trimmed down the api to just the functions I need / can use (most require user authentication and was not usable)
- And analyzed the responses from those functions
- Designed a schema for storing the data
- Designed the program flow that fetches the data
Next is to actually write the program. I've decided to script it with ruby and have the data stored as flat files, i.e. there will be one file for each table. The files will probably be CSV or TSV with the first line listing the attribute names. Since field values can also be data structures and not mere strings, e.g. Hashes, Arrays, etc, I will encode all field values into JSON. As for the photo files, they will be stored as a set of photoset folders. The filenames will be the photo_id. The folder name will be the photoset_id.