By photoset vs. collection
Flixport scans all photos in 2 different approaches, by photoset or by collection. User makes the choice with
--by_collection and while photoset is the default approach. By collection means Flixport traverses all collections first, then iterates photoset in each leaf collection. By photoset simply means it iterates all photosets. Why does it matter to user?
The way Flixport figures out the exact destination of a particular photo is based on 3 options:
The destination of each photo is
dest_file_name supports $ syntax. In most cases the default values of
dest_file_name works well.
- The default value of
- When photos are exported by photoset, the default value of
- When photos are exported by collection, the default value of dest_dir is
With the default settings above command line like below
java -jar flixport-0.0.1.jar -d s3:mybucket/flickr
would copy photo
myphoto1.jpg in Album my_album to s3 bucket
mybucket with key
flickr/my_album/myphoto1.jpg when it exports photo by photoset.
If my_album is in collection my_collection and command line runs by collection, the same photo would be
More about the expression
In these expressions,
$f is file,
$s is a photoset and
$c is a collection.
$sis available for
dest_dirwhen photos are exported by photoset.
$care available for
dest_dirwhen photos are exported by collection.
$fand whatever is available for
dest_dirare available for
Max number of files
Max number of files can be limited by
--max_files, so that users can get an idea what's going to happen in the full run. For example:
java -jar flixport-0.0.1.jar -d s3:mybucket/flickr -m 20
Dry run mode
Another way to preview the execution is to dry run the command line tool without actually copying any file. With
--dry_run option, Instead of copying the file, the tool simply logs a message saying it would copy a file from a location to a location. In dry run mode, it becomes particularly important to keep the log files. For example:
java -jar flixport-0.0.1.jar -d s3:mybucket/flickr -r 2>&1 | tee /tmp/flixport.log
By default flixport runs in a single thread, which copies one file after another. This is very inefficient if you have large number of files to copy. Since the command line is making many TCP calls, the thread is mostly idle while waiting for the calls to return. In your final run you almost always need to specify number of threads to use with -t or --threads option to keep the run time reasonable. For example:
java -jar flixport-0.0.1.jar -d s3:mybucket/flickr -t 20