S3 Sync
Find a file
2019-05-14 07:05:48 +01:00
.github [github] Add stale configuration 2019-05-14 07:05:48 +01:00
project [gitignote] update to allow some project files 2019-05-11 08:54:35 +01:00
src Use logging in place of println 2019-05-11 20:18:55 +01:00
.gitignore [gitignore] ignore any generated native-image 2019-05-13 22:49:50 +01:00
build.sbt [sbt] Add scala-logging 3.9.2 / slf4j-log4j12 1.7.25 as dependencies 2019-05-11 19:35:26 +01:00
README.org [readme] update to do list 2019-05-13 22:52:33 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Based on Alex Kudlick's JavaScript implementation aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

s3thorp
Usage: S3Thorp [options]

  -s, --source <value>  Source directory to sync to S3
  -b, --bucket <value>  S3 bucket name
  -p, --prefix <value>  Prefix within the S3 Bucket

Creating Native Images

  • Download and install GraalVM

  • Install native-image using the graal updater

      gu install native-image
    
  • Create native image

      native-image -cp `sbt 'export runtime:fullClasspath'|tail -n 1` \
                   -H:Name=s3thorp \
                   -H:Class=net.kemitix.s3thorp.Main \
                   --allow-incomplete-classpath \
                   --force-fallback
    
  • Resulting file requires a JDK for execution

TO DO

  • Improve test coverage
  • Create os-native binaries
  • Replace println with real logging
  • Add support for logging options
  • Add support for exclusion filters
  • Bulk fetching of Hash values from S3
  • ? When lastModified matches local file, skip calculating local MD5 ?
  • Add support for multi-part uploads for large files
  • Add support for upload progress - may only be available with multi-part uploads