S3 Sync
Find a file
Paul Campbell 082babb94d
Use multi-part upload for large files (i.e. files > 5Mb) (#22)
* [ThorpS3Client] Extract QuoteStripper and S3ClientObjectLister

* [ThorpS3Client] Extract S3ClientUploader

* [ThorpS3Client] Extract S3ClientCopier

* [ThorpS3Client] Extract S3ClientDeleter

* [ThropS3Client] Can select upload strategy based on file size

Currently switches to an alternate that is a clone of the original
method.

* [MD5HashGenerator] Add md5FilePart

Reimplement md5File using md5FilePart

* [MyS3CatsIOClient] extracted

* [S3ClientMultiPartUploader] add tests for accept def

* [S3ClientMultiPartUploader] initiate multi-part upload

* [Md5HashGenerator] add tests reading part of a file = failing test

* [Md5HashGenerator] fix when reading part of a file

* [S3ClientMultiPartUploader] create UploadPartRequests

* [S3ClientMultiPartUploader] uploadPart delegates to an S3Client

* [S3ClientMultiPartUploader] uploadParts uploads each part

* [S3ClientMultiPartUploader] complete upload should completeUpload

* [S3ClientMultiPartUploader] upload file tests when all okay

* [S3ClientMultiPartUploader] Use Recording client in component tests

* [s3ClientMultiPartUploader] remove unused variable

* [S3ClientMultiPartUploader] failing test for init upload error

* [S3ClientMultiPartUploader] Handle errors during multi-part upload

* [S3ClientMultiPartUploader] Retry uploads

* [S3Action] ErroredS4Action now holds the error

* [S3ClientMultiPartUploader] Add logging

* [S3ClientMultiPartUploader] Display warning messages

* [S3ClientMultiPartUploader] test creation of CreateMulitpartUploadRequest

* [S3ClientMultiPartUploader] specify bucket in UploadPartRequest

* [S3ClientMultiPartUploader] verify complete request has upload id

* [S3ClientMultiPartUploader] verify abort request contains upload id

* [S3ClientMultiPartUploader] add logging around retry errors

* [S3ClientMultiPartUploader] verify upload part request had remote key

* [S3ClientMultipartuploaderLogging] refactoring/rewriting strings

* [S3ClientMultiPartUploader] add bucket to abort request

* [S3ClientMultiPartUploader] part numbers must start at 1

* [S3ClientMultiPartUploader] fix capitalisation in comment

* [Config] define maxRetries

* [S3ClientMultiPartUploader] abort request should have the remote key

* [S3ClientMultiPartUploader] display remote key properly

* [S3ClientMultiPartUploader] rename method for plural parts

* [S3ClientMultiPartUploader] log hash and part number

* [MD5HashGenerator] support creating hash from a byte array

* [sbt] add aws-java-sdk-s3 (v1) for multi-part uploads

The reactive-aws-s3-* library is based on the V2 of the Java library,
which doesn't support multi-part uploads.

* [S3ClientMultiPartUploader] use Amazon S3 Client (from v1 sdk)

* [S3ClientMultiPartUploader] include file and offset in upload part request

* {S3ClientMultiPartUploader] Add part etags to complete request

* [S3ClientMultiPartUploader] Use withers to create requests

* [S3ClientMultiPartUploader] don't bounce responses to tags when client accepts then as is

* [MD5HashGenerator] use MD5Hash

* [S3ClientMultiPartUploader] include hash in sending log message

* [S3ClientMultiPartUploader] tests throw correct exception

* [S3ClientMultiPartUploader] Include returned hash in error and log when send is finished

* [S3ClientUploader] Extract as trait, renaming implementations

* [S3Client] upload def now requires tryCount

* [S3ClientUploader] add accepts to trait

* [S3ClientMultiPartUploaderSuite] remove ambiguity over class import

* [S3ClientMultiPartTransferManager] implement and use
2019-05-27 20:37:59 +01:00
.github [github] Add stale configuration 2019-05-14 07:05:48 +01:00
project [gitignote] update to allow some project files 2019-05-11 08:54:35 +01:00
src Use multi-part upload for large files (i.e. files > 5Mb) (#22) 2019-05-27 20:37:59 +01:00
.gitignore [gitignore] ignore zip files 2019-05-14 07:27:14 +01:00
.travis.yml [travis] define AWS_REGION environment variable 2019-05-16 19:28:50 +01:00
build.sbt Use multi-part upload for large files (i.e. files > 5Mb) (#22) 2019-05-27 20:37:59 +01:00
CHANGELOG.org Support multiple filters (#18) 2019-05-23 19:35:48 +01:00
README.org Support multiple filters (#18) 2019-05-23 19:35:48 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Originally based on Alex Kudlick's aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

  s3thorp
  Usage: s3thorp [options]

    -s, --source <value>             Source directory to sync to S3
    -b, --bucket <value>             S3 bucket name
    -p, --prefix <value>             Prefix within the S3 Bucket
    -f, --filters <value>[,<values>]Exclude matching paths
    -v, --verbose <value>            Verbosity level (1-5)

Behaviour

When considering a local file, the following table governs what should happen:

# local file remote key hash of same key hash of other keys action
1 exists exists matches - do nothing
2 exists is missing - matches copy from other key
3 exists is missing - no matches upload
4 exists exists no match matches copy from other key
5 exists exists no match no matches upload
6 is missing exists - - delete

Creating Native Images

  • Download and install GraalVM

  • Install native-image using the graal updater

      gu install native-image
    
  • Create native image

      native-image -cp `sbt 'export runtime:fullClasspath'|tail -n 1` \
                   -H:Name=s3thorp \
                   -H:Class=net.kemitix.s3thorp.Main \
                   --allow-incomplete-classpath \
                   --force-fallback
    
  • Resulting file requires a JDK for execution