thorp

S3 Sync

Find a file

Paul Campbell 082babb94d Use multi-part upload for large files (i.e. files > 5Mb) (#22 ) * [ThorpS3Client] Extract QuoteStripper and S3ClientObjectLister * [ThorpS3Client] Extract S3ClientUploader * [ThorpS3Client] Extract S3ClientCopier * [ThorpS3Client] Extract S3ClientDeleter * [ThropS3Client] Can select upload strategy based on file size Currently switches to an alternate that is a clone of the original method. * [MD5HashGenerator] Add md5FilePart Reimplement md5File using md5FilePart * [MyS3CatsIOClient] extracted * [S3ClientMultiPartUploader] add tests for accept def * [S3ClientMultiPartUploader] initiate multi-part upload * [Md5HashGenerator] add tests reading part of a file = failing test * [Md5HashGenerator] fix when reading part of a file * [S3ClientMultiPartUploader] create UploadPartRequests * [S3ClientMultiPartUploader] uploadPart delegates to an S3Client * [S3ClientMultiPartUploader] uploadParts uploads each part * [S3ClientMultiPartUploader] complete upload should completeUpload * [S3ClientMultiPartUploader] upload file tests when all okay * [S3ClientMultiPartUploader] Use Recording client in component tests * [s3ClientMultiPartUploader] remove unused variable * [S3ClientMultiPartUploader] failing test for init upload error * [S3ClientMultiPartUploader] Handle errors during multi-part upload * [S3ClientMultiPartUploader] Retry uploads * [S3Action] ErroredS4Action now holds the error * [S3ClientMultiPartUploader] Add logging * [S3ClientMultiPartUploader] Display warning messages * [S3ClientMultiPartUploader] test creation of CreateMulitpartUploadRequest * [S3ClientMultiPartUploader] specify bucket in UploadPartRequest * [S3ClientMultiPartUploader] verify complete request has upload id * [S3ClientMultiPartUploader] verify abort request contains upload id * [S3ClientMultiPartUploader] add logging around retry errors * [S3ClientMultiPartUploader] verify upload part request had remote key * [S3ClientMultipartuploaderLogging] refactoring/rewriting strings * [S3ClientMultiPartUploader] add bucket to abort request * [S3ClientMultiPartUploader] part numbers must start at 1 * [S3ClientMultiPartUploader] fix capitalisation in comment * [Config] define maxRetries * [S3ClientMultiPartUploader] abort request should have the remote key * [S3ClientMultiPartUploader] display remote key properly * [S3ClientMultiPartUploader] rename method for plural parts * [S3ClientMultiPartUploader] log hash and part number * [MD5HashGenerator] support creating hash from a byte array * [sbt] add aws-java-sdk-s3 (v1) for multi-part uploads The reactive-aws-s3-* library is based on the V2 of the Java library, which doesn't support multi-part uploads. * [S3ClientMultiPartUploader] use Amazon S3 Client (from v1 sdk) * [S3ClientMultiPartUploader] include file and offset in upload part request * {S3ClientMultiPartUploader] Add part etags to complete request * [S3ClientMultiPartUploader] Use withers to create requests * [S3ClientMultiPartUploader] don't bounce responses to tags when client accepts then as is * [MD5HashGenerator] use MD5Hash * [S3ClientMultiPartUploader] include hash in sending log message * [S3ClientMultiPartUploader] tests throw correct exception * [S3ClientMultiPartUploader] Include returned hash in error and log when send is finished * [S3ClientUploader] Extract as trait, renaming implementations * [S3Client] upload def now requires tryCount * [S3ClientUploader] add accepts to trait * [S3ClientMultiPartUploaderSuite] remove ambiguity over class import * [S3ClientMultiPartTransferManager] implement and use		2019-05-27 20:37:59 +01:00
.github	[github] Add stale configuration	2019-05-14 07:05:48 +01:00
project	[gitignote] update to allow some project files	2019-05-11 08:54:35 +01:00
src	Use multi-part upload for large files (i.e. files > 5Mb) (#22 )	2019-05-27 20:37:59 +01:00
.gitignore	[gitignore] ignore zip files	2019-05-14 07:27:14 +01:00
.travis.yml	[travis] define AWS_REGION environment variable	2019-05-16 19:28:50 +01:00
build.sbt	Use multi-part upload for large files (i.e. files > 5Mb) (#22 )	2019-05-27 20:37:59 +01:00
CHANGELOG.org	Support multiple filters (#18 )	2019-05-23 19:35:48 +01:00
README.org	Support multiple filters (#18 )	2019-05-23 19:35:48 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Originally based on Alex Kudlick's aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

  s3thorp
  Usage: s3thorp [options]

    -s, --source <value>             Source directory to sync to S3
    -b, --bucket <value>             S3 bucket name
    -p, --prefix <value>             Prefix within the S3 Bucket
    -f, --filters <value>[,<values>]Exclude matching paths
    -v, --verbose <value>            Verbosity level (1-5)

Behaviour

When considering a local file, the following table governs what should happen:

#	local file	remote key	hash of same key	hash of other keys	action
1	exists	exists	matches	-	do nothing
2	exists	is missing	-	matches	copy from other key
3	exists	is missing	-	no matches	upload
4	exists	exists	no match	matches	copy from other key
5	exists	exists	no match	no matches	upload
6	is missing	exists	-	-	delete

Creating Native Images

Download and install GraalVM
- https://github.com/oracle/graal/releases
Install native-image using the graal updater
```
  gu install native-image
```

Create native image

  native-image -cp `sbt 'export runtime:fullClasspath'|tail -n 1` \
               -H:Name=s3thorp \
               -H:Class=net.kemitix.s3thorp.Main \
               --allow-incomplete-classpath \
               --force-fallback

Resulting file requires a JDK for execution