S3 Sync
Find a file
Paul Campbell f54c50aaf3
Split into subprojects (#36)
* [sbt] define existing single module project as legacyRoot

* [sbt] add empty cli module depending on legacyRoot

* [cli] move Main to cli module

* [cli] move ParseArgs to cli module

* [sbt] limit scope of scopt dependency to cli module

* [cli] moved logging config to cli module

* [cli] rename module directory

* [aws-api] added empty module

* [sbt] aggregate builds from cli

* [aws-lib] add empty module

* [core] add empty module

* [sbt] add comment graphing module dependencies

* [sbt] adjust module dependencies to reflect plan

Include legacyRoot at the base until it can be redistributed

* [legacy] make some awssdk classes non-private

during this transition, these classes being private would cause problems

* [aws-lib] create S3ClientBuilder

This is copied from the legacy S3Client companion object

* [domain] add empty module

* [domain] move Bucket into module

* [legacy] RemoteKey no longer has dependency on Config

* [domain] move RemoteKey into module

* [domain] move MD5Hash into module

* [legacy] LocalFile no longer had dependency on MD5HashGenerator

* [domain] move LocalFile into module

* [domain] mode LastModified into module

* [domain] move RemoteMetaData into module

* [domain] move S3MetaData into module

* [domain] move Exclude into module

* [domain] move Filter into module

* [domain] move KeyModified into module

* [domain] move HashModified into module

* [domain] RemoteKey.resolve added

* [domain] add dependency on scalatest

* [domain] LocalFile.resolve added

* [legacy] Remove UnitTest

* [legacy] optimise imports

* [domain] move S3ObjectsData moved into module

* [legacy] wrapper for using GeneralProgressListener

* [domain] move Config into module

* [sbt] move aws-api below legacyRoot in dependencies

This will allow use to move S3Client into the aws-api module

* [legacy] rename S3Client companion as S3ClientBuilder

Preparation to move this into its own file.

* Inject Logger via CLI (#34)

* [S3Client] refactor defaultClient()

* [S3Client] transfermanager explicitly uses the same s3client

* [S3ClientPutObjectUploader] refactor putObjectRequest creation

* [cli] copy in Logging trait as Logger class

* [cli] Main uses Logger

* [cli] simplify Logger and pass to Sync.run

* [legacy] SyncLogging converted to companion

* [cli] Logger info can more easily use levels again

* [legacy] LocalFileStream uses injected info

* [legacy] S3MetaDataEnricher remove unused Logging

* [legacy] ActionGenerator remove unused Logging

* [legacy] convert ActionGenerator to an object

* [legacy] import log methods from SyncLogging

* [legacy] move getS3Status from S3Client to S3MetaDataEnricher

* [legact] convert ActionsSubmitter to an object

* [legacy] convert LocalFileStream to an object

* [legacy] move Action case classes inside companion

* [legacy] move UploadEvent case classes inside companion and rename

* [legacy] move S3Action case classes into companion

* [legacy] convert Sync to an object

* [cli] Logger takes verbosity level at construction

No longer needs to be passed the whole Config implicitly for each info
call.

* [legacy] stop passing implicit Config for logging purposes

Pass a more specific implicit info: Int => String => Unit instead

* [legacy] remove DummyS3Client

* [legacy] remove Logging

* [legacy] convert MD5HashGenerator to an object

* [aws-api] move S3Client into module

* [legacy] convert KeyGenerator to an object

* [legacy] don't use IO.unsafeRunSync directly

* [legacy] refactor/rewrite Sync.run

* [legacy] Rewrite sort using a for-comprehension

* [legacy] Sync inline sorting

* [legacy] SyncLogging rename method

* [legacy] repair tests

* [sbt] move core module to a dependency of legacyRoot

* [sbt] add test dependencies to core module

* [core] move classes into module

* [aws-lib] move classes into module

* [sbt] remove legacy root
2019-06-06 19:24:15 +01:00
.github [github] Add stale configuration 2019-05-14 07:05:48 +01:00
aws-api/src/main/scala/net/kemitix/s3thorp/aws/api Split into subprojects (#36) 2019-06-06 19:24:15 +01:00
aws-lib/src Split into subprojects (#36) 2019-06-06 19:24:15 +01:00
cli/src/main Split into subprojects (#36) 2019-06-06 19:24:15 +01:00
core/src Split into subprojects (#36) 2019-06-06 19:24:15 +01:00
domain/src Split into subprojects (#36) 2019-06-06 19:24:15 +01:00
project [gitignote] update to allow some project files 2019-05-11 08:54:35 +01:00
.gitignore [gitignore] ignore zip files 2019-05-14 07:27:14 +01:00
.travis.yml [travis] define AWS_REGION environment variable 2019-05-16 19:28:50 +01:00
build.sbt Split into subprojects (#36) 2019-06-06 19:24:15 +01:00
CHANGELOG.org Support multiple filters (#18) 2019-05-23 19:35:48 +01:00
README.org [readme] add note about broken native images 2019-05-30 18:38:23 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Originally based on Alex Kudlick's aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

  s3thorp
  Usage: s3thorp [options]

    -s, --source <value>             Source directory to sync to S3
    -b, --bucket <value>             S3 bucket name
    -p, --prefix <value>             Prefix within the S3 Bucket
    -x, --exclude <value>[,<values>] Exclude matching paths
    -v, --verbose <value>            Verbosity level (1-5)

Behaviour

When considering a local file, the following table governs what should happen:

# local file remote key hash of same key hash of other keys action
1 exists exists matches - do nothing
2 exists is missing - matches copy from other key
3 exists is missing - no matches upload
4 exists exists no match matches copy from other key
5 exists exists no match no matches upload
6 is missing exists - - delete

Creating Native Images

Note: the created image currently can't be run outside of the base of the project. See Issue #15

  • Download and install GraalVM

  • Install native-image using the graal updater

      gu install native-image
    
  • Create native image

      native-image -cp `sbt 'export runtime:fullClasspath'|tail -n 1` \
                   -H:Name=s3thorp \
                   -H:Class=net.kemitix.s3thorp.Main \
                   --allow-incomplete-classpath \
                   --force-fallback
    
  • Resulting file requires a JDK for execution