S3 Sync
Find a file
Paul Campbell 0f8708e19f
Restructure sync to use a State with foldLeft around actions (#74)
* [changelog] updated

* [cli] Program rename parameter

* [core] Add AppState

* [core] Synchronise rought draft replacement for Sync

Uses the AppState

* [core] Synchronise as sequential for-comprehensions

* [core] Synchronise as nested for-comprehensions

* [sbt] thorp(root) depends on cli moduke

* [core] Synchronise extract methods

* [core] Synchronise rewritten

* [core] Synchronise generates actions

* [core] Remove AppState

* [core] ActionSubmitter remove unused implicit config parameter

* [cli] Program rewritten to use Synchronise

* [core] Synchronise useValidConfig accepts Logger implicitly

* [core] Synchronise reorder methods

* [core] Synchronise refactor errorMessages

* [core] SyncLogging logRunStart accepts explicit parameters

* [core] remove old Sync

* [core] Synchronise restore logRunStart

* [domain] Terminal add types to public methods and values

* [domain] UploadEventLogger force flush to terminal

Also make part of the progress message in green.

Not flushing, by using println, cause odd behaviour. Works on normal
terminal, but not great in an emacs terminal. Oh well.

* [core] SyncLogging.logRunFinished remove unused parameters

* [cli] Program restore final summary

* [storage-aws] remove logging from module

* [core] ThorpArchive replaces ActionSubmitter

ActionSubmitter implementation becomes UnversionedMirrorArchive

* [domain] cleaner upload progress messages

* [cli] Program remove unused Logger

* [cli] Program rename parameter

* [core] SyncSuite use Synchronise

* [sbt] Allow storage-aws to share core test classes

* [domain] LocalFile stop storing a lambda

The lambda breaks the equality test between LocalFile instances.

* [core] MD4HashData add missing base64 digest for leafFile

* [core] Synchronise drop DoNothing actions

* [core] SyncSuite update tests

* [sbt] aggregate modules from root module
2019-06-25 08:27:38 +01:00
.github [github] Add stale configuration 2019-05-14 07:05:48 +01:00
bin Rename project to Thorp (#75) 2019-06-17 15:33:49 +01:00
cli/src Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
core/src Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
domain/src Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
project Enable running outside of sbt (#55) 2019-06-11 23:36:08 +01:00
storage-api/src/main/scala/net/kemitix/thorp/storage/api Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
storage-aws/src Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
.gitignore Rename project to Thorp (#75) 2019-06-17 15:33:49 +01:00
.travis.yml [travis] define AWS_REGION environment variable 2019-05-16 19:28:50 +01:00
build.sbt Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
CHANGELOG.org Restructure sync to use a State with foldLeft around actions (#74) 2019-06-25 08:27:38 +01:00
LICENSE Create LICENSE 2019-06-07 21:25:23 +01:00
README.org Is AWS SDK calculating MD5Hash again for a local file? (#50) 2019-06-21 19:20:35 +01:00

thorp

Synchronisation of files with S3 using the hash of the file contents.

file:https://img.shields.io/codacy/grade/c1719d44f1f045a8b71e1665a6d3ce6c.svg?style=for-the-badge

Originally based on Alex Kudlick's aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

  thorp
  Usage: thorp [options]

    -s, --source <value>  Source directory to sync to S3
    -b, --bucket <value>  S3 bucket name
    -p, --prefix <value>  Prefix within the S3 Bucket
    -i, --include <value> Include matching paths
    -x, --exclude <value> Exclude matching paths
    -d, --debug           Enable debug logging
    --no-global           Ignore global configuration
    --no-user             Ignore user configuration

If you don't provide a source the current diretory will be used.

The --include and --exclude parameters can be used more than once.

Configuration

Configuration will be read from these files:

  • Global: /etc/thorp.conf
  • User: ~ /.config/thorp.conf
  • Source: ${source}/.thorp.conf

Command line arguments override those in Source, which override those in User, which override those Global, which override any built-in config.

Built-in config consists of using the current working directory as the source.

Note, that include and exclude are cumulative across all configuration files.

Behaviour

When considering a local file, the following table governs what should happen:

# local file remote key hash of same key hash of other keys action
1 exists exists matches - do nothing
2 exists is missing - matches copy from other key
3 exists is missing - no matches upload
4 exists exists no match matches copy from other key
5 exists exists no match no matches upload
6 is missing exists - - delete

Executable JAR

To build as an executable jar, perform `sbt assembly`

This will create the file `cli/target/scala-2.12/thorp-assembly-$VERSION.jar` (where $VERSION is substituted)

Copy and rename this file as `thorp.jar` into the same directory as the `bin/throp` shell script.