S3 Sync
Find a file
Paul Campbell 319c46f403
Convert to Java (domain, config, storage-aws and filesystem) (#446)
* Java rewrite domain (#438)

* domain.Bucket: convert to Java

* domain.LastModified: convert to Java

* domain.QuoteStripper: convert to Java

* domain.HexEncoder: convert to Java

* domain.MD5Hash: convert to Java

* remove unused import

* domain.RemoteKey: convert to Java

* domain.Action: convert to Java

* domain.Counters: convert to Java

* domain.HashType: convert to Java

* domain.Hashes: convert to Java

* domain.MD5HashData: convert to Java

* domain.Filter: convert to Java

* domain.LocalFile: convert to Java

* domain: make immutable field public

* domain.SizeTranslation: convert to Java

* domain.HashType: restrict access to contstructor

* domain.RemoteObjects: convert to Java

Introduce MapView and Tuple.

* domain.Sources: convert to Java

* domain.StorageEvent: convert to Java

* domain.Terminal: convert to Java

* domain => config: move SimpleLens to only module that uses it

* domain => filesystem: move TemporaryFolder

* domain.Implicits: removed

* parent: make junit, et al available

* domain: add testing dependencies

* domain.HexEncoder: convert test to Java and fix bugs

* domain.HexEncoderTest: replace with Java version

* domain.MD5HashTest: convert to Java

* domain.RemoteKeyTest: convert to Java

* domain.SizeTranslationTest: convert to Java

* domain.TerminalTest: convert to Java

* domain: remove unused dependencies

* parent: rollback zio-streams to match zio and pin them together

* storage-aws: resolve transitive dependency conflicts

* Java rewrite storage aws (#445)

* storage-aws.AmazonS3: convert to Java as AmazonS3Client

* storage-aws.S3Copier: convert to Java

* storage-aws.S3Uploader: convert to Java

* storage-aws.S3Deleter: convert to Java

* storage-aws.S3Lister: convert to Java

* filesystem: write cache data correctly (as supplied)

* domain,filesystem: fix MD5Hash generation

* filesystem: convert to Java (#450)

* remove legacy

* Rewrite config module in Java (#461)

* config.ParseConfigFile: convert to Java

* config.ParseConfigFile: convert to Java

* config.SourceConfigLoader: convert to Java

* WIP config.Configuration: convert to Java

* config.ConfigOption: convert to Java

* config.ConfigOptions: convert to Java

* config.ConfigValidation: convert to Java

* config.ConfigQuery: convert to Java

* config: move classes to correct location

* config.ConfigValidationException: convert to Java

* config.ConfigValidator: convert to Java

* config.ConfigurationBuilder: convert to Java

* config.SimpleLens: removed

* config.Config: remove environment

* config.ConfigOptionTest: convert to Java

* config.ConfigQueryTest: convert to Java

* config.ConfigurationBuilderTest: convert to Java

* config.ParseConfigFileTest: convert to Java

* config.ParseConfigLinesTest: convert to Java

* config: remove scala dependencies and plugin
2020-06-21 07:21:21 +01:00
.github Add release drafter configuration (#462) 2020-06-21 07:09:12 +01:00
app Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
bin Rename project to Thorp (#75) 2019-06-17 15:33:49 +01:00
cli Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
config Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
console Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
docs/images Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
domain Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
filesystem Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
lib Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
parent Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
project Update sbt-scoverage to 1.6.1 (#272) 2020-06-08 06:48:42 +01:00
storage Java rewrite - step 1 - build with Maven (#431) 2020-06-11 21:33:28 +01:00
storage-aws Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
uishell Convert to Java (domain, config, storage-aws and filesystem) (#446) 2020-06-21 07:21:21 +01:00
.gitignore Rename project to Thorp (#75) 2019-06-17 15:33:49 +01:00
.scalafmt.conf Apply scalafmt (#108) 2019-07-16 07:56:54 +01:00
CHANGELOG.org Create and use a cache of hashes for local files (#249) 2019-10-27 19:53:00 +00:00
LICENSE Create LICENSE 2019-06-07 21:25:23 +01:00
pom.xml Java rewrite - step 1 - build with Maven (#431) 2020-06-11 21:33:28 +01:00
README.md Java rewrite - step 1 - build with Maven (#431) 2020-06-11 21:33:28 +01:00

thorp

Synchronisation of files with S3 using the hash of the file contents.

Maven Central

Originally based on Alex Kudlick's aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

$ thorp
Usage: thorp [options]

  -V, --version         Display the version and quit
  -B, --batch           Enabled batch-mode
  -s, --source <value>  Source directory to sync to S3
  -b, --bucket <value>  S3 bucket name
  -p, --prefix <value>  Prefix within the S3 Bucket
  -P, --parallel <value> Maximum parallel upload/copy operations
  -i, --include <value> Include matching paths
  -x, --exclude <value> Exclude matching paths
  -d, --debug           Enable debug logging
  --no-global           Ignore global configuration
  --no-user             Ignore user configuration

If you don't provide a source the current directory will be used.

The --include and --exclude parameters can be used more than once.

The --source parameter can be used more than once, in which case, all files in all sources will be consolidated into the same bucket/prefix.

Batch mode

Batch mode disable the ANSI console display and logs simple messages that can be written to a file.

Configuration

Configuration will be read from these files:

  • Global: /etc/thorp.conf
  • User: ~/.config/thorp.conf
  • Source: ${source}/.thorp.conf

Command line arguments override those in Source, which override those in User, which override those Global, which override any built-in config.

When there is more than one source, only the first .thorp.conf file found will be used.

Built-in config consists of using the current working directory as the source.

Note, that include and exclude are cumulative across all configuration files.

Caching

The last modified time for files is used to decide whether to calculate the hash values for the file. If a file has not been updated, then the hash values stored in the .thorp.cache file located in the root of the source is used. Otherwise the file will be read to caculate the the new hashes.

Behaviour

When considering a local file, the following table governs what should happen:

# local file remote key hash of same key hash of other keys action
1 exists exists matches - do nothing
2 exists is missing - matches copy from other key
3 exists is missing - no matches upload
4 exists exists no match matches copy from other key
5 exists exists no match no matches upload
6 is missing exists - - delete

Executable JAR

To build as an executable jar, perform mvn package

This will create the file app/target/thorp-${version}-jar-with-dependencies.jar

Copy and rename this file into your PATH.

Structure/Dependencies

Dependency Graph