thorp

S3 Sync

Find a file

Paul Campbell ac9a52f93f Use correct hash locally for comparing multi-part uploaded files (#82 ) * [storage-aws] ETagGenerator add stub * [core] MD5HashGenerator add hex and digest helpers * [domain] MD5Hash can always provide base64 and also digest Rather that store the base 64 digest some of the time, simply decode it from the hex hash. The same for the binary digest. MD5Hash is now cleaner now that it no longer has Option parameters. * [core] MD5HashGenerator add stubs to allow reading file chunks * [domain] MD5HashData add sub-objects * [domain] MD5HashData move back into test where it belongs * [sbt] add sbt-bloop plugin * [domain] MD5HashData Add hash of big-file * [domain] MD5HashData Add hash of big-file * [core] MD5HashGenerator find end of chunk correctly * [core] MD5HashGenerator offset is a Long * [core] MD5HashGenerator don't read past the end of the file * [storage-aws] ETagGenerator can reproduce ETags * [storage-aws] ETagGeneratorTest added * [storate-aws] ETagGenerator refactoring * [storage-aws] ETageGenerator refactoring * [core] SyncSuite remove redundant braces * [storage-api] HashService added * [storage-aws] S3HashService added * [core] LocalFileStream refactoring * [core] integrate HashService and ETagGenerator * Optimise imports * [domain] HexEncoder added to replace java 8 only DataTypeConverter * [core] MD5HashGenerator refactoring * [core] S3MetaDataEnricher refactoring * [core] S3MetaDataEnricherSuite refactoring * [storage-aws] ETagGeneratorTest refactoring * [storage-aws] StorageServiceSuite refactoring * [core] S3MetaDataEnricher refactoring * [core] refactoring * [storage-aws] refactoring		2019-06-29 19:07:51 +01:00
.github	[github] Add stale configuration	2019-05-14 07:05:48 +01:00
bin	Rename project to Thorp (#75 )	2019-06-17 15:33:49 +01:00
cli/src	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
core/src	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
domain/src	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
project	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
storage-api/src/main/scala/net/kemitix/thorp/storage/api	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
storage-aws/src	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
.gitignore	Rename project to Thorp (#75 )	2019-06-17 15:33:49 +01:00
.travis.yml	[travis] define AWS_REGION environment variable	2019-05-16 19:28:50 +01:00
build.sbt	Use correct hash locally for comparing multi-part uploaded files (#82 )	2019-06-29 19:07:51 +01:00
CHANGELOG.org	Restructure sync to use a State with foldLeft around actions (#74 )	2019-06-25 08:27:38 +01:00
LICENSE	Create LICENSE	2019-06-07 21:25:23 +01:00
README.org	Is AWS SDK calculating MD5Hash again for a local file? (#50 )	2019-06-21 19:20:35 +01:00

thorp

Synchronisation of files with S3 using the hash of the file contents.

file:https://img.shields.io/codacy/grade/c1719d44f1f045a8b71e1665a6d3ce6c.svg?style=for-the-badge

Originally based on Alex Kudlick's aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

Usage

  thorp
  Usage: thorp [options]

    -s, --source <value>  Source directory to sync to S3
    -b, --bucket <value>  S3 bucket name
    -p, --prefix <value>  Prefix within the S3 Bucket
    -i, --include <value> Include matching paths
    -x, --exclude <value> Exclude matching paths
    -d, --debug           Enable debug logging
    --no-global           Ignore global configuration
    --no-user             Ignore user configuration

If you don't provide a source the current diretory will be used.

The --include and --exclude parameters can be used more than once.

Configuration

Configuration will be read from these files:

Global: /etc/thorp.conf
User: ~ /.config/thorp.conf
Source: ${source}/.thorp.conf

Command line arguments override those in Source, which override those in User, which override those Global, which override any built-in config.

Built-in config consists of using the current working directory as the source.

Note, that include and exclude are cumulative across all configuration files.

Behaviour

When considering a local file, the following table governs what should happen:

#	local file	remote key	hash of same key	hash of other keys	action
1	exists	exists	matches	-	do nothing
2	exists	is missing	-	matches	copy from other key
3	exists	is missing	-	no matches	upload
4	exists	exists	no match	matches	copy from other key
5	exists	exists	no match	no matches	upload
6	is missing	exists	-	-	delete

Executable JAR

To build as an executable jar, perform `sbt assembly`

This will create the file `cli/target/scala-2.12/thorp-assembly-$VERSION.jar` (where $VERSION is substituted)

Copy and rename this file as `thorp.jar` into the same directory as the `bin/throp` shell script.