Paul Campbell
f35ea9795d
* [domain] Define Hashes in domain package * [filesystem] Load and parse any .thorp.cache files found * [filesystem] Use cached file data when available and up-to-date * [lib] FileScanner refactoring * [filesystem] scan sub-dirs first to minimise time cache is on heap * [filesystem] Write new cache data to temp file * [lib] replace cache file when finished updating * [filesystem] AppendLines to correct file with new lines * [domain] decode HashType from String * [filesystem] Store last modified time as epoch milliseconds * [filesystem] parse lastmodified as a long * [filesystem] use all hash values in cache * [lib] FileScanner rearrange code * [lib] Create and use a single cache file per source * [storage-aws] Use ETag hash from cache when available * [filesystem] Merge file data together correctly * [filesystem] Handle exceptions thrown by Files.mode correctly * [readme] Add section on caching * [changelog] updated * [changelog] add pending dependencies notes * [lib] Filters should not name methods after their defining object * [lib] Fix up test
92 lines
3.7 KiB
Org Mode
92 lines
3.7 KiB
Org Mode
* thorp
|
|
|
|
Synchronisation of files with S3 using the hash of the file contents.
|
|
|
|
[[https://www.codacy.com/app/kemitix/thorp][file:https://img.shields.io/codacy/grade/c1719d44f1f045a8b71e1665a6d3ce6c.svg?style=for-the-badge]]
|
|
[[https://search.maven.org/search?q=net.kemitix.thorp][file:https://img.shields.io/maven-central/v/net.kemitix.thorp/thorp_2.12.svg?style=for-the-badge]]
|
|
|
|
Originally based on Alex Kudlick's [[https://github.com/akud/aws-s3-sync-by-hash][aws-s3-sync-by-hash]].
|
|
|
|
The normal ~aws s3 sync ...~ command only uses the time stamp of files
|
|
to decide what files need to be copied. This utility looks at the md5
|
|
hash of the file contents.
|
|
|
|
* Usage
|
|
|
|
#+begin_example
|
|
thorp
|
|
Usage: thorp [options]
|
|
|
|
-V, --version Display the version and quit
|
|
-B, --batch Enabled batch-mode
|
|
-s, --source <value> Source directory to sync to S3
|
|
-b, --bucket <value> S3 bucket name
|
|
-p, --prefix <value> Prefix within the S3 Bucket
|
|
-P, --parallel <value> Maximum parallel upload/copy operations
|
|
-i, --include <value> Include matching paths
|
|
-x, --exclude <value> Exclude matching paths
|
|
-d, --debug Enable debug logging
|
|
--no-global Ignore global configuration
|
|
--no-user Ignore user configuration
|
|
#+end_example
|
|
|
|
If you don't provide a ~source~ the current diretory will be used.
|
|
|
|
The ~--include~ and ~--exclude~ parameters can be used more than once.
|
|
|
|
The ~--source~ parameter can be used more than once, in which case,
|
|
all files in all sources will be consolidated into the same
|
|
bucket/prefix.
|
|
|
|
** Batch mode
|
|
|
|
Batch mode disable the ANSI console display and logs simple messages
|
|
that can be written to a file.
|
|
|
|
* Configuration
|
|
|
|
Configuration will be read from these files:
|
|
|
|
- Global: ~/etc/thorp.conf~
|
|
- User: ~ ~/.config/thorp.conf~
|
|
- Source: ~${source}/.thorp.conf~
|
|
|
|
Command line arguments override those in Source, which override
|
|
those in User, which override those Global, which override any
|
|
built-in config.
|
|
|
|
When there is more than one source, only the first ".thorp.conf"
|
|
file found will be used.
|
|
|
|
Built-in config consists of using the current working directory as
|
|
the ~source~.
|
|
|
|
Note, that ~include~ and ~exclude~ are cumulative across all
|
|
configuration files.
|
|
|
|
* Caching
|
|
|
|
The last modified time for files is used to decide whether to calculate the hash values for the file. If a file has not been updated, then the hash values stored in the `.thorp.cache` file located in the root of the source is used. Otherwise the file will be read to caculate the the new hashes.
|
|
|
|
* Behaviour
|
|
|
|
When considering a local file, the following table governs what should happen:
|
|
|
|
|---+------------+------------+------------------+--------------------+---------------------|
|
|
| # | local file | remote key | hash of same key | hash of other keys | action |
|
|
|---+------------+------------+------------------+--------------------+---------------------|
|
|
| 1 | exists | exists | matches | - | do nothing |
|
|
| 2 | exists | is missing | - | matches | copy from other key |
|
|
| 3 | exists | is missing | - | no matches | upload |
|
|
| 4 | exists | exists | no match | matches | copy from other key |
|
|
| 5 | exists | exists | no match | no matches | upload |
|
|
| 6 | is missing | exists | - | - | delete |
|
|
|---+------------+------------+------------------+--------------------+---------------------|
|
|
|
|
* Executable JAR
|
|
|
|
To build as an executable jar, perform `sbt assembly`
|
|
|
|
This will create the file `cli/target/scala-2.13/thorp`
|
|
|
|
Copy this file to your `PATH`.
|