thorp/README.org
Paul Campbell f35ea9795d
Create and use a cache of hashes for local files (#249)
* [domain] Define Hashes in domain package

* [filesystem] Load and parse any .thorp.cache files found

* [filesystem] Use cached file data when available and up-to-date

* [lib] FileScanner refactoring

* [filesystem] scan sub-dirs first to minimise time cache is on heap

* [filesystem] Write new cache data to temp file

* [lib] replace cache file when finished updating

* [filesystem] AppendLines to correct file with new lines

* [domain] decode HashType from String

* [filesystem] Store last modified time as epoch milliseconds

* [filesystem] parse lastmodified as a long

* [filesystem] use all hash values in cache

* [lib] FileScanner rearrange code

* [lib] Create and use a single cache file per source

* [storage-aws] Use ETag hash from cache when available

* [filesystem] Merge file data together correctly

* [filesystem] Handle exceptions thrown by Files.mode correctly

* [readme] Add section on caching

* [changelog] updated

* [changelog] add pending dependencies notes

* [lib] Filters should not name methods after their defining object

* [lib] Fix up test
2019-10-27 19:53:00 +00:00

92 lines
3.7 KiB
Org Mode

* thorp
Synchronisation of files with S3 using the hash of the file contents.
[[https://www.codacy.com/app/kemitix/thorp][file:https://img.shields.io/codacy/grade/c1719d44f1f045a8b71e1665a6d3ce6c.svg?style=for-the-badge]]
[[https://search.maven.org/search?q=net.kemitix.thorp][file:https://img.shields.io/maven-central/v/net.kemitix.thorp/thorp_2.12.svg?style=for-the-badge]]
Originally based on Alex Kudlick's [[https://github.com/akud/aws-s3-sync-by-hash][aws-s3-sync-by-hash]].
The normal ~aws s3 sync ...~ command only uses the time stamp of files
to decide what files need to be copied. This utility looks at the md5
hash of the file contents.
* Usage
#+begin_example
thorp
Usage: thorp [options]
-V, --version Display the version and quit
-B, --batch Enabled batch-mode
-s, --source <value> Source directory to sync to S3
-b, --bucket <value> S3 bucket name
-p, --prefix <value> Prefix within the S3 Bucket
-P, --parallel <value> Maximum parallel upload/copy operations
-i, --include <value> Include matching paths
-x, --exclude <value> Exclude matching paths
-d, --debug Enable debug logging
--no-global Ignore global configuration
--no-user Ignore user configuration
#+end_example
If you don't provide a ~source~ the current diretory will be used.
The ~--include~ and ~--exclude~ parameters can be used more than once.
The ~--source~ parameter can be used more than once, in which case,
all files in all sources will be consolidated into the same
bucket/prefix.
** Batch mode
Batch mode disable the ANSI console display and logs simple messages
that can be written to a file.
* Configuration
Configuration will be read from these files:
- Global: ~/etc/thorp.conf~
- User: ~ ~/.config/thorp.conf~
- Source: ~${source}/.thorp.conf~
Command line arguments override those in Source, which override
those in User, which override those Global, which override any
built-in config.
When there is more than one source, only the first ".thorp.conf"
file found will be used.
Built-in config consists of using the current working directory as
the ~source~.
Note, that ~include~ and ~exclude~ are cumulative across all
configuration files.
* Caching
The last modified time for files is used to decide whether to calculate the hash values for the file. If a file has not been updated, then the hash values stored in the `.thorp.cache` file located in the root of the source is used. Otherwise the file will be read to caculate the the new hashes.
* Behaviour
When considering a local file, the following table governs what should happen:
|---+------------+------------+------------------+--------------------+---------------------|
| # | local file | remote key | hash of same key | hash of other keys | action |
|---+------------+------------+------------------+--------------------+---------------------|
| 1 | exists | exists | matches | - | do nothing |
| 2 | exists | is missing | - | matches | copy from other key |
| 3 | exists | is missing | - | no matches | upload |
| 4 | exists | exists | no match | matches | copy from other key |
| 5 | exists | exists | no match | no matches | upload |
| 6 | is missing | exists | - | - | delete |
|---+------------+------------+------------------+--------------------+---------------------|
* Executable JAR
To build as an executable jar, perform `sbt assembly`
This will create the file `cli/target/scala-2.13/thorp`
Copy this file to your `PATH`.