Commit graph

35 commits

Author SHA1 Message Date
f35ea9795d
Create and use a cache of hashes for local files (#249)
* [domain] Define Hashes in domain package

* [filesystem] Load and parse any .thorp.cache files found

* [filesystem] Use cached file data when available and up-to-date

* [lib] FileScanner refactoring

* [filesystem] scan sub-dirs first to minimise time cache is on heap

* [filesystem] Write new cache data to temp file

* [lib] replace cache file when finished updating

* [filesystem] AppendLines to correct file with new lines

* [domain] decode HashType from String

* [filesystem] Store last modified time as epoch milliseconds

* [filesystem] parse lastmodified as a long

* [filesystem] use all hash values in cache

* [lib] FileScanner rearrange code

* [lib] Create and use a single cache file per source

* [storage-aws] Use ETag hash from cache when available

* [filesystem] Merge file data together correctly

* [filesystem] Handle exceptions thrown by Files.mode correctly

* [readme] Add section on caching

* [changelog] updated

* [changelog] add pending dependencies notes

* [lib] Filters should not name methods after their defining object

* [lib] Fix up test
2019-10-27 19:53:00 +00:00
1bca90c815 [readme] minor update 2019-10-08 19:35:24 +01:00
844d44596b [readme] updated 2019-09-27 16:12:28 +01:00
b15350d959
Sync more than one source directory into a single bucket/prefix (#25)
* [changelog] updated

* [readme] updated

* [core] ConfigQuery added sources()

* [cli] ParseArgs allow specifying multiple sources

* [domain,core,cli] Source datatype changed to Path

* [domain] Sources added to hold multiple paths in order

* [domain] Config sources change datatype to Sources

* [core] Scan sources for .thorp.config and include any sources listed

This allows the inclusion of a `.thorp.config` file in a source with a
single line `source = ....` that causes that other source to also be
synched into the same remote prefix as the current source.

* [core] ConfigurationBuilderTest add more pending tests

* [[core] ConfigurationBuilderTest rewrite using loan-pattern for fixtures

* [core] ConfigOptionTest use TemporaryFolder

* [core] ConfigOptionTest remove unused fields

* [cli] ParseArgsTest don't use get on an Option

* [core] ConfigurationBuilderTest don't use get on Either

* [core] TemporaryFolder Move import to top of file

* [core] TemporaryFolder use Try over try-finally

* [core] ConfigurationBuilderTest don't use get on Either

* [core] TemporaryFolders.withDirectory propogate errors

* [core] TemporaryFolders add writeFile and createFile

* [core] PlanBuilderTest create a plan with two sources with unique files in both

* [core] ActionGenerator only upload file by name in first source

create a plan
  two sources
    same filename in both
    - only upload file in first source

* [domain] LastModified with no params is now()

* [core] PlanBuilderTest 2 sources w/remote only in 2nd src do nothing

* [core] PlanBuilderTest 2 sources w/remote only in 2nd src do nothing

* [domain] RemoteKey map to a file when prefix is empty

* [domain] S3ObjectData defaults to empty

* [core] KeyGenerator Avoid delimiter when empty prefix key

* [core] PlanBuilderTest when remote not in sources delete from remote

* [core] PlanBuilderTest extract helper md5Hash()

* [core] PlanBuilderTest one source a file no matching remote key

* [core] PlanBuildingTest file with matching key and hash do nothing

* [core] PlanBuilderTest file w/matching remote key and different hash

* [core] PlanBuilderTest a remote key with and without local file

* [core] DummyStorageService Use wildcards when selecting more than 6 elements
2019-07-12 07:42:42 +01:00
a6c767f16f
[readme] add maven version badge 2019-07-06 17:20:33 +01:00
1267b6e313
Add a batch mode that provides a simple log output (#85)
* [changelog] Updated

* [readme] Updated

* [domain] Config Add batch-mode flag

* [core] ConfigOption Add BatchMode option

* [core] ConfigQuery Add batchMode query

Also replaced verbose exists case clauses with a simple contains.

* [core] ConfigOptions added to replace Seq[ConfigOption]

* [core] Syncronise rename method to createPlan

* [cli] Program rename apply as run

* [storage-aws] S3StorageServiceBuilder stop using IO to create object

* [storage-aws] S3StorageServiceBuilder make default service lazy

* [storage-aws] Rename S3ClientCopier => Copier

* [storage-aws] Rename S3ClientDeleter => Deleter

* [storage-aws] Rename S3ClientObjectLister => Lister

* [storage-aws] Only attach upload listener when in batch mode

Only detects batch mode when selected as a command line option

* [core] Synchronise use leftMap rather than swap.map.swap

* [cli] ParseArgs add `-B` and `--batch` options to enable batch mode

* [core] ThorpArchive logs file uploaded when in batch mode
2019-07-02 08:43:52 +01:00
561966218c
Remove repo badges
Until releases are made, these are just distractions.
2019-06-30 19:49:22 +01:00
cff67401be [readme] updated 2019-06-30 15:31:00 +01:00
761c1c9784
Is AWS SDK calculating MD5Hash again for a local file? (#50)
* [aws-lib] Uploader provide request with the already calculated md5 hash

* [aws-lib] remove unused accepts method

* [aws-lib] Uploader refactoring

* [domain] Config remove unused threshold and max retries items

* [core] Show upload errors in summary

* [domain] LocalFile add helper to explicitly compare by hash value

Looking to add an optional field to MD5Hash but we want to do our
checks here only on the hash value, not whether a digest is available
or not.

* [core] Sync refactoring

* [core] SyncSuite invoke subject inside it method and after declaring expectations

* [core] SyncSuite use the localfile hash rather than something arbitrary

* [cli] Add `--no-global` and `--no-user` options

* [core] LocalFileStream refactoring

* [core] SyncSuite: ignore user and global configuration files

* [domain] MD5Hash now can optionally store the base64 encoded hash

* [core] MD5HashGenerator pass the digest to MD5Hash

* [aws-lib] Uploader use the base64 encoded hash

* [changelog] updated
2019-06-21 19:20:35 +01:00
7e9db432d7
[readme] update codacy badge 2019-06-21 13:59:43 +01:00
0a92667d3c
Add support for global and user configuration files (#73)
* [core] ConfigurationBuilder reads user and global config files

* [changelog] updated

* [readme] updated
2019-06-20 17:41:08 +01:00
9196dd623f
Rename project to Thorp (#75)
* [sbt] change application name

* [cli] rename package

* [cli] Change displayed application name and description

* [domain] rename package

* [core] fix bad package directory structure

* [core] rename package

* [aws-lib] rename package

* [aws-api] rename package

* [cli] rename programe for usage message

* [bin] rename and update script

* [gitignore] update

* [readme] update
2019-06-17 15:33:49 +01:00
90770eaafb
[cli] Remove verbosity flag (#63) 2019-06-14 20:21:58 +01:00
e3675b5394
Add a debug flag and make debug message hidden by default (#60)
* [cli] add a debug flag to control logging

* [core] show entering a directory as a debug message
2019-06-14 20:00:22 +01:00
5996632c1e
Enable running outside of sbt (#55)
Enable running outside of sbt
2019-06-11 23:36:08 +01:00
1a223a110b
[readme] Fix codace badge properly
Use shields.io to generate the badge.
2019-06-11 20:46:00 +01:00
724a902510
[readme] trying to fix codacy badge 2019-06-11 20:42:40 +01:00
f362a332f6 [readme] Add codacy badge 2019-06-11 08:04:31 +01:00
f7295025bc [readme] updated 2019-06-11 08:04:31 +01:00
86bb802468 [readme] add note about broken native images 2019-05-30 18:38:23 +01:00
35e85702aa
Rename filter as exclude (#23)
* [Filter => Exclude] rename class

* [Config] rename filters field as excludes

* [readme,ParseArgs] change commandline arg f to x and filters to excludes

* [SyncSuite] rename val

* [ExcludeSuite] rename vars

* [SyncLogging] Update message
2019-05-27 23:23:13 +01:00
bffc6c032c
Support multiple filters (#18)
* Support multiple filters

* Clean up imports

* [S3ClientLogging] log the remote key value

* Update changelog, readme and long arg name

* [SyncSuite] update test
2019-05-23 19:35:48 +01:00
0fe9b86471
Simple Exclusion Filter (#16)
* [filter] Parse filter from command line and add to config

* [filter] exclude file that match the filter
2019-05-23 09:21:09 +01:00
eacfc37095
Handle renames (#14)
* [sync] move thunks to s3client to bottom of class

Also, use the thunk methods from within run rather than accessing the
s3client object directly.

* Layout tweaks to put each parameter on own line

* [syncsuite] value renames and move sync.run outside it() call

Future tests will be evaluating the result of that call, so this
avoids repeatedly calling it.

* Add first pass at copy methods and some delete stubs

* [Bucket] Convert from type alias for String to a case class

* [SyncSuite] mark new tests as pending

* [RemoteKey] Convert from type alias for String to a case class

* [MD5Hash] Convert from type alias for String to a case class

* [LastModified] Convert from type alias for String to a case class

* [LocalFile] Revert to using a normal File

* [Sync] Use a for-comprehension and restructure S3MetaData

The for-comprehension will make it easier to generate multiple actions
out of the stream of enriched metadata. The restructured S3MetaData
avoids the need to wrap it in an Either in some cases.

* [ToUpload] Add an wrapper to indicate action required on File

* [S3Action] Stub actions for IO events

* [S3Action] Use UploadS3Action

* [Sync] Fix formating when echoing parameters

* [logging] Change log level down to 4 for listing every file considered

* [Sync] Use a case class to hold counters

* [HashModified] Add case class to replace MD5Hash, LastModified tuples

* [logging] Move file considered logging to source of files

Rather than logging this where adding meta data, move to where the
files are being initially identified.

* [logging] Log all final counters

* Pass Config and HashLookup as implicit parameters

* [LocalFileStream] rename method as findFiles

* [S3MetaDataEnricher] rename method as getMetadata

* Rename selection filter and uploader trait and methods

* [MD5HashGenerator] Extract as trait

* [Action] Convert ToUpload into an Action sealed trait

* [ActionGenerator] refactored and removed logging

* fix up tests

* [LocalFileStream] adjust logging

* [RemoteMetaData] Added

* [ActionGenerator] remove redundant braces

* [LocalFile] Added as wrapper for File

* [Sync] run: remove redundant braces

* [Sync] run: rename HashLookup as S3ObjectsData

* WIP - toward copy action

* Extract S3ObjectsByHash for grouping

* extract internal wrapper for S3CatsIOClient

Remove some boiler plate from the middle of a test

* Explicitly name the Map parameters in extected result

* All lastModified are the same to avoid confusion

We aren't testing this field, just that the keys and hash values are correct.

* Rename variable

* space out object cxreation

* Fix test - error in expected result

Code has been working for ages!

* [readme] condense and simplify behaviour table, adding option delete

Reduce the complexity by only noting the distinct attributes leading
to each action.

Add the action of delete when a local file is missing.

* [S3MetaDataEnricherSuite] rename tests and note missing tests

* [ActionGeneratorSuite] rename tests and note missing tests

* Note unwritten tests as such

* [ActionGenerator]  #2 local exists, remote is missing, other matches

* [S3ClientSuite] fix tests

* [S3MetaDataEnricherSuite] #2a local exists, remote is missing, remote matches, other matches - copy

* [S3MetaDataEnricherSuite] drop 'remote is missing, remote matches'

Impossible to represent this combination

* [S3MetaDataEnricherSuite] #3 local exists, remote is missing, remote no match, other no matches - upload

* [S3MetaDataEnricherSuite] Tests #1-3 rename variables consistantly

* [S3MetadataEnricherSuite] #4 local exists, remote exists, remote no match, other matches - copy

* [S3MetadataEnricherSuite] #5 local exists, remote exists, remote no match, other no matches - upload

* [S3MetadataEnricherSuite] drop test #6 - no way to make request

* [ActionGeneratorSuite] standardise tests 2-4

* [ActionGeneratorSuite] #1 local exists, remote exists, remote matches - do nothing

* [ActionGeneratorSuite] Comment expected outcome

* [ActionGeneratorSuite] #5 local exists, remote exists, remote no match, other no matches - upload

* [Action] Add ToDelete case class

* Use ToDelete and fix up return types for DeleteS3Action

* [ActionGenerator] Add explicit case for #1

* [ActionGenerator] Add explicit check for local exists in #2

* [ActionGenerator] match case against #3

* [ActionGenerator] simplify case and match against #5

* [ActionGenerator] Add case for #4

* [ActionGenerator] Remote explicit checks for file existing

If we are called with a LocalFile parameter then we assume the file exists.

* [ActionGenerator] Avoid #1 matching condition #5

* [ActionGeneratorSuite] enable tests

* [test] remove stray println

* [SyncSuite] Add test helper RecordingSync

* [SyncSuite] Use RecordingSync

* [SyncSuite] enable rename test - excluding delete test

* [Sync] log and increment counters for copy and delete

* [Sync] Use case matched RemoteKey in log message

* [Sync] Reorder actioins to do copy then upload then delete

* [S3Action] Drop Move as a distinct action

Can be implemented as a Copy followed by a Delete.

* [S3Action] Actions are ordered Copy, Upload then Delete

This allows sequencing of actions so that all the quick to accomplish
copies take place before bandwidth/time costly updates or destructive
deletes. Deletes come last after they have had the opportunity to b
used as the source for any copies.

* [Sync] Use S3Action's default sorting

* [Sync] extract logging of activity

* [SyncLogging] Extract logging out of Sync

Single Responsibility principle - Sync knows nothing about how it
logs, it just delegates to SyncLogging.

* [Sync] Rename variables and extract sort into private def

* [SyncLogging] Use IO context

* [SyncLogging] Remove moved counter

* [SyncLogging] Clean up an log start of run config info

* Verify that IO actions are evaluated before the program terminates

* [Sync] ensure logging runs

* [ActionGenerator] Don't upload files every time

* [ActionGenerator] fix remote hash for #5

* [SyncSuite] Add tests for delete and delete after rename

* [RemoteKey] Add asFile and isMissingLocally helpers

* [Sync] Generate delete actions

* Remove old extensions upon MD5HashGenerator

* [MD5Hash] prevent confusion by never allowing quotes

This means we need to filter quotes from md5hash values at source

* [Sync] ensure start log message is run

* [ThorpS3Client] Fix passing parameters for source key

* [ThorpS3Client] reformat byKey for clarity

* [S3Client] Add level 5 logging around s3 sdk calls

* fix up tests
2019-05-22 13:55:03 +01:00
00743c425c
Add configurable logging levels, selected from command line argument (#12)
* [config,parseargs] Accept v/verbose command line argument

* [parseargs] lowercase program name

* [logging] Log messages based on command line argument

* [readme] update usage
2019-05-16 21:59:40 +01:00
56a45b6e2a [readme] Move to do items to Github issues 2019-05-16 17:09:27 +01:00
608b9a9e7f [readme] minor tweaking 2019-05-16 16:40:33 +01:00
1c3e2676d6 [readme] update to do list 2019-05-13 22:52:33 +01:00
64b6585f47 [readme] add instructions to create native image 2019-05-13 22:49:50 +01:00
11cbcb2312 Use logging in place of println 2019-05-11 20:18:55 +01:00
49411b546b [readme] Add a couple more todo items 2019-05-11 18:09:00 +01:00
8affe49ce4 [readme] Update readme 2019-05-11 07:55:09 +01:00
960e336867 [readme] rewritten readme 2019-05-10 22:44:27 +01:00
a4c61e264c [readme] Add impression of aws-s3-sync-by-hash process 2019-05-06 17:15:43 +01:00
1b0d1ebdbf [readme] Added 2019-04-29 20:10:38 +01:00