Commit graph

145 commits

Author SHA1 Message Date
44c66c042c
Include and Exclude behave more like the AWS CLI (#48)
* [domain] rename Filter as Include

* [cli]ParseArgs allow exclude and include parameters to be repeated

* [core] don't include include/exclude details in logging

* [domain] combine Include and Exclude into Filter

Config now collect includes and Excludes into a single list and passed
each file to the Filter.isIncluded method, with the list of Filters,
to determine if a file should be included.
2019-06-08 20:31:20 +01:00
7ffa386b29
[core] MD5HashGenerator uses IO to return where there is file IO (#47)
* [core] MD5HashGenerator uses IO to return where there is file IO

This required that LocalFile in the domain module no longer be
supplied with a function to convert a File into an MD5Hash. Because
such a function requires reading the file it now must use IO, which we
don't allow in the domain module.

Unfortunate ripple effects out to users of MD5HashGenerator and
LocalFile.

* [aws-lib] Add own copy of test class MD5HashData
2019-06-08 18:19:15 +01:00
f6bf2700ff
Fetch all hashes (#45)
* [core] close files after calculating their MD5 hash

FileInputStream was never closed, so eventually ran into
ToManyFilesOpen.

Will come back to look at this again with IO.bracket for better
guarantee that FIS is closed.

Signed-off-by: Paul Campbell <pcampbell@kemitix.net>

* [aws-lib] Fetch all MD5 hashes under prefix

Initial request only returns the first 1000.
2019-06-08 11:44:22 +01:00
Scala Steward
07f12ac19f Update aws-java-sdk-s3 to 1.11.568 (#42) 2019-06-08 08:21:07 +01:00
Scala Steward
bb4e7cff4f Update scalamock to 4.2.0 (#43) 2019-06-08 08:20:33 +01:00
742a6f130f
Create LICENSE 2019-06-07 21:25:23 +01:00
aa7fb1eb24
Drop AWS SDK V2 client (#41)
* [sbt] add scalamock as a test dependency

* [aws-lib]SyncSuite: minor layout changes

* [aws-lib]SyncSuite: remove test

* [core] move SyncSuite to same module as subject it tests

* [aws-lib]ThorpS3Client: remove commented lines

* [aws-lib] remove PutObject versions of Uploader

* [aws-lib] rename to TransferManager to remove Multi-part from name

* [aws-lib]TransferManager: change logging prefix

* [aws-lib] convert logging classes to objects

* [aws-lib] convert ObjectLister to use V1 SDK

* [aws-lib] convert Copier to use V1 SDK

* [aws-lib] extract S3ObjectsBy{Hash,Key} to objects

* [aws-lib]S3ClientSuite: rewrite test using mocks

* [aws-lib]TransferManager rewrite using for-comprehension

* [aws-lib]Copier: remote bucket name from target remote key

* [aws-lib]TransferManager: refactor logging to use IO themselves

* [aws-lib] Remove test class MyAmazonS3

* [aws-lib]ObjectLister: optimise imports

* [aws-lib] S3ClientSuite remove commented code

* [aws-lib]ThropS3ClientSuite update to V1 api

* [aws-lib]S3ClientSuite: make test as pending

It works okay on its own, but when run as part of a suite it fails.

Will look at this again once all V2 SDK is removed.

* [aws-lib] convert Deleter to use V1 SDK

* [aws-lib] Client Logging remove redundant braces

* [aws-lib] stop injecting the V2 SDK

* [sbt] remove v2 SDK dependencies

* [aws-lib] remove redundant helpers for v2 SDK

* [sbt] upgrade aws jackson dependencies

The jackson libraries used by AWS have security flaws, but are Java 6
compatible, which AWS want to preserve.

* [aws-lib] clean up TransferManager tests
2019-06-07 21:17:14 +01:00
Scala Steward
908ac855ff Update aws-java-sdk-s3 to 1.11.567 (#38) 2019-06-07 21:04:00 +01:00
Scala Steward
e5078fd0b6 Update cats-effect to 1.3.1 (#39) 2019-06-07 21:03:45 +01:00
Scala Steward
96fc2812dd Update aws-java-sdk-s3 to 1.11.566 (#35) 2019-06-06 19:54:17 +01:00
8ec667343a
Improve Code Quality (#37)
* [core] convert QuoteStripper to an object and move to core

* [aws-lib]S3ClientUploader: use case matching instead of else if blocks

* [aws-lib] put imports at top of file

* [domain] remove redundant braces after class definition

* [aws-lib] remove redundant braces after class definition

* [core] avoid using head on a collection
2019-06-06 19:49:07 +01:00
f54c50aaf3
Split into subprojects (#36)
* [sbt] define existing single module project as legacyRoot

* [sbt] add empty cli module depending on legacyRoot

* [cli] move Main to cli module

* [cli] move ParseArgs to cli module

* [sbt] limit scope of scopt dependency to cli module

* [cli] moved logging config to cli module

* [cli] rename module directory

* [aws-api] added empty module

* [sbt] aggregate builds from cli

* [aws-lib] add empty module

* [core] add empty module

* [sbt] add comment graphing module dependencies

* [sbt] adjust module dependencies to reflect plan

Include legacyRoot at the base until it can be redistributed

* [legacy] make some awssdk classes non-private

during this transition, these classes being private would cause problems

* [aws-lib] create S3ClientBuilder

This is copied from the legacy S3Client companion object

* [domain] add empty module

* [domain] move Bucket into module

* [legacy] RemoteKey no longer has dependency on Config

* [domain] move RemoteKey into module

* [domain] move MD5Hash into module

* [legacy] LocalFile no longer had dependency on MD5HashGenerator

* [domain] move LocalFile into module

* [domain] mode LastModified into module

* [domain] move RemoteMetaData into module

* [domain] move S3MetaData into module

* [domain] move Exclude into module

* [domain] move Filter into module

* [domain] move KeyModified into module

* [domain] move HashModified into module

* [domain] RemoteKey.resolve added

* [domain] add dependency on scalatest

* [domain] LocalFile.resolve added

* [legacy] Remove UnitTest

* [legacy] optimise imports

* [domain] move S3ObjectsData moved into module

* [legacy] wrapper for using GeneralProgressListener

* [domain] move Config into module

* [sbt] move aws-api below legacyRoot in dependencies

This will allow use to move S3Client into the aws-api module

* [legacy] rename S3Client companion as S3ClientBuilder

Preparation to move this into its own file.

* Inject Logger via CLI (#34)

* [S3Client] refactor defaultClient()

* [S3Client] transfermanager explicitly uses the same s3client

* [S3ClientPutObjectUploader] refactor putObjectRequest creation

* [cli] copy in Logging trait as Logger class

* [cli] Main uses Logger

* [cli] simplify Logger and pass to Sync.run

* [legacy] SyncLogging converted to companion

* [cli] Logger info can more easily use levels again

* [legacy] LocalFileStream uses injected info

* [legacy] S3MetaDataEnricher remove unused Logging

* [legacy] ActionGenerator remove unused Logging

* [legacy] convert ActionGenerator to an object

* [legacy] import log methods from SyncLogging

* [legacy] move getS3Status from S3Client to S3MetaDataEnricher

* [legact] convert ActionsSubmitter to an object

* [legacy] convert LocalFileStream to an object

* [legacy] move Action case classes inside companion

* [legacy] move UploadEvent case classes inside companion and rename

* [legacy] move S3Action case classes into companion

* [legacy] convert Sync to an object

* [cli] Logger takes verbosity level at construction

No longer needs to be passed the whole Config implicitly for each info
call.

* [legacy] stop passing implicit Config for logging purposes

Pass a more specific implicit info: Int => String => Unit instead

* [legacy] remove DummyS3Client

* [legacy] remove Logging

* [legacy] convert MD5HashGenerator to an object

* [aws-api] move S3Client into module

* [legacy] convert KeyGenerator to an object

* [legacy] don't use IO.unsafeRunSync directly

* [legacy] refactor/rewrite Sync.run

* [legacy] Rewrite sort using a for-comprehension

* [legacy] Sync inline sorting

* [legacy] SyncLogging rename method

* [legacy] repair tests

* [sbt] move core module to a dependency of legacyRoot

* [sbt] add test dependencies to core module

* [core] move classes into module

* [aws-lib] move classes into module

* [sbt] remove legacy root
2019-06-06 19:24:15 +01:00
Scala Steward
b7e79c0b36 Update aws-java-sdk-s3 to 1.11.564 (#33) 2019-06-06 13:11:27 +01:00
bb283a12d4
Final case classes (#32)
* case classes are final

* [Counters] Extract to it's own file

* [LocalFile] allow overriding Hash
2019-06-01 22:43:26 +01:00
Scala Steward
0386fde322 Update aws-java-sdk-s3 to 1.11.563 (#31) 2019-05-31 09:34:44 +01:00
86bb802468 [readme] add note about broken native images 2019-05-30 18:38:23 +01:00
574d4c5885
Display upload progress (#29)
* [S3ClientMultiPartTransferManager] use request object

* [ActionSubmitter] unwrap RemoteKey in log messages

* [ActionSubmitter] rename variable

* [Logging] include log level in info messages

* [LocalFileStream] log when entering directory at level 2

* [UploadProgress{Listener,Logging}: add initial implementations

* [S3Client] def upload not requires an UploadProgressListener as a parameter

* [UploadProgressListener] rename method

* [S3ClientPutObjectUploader] Log upload progress for file <5Mb

Switched to using the AWS SDK V1 for PutObject as the V2 doesn't
support progress callbacks.

* Fix up tests

* Adjust logging levels
2019-05-30 16:59:37 +01:00
Scala Steward
602c5ef150 Update aws-java-sdk-s3 to 1.11.562 (#30) 2019-05-30 15:12:34 +01:00
Scala Steward
5011779007 Update aws-java-sdk-s3 to 1.11.561 (#26) 2019-05-29 09:42:40 +01:00
b9bc7dc957
Add filter to select files to be synced (#24)
* [Filter] added

* [Config] Add filters field

* [ParseArgs] Add '-f'/'--filter' parameters

* [LocalFileStream] apply filters

* [SyncLogging] show filter(s)

* [LocalFileStream] Don't apply filter to directories

The filter may match on a file within a directory, but if the filter
fails on the directory alone, then we weren't recursing into the
directory at all.
2019-05-28 12:24:09 +01:00
35e85702aa
Rename filter as exclude (#23)
* [Filter => Exclude] rename class

* [Config] rename filters field as excludes

* [readme,ParseArgs] change commandline arg f to x and filters to excludes

* [SyncSuite] rename val

* [ExcludeSuite] rename vars

* [SyncLogging] Update message
2019-05-27 23:23:13 +01:00
082babb94d
Use multi-part upload for large files (i.e. files > 5Mb) (#22)
* [ThorpS3Client] Extract QuoteStripper and S3ClientObjectLister

* [ThorpS3Client] Extract S3ClientUploader

* [ThorpS3Client] Extract S3ClientCopier

* [ThorpS3Client] Extract S3ClientDeleter

* [ThropS3Client] Can select upload strategy based on file size

Currently switches to an alternate that is a clone of the original
method.

* [MD5HashGenerator] Add md5FilePart

Reimplement md5File using md5FilePart

* [MyS3CatsIOClient] extracted

* [S3ClientMultiPartUploader] add tests for accept def

* [S3ClientMultiPartUploader] initiate multi-part upload

* [Md5HashGenerator] add tests reading part of a file = failing test

* [Md5HashGenerator] fix when reading part of a file

* [S3ClientMultiPartUploader] create UploadPartRequests

* [S3ClientMultiPartUploader] uploadPart delegates to an S3Client

* [S3ClientMultiPartUploader] uploadParts uploads each part

* [S3ClientMultiPartUploader] complete upload should completeUpload

* [S3ClientMultiPartUploader] upload file tests when all okay

* [S3ClientMultiPartUploader] Use Recording client in component tests

* [s3ClientMultiPartUploader] remove unused variable

* [S3ClientMultiPartUploader] failing test for init upload error

* [S3ClientMultiPartUploader] Handle errors during multi-part upload

* [S3ClientMultiPartUploader] Retry uploads

* [S3Action] ErroredS4Action now holds the error

* [S3ClientMultiPartUploader] Add logging

* [S3ClientMultiPartUploader] Display warning messages

* [S3ClientMultiPartUploader] test creation of CreateMulitpartUploadRequest

* [S3ClientMultiPartUploader] specify bucket in UploadPartRequest

* [S3ClientMultiPartUploader] verify complete request has upload id

* [S3ClientMultiPartUploader] verify abort request contains upload id

* [S3ClientMultiPartUploader] add logging around retry errors

* [S3ClientMultiPartUploader] verify upload part request had remote key

* [S3ClientMultipartuploaderLogging] refactoring/rewriting strings

* [S3ClientMultiPartUploader] add bucket to abort request

* [S3ClientMultiPartUploader] part numbers must start at 1

* [S3ClientMultiPartUploader] fix capitalisation in comment

* [Config] define maxRetries

* [S3ClientMultiPartUploader] abort request should have the remote key

* [S3ClientMultiPartUploader] display remote key properly

* [S3ClientMultiPartUploader] rename method for plural parts

* [S3ClientMultiPartUploader] log hash and part number

* [MD5HashGenerator] support creating hash from a byte array

* [sbt] add aws-java-sdk-s3 (v1) for multi-part uploads

The reactive-aws-s3-* library is based on the V2 of the Java library,
which doesn't support multi-part uploads.

* [S3ClientMultiPartUploader] use Amazon S3 Client (from v1 sdk)

* [S3ClientMultiPartUploader] include file and offset in upload part request

* {S3ClientMultiPartUploader] Add part etags to complete request

* [S3ClientMultiPartUploader] Use withers to create requests

* [S3ClientMultiPartUploader] don't bounce responses to tags when client accepts then as is

* [MD5HashGenerator] use MD5Hash

* [S3ClientMultiPartUploader] include hash in sending log message

* [S3ClientMultiPartUploader] tests throw correct exception

* [S3ClientMultiPartUploader] Include returned hash in error and log when send is finished

* [S3ClientUploader] Extract as trait, renaming implementations

* [S3Client] upload def now requires tryCount

* [S3ClientUploader] add accepts to trait

* [S3ClientMultiPartUploaderSuite] remove ambiguity over class import

* [S3ClientMultiPartTransferManager] implement and use
2019-05-27 20:37:59 +01:00
2ff5d68b4f [Sync] Log when starting to scan local files 2019-05-24 08:00:04 +01:00
44cf67c9cc [SyncLogging] just logs don't return an IO 2019-05-24 07:59:38 +01:00
4cff0dd0c9 fix up tests to handle new stream return types 2019-05-24 07:52:36 +01:00
a5311fec72 [sbt] remote unused fs2 Stream dependency 2019-05-24 07:49:19 +01:00
fa31882e51 [S3MetaDataEnricher,ActionSubmitter] return streams
Help to perpetuate the map/flatMap structure within for-comprehension
in Sync's run method.

Added DoNothing and DoNothingS3Action
2019-05-24 07:48:43 +01:00
bffc6c032c
Support multiple filters (#18)
* Support multiple filters

* Clean up imports

* [S3ClientLogging] log the remote key value

* Update changelog, readme and long arg name

* [SyncSuite] update test
2019-05-23 19:35:48 +01:00
37ac41093e
Improved S3Client logging (#17)
* [ThorpS3Client] Log event when event actually occurs

* [MD5HashGenerator] log activity reading md5 hash for local files

* [awssdk] Extract logging into S3ClientLogging

* [S3ClientLogging] raise logging levels

* [SyncLogging] Remove per-file logging

* [S3ClientLogging] More readable messages
2019-05-23 18:19:51 +01:00
0fe9b86471
Simple Exclusion Filter (#16)
* [filter] Parse filter from command line and add to config

* [filter] exclude file that match the filter
2019-05-23 09:21:09 +01:00
eacfc37095
Handle renames (#14)
* [sync] move thunks to s3client to bottom of class

Also, use the thunk methods from within run rather than accessing the
s3client object directly.

* Layout tweaks to put each parameter on own line

* [syncsuite] value renames and move sync.run outside it() call

Future tests will be evaluating the result of that call, so this
avoids repeatedly calling it.

* Add first pass at copy methods and some delete stubs

* [Bucket] Convert from type alias for String to a case class

* [SyncSuite] mark new tests as pending

* [RemoteKey] Convert from type alias for String to a case class

* [MD5Hash] Convert from type alias for String to a case class

* [LastModified] Convert from type alias for String to a case class

* [LocalFile] Revert to using a normal File

* [Sync] Use a for-comprehension and restructure S3MetaData

The for-comprehension will make it easier to generate multiple actions
out of the stream of enriched metadata. The restructured S3MetaData
avoids the need to wrap it in an Either in some cases.

* [ToUpload] Add an wrapper to indicate action required on File

* [S3Action] Stub actions for IO events

* [S3Action] Use UploadS3Action

* [Sync] Fix formating when echoing parameters

* [logging] Change log level down to 4 for listing every file considered

* [Sync] Use a case class to hold counters

* [HashModified] Add case class to replace MD5Hash, LastModified tuples

* [logging] Move file considered logging to source of files

Rather than logging this where adding meta data, move to where the
files are being initially identified.

* [logging] Log all final counters

* Pass Config and HashLookup as implicit parameters

* [LocalFileStream] rename method as findFiles

* [S3MetaDataEnricher] rename method as getMetadata

* Rename selection filter and uploader trait and methods

* [MD5HashGenerator] Extract as trait

* [Action] Convert ToUpload into an Action sealed trait

* [ActionGenerator] refactored and removed logging

* fix up tests

* [LocalFileStream] adjust logging

* [RemoteMetaData] Added

* [ActionGenerator] remove redundant braces

* [LocalFile] Added as wrapper for File

* [Sync] run: remove redundant braces

* [Sync] run: rename HashLookup as S3ObjectsData

* WIP - toward copy action

* Extract S3ObjectsByHash for grouping

* extract internal wrapper for S3CatsIOClient

Remove some boiler plate from the middle of a test

* Explicitly name the Map parameters in extected result

* All lastModified are the same to avoid confusion

We aren't testing this field, just that the keys and hash values are correct.

* Rename variable

* space out object cxreation

* Fix test - error in expected result

Code has been working for ages!

* [readme] condense and simplify behaviour table, adding option delete

Reduce the complexity by only noting the distinct attributes leading
to each action.

Add the action of delete when a local file is missing.

* [S3MetaDataEnricherSuite] rename tests and note missing tests

* [ActionGeneratorSuite] rename tests and note missing tests

* Note unwritten tests as such

* [ActionGenerator]  #2 local exists, remote is missing, other matches

* [S3ClientSuite] fix tests

* [S3MetaDataEnricherSuite] #2a local exists, remote is missing, remote matches, other matches - copy

* [S3MetaDataEnricherSuite] drop 'remote is missing, remote matches'

Impossible to represent this combination

* [S3MetaDataEnricherSuite] #3 local exists, remote is missing, remote no match, other no matches - upload

* [S3MetaDataEnricherSuite] Tests #1-3 rename variables consistantly

* [S3MetadataEnricherSuite] #4 local exists, remote exists, remote no match, other matches - copy

* [S3MetadataEnricherSuite] #5 local exists, remote exists, remote no match, other no matches - upload

* [S3MetadataEnricherSuite] drop test #6 - no way to make request

* [ActionGeneratorSuite] standardise tests 2-4

* [ActionGeneratorSuite] #1 local exists, remote exists, remote matches - do nothing

* [ActionGeneratorSuite] Comment expected outcome

* [ActionGeneratorSuite] #5 local exists, remote exists, remote no match, other no matches - upload

* [Action] Add ToDelete case class

* Use ToDelete and fix up return types for DeleteS3Action

* [ActionGenerator] Add explicit case for #1

* [ActionGenerator] Add explicit check for local exists in #2

* [ActionGenerator] match case against #3

* [ActionGenerator] simplify case and match against #5

* [ActionGenerator] Add case for #4

* [ActionGenerator] Remote explicit checks for file existing

If we are called with a LocalFile parameter then we assume the file exists.

* [ActionGenerator] Avoid #1 matching condition #5

* [ActionGeneratorSuite] enable tests

* [test] remove stray println

* [SyncSuite] Add test helper RecordingSync

* [SyncSuite] Use RecordingSync

* [SyncSuite] enable rename test - excluding delete test

* [Sync] log and increment counters for copy and delete

* [Sync] Use case matched RemoteKey in log message

* [Sync] Reorder actioins to do copy then upload then delete

* [S3Action] Drop Move as a distinct action

Can be implemented as a Copy followed by a Delete.

* [S3Action] Actions are ordered Copy, Upload then Delete

This allows sequencing of actions so that all the quick to accomplish
copies take place before bandwidth/time costly updates or destructive
deletes. Deletes come last after they have had the opportunity to b
used as the source for any copies.

* [Sync] Use S3Action's default sorting

* [Sync] extract logging of activity

* [SyncLogging] Extract logging out of Sync

Single Responsibility principle - Sync knows nothing about how it
logs, it just delegates to SyncLogging.

* [Sync] Rename variables and extract sort into private def

* [SyncLogging] Use IO context

* [SyncLogging] Remove moved counter

* [SyncLogging] Clean up an log start of run config info

* Verify that IO actions are evaluated before the program terminates

* [Sync] ensure logging runs

* [ActionGenerator] Don't upload files every time

* [ActionGenerator] fix remote hash for #5

* [SyncSuite] Add tests for delete and delete after rename

* [RemoteKey] Add asFile and isMissingLocally helpers

* [Sync] Generate delete actions

* Remove old extensions upon MD5HashGenerator

* [MD5Hash] prevent confusion by never allowing quotes

This means we need to filter quotes from md5hash values at source

* [Sync] ensure start log message is run

* [ThorpS3Client] Fix passing parameters for source key

* [ThorpS3Client] reformat byKey for clarity

* [S3Client] Add level 5 logging around s3 sdk calls

* fix up tests
2019-05-22 13:55:03 +01:00
00743c425c
Add configurable logging levels, selected from command line argument (#12)
* [config,parseargs] Accept v/verbose command line argument

* [parseargs] lowercase program name

* [logging] Log messages based on command line argument

* [readme] update usage
2019-05-16 21:59:40 +01:00
74afb288cc
[localfilestream] Compare test files within a Set (#11)
Fixes #10 

* [localfilestream] Compare test files within a Set

Removes issue of files being read in different orders.

* [localfilestream] add missing parameter type
2019-05-16 19:59:06 +01:00
e834702923
Merge pull request #9 from kemitix/gh2-isolate-awssdk-in-tests
Configure travis to fake enough AWS SDK to run tests
2019-05-16 19:44:19 +01:00
65ca11e2fa [travis] define AWS_REGION environment variable 2019-05-16 19:28:50 +01:00
56a45b6e2a [readme] Move to do items to Github issues 2019-05-16 17:09:27 +01:00
608b9a9e7f [readme] minor tweaking 2019-05-16 16:40:33 +01:00
d66e450cd8 [changelog] Added 2019-05-16 16:37:25 +01:00
ed6550e134 [sync] use listObjects and show count of files uploaded at end 2019-05-16 16:09:32 +01:00
74be5ec1ac [awssdk] add listObjects 2019-05-15 07:06:10 +01:00
64bf42921d [awssdk] Typo/rename class Throp* => Thorp* 2019-05-14 20:14:08 +01:00
40848882f8
Merge pull request #1 from scala-steward/update/slf4j-log4j12-1.7.26
Update slf4j-log4j12 to 1.7.26
2019-05-14 07:38:19 +01:00
ac8cb6241c [travis] Add minimal config file 2019-05-14 07:34:59 +01:00
2397c178eb [gitignore] ignore zip files 2019-05-14 07:27:14 +01:00
1f9acbe386 [github] Add stale configuration 2019-05-14 07:05:48 +01:00
Scala steward
f296fc05f2
Update slf4j-log4j12 to 1.7.26 2019-05-14 04:45:44 +02:00
1c3e2676d6 [readme] update to do list 2019-05-13 22:52:33 +01:00
419a9f7c36 [gitignore] ignore any generated native-image 2019-05-13 22:49:50 +01:00
64b6585f47 [readme] add instructions to create native image 2019-05-13 22:49:50 +01:00
11cbcb2312 Use logging in place of println 2019-05-11 20:18:55 +01:00