S3 Sync
Find a file
2019-05-06 17:15:43 +01:00
src/main/scala/net/kemitix/s3thorp [config] Add stub config class 2019-05-06 17:15:43 +01:00
.gitignore Initial commit 2019-04-29 20:05:41 +01:00
build.sbt [sbt] Add cats-effect recommended scalaOptions 2019-05-06 17:15:43 +01:00
README.org [readme] Add impression of aws-s3-sync-by-hash process 2019-05-06 17:15:43 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Based on Alex Kudlick's JavaScript implementation aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

How does aws-s3-sync-by-hash do it?

The following is a rough, first draft, pseudo-scala, impression of the process.

constructor

val options = Load command line arguments and AWS security keys.

def sync(): Promise[Upload]

val uploadPromise = createUploadPromise() if options contains delete then createDeletePromise() else return uploadPromise

def createUploadPromise(): Promise[Upload]

readdir(options(root)) loadS3MetaData filterByHash uploadFile callback(file > uploadedFiles + file)

def loadS3MetaData: Stream[S3MetaData]

HEAD(bucket, key) map (metadata => S3MetaData(localFile, bucket, key, metadata.hash, metadata.lastModified))

def filterByHash(p: S3MetaData => Boolean): Stream[S3MetaData]

md5File(localFile) filter(localHash > options.force || localHash ! metadataHash)

def uploadFile(upload: Upload): IO[Unit]

S3Upload(bucket, key, localFile)

def createDeletePromise(): Promise[Delete]

S3AllKeys(bucket, key) filter(remoteKey => localFileExists(remoteFile).negate)

def deleteFile(delete: Delete): IO[Unit]

S3Delete(bucket, key, remoteKey)