thorp

S3 Sync

Find a file

Paul Campbell d45f590c15 [config] Add stub config class		2019-05-06 17:15:43 +01:00
src/main/scala/net/kemitix/s3thorp	[config] Add stub config class	2019-05-06 17:15:43 +01:00
.gitignore	Initial commit	2019-04-29 20:05:41 +01:00
build.sbt	[sbt] Add cats-effect recommended scalaOptions	2019-05-06 17:15:43 +01:00
README.org	[readme] Add impression of aws-s3-sync-by-hash process	2019-05-06 17:15:43 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Based on Alex Kudlick's JavaScript implementation aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

How does aws-s3-sync-by-hash do it?

The following is a rough, first draft, pseudo-scala, impression of the process.

constructor

val options = Load command line arguments and AWS security keys.

def sync(): Promise[Upload]

val uploadPromise = createUploadPromise() if options contains delete then createDeletePromise() else return uploadPromise

def createUploadPromise(): Promise[Upload]

readdir(options(root)) loadS3MetaData filterByHash uploadFile callback(file > uploadedFiles + file)

def loadS3MetaData: Stream[S3MetaData]

HEAD(bucket, key) map (metadata => S3MetaData(localFile, bucket, key, metadata.hash, metadata.lastModified))

def filterByHash(p: S3MetaData => Boolean): Stream[S3MetaData]

md5File(localFile) filter(localHash > options.force || localHash ! metadataHash)

def uploadFile(upload: Upload): IO[Unit]

S3Upload(bucket, key, localFile)

def createDeletePromise(): Promise[Delete]

S3AllKeys(bucket, key) filter(remoteKey => localFileExists(remoteFile).negate)

def deleteFile(delete: Delete): IO[Unit]

S3Delete(bucket, key, remoteKey)