S3 Sync
Find a file
Paul Campbell 7af4004c75 [s3client] objectHead returns an IO[Option[...]]
If the remote file is missing then return None.

S3MetaDataEnricher.enrichWithS3MetaData now returns an IO[Either[File,
S3MetaData]]. If objectHead returns None, the this returns the file,
otherwise, the Some[Hash, LastModified] from objectHead is used to
create the S3MetaData as before.
2019-05-09 07:11:27 +01:00
src [s3client] objectHead returns an IO[Option[...]] 2019-05-09 07:11:27 +01:00
.gitignore [git] update gitignore 2019-05-06 17:15:43 +01:00
build.sbt [sbt] add reactive-aws-s3 1.1.3 as dependency 2019-05-07 08:58:22 +01:00
README.org [readme] Add impression of aws-s3-sync-by-hash process 2019-05-06 17:15:43 +01:00

s3thorp

Synchronisation of files with S3 using the hash of the file contents.

Based on Alex Kudlick's JavaScript implementation aws-s3-sync-by-hash.

The normal aws s3 sync ... command only uses the time stamp of files to decide what files need to be copied. This utility looks at the md5 hash of the file contents.

How does aws-s3-sync-by-hash do it?

The following is a rough, first draft, pseudo-scala, impression of the process.

constructor

val options = Load command line arguments and AWS security keys.

def sync(): Promise[Upload]

val uploadPromise = createUploadPromise() if options contains delete then createDeletePromise() else return uploadPromise

def createUploadPromise(): Promise[Upload]

readdir(options(root)) loadS3MetaData filterByHash uploadFile callback(file > uploadedFiles + file)

def loadS3MetaData: Stream[S3MetaData]

HEAD(bucket, key) map (metadata => S3MetaData(localFile, bucket, key, metadata.hash, metadata.lastModified))

def filterByHash(p: S3MetaData => Boolean): Stream[S3MetaData]

md5File(localFile) filter(localHash > options.force || localHash ! metadataHash)

def uploadFile(upload: Upload): IO[Unit]

S3Upload(bucket, key, localFile)

def createDeletePromise(): Promise[Delete]

S3AllKeys(bucket, key) filter(remoteKey => localFileExists(remoteFile).negate)

def deleteFile(delete: Delete): IO[Unit]

S3Delete(bucket, key, remoteKey)