paperoni

Author	SHA1	Message	Date
Kenneth Gitere	1cbbc7527f	Update version	2021-05-24 20:33:05 +03:00
Kenneth Gitere	538a65f6fd	Update dependencies in lockfile	2021-04-30 08:34:09 +03:00
Kenneth Gitere	cae9227ab0	Update documentation	2021-04-30 06:55:02 +03:00
Kenneth Gitere	ae52cc4e13	Add features for logging and cli - display of partial downloads in the summary - custom file name that is displayed after the summary ensuring it is visible - log-to-file flag which specifies that logs will be sent to the default directory - verbose flag (v) used to configure the log levels - disabling the progress bars when logging to the terminal is active	2021-04-29 20:02:08 +03:00
Kenneth Gitere	00d704fdd6	Move initializing logger to logs module	2021-04-28 07:47:45 +03:00
Kenneth Gitere	a9787d7b5a	Add colored output and configuring of a paperoni root directory for logs	2021-04-24 15:13:44 +03:00
Kenneth Gitere	910c45abf7	Add logging configured to send to a file by default	2021-04-24 13:56:02 +03:00
Kenneth Gitere	313041a109	Update dependencies and restore redirect middleware in `download_images`	2021-04-22 18:01:23 +03:00
Kenneth Gitere	b217448601	Add printing of tables upon successful extraction	2021-04-20 14:02:56 +03:00
Kenneth Gitere	04a1eed4e2	Add progress indicators for the cli	2021-04-17 17:28:07 +03:00
Kenneth Gitere	7e9dcfc2b7	Add custom error types and ignore failed image downloads Using this custom error type, many instances of unwrap are replaced with mapping to errors that are then logged in main.rs. This allows paperoni to stop crashing when downloading articles when the errors are possibly recoverable or should not affect other downloads. This subsequently introduces ignoring the failed image downloads and instead leaving the original URLs intact.	2021-04-17 12:04:06 +03:00
Kenneth Gitere	165b2187be	Bump version	2021-02-24 13:03:52 +03:00
Kenneth Gitere	003953332f	Refactor downloading of HTML pages This change allows for parallel downloads of HTML pages upto a maximum number of concurrent HTTP requests which is more efficient than before where all HTTP requests are likely to begin at the same time.	2021-02-06 17:06:03 +03:00
Kenneth Gitere	b98c0a69a6	Bump version	2021-01-24 17:54:33 +03:00
Kenneth Gitere	8407c613df	Bug fixes - Prevent downloading images with base64 strings as the source - Add escaping of quotation characters in the serializer - Disable redirects when downloading images which fails on multiple sites - Remove invalid characters for making the epub export file name - Fix version number in release	2020-12-24 14:03:36 +03:00
Kenneth Gitere	3bfa82ba60	Update README and version	2020-11-24 18:39:51 +03:00
Kenneth Gitere	37cb4e1fd2	Change from structopt to clap This allows printing the help message if no args are passed	2020-11-24 09:58:50 +03:00
Kenneth Gitere	ef3efdba81	Refactor to use temp directory and update surf Change from using res directory for image downloads to using temp directories. Update surf to v2 which required changing the way Content-Type headers are read from.	2020-11-23 13:38:58 +03:00
Kenneth Gitere	be48cc1e47	Fix alignment in README Update manifest file Add fix in serialized file to have self closing tags which is invalid xhtml	2020-10-22 19:18:18 +03:00
Kenneth Gitere	87ff21b676	Add regex and lazy_static crates	2020-10-07 20:44:35 +03:00
Kenneth Gitere	e1debf5630	Add moz_readability initial code and accompanying unit tests This currently contains the preprocessing code of the Readability. It is a port of Readability.js by Mozilla.	2020-08-31 19:30:09 +03:00
Kenneth Gitere	9f56c58dd9	Add simple CLI wrapper	2020-05-16 10:09:44 +03:00
Kenneth Gitere	4e8812c1ee	Add first attempt to save an epub file	2020-05-02 19:25:31 +03:00
Kenneth Gitere	78ba40f57a	Add image download functionality	2020-05-02 18:33:45 +03:00
Kenneth Gitere	4527fb07d9	Initial extraction code to get meta information on a blog	2020-04-30 11:05:53 +03:00

25 commits