Commit graph

25 commits

Author SHA1 Message Date
Kenneth Gitere
1cbbc7527f Update version 2021-05-24 20:33:05 +03:00
Kenneth Gitere
538a65f6fd Update dependencies in lockfile 2021-04-30 08:34:09 +03:00
Kenneth Gitere
cae9227ab0 Update documentation 2021-04-30 06:55:02 +03:00
Kenneth Gitere
ae52cc4e13 Add features for logging and cli
- display of partial downloads in the summary
- custom file name that is displayed after the summary ensuring it is visible
- log-to-file flag which specifies that logs will be sent to the default directory
- verbose flag (v) used to configure the log levels
- disabling the progress bars when logging to the terminal is active
2021-04-29 20:02:08 +03:00
Kenneth Gitere
00d704fdd6 Move initializing logger to logs module 2021-04-28 07:47:45 +03:00
Kenneth Gitere
a9787d7b5a Add colored output and configuring of a paperoni root directory for logs 2021-04-24 15:13:44 +03:00
Kenneth Gitere
910c45abf7 Add logging configured to send to a file by default 2021-04-24 13:56:02 +03:00
Kenneth Gitere
313041a109 Update dependencies and restore redirect middleware in download_images 2021-04-22 18:01:23 +03:00
Kenneth Gitere
b217448601 Add printing of tables upon successful extraction 2021-04-20 14:02:56 +03:00
Kenneth Gitere
04a1eed4e2 Add progress indicators for the cli 2021-04-17 17:28:07 +03:00
Kenneth Gitere
7e9dcfc2b7 Add custom error types and ignore failed image downloads
Using this custom error type, many instances of unwrap are replaced
with mapping to errors that are then logged in main.rs. This allows
paperoni to stop crashing when downloading articles when the errors
are possibly recoverable or should not affect other downloads.

This subsequently introduces ignoring the failed image downloads
and instead leaving the original URLs intact.
2021-04-17 12:04:06 +03:00
Kenneth Gitere
165b2187be Bump version 2021-02-24 13:03:52 +03:00
Kenneth Gitere
003953332f Refactor downloading of HTML pages
This change allows for parallel downloads of HTML pages upto a maximum
number of concurrent HTTP requests which is more efficient than
before where all HTTP requests are likely to begin at the same time.
2021-02-06 17:06:03 +03:00
Kenneth Gitere
b98c0a69a6 Bump version 2021-01-24 17:54:33 +03:00
Kenneth Gitere
8407c613df Bug fixes
- Prevent downloading images with base64 strings as the source
- Add escaping of quotation characters in the serializer
- Disable redirects when downloading images which fails on multiple sites
- Remove invalid characters for making the epub export file name
- Fix version number in release
2020-12-24 14:03:36 +03:00
Kenneth Gitere
3bfa82ba60 Update README and version 2020-11-24 18:39:51 +03:00
Kenneth Gitere
37cb4e1fd2 Change from structopt to clap
This allows printing the help message if no args are passed
2020-11-24 09:58:50 +03:00
Kenneth Gitere
ef3efdba81 Refactor to use temp directory and update surf
Change from using res directory for image downloads to using temp directories.
Update surf to v2 which required changing the way Content-Type headers are
read from.
2020-11-23 13:38:58 +03:00
Kenneth Gitere
be48cc1e47 Fix alignment in README
Update manifest file
Add fix in serialized file to have self closing tags which is invalid
xhtml
2020-10-22 19:18:18 +03:00
Kenneth Gitere
87ff21b676 Add regex and lazy_static crates 2020-10-07 20:44:35 +03:00
Kenneth Gitere
e1debf5630 Add moz_readability initial code and accompanying unit tests
This currently contains the preprocessing code of the Readability.
It is a port of Readability.js by Mozilla.
2020-08-31 19:30:09 +03:00
Kenneth Gitere
9f56c58dd9 Add simple CLI wrapper 2020-05-16 10:09:44 +03:00
Kenneth Gitere
4e8812c1ee Add first attempt to save an epub file 2020-05-02 19:25:31 +03:00
Kenneth Gitere
78ba40f57a Add image download functionality 2020-05-02 18:33:45 +03:00
Kenneth Gitere
4527fb07d9 Initial extraction code to get meta information on a blog 2020-04-30 11:05:53 +03:00