paperoni

Archived

Author	SHA1	Message	Date
Kenneth Gitere	cae9227ab0	Update documentation	2021-04-30 06:55:02 +03:00
Kenneth Gitere	c00582ac29	Fix verbosity levels ordering	2021-04-30 06:42:08 +03:00
Kenneth Gitere	ae52cc4e13	Add features for logging and cli - display of partial downloads in the summary - custom file name that is displayed after the summary ensuring it is visible - log-to-file flag which specifies that logs will be sent to the default directory - verbose flag (v) used to configure the log levels - disabling the progress bars when logging to the terminal is active	2021-04-29 20:02:08 +03:00
Kenneth Gitere	00d704fdd6	Move initializing logger to logs module	2021-04-28 07:47:45 +03:00
Kenneth Gitere	36c3eb65c6	Add appendix page for listing the source of the article	2021-04-28 07:46:07 +03:00
Kenneth Gitere	088699b2c3	Add debug flag	2021-04-24 15:50:43 +03:00
Kenneth Gitere	a9787d7b5a	Add colored output and configuring of a paperoni root directory for logs	2021-04-24 15:13:44 +03:00
Kenneth Gitere	65f8ebda56	Add logs crate for dealing with printing out the final download summary	2021-04-24 13:58:03 +03:00
Kenneth Gitere	a3de3fb6ff	Add ImgError struct for representing errors in downloading article images	2021-04-24 13:57:06 +03:00
Kenneth Gitere	910c45abf7	Add logging configured to send to a file by default	2021-04-24 13:56:02 +03:00
Kenneth Gitere	c0323a6ae4	Minor refactor and add non zero exit upon failure to download any article - Move printing of the successfully downloaded articles into main.rs - Add summary text	2021-04-24 09:00:18 +03:00
Kenneth Gitere	b496abb576	Fix serialization issue with poorly defined attribute names	2021-04-22 19:00:32 +03:00
Kenneth Gitere	313041a109	Update dependencies and restore redirect middleware in `download_images`	2021-04-22 18:01:23 +03:00
Kenneth Gitere	960f114dc6	Minor fixes in moz_readability - swap unwrap for if let statement in `get_article_metadata` - add default when extracting the title from a possible `<title>` element - fix extracting alternative titles from h1 tags	2021-04-21 19:52:41 +03:00
Kenneth Gitere	dbac7c3b69	Refactor `grab_article` to return a Result - Add ReadabilityError field - Refactor `article` getter in Extractor to return a &NodeRef. This relies on the assumption that the article has already been parsed and should otherwise panic.	2021-04-21 19:11:57 +03:00
Kenneth Gitere	ae1ddb9386	Add printing of table for failed article downloads - Map errors in `fetch_html` to include the source url - Change `article_link` to `article_source` - Add `Into` conversion for `UTF8Error` - Collect errors in `generate_epubs` for displaying in a table	2021-04-20 21:33:24 +03:00
Kenneth Gitere	60fb30e8a2	Add url field in Extractor struct	2021-04-20 21:06:54 +03:00
Kenneth Gitere	b217448601	Add printing of tables upon successful extraction	2021-04-20 14:02:56 +03:00
Kenneth Gitere	04a1eed4e2	Add progress indicators for the cli	2021-04-17 17:28:07 +03:00
Kenneth Gitere	217cd3e442	Minor refactor Change cli to grab version from the Cargo manifest Rename fetch_url to fetch_html	2021-04-17 12:37:53 +03:00
Kenneth Gitere	7e9dcfc2b7	Add custom error types and ignore failed image downloads Using this custom error type, many instances of unwrap are replaced with mapping to errors that are then logged in main.rs. This allows paperoni to stop crashing when downloading articles when the errors are possibly recoverable or should not affect other downloads. This subsequently introduces ignoring the failed image downloads and instead leaving the original URLs intact.	2021-04-17 12:04:06 +03:00
Kenneth Gitere	d6cbbe405b	Fix bug in `inline_css_str_to_map`	2021-04-14 18:07:39 +03:00
Kenneth Gitere	2762bc5086	Merge pull request #7 from hipstermojo/dev Update README	2021-02-24 13:28:56 +03:00
Kenneth Gitere	b8c0cf29f1	Update README	2021-02-24 13:27:43 +03:00
Kenneth Gitere	e9f96d2970	Merge pull request #6 from hipstermojo/dev Update to 0.3.0	2021-02-24 13:13:36 +03:00
Kenneth Gitere	165b2187be	Bump version	2021-02-24 13:03:52 +03:00
Kenneth Gitere	912bc9d915	Add flag for configuring maximum concurrent requests Change printing macro for error messages to go out to stderr	2021-02-21 13:11:26 +03:00
Kenneth Gitere	b0c4c47413	Add support for merging articles into a single epub This is still experimental as it lacks validation of the target file name	2021-02-11 13:51:21 +03:00
Kenneth Gitere	f0a610c2ac	Bug fix with empty titles The code for title retrieval previously assumed that meta tags concerned with the title would always contain a value but some sites leave the value empty thus it had to be checked for as well.	2021-02-09 12:56:07 +03:00
Kenneth Gitere	65fdd967c1	Refactor image downloading and update README Image downloads uses streams instead of spawned tasks to ensure that it does not start an unbounded number of spawned tasks	2021-02-09 10:34:35 +03:00
Kenneth Gitere	003953332f	Refactor downloading of HTML pages This change allows for parallel downloads of HTML pages upto a maximum number of concurrent HTTP requests which is more efficient than before where all HTTP requests are likely to begin at the same time.	2021-02-06 17:06:03 +03:00
Kenneth Gitere	6b62051942	Add `replace_metadata_value` function	2021-02-06 13:53:04 +03:00
Kenneth Gitere	b402472ba6	Add http and epub modules	2021-02-06 12:59:03 +03:00
Kenneth Gitere	08f847531f	Remove empty lines when reading from an input file	2021-02-03 07:39:51 +03:00
Kenneth Gitere	3d56023592	Add -f flag for adding links from a file instead of needing to use cat	2021-02-01 11:31:24 +03:00
Kenneth Gitere	c82071a871	Merge pull request #5 from hipstermojo/dev Merge 0.2.2-alpha-1	2021-01-24 18:00:50 +03:00
Kenneth Gitere	b98c0a69a6	Bump version	2021-01-24 17:54:33 +03:00
Kenneth Gitere	21c3ffd922	Refactor fetch_url This adds: - More validation of responses to ensure the HTML response is valid. - Better handling of redirecting URLs which allows for fetching of links proxied to Medium.	2021-01-24 17:52:31 +03:00
Kenneth Gitere	1dc7b3432b	Bug fixes The bug fixes include: - `<html>` nodes being added to the replaced image when `unwrap_noscript_tags` is called. - Remove `srcset` attribute of <img> tags after downloading the image. This prevented readers like Foliate from displaying the downloaded image	2021-01-12 10:27:46 +03:00
Kenneth Gitere	ca1f9e2800	Merge pull request #4 from hipstermojo/dev Update to 0.2.1-alpha1	2020-12-24 14:11:42 +03:00
Kenneth Gitere	8407c613df	Bug fixes - Prevent downloading images with base64 strings as the source - Add escaping of quotation characters in the serializer - Disable redirects when downloading images which fails on multiple sites - Remove invalid characters for making the epub export file name - Fix version number in release	2020-12-24 14:03:36 +03:00
Kenneth Gitere	3c7dc9a416	Merge pull request #3 from hipstermojo/dev 0.2.0 update	2020-11-24 18:42:29 +03:00
Kenneth Gitere	3bfa82ba60	Update README and version	2020-11-24 18:39:51 +03:00
Kenneth Gitere	725c73c83f	Add basic redirect provided by surf and early exit of the program if the response is not a 200	2020-11-24 18:31:16 +03:00
Kenneth Gitere	5f99bddc10	Add custom serializer for XHTML	2020-11-24 14:54:23 +03:00
Kenneth Gitere	37cb4e1fd2	Change from structopt to clap This allows printing the help message if no args are passed	2020-11-24 09:58:50 +03:00
Kenneth Gitere	cdfbc2b3f6	Refactor inline_css_str_to_map to use a better tokenizer	2020-11-24 08:29:00 +03:00
Kenneth Gitere	aff4054ca9	Update crates and fix bugs The bug fixes are for: - <base> elements with "/" as the href - articles containing an ampersand in the title which would create corrupted manifest files.	2020-11-23 15:55:58 +03:00
Kenneth Gitere	ef3efdba81	Refactor to use temp directory and update surf Change from using res directory for image downloads to using temp directories. Update surf to v2 which required changing the way Content-Type headers are read from.	2020-11-23 13:38:58 +03:00
Kenneth Gitere	ab800d0174	Bug fix and add printing of the name of the extracted EPUB The fix prevents creating the res directory if it already exists	2020-11-23 09:06:13 +03:00

1 2 3

142 commits