paperoni

Archived

Author	SHA1	Message	Date
KOVACS Tamas	8ec491ff06	http.rs: check response status for fetched images This patch checks if fetching an image resulted in a non-success status code. In case of non-success status, the response is discarded and an error is emitted. This relies on having 3xx codes already handled by surf's Redirect middleware, so we should see 4xx and 5xx codes here. Fixes hipstermojo/paperoni#11	2021-05-09 14:35:55 +02:00
KOVACS Tamas	4581f07330	http.rs: extract process_img_response function	2021-05-08 21:32:15 +02:00
Kenneth Gitere	a3de3fb6ff	Add ImgError struct for representing errors in downloading article images	2021-04-24 13:57:06 +03:00
Kenneth Gitere	910c45abf7	Add logging configured to send to a file by default	2021-04-24 13:56:02 +03:00
Kenneth Gitere	313041a109	Update dependencies and restore redirect middleware in `download_images`	2021-04-22 18:01:23 +03:00
Kenneth Gitere	dbac7c3b69	Refactor `grab_article` to return a Result - Add ReadabilityError field - Refactor `article` getter in Extractor to return a &NodeRef. This relies on the assumption that the article has already been parsed and should otherwise panic.	2021-04-21 19:11:57 +03:00
Kenneth Gitere	ae1ddb9386	Add printing of table for failed article downloads - Map errors in `fetch_html` to include the source url - Change `article_link` to `article_source` - Add `Into` conversion for `UTF8Error` - Collect errors in `generate_epubs` for displaying in a table	2021-04-20 21:33:24 +03:00
Kenneth Gitere	04a1eed4e2	Add progress indicators for the cli	2021-04-17 17:28:07 +03:00
Kenneth Gitere	217cd3e442	Minor refactor Change cli to grab version from the Cargo manifest Rename fetch_url to fetch_html	2021-04-17 12:37:53 +03:00
Kenneth Gitere	7e9dcfc2b7	Add custom error types and ignore failed image downloads Using this custom error type, many instances of unwrap are replaced with mapping to errors that are then logged in main.rs. This allows paperoni to stop crashing when downloading articles when the errors are possibly recoverable or should not affect other downloads. This subsequently introduces ignoring the failed image downloads and instead leaving the original URLs intact.	2021-04-17 12:04:06 +03:00
Kenneth Gitere	65fdd967c1	Refactor image downloading and update README Image downloads uses streams instead of spawned tasks to ensure that it does not start an unbounded number of spawned tasks	2021-02-09 10:34:35 +03:00
Kenneth Gitere	003953332f	Refactor downloading of HTML pages This change allows for parallel downloads of HTML pages upto a maximum number of concurrent HTTP requests which is more efficient than before where all HTTP requests are likely to begin at the same time.	2021-02-06 17:06:03 +03:00
Kenneth Gitere	b402472ba6	Add http and epub modules	2021-02-06 12:59:03 +03:00

13 commits