Commit graph

17 commits

Author SHA1 Message Date
Kenneth Gitere
07479afeac refactor: refactor update_imgs_base64
chore: add doc comment on ResourceType alias

fix: add error when image MIME type is invalid on an image
2021-07-28 10:00:45 +03:00
Kenneth Gitere
e6f901eb5a refactor: rename Extractor to Article 2021-07-24 12:43:40 +03:00
Kenneth Gitere
5fbfb9c806 refactor: move download function to http module
feat:  add rendering of table for partial downloads
feat:  add help message for enabling --log-to-file
chore: format flags to kebab-case and shorten --output-directory flag
2021-06-08 07:58:52 +03:00
Mikhail Gorbachev
13ad14e73d Add output_dir to cli argument
- Add `output_dir` to cli argument
    - This argument allows you to save output files in a special folder, not just current dir
- Refactor 'cli.rs'
    - Add `Builder` for `AppConfig`
    - Add `Error` instead separated panics
- Upgrade dependencies
2021-06-01 18:18:14 +03:00
KOVACS Tamas
8ec491ff06 http.rs: check response status for fetched images
This patch checks if fetching an image resulted in a non-success status
code. In case of non-success status, the response is discarded and an
error is emitted.

This relies on having 3xx codes already handled by surf's Redirect
middleware, so we should see 4xx and 5xx codes here.

Fixes hipstermojo/paperoni#11
2021-05-09 14:35:55 +02:00
KOVACS Tamas
4581f07330 http.rs: extract process_img_response function 2021-05-08 21:32:15 +02:00
Kenneth Gitere
a3de3fb6ff Add ImgError struct for representing errors in downloading article images 2021-04-24 13:57:06 +03:00
Kenneth Gitere
910c45abf7 Add logging configured to send to a file by default 2021-04-24 13:56:02 +03:00
Kenneth Gitere
313041a109 Update dependencies and restore redirect middleware in download_images 2021-04-22 18:01:23 +03:00
Kenneth Gitere
dbac7c3b69 Refactor grab_article to return a Result
- Add ReadabilityError field
- Refactor `article` getter in Extractor to return a &NodeRef. This
  relies on the assumption that the article has already been parsed
  and should otherwise panic.
2021-04-21 19:11:57 +03:00
Kenneth Gitere
ae1ddb9386 Add printing of table for failed article downloads
- Map errors in `fetch_html` to include the source url
- Change `article_link` to `article_source`
- Add `Into` conversion for `UTF8Error`
- Collect errors in `generate_epubs` for displaying in a table
2021-04-20 21:33:24 +03:00
Kenneth Gitere
04a1eed4e2 Add progress indicators for the cli 2021-04-17 17:28:07 +03:00
Kenneth Gitere
217cd3e442 Minor refactor
Change cli to grab version from the Cargo manifest
Rename fetch_url to fetch_html
2021-04-17 12:37:53 +03:00
Kenneth Gitere
7e9dcfc2b7 Add custom error types and ignore failed image downloads
Using this custom error type, many instances of unwrap are replaced
with mapping to errors that are then logged in main.rs. This allows
paperoni to stop crashing when downloading articles when the errors
are possibly recoverable or should not affect other downloads.

This subsequently introduces ignoring the failed image downloads
and instead leaving the original URLs intact.
2021-04-17 12:04:06 +03:00
Kenneth Gitere
65fdd967c1 Refactor image downloading and update README
Image downloads uses streams instead of spawned tasks to ensure that
it does not start an unbounded number of spawned tasks
2021-02-09 10:34:35 +03:00
Kenneth Gitere
003953332f Refactor downloading of HTML pages
This change allows for parallel downloads of HTML pages upto a maximum
number of concurrent HTTP requests which is more efficient than
before where all HTTP requests are likely to begin at the same time.
2021-02-06 17:06:03 +03:00
Kenneth Gitere
b402472ba6 Add http and epub modules 2021-02-06 12:59:03 +03:00