- refactor comments
- move `cli::Error` to `errors::ErrorCli`
- removed mixing of order of input urls
- move pure functionality if `init_logger` to clear function
- Add `output_dir` to cli argument
- This argument allows you to save output files in a special folder, not just current dir
- Refactor 'cli.rs'
- Add `Builder` for `AppConfig`
- Add `Error` instead separated panics
- Upgrade dependencies
fix_lazy_images checks whether an img node is lazily loaded. An img is
considered lazily loaded if it does not have an src/srcset attribute, or
if it's class contains the 'lazy' string. If an img is considered lazy,
fix_lazy_images will make attempts to replace it's src.
However, if an img was missing the class attribute, it was incorrectly
assumed to be lazy and had it's src replaced.
Fixeshipstermojo/paperoni#13
This patch checks if fetching an image resulted in a non-success status
code. In case of non-success status, the response is discarded and an
error is emitted.
This relies on having 3xx codes already handled by surf's Redirect
middleware, so we should see 4xx and 5xx codes here.
Fixeshipstermojo/paperoni#11
- display of partial downloads in the summary
- custom file name that is displayed after the summary ensuring it is visible
- log-to-file flag which specifies that logs will be sent to the default directory
- verbose flag (v) used to configure the log levels
- disabling the progress bars when logging to the terminal is active
- swap unwrap for if let statement in `get_article_metadata`
- add default when extracting the title from a possible `<title>` element
- fix extracting alternative titles from h1 tags
- Add ReadabilityError field
- Refactor `article` getter in Extractor to return a &NodeRef. This
relies on the assumption that the article has already been parsed
and should otherwise panic.
- Map errors in `fetch_html` to include the source url
- Change `article_link` to `article_source`
- Add `Into` conversion for `UTF8Error`
- Collect errors in `generate_epubs` for displaying in a table
Using this custom error type, many instances of unwrap are replaced
with mapping to errors that are then logged in main.rs. This allows
paperoni to stop crashing when downloading articles when the errors
are possibly recoverable or should not affect other downloads.
This subsequently introduces ignoring the failed image downloads
and instead leaving the original URLs intact.
The code for title retrieval previously assumed that meta tags concerned
with the title would always contain a value but some sites leave the value
empty thus it had to be checked for as well.
This change allows for parallel downloads of HTML pages upto a maximum
number of concurrent HTTP requests which is more efficient than
before where all HTTP requests are likely to begin at the same time.