Commit graph

114 commits

Author SHA1 Message Date
Kenneth Gitere
d4a23088a9 test: add cli tests 2021-08-24 07:20:23 +03:00
Kenneth Gitere
07479afeac refactor: refactor update_imgs_base64
chore: add doc comment on ResourceType alias

fix: add error when image MIME type is invalid on an image
2021-07-28 10:00:45 +03:00
Kenneth Gitere
0b19376f59 test: add tests for html module 2021-07-27 18:43:08 +03:00
Kenneth Gitere
0357eaebb6 fix: fix insert_appendix function when inserting HTML nodes
refactor: remove check for `<head>` in inline_css

The `<head>` element is automatically added when parsing an HTML document,
therefore, the program should panic if it still does not find the `<head>`
element
2021-07-27 18:42:17 +03:00
Kenneth Gitere
9c2232e37f fix: add validation when passing inline-images flag 2021-07-27 18:38:01 +03:00
Kenneth Gitere
40cf5b06c9 chore: update README
chore: bump version
2021-07-24 13:29:55 +03:00
Kenneth Gitere
e6f901eb5a refactor: rename Extractor to Article 2021-07-24 12:43:40 +03:00
Kenneth Gitere
eac28da798 fix: add validation when passing --inline-toc
feat: add coloring when displaying CLI errors
2021-07-24 12:36:33 +03:00
Kenneth Gitere
2f4da824ba feat: add HTML exports with inlining of images
fix: typo fix
refactor: refactor `add_stylesheets` function
2021-07-24 12:08:18 +03:00
Kenneth Gitere
d1d1a0f3f4 feat: add no-css and no-header-css flags for #19
refactor: change to yaml configuration for the CLI

refactor: change all flags to kebab case
2021-07-22 08:50:08 +03:00
Kenneth Gitere
d67169425d fix: fix serialization of element attributes 2021-07-16 07:45:20 +03:00
Kenneth Gitere
92c97ca2cf fix: add .epub extension as fallback
chore: update dependencies and update README
chore: bump version
2021-06-24 08:26:40 +03:00
Kenneth Gitere
754365a42a feat: add inline-toc flag 2021-06-17 17:32:53 +03:00
Kenneth Gitere
c6c10689eb fix: fix broken links in toc generation
the fix involves ensuring the ToC is generated prior to serialization
because it mutates the document and will not work otherwise.

chore: add .vscode config to .gitignore
2021-06-16 18:09:05 +03:00
Kenneth Gitere
282d229754 fix: fix ordering issue with merged articles
This commit adds the itertools crate which is used to dedup the Vec
when downloading urls

fix: fix error message
feat: change the serif and mono fonts declarations
2021-06-11 14:21:41 +03:00
Kenneth Gitere
4247fab1ea feat: add css library for EPUB exports 2021-06-09 08:04:50 +03:00
Kenneth Gitere
d50bbdfb58 fix: minor fixes
- restore default debug level when logging to file
- return early from generating epubs if there are no articles
- fix serialization bug in creating attributes
2021-06-09 07:26:52 +03:00
Kenneth Gitere
8691b0166f fix: fix panic when unwrapping a base URI
chore: add message when downloading articles to a specified output-dir
2021-06-08 20:37:20 +03:00
Kenneth Gitere
5fbfb9c806 refactor: move download function to http module
feat:  add rendering of table for partial downloads
feat:  add help message for enabling --log-to-file
chore: format flags to kebab-case and shorten --output-directory flag
2021-06-08 07:58:52 +03:00
Kenneth Gitere
95bd22f339 Merge branch 'dev' of github.com:hipstermojo/paperoni into dev 2021-06-07 22:44:51 +03:00
Kenneth Gitere
5b41e785b8 Fix get_header_level_toc_vec 2021-06-07 22:42:14 +03:00
Kenneth Gitere
16dc83ac62
Merge pull request #15 from sadsnake42/output-directory
Add `output_dir` to cli argument
2021-06-06 16:01:38 +03:00
Mikhail Gorbachev
67e86e4d74 Refactor LogError 2021-06-06 15:53:47 +03:00
Mikhail Gorbachev
aa9258e122 Fix from PR#15
- refactor comments
- move `cli::Error` to `errors::ErrorCli`
- removed mixing of order of input urls
- move pure functionality if `init_logger` to clear function
2021-06-06 13:25:28 +03:00
Kenneth Gitere
a1156e10fc Add generate_header_ids function
Add h4 to header level ToC and update implementation
Add tests
2021-06-06 13:02:31 +03:00
Kenneth Gitere
8220cf29f7 Change function replace_metadata_value to replace_escaped_characters 2021-06-06 12:59:25 +03:00
Kenneth Gitere
5548ba4ba5 Merge branch 'dev' of github.com:hipstermojo/paperoni into dev 2021-06-06 09:24:17 +03:00
Philip Wrenn
fd161455b4 Removed unwrap to prevent unexpected panic. 2021-06-05 23:17:55 -04:00
Mikhail Gorbachev
13ad14e73d Add output_dir to cli argument
- Add `output_dir` to cli argument
    - This argument allows you to save output files in a special folder, not just current dir
- Refactor 'cli.rs'
    - Add `Builder` for `AppConfig`
    - Add `Error` instead separated panics
- Upgrade dependencies
2021-06-01 18:18:14 +03:00
Kenneth Gitere
8c9783b596 feat: add header level table of contents for articles 2021-05-24 20:40:41 +03:00
Kenneth Gitere
3a8160412c refactor short_summary function in logs.rs to be less redundant 2021-05-24 20:40:41 +03:00
Kenneth Gitere
102304544d
Merge pull request #14 from kxt/13-fix-lazy-images-laziness-check
Fix laziness check in fix_lazy_images
2021-05-12 07:12:46 +03:00
KOVACS Tamas
7649f6aa18 moz_readability/mod.rs: fix laziness check in fix_lazy_images
fix_lazy_images checks whether an img node is lazily loaded. An img is
considered lazily loaded if it does not have an src/srcset attribute, or
if it's class contains the 'lazy' string. If an img is considered lazy,
fix_lazy_images will make attempts to replace it's src.

However, if an img was missing the class attribute, it was incorrectly
assumed to be lazy and had it's src replaced.

Fixes hipstermojo/paperoni#13
2021-05-10 10:08:33 +02:00
KOVACS Tamas
d50f08b875 moz_readability/mod.rs: add testcase for issue #13
This patch adds a testcase for issue #13, where an img node without
a class attribute is automatically assumed to be lazy and its src is
replaced.
2021-05-10 10:08:25 +02:00
KOVACS Tamas
8ec491ff06 http.rs: check response status for fetched images
This patch checks if fetching an image resulted in a non-success status
code. In case of non-success status, the response is discarded and an
error is emitted.

This relies on having 3xx codes already handled by surf's Redirect
middleware, so we should see 4xx and 5xx codes here.

Fixes hipstermojo/paperoni#11
2021-05-09 14:35:55 +02:00
KOVACS Tamas
4581f07330 http.rs: extract process_img_response function 2021-05-08 21:32:15 +02:00
Kenneth Gitere
4fd71311a1 Fix bug when validating the download file name in merged mode 2021-04-30 07:47:25 +03:00
Kenneth Gitere
cae9227ab0 Update documentation 2021-04-30 06:55:02 +03:00
Kenneth Gitere
c00582ac29 Fix verbosity levels ordering 2021-04-30 06:42:08 +03:00
Kenneth Gitere
ae52cc4e13 Add features for logging and cli
- display of partial downloads in the summary
- custom file name that is displayed after the summary ensuring it is visible
- log-to-file flag which specifies that logs will be sent to the default directory
- verbose flag (v) used to configure the log levels
- disabling the progress bars when logging to the terminal is active
2021-04-29 20:02:08 +03:00
Kenneth Gitere
00d704fdd6 Move initializing logger to logs module 2021-04-28 07:47:45 +03:00
Kenneth Gitere
36c3eb65c6 Add appendix page for listing the source of the article 2021-04-28 07:46:07 +03:00
Kenneth Gitere
088699b2c3 Add debug flag 2021-04-24 15:50:43 +03:00
Kenneth Gitere
a9787d7b5a Add colored output and configuring of a paperoni root directory for logs 2021-04-24 15:13:44 +03:00
Kenneth Gitere
65f8ebda56 Add logs crate for dealing with printing out the final download summary 2021-04-24 13:58:03 +03:00
Kenneth Gitere
a3de3fb6ff Add ImgError struct for representing errors in downloading article images 2021-04-24 13:57:06 +03:00
Kenneth Gitere
910c45abf7 Add logging configured to send to a file by default 2021-04-24 13:56:02 +03:00
Kenneth Gitere
c0323a6ae4 Minor refactor and add non zero exit upon failure to download any article
- Move printing of the successfully downloaded articles into main.rs
- Add summary text
2021-04-24 09:00:18 +03:00
Kenneth Gitere
b496abb576 Fix serialization issue with poorly defined attribute names 2021-04-22 19:00:32 +03:00
Kenneth Gitere
313041a109 Update dependencies and restore redirect middleware in download_images 2021-04-22 18:01:23 +03:00