paperoni

Archived

Author	SHA1	Message	Date
Kenneth Gitere	16dc83ac62	Merge pull request #15 from sadsnake42/output-directory Add `output_dir` to cli argument	2021-06-06 16:01:38 +03:00
Mikhail Gorbachev	67e86e4d74	Refactor `LogError`	2021-06-06 15:53:47 +03:00
Mikhail Gorbachev	aa9258e122	Fix from PR#15 - refactor comments - move `cli::Error` to `errors::ErrorCli` - removed mixing of order of input urls - move pure functionality if `init_logger` to clear function	2021-06-06 13:25:28 +03:00
Kenneth Gitere	751b5702fe	Merge pull request #17 from philwrenn/dev Removed unwrap to prevent unexpected panic.	2021-06-06 09:23:01 +03:00
Philip Wrenn	fd161455b4	Removed unwrap to prevent unexpected panic.	2021-06-05 23:17:55 -04:00
Mikhail Gorbachev	13ad14e73d	Add `output_dir` to cli argument - Add `output_dir` to cli argument - This argument allows you to save output files in a special folder, not just current dir - Refactor 'cli.rs' - Add `Builder` for `AppConfig` - Add `Error` instead separated panics - Upgrade dependencies	2021-06-01 18:18:14 +03:00
Kenneth Gitere	1cbbc7527f	Update version	2021-05-24 20:33:05 +03:00
Kenneth Gitere	c916fb8493	Edit README	2021-05-13 12:26:23 +03:00
Kenneth Gitere	5ccbe1a17a	Merge branch 'dev' of github.com:hipstermojo/paperoni into dev	2021-05-13 12:25:11 +03:00
Kenneth Gitere	102304544d	Merge pull request #14 from kxt/13-fix-lazy-images-laziness-check Fix laziness check in fix_lazy_images	2021-05-12 07:12:46 +03:00
KOVACS Tamas	7649f6aa18	moz_readability/mod.rs: fix laziness check in fix_lazy_images fix_lazy_images checks whether an img node is lazily loaded. An img is considered lazily loaded if it does not have an src/srcset attribute, or if it's class contains the 'lazy' string. If an img is considered lazy, fix_lazy_images will make attempts to replace it's src. However, if an img was missing the class attribute, it was incorrectly assumed to be lazy and had it's src replaced. Fixes hipstermojo/paperoni#13	2021-05-10 10:08:33 +02:00
KOVACS Tamas	d50f08b875	moz_readability/mod.rs: add testcase for issue #13 This patch adds a testcase for issue #13, where an img node without a class attribute is automatically assumed to be lazy and its src is replaced.	2021-05-10 10:08:25 +02:00
Kenneth Gitere	312dff95e2	Merge pull request #12 from kxt/11-image-status-codes Check response status for fetched images	2021-05-10 10:58:23 +03:00
KOVACS Tamas	8ec491ff06	http.rs: check response status for fetched images This patch checks if fetching an image resulted in a non-success status code. In case of non-success status, the response is discarded and an error is emitted. This relies on having 3xx codes already handled by surf's Redirect middleware, so we should see 4xx and 5xx codes here. Fixes hipstermojo/paperoni#11	2021-05-09 14:35:55 +02:00
KOVACS Tamas	4581f07330	http.rs: extract process_img_response function	2021-05-08 21:32:15 +02:00
Kenneth Gitere	474d97c6bd	Merge pull request #10 from hipstermojo/dev v0.4.0 release	2021-04-30 08:48:11 +03:00
Kenneth Gitere	538a65f6fd	Update dependencies in lockfile	2021-04-30 08:34:09 +03:00
Kenneth Gitere	f93017ab73	Fix README formatting	2021-04-30 08:29:08 +03:00
Kenneth Gitere	4fd71311a1	Fix bug when validating the download file name in merged mode	2021-04-30 07:47:25 +03:00
Kenneth Gitere	cae9227ab0	Update documentation	2021-04-30 06:55:02 +03:00
Kenneth Gitere	c00582ac29	Fix verbosity levels ordering	2021-04-30 06:42:08 +03:00
Kenneth Gitere	ae52cc4e13	Add features for logging and cli - display of partial downloads in the summary - custom file name that is displayed after the summary ensuring it is visible - log-to-file flag which specifies that logs will be sent to the default directory - verbose flag (v) used to configure the log levels - disabling the progress bars when logging to the terminal is active	2021-04-29 20:02:08 +03:00
Kenneth Gitere	00d704fdd6	Move initializing logger to logs module	2021-04-28 07:47:45 +03:00
Kenneth Gitere	36c3eb65c6	Add appendix page for listing the source of the article	2021-04-28 07:46:07 +03:00
Kenneth Gitere	088699b2c3	Add debug flag	2021-04-24 15:50:43 +03:00
Kenneth Gitere	a9787d7b5a	Add colored output and configuring of a paperoni root directory for logs	2021-04-24 15:13:44 +03:00
Kenneth Gitere	65f8ebda56	Add logs crate for dealing with printing out the final download summary	2021-04-24 13:58:03 +03:00
Kenneth Gitere	a3de3fb6ff	Add ImgError struct for representing errors in downloading article images	2021-04-24 13:57:06 +03:00
Kenneth Gitere	910c45abf7	Add logging configured to send to a file by default	2021-04-24 13:56:02 +03:00
Kenneth Gitere	c0323a6ae4	Minor refactor and add non zero exit upon failure to download any article - Move printing of the successfully downloaded articles into main.rs - Add summary text	2021-04-24 09:00:18 +03:00
Kenneth Gitere	b496abb576	Fix serialization issue with poorly defined attribute names	2021-04-22 19:00:32 +03:00
Kenneth Gitere	313041a109	Update dependencies and restore redirect middleware in `download_images`	2021-04-22 18:01:23 +03:00
Kenneth Gitere	960f114dc6	Minor fixes in moz_readability - swap unwrap for if let statement in `get_article_metadata` - add default when extracting the title from a possible `<title>` element - fix extracting alternative titles from h1 tags	2021-04-21 19:52:41 +03:00
Kenneth Gitere	dbac7c3b69	Refactor `grab_article` to return a Result - Add ReadabilityError field - Refactor `article` getter in Extractor to return a &NodeRef. This relies on the assumption that the article has already been parsed and should otherwise panic.	2021-04-21 19:11:57 +03:00
Kenneth Gitere	ae1ddb9386	Add printing of table for failed article downloads - Map errors in `fetch_html` to include the source url - Change `article_link` to `article_source` - Add `Into` conversion for `UTF8Error` - Collect errors in `generate_epubs` for displaying in a table	2021-04-20 21:33:24 +03:00
Kenneth Gitere	60fb30e8a2	Add url field in Extractor struct	2021-04-20 21:06:54 +03:00
Kenneth Gitere	b217448601	Add printing of tables upon successful extraction	2021-04-20 14:02:56 +03:00
Kenneth Gitere	04a1eed4e2	Add progress indicators for the cli	2021-04-17 17:28:07 +03:00
Kenneth Gitere	217cd3e442	Minor refactor Change cli to grab version from the Cargo manifest Rename fetch_url to fetch_html	2021-04-17 12:37:53 +03:00
Kenneth Gitere	7e9dcfc2b7	Add custom error types and ignore failed image downloads Using this custom error type, many instances of unwrap are replaced with mapping to errors that are then logged in main.rs. This allows paperoni to stop crashing when downloading articles when the errors are possibly recoverable or should not affect other downloads. This subsequently introduces ignoring the failed image downloads and instead leaving the original URLs intact.	2021-04-17 12:04:06 +03:00
Kenneth Gitere	d6cbbe405b	Fix bug in `inline_css_str_to_map`	2021-04-14 18:07:39 +03:00
Kenneth Gitere	2762bc5086	Merge pull request #7 from hipstermojo/dev Update README	2021-02-24 13:28:56 +03:00
Kenneth Gitere	b8c0cf29f1	Update README	2021-02-24 13:27:43 +03:00
Kenneth Gitere	e9f96d2970	Merge pull request #6 from hipstermojo/dev Update to 0.3.0	2021-02-24 13:13:36 +03:00
Kenneth Gitere	165b2187be	Bump version	2021-02-24 13:03:52 +03:00
Kenneth Gitere	912bc9d915	Add flag for configuring maximum concurrent requests Change printing macro for error messages to go out to stderr	2021-02-21 13:11:26 +03:00
Kenneth Gitere	b0c4c47413	Add support for merging articles into a single epub This is still experimental as it lacks validation of the target file name	2021-02-11 13:51:21 +03:00
Kenneth Gitere	f0a610c2ac	Bug fix with empty titles The code for title retrieval previously assumed that meta tags concerned with the title would always contain a value but some sites leave the value empty thus it had to be checked for as well.	2021-02-09 12:56:07 +03:00
Kenneth Gitere	65fdd967c1	Refactor image downloading and update README Image downloads uses streams instead of spawned tasks to ensure that it does not start an unbounded number of spawned tasks	2021-02-09 10:34:35 +03:00
Kenneth Gitere	003953332f	Refactor downloading of HTML pages This change allows for parallel downloads of HTML pages upto a maximum number of concurrent HTTP requests which is more efficient than before where all HTTP requests are likely to begin at the same time.	2021-02-06 17:06:03 +03:00

1 2 3

111 commits