diff --git a/Cargo.lock b/Cargo.lock index 2824333..a40cb8e 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1552,7 +1552,7 @@ dependencies = [ [[package]] name = "paperoni" -version = "0.5.0-alpha1" +version = "0.6.0-alpha1" dependencies = [ "async-std", "base64", diff --git a/Cargo.toml b/Cargo.toml index fced683..ec637c9 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -3,7 +3,7 @@ description = "A web article downloader" homepage = "https://github.com/hipstermojo/paperoni" repository = "https://github.com/hipstermojo/paperoni" name = "paperoni" -version = "0.5.0-alpha1" +version = "0.6.0-alpha1" authors = ["Kenneth Gitere "] edition = "2018" license = "MIT" diff --git a/README.md b/README.md index 5c547b0..8f08536 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ -Paperoni is a CLI tool made in Rust for downloading web articles as EPUBs. There is provisional\* support for exporting to PDF as well. +Paperoni is a CLI tool made in Rust for downloading web articles as EPUB or HTML files. There is provisional\* support for exporting to PDF as well. > This project is in an alpha release so it might crash when you use it. Please open an [issue on Github](https://github.com/hipstermojo/paperoni/issues/new) if it does crash. @@ -23,7 +23,7 @@ Check the [releases](https://github.com/hipstermojo/paperoni/releases) page for Paperoni is published on [crates.io](https://crates.io). If you have [cargo](https://github.com/rust-lang/cargo) installed, then run: ```sh -cargo install paperoni --version 0.5.0-alpha1 +cargo install paperoni --version 0.6.0-alpha1 ``` _Paperoni is still in alpha so the `version` flag has to be passed._ @@ -48,39 +48,54 @@ USAGE: paperoni [OPTIONS] [urls]... OPTIONS: + --export + Specify the file type of the export. The type must be in lower case. [default: epub] [possible values: + html, epub] -f, --file Input file containing links -h, --help Prints help information + --inline-images + Inlines the article images when exporting to HTML using base64. + This is used when you do not want a separate folder created for images during HTML export. + NOTE: It uses base64 encoding on the images which results in larger HTML export sizes as each image + increases in size by about 25%-33%. --inline-toc - Add an inlined Table of Contents page at the start of the merged article. - + Add an inlined Table of Contents page at the start of the merged article. This does not affect the Table of Contents navigation --log-to-file Enables logging of events to a file located in .paperoni/logs with a default log level of debug. Use -v to specify the logging level - --max-conn + --max-conn The maximum number of concurrent HTTP connections when downloading articles. Default is 8. NOTE: It is advised to use as few connections as needed i.e between 1 and 50. Using more connections can end up overloading your network card with too many concurrent requests. - -o, --output-dir - Directory for saving epub documents - - --merge + --no-css + Removes the stylesheets used in the EPUB generation. + The EPUB file will then be laid out based on your e-reader's default stylesheets. + Images and code blocks may overflow when this flag is set and layout of generated + PDFs will be affected. Use --no-header-css if you want to only disable the styling on headers. + --no-header-css + Removes the header CSS styling but preserves styling of images and codeblocks. To remove all the default + CSS, use --no-css instead. + --merge Merge multiple articles into a single epub that will be given the name provided + -o, --output-dir + Directory to store output epub documents + -V, --version Prints version information -v This takes upto 4 levels of verbosity in the following order. - - Error (-v) - - Warn (-vv) - - Info (-vvv) - - Debug (-vvvv) - When this flag is passed, it disables the progress bars and logs to stderr. - If you would like to send the logs to a file (and enable progress bars), pass the log-to-file flag. + - Error (-v) + - Warn (-vv) + - Info (-vvv) + - Debug (-vvvv) + When this flag is passed, it disables the progress bars and logs to stderr. + If you would like to send the logs to a file (and enable progress bars), pass the log-to-file flag. ARGS: ... @@ -112,6 +127,41 @@ These can also be read from a file using the `-f/--file` flag. paperoni -f links.txt ``` +### Exporting articles + +By default, Paperoni exports to EPUB files but you can change to HTML by passing the `--export html` flag. + +```sh +paperoni https://en.wikipedia.org/wiki/Pepperoni --export html +``` + +HTML exports allow you to read the articles as plain HTML documents on your browser but can also be used to convert to PDF as explained [here](#). + +When exporting to HTML, Paperoni will download the article's images to a folder named similar to the article. Therefore the folder structure would look like this for the command ran above: + +``` +. +├── Pepperoni - Wikipedia +│ ├── 1a9f886e9b58db72e0003a2cd52681d8.png +│ ├── 216f8a4265a1ceb3f8cfba4c2f9057b1.jpeg +│ ... +└── Pepperoni - Wikipedia.html +``` + +If you would instead prefer to have the images inlined directly to the HTML export, pass the `inline-images` flag, i.e.: + +```sh +paperoni https://en.wikipedia.org/wiki/Pepperoni --export html --inline-images +``` + +This is especially useful when exporting multiple links. + +**NOTE**: The inlining of images for HTML exports uses base64 encoding which is known to increase the overall size of images by about 25% to 33%. + +### Disabling CSS + +The `no-css` and `no-header-css` flags can be used to remove the default styling added by Paperoni. Refer to `--help` to see the usage of the flags. + ### Merging articles By default, Paperoni generates an epub file for each link. You can also merge multiple links @@ -153,7 +203,11 @@ There are also web pages it won't work on in general such as Twitter and Reddit ## PDF exports -As of version 0.5-alpha1, you can now export to PDF using a third party tool. This requires that you install [Calibre](https://calibre-ebook.com/) which comes with a ebook conversion. You can convert the epub to a pdf through the terminal with `ebook-convert`: +PDF conversion can be done using a third party tool. There are 2 options to do so: + +### EPUB to PDF + +This requires that you install [Calibre](https://calibre-ebook.com/) which comes with a ebook conversion. You can convert the epub to a pdf through the terminal with `ebook-convert`: ```sh # Assuming the downloaded epub was called foo.epub @@ -161,3 +215,25 @@ ebook-convert foo.epub foo.pdf ``` Alternatively, you can use the Calibre GUI to do the file conversion. + +### HTML to PDF + +The recommended approach is to use [Weasyprint](https://weasyprint.org/start/), a free and open-source tool that converts HTML documents to PDF. It is available on Linux, MacOS and Windows. Using the CLI, it can be done as follows: + +```sh +paperoni https://en.wikipedia.org/wiki/Pepperoni --export html +weasyprint "Pepperoni - Wikipedia.html" Pepperoni.pdf +``` + +Inlining images is not mandatory as Weasyprint will be able to find the files on its own. + +### Comparison of PDF conversion methods + +Either of the conversion methods is sufficient for most use cases. The main differences are listed below: +| | EPUB to PDF | HTML to PDF | +|----------------------|----------------------------|------------------| +| Wrapping code blocks | Yes | No | +| CSS customization | No | Yes | +| Generated file size | Slightly larger | Slightly smaller | + +The difference in file size is due to the additional fonts added to the PDF file by `ebook-convert`. diff --git a/src/cli_config.yml b/src/cli_config.yml index 88be611..4f86d52 100644 --- a/src/cli_config.yml +++ b/src/cli_config.yml @@ -49,7 +49,7 @@ args: long: inline-toc requires: output-name help: Add an inlined Table of Contents page at the start of the merged article. - long_help: Add an inlined Table of Contents page at the start of the merged article. This does not affect the Table of Contents navigation" + long_help: Add an inlined Table of Contents page at the start of the merged article. This does not affect the Table of Contents navigation - no-css: long: no-css conflicts_with: no-header-css