2020-10-22 18:00:43 +01:00
< p align = "center" > < img src = "./paperoni-dark.png" > < / p >
2020-10-22 14:03:57 +01:00
2020-10-22 17:10:11 +01:00
< p align = "center" > < i > Salami not included< / i > < / p >
2020-10-22 14:03:57 +01:00
2021-05-13 10:26:23 +01:00
< div align = "center" >
< a href = "https://crates.io/crates/paperoni" >
< img alt = "crates.io version" src = "https://img.shields.io/crates/v/paperoni.svg" >
< / a >
< / div >
2021-04-30 04:55:02 +01:00
Paperoni is a CLI tool made in Rust for downloading web articles as EPUBs.
2020-10-22 14:03:57 +01:00
2021-02-24 10:17:13 +00:00
> This project is in an alpha release so it might crash when you use it. Please open an [issue on Github](https://github.com/hipstermojo/paperoni/issues/new) if it does crash.
## Installation
### Precompiled binaries
Check the [releases ](https://github.com/hipstermojo/paperoni/releases ) page for precompiled binaries. Currently there are only builds for Debian and Arch.
### Installing from crates.io
Paperoni is published on [crates.io ](https://crates.io ). If you have [cargo ](https://github.com/rust-lang/cargo ) installed, then run:
```sh
2021-05-24 18:28:23 +01:00
cargo install paperoni --version 0.4.1-alpha1
2021-02-24 10:17:13 +00:00
```
_Paperoni is still in alpha so the `version` flag has to be passed._
### Building from source
This project uses `async/.await` so it should be compiled using a minimum Rust version of 1.33. Preferrably use the latest version of Rust.
```sh
git clone https://github.com/hipstermojo/paperoni.git
cd paperoni
## You can build and install paperoni locally
cargo install --path .
## or use it from within the project
cargo run -- # pass your url here
```
2020-10-22 14:03:57 +01:00
## Usage
2021-04-30 04:55:02 +01:00
```
USAGE:
paperoni [OPTIONS] [urls]...
OPTIONS:
2021-06-08 05:42:30 +01:00
-f, --file < file >
2021-06-06 11:20:08 +01:00
Input file containing links
2021-06-08 05:42:30 +01:00
-h, --help
2021-06-06 11:20:08 +01:00
Prints help information
2021-06-01 10:23:22 +01:00
--log-to-file
Enables logging of events to a file located in .paperoni/logs with a default log level of debug. Use -v to
specify the logging level
2021-06-08 05:42:30 +01:00
--max-conn < max_conn >
2021-06-06 11:20:08 +01:00
The maximum number of concurrent HTTP connections when downloading articles. Default is 8.
NOTE: It is advised to use as few connections as needed i.e between 1 and 50. Using more connections can end
up overloading your network card with too many concurrent requests.
2021-06-08 05:42:30 +01:00
-o, --output-dir < output_directory >
2021-06-06 11:20:08 +01:00
Directory for saving epub documents
2021-06-01 10:23:22 +01:00
2021-06-08 05:42:30 +01:00
--merge < output_name >
2021-06-06 11:20:08 +01:00
Merge multiple articles into a single epub that will be given the name provided
2021-06-01 10:23:22 +01:00
2021-06-08 05:42:30 +01:00
-V, --version
2021-06-06 11:20:08 +01:00
Prints version information
-v
This takes upto 4 levels of verbosity in the following order.
- Error (-v)
- Warn (-vv)
- Info (-vvv)
- Debug (-vvvv)
When this flag is passed, it disables the progress bars and logs to stderr.
If you would like to send the logs to a file (and enable progress bars), pass the log-to-file flag.
2021-04-30 04:55:02 +01:00
ARGS:
2021-06-08 05:42:30 +01:00
< urls > ...
2021-06-06 11:20:08 +01:00
Urls of web articles
2021-04-30 04:55:02 +01:00
```
To download a single article pass in its URL
2020-10-22 14:03:57 +01:00
```sh
paperoni https://en.wikipedia.org/wiki/Pepperoni
```
2021-02-09 07:33:02 +00:00
Paperoni also supports passing multiple links as arguments.
2021-02-01 08:28:07 +00:00
```sh
2021-02-09 07:33:02 +00:00
paperoni https://en.wikipedia.org/wiki/Pepperoni https://en.wikipedia.org/wiki/Salami
2021-02-01 08:28:07 +00:00
```
Alternatively, if you are on a Unix-like OS, you can simply do something like this:
2020-10-22 14:03:57 +01:00
```sh
cat links.txt | xargs paperoni
```
2021-02-24 10:17:13 +00:00
These can also be read from a file using the `-f/--file` flag.
2021-02-09 07:33:02 +00:00
```sh
paperoni -f links.txt
```
2021-02-24 10:17:13 +00:00
### Merging articles
By default, Paperoni generates an epub file for each link. You can also merge multiple links
into a single epub using the `merge` flag and specifying the output file.
```sh
paperoni -f links.txt --merge out.epub
```
2021-04-30 04:55:02 +01:00
### Logging events
Logging is disabled by default. This can be activated by either using the `-v` flag or `--log-to-file` flag. If the `--log-to-file` flag is passed the logs are sent to a file in the default Paperoni directory `.paperoni/logs` which is on your home directory. The `-v` flag configures the verbosity levels such that:
```
-v Logs only the error level
-vv Logs only the warn level
-vvv Logs only the info level
-vvvv Logs only the debug level
```
If only the `-v` flag is passed, the progress bars are disabled. If both `-v` and `--log-to-file` are passed then the progress bars will still be shown.
2020-10-22 14:03:57 +01:00
## How it works
The URL passed to Paperoni is fetched and the returned HTML response is passed to the extractor.
2021-04-30 04:55:02 +01:00
This extractor retrieves a possible article using a [custom port ](https://github.com/hipstermojo/paperoni/blob/master/src/moz_readability/mod.rs ) of the [Mozilla Readability algorithm ](https://github.com/mozilla/readability ). This article is then saved in an EPUB.
2020-10-22 14:03:57 +01:00
> The port of the algorithm is still unstable as well so it is not fully compatible with all the websites that can be extracted using Readability.
## How it (currently) doesn't work
2021-02-09 07:33:02 +00:00
This program is still in alpha so a number of things won't work:
2020-10-22 14:03:57 +01:00
- Websites that only run with JavaScript cannot be extracted.
- Website articles that cannot be extracted by Readability cannot be extracted by Paperoni either.
2021-02-09 07:33:02 +00:00
- Code snippets on Medium articles that are lazy loaded will not appear in the EPUB.
2021-04-30 04:55:02 +01:00
There are also web pages it won't work on in general such as Twitter and Reddit threads.