No description
Find a file
2021-06-06 12:59:25 +03:00
src Change function replace_metadata_value to replace_escaped_characters 2021-06-06 12:59:25 +03:00
test_html Add moz_readability initial code and accompanying unit tests 2020-08-31 19:30:09 +03:00
.gitignore Add logging configured to send to a file by default 2021-04-24 13:56:02 +03:00
Cargo.lock Update version 2021-05-24 20:33:05 +03:00
Cargo.toml Update version 2021-05-24 20:33:05 +03:00
LICENSE Initial commit 2020-04-30 08:06:07 +03:00
paperoni-dark.png Add README 2020-10-22 16:03:57 +03:00
README.md Update version 2021-05-24 20:33:05 +03:00

Salami not included

Paperoni is a CLI tool made in Rust for downloading web articles as EPUBs.

This project is in an alpha release so it might crash when you use it. Please open an issue on Github if it does crash.

Installation

Precompiled binaries

Check the releases page for precompiled binaries. Currently there are only builds for Debian and Arch.

Installing from crates.io

Paperoni is published on crates.io. If you have cargo installed, then run:

cargo install paperoni --version 0.4.1-alpha1

Paperoni is still in alpha so the version flag has to be passed.

Building from source

This project uses async/.await so it should be compiled using a minimum Rust version of 1.33. Preferrably use the latest version of Rust.

git clone https://github.com/hipstermojo/paperoni.git
cd paperoni
## You can build and install paperoni locally
cargo install --path .
## or use it from within the project
cargo run -- # pass your url here

Usage

USAGE:
    paperoni [OPTIONS] [urls]...

OPTIONS:
    -f, --file <file>            Input file containing links
    -h, --help                   Prints help information
        --log-to-file            Enables logging of events to a file located in .paperoni/logs with a default log level
                                 of debug. Use -v to specify the logging level
        --max_conn <max_conn>    The maximum number of concurrent HTTP connections when downloading articles. Default is
                                 8
        --merge <output_name>    Merge multiple articles into a single epub
    -V, --version                Prints version information
    -v                           Enables logging of events and set the verbosity level. Use -h to read on its usage

ARGS:
    <urls>...    Urls of web articles

To download a single article pass in its URL

paperoni https://en.wikipedia.org/wiki/Pepperoni

Paperoni also supports passing multiple links as arguments.

paperoni https://en.wikipedia.org/wiki/Pepperoni https://en.wikipedia.org/wiki/Salami

Alternatively, if you are on a Unix-like OS, you can simply do something like this:

cat links.txt | xargs paperoni

These can also be read from a file using the -f/--file flag.

paperoni -f links.txt

Merging articles

By default, Paperoni generates an epub file for each link. You can also merge multiple links into a single epub using the merge flag and specifying the output file.

paperoni -f links.txt --merge out.epub

Logging events

Logging is disabled by default. This can be activated by either using the -v flag or --log-to-file flag. If the --log-to-file flag is passed the logs are sent to a file in the default Paperoni directory .paperoni/logs which is on your home directory. The -v flag configures the verbosity levels such that:

-v Logs only the error level
-vv Logs only the warn level
-vvv Logs only the info level
-vvvv Logs only the debug level

If only the -v flag is passed, the progress bars are disabled. If both -v and --log-to-file are passed then the progress bars will still be shown.

How it works

The URL passed to Paperoni is fetched and the returned HTML response is passed to the extractor. This extractor retrieves a possible article using a custom port of the Mozilla Readability algorithm. This article is then saved in an EPUB.

The port of the algorithm is still unstable as well so it is not fully compatible with all the websites that can be extracted using Readability.

How it (currently) doesn't work

This program is still in alpha so a number of things won't work:

  • Websites that only run with JavaScript cannot be extracted.
  • Website articles that cannot be extracted by Readability cannot be extracted by Paperoni either.
  • Code snippets on Medium articles that are lazy loaded will not appear in the EPUB.

There are also web pages it won't work on in general such as Twitter and Reddit threads.