I've switched to using https://readeck.org/en/, even though it isn't a CLI, it does produce very nice epubs
This repository has been archived on 2024-11-22. You can view files and clone it, but cannot push or open issues or pull requests.
Find a file
Kenneth Gitere ef3efdba81 Refactor to use temp directory and update surf
Change from using res directory for image downloads to using temp directories.
Update surf to v2 which required changing the way Content-Type headers are
read from.
2020-11-23 13:38:58 +03:00
src Refactor to use temp directory and update surf 2020-11-23 13:38:58 +03:00
test_html Add moz_readability initial code and accompanying unit tests 2020-08-31 19:30:09 +03:00
.gitignore Merge branch 'master' into dev 2020-05-16 10:35:47 +03:00
Cargo.lock Refactor to use temp directory and update surf 2020-11-23 13:38:58 +03:00
Cargo.toml Refactor to use temp directory and update surf 2020-11-23 13:38:58 +03:00
LICENSE Initial commit 2020-04-30 08:06:07 +03:00
paperoni-dark.png Add README 2020-10-22 16:03:57 +03:00
README.md Resize logo 2020-10-24 08:20:47 +03:00

Salami not included

Paperoni is a web article downloader written in Rust. The downloaded articles are then exported as EPUB files.

This project is in an alpha release so it is pretty unstable.

Usage

paperoni https://en.wikipedia.org/wiki/Pepperoni

Paperoni also supports passing multiple links as arguments. If you are on a Unix-like OS, you can simply do something like this:

cat links.txt | xargs paperoni

How it works

The URL passed to Paperoni is fetched and the returned HTML response is passed to the extractor. This extractor retrieves a possible article using a port of the Mozilla Readability algorithm. This article is then saved in an EPUB.

The port of the algorithm is still unstable as well so it is not fully compatible with all the websites that can be extracted using Readability.

How it (currently) doesn't work

This program is still in alpha so a number of things currently break:

  • Links with redirects will crash the program as it has no redirect logic.
  • Websites that only run with JavaScript cannot be extracted.
  • Website articles that cannot be extracted by Readability cannot be extracted by Paperoni either.

Running locally

Precompiled binaries

Check the releases page for precompiled binaries. Currently there are only builds for Debian and Arch.

Building from source

This project uses async/.await so it should be compiled using a minimum Rust version of 1.33. Preferrably use the latest version of Rust.

git clone https://github.com/hipstermojo/paperoni.git
cd paperoni
## You can build and install paperoni locally
cargo install --path .
## or use it from within the project
cargo run -- # pass your url here