kemitix/paperoni: I've switched to using https://readeck.org/en/, even though it isn't a CLI, it does produce very nice epubs

Archived

I've switched to using https://readeck.org/en/, even though it isn't a CLI, it does produce very nice epubs

This repository has been archived on 2024-11-22. You can view files and clone it, but cannot push or open issues or pull requests.

Find a file

Kenneth Gitere 7e9dcfc2b7 Add custom error types and ignore failed image downloads Using this custom error type, many instances of unwrap are replaced with mapping to errors that are then logged in main.rs. This allows paperoni to stop crashing when downloading articles when the errors are possibly recoverable or should not affect other downloads. This subsequently introduces ignoring the failed image downloads and instead leaving the original URLs intact.		2021-04-17 12:04:06 +03:00
src	Add custom error types and ignore failed image downloads	2021-04-17 12:04:06 +03:00
test_html	Add moz_readability initial code and accompanying unit tests	2020-08-31 19:30:09 +03:00
.gitignore	Merge branch 'master' into dev	2020-05-16 10:35:47 +03:00
Cargo.lock	Add custom error types and ignore failed image downloads	2021-04-17 12:04:06 +03:00
Cargo.toml	Add custom error types and ignore failed image downloads	2021-04-17 12:04:06 +03:00
LICENSE	Initial commit	2020-04-30 08:06:07 +03:00
paperoni-dark.png	Add README	2020-10-22 16:03:57 +03:00
README.md	Update README	2021-02-24 13:27:43 +03:00

README.md

Salami not included

Paperoni is a web article downloader written in Rust. The downloaded articles are then exported as EPUB files.

This project is in an alpha release so it might crash when you use it. Please open an issue on Github if it does crash.

Installation

Precompiled binaries

Check the releases page for precompiled binaries. Currently there are only builds for Debian and Arch.

Installing from crates.io

Paperoni is published on crates.io. If you have cargo installed, then run:

cargo install paperoni --version 0.3.0-alpha1

Paperoni is still in alpha so the version flag has to be passed.

Building from source

This project uses async/.await so it should be compiled using a minimum Rust version of 1.33. Preferrably use the latest version of Rust.

git clone https://github.com/hipstermojo/paperoni.git
cd paperoni
## You can build and install paperoni locally
cargo install --path .
## or use it from within the project
cargo run -- # pass your url here

Usage

paperoni https://en.wikipedia.org/wiki/Pepperoni

Paperoni also supports passing multiple links as arguments.

paperoni https://en.wikipedia.org/wiki/Pepperoni https://en.wikipedia.org/wiki/Salami

Alternatively, if you are on a Unix-like OS, you can simply do something like this:

cat links.txt | xargs paperoni

These can also be read from a file using the -f/--file flag.

paperoni -f links.txt

Merging articles

By default, Paperoni generates an epub file for each link. You can also merge multiple links into a single epub using the merge flag and specifying the output file.

paperoni -f links.txt --merge out.epub

How it works

The URL passed to Paperoni is fetched and the returned HTML response is passed to the extractor. This extractor retrieves a possible article using a port of the Mozilla Readability algorithm. This article is then saved in an EPUB.

The port of the algorithm is still unstable as well so it is not fully compatible with all the websites that can be extracted using Readability.

How it (currently) doesn't work

This program is still in alpha so a number of things won't work:

Websites that only run with JavaScript cannot be extracted.
Website articles that cannot be extracted by Readability cannot be extracted by Paperoni either.
Code snippets on Medium articles that are lazy loaded will not appear in the EPUB.