Commit graph

65 commits

Author SHA1 Message Date
Kenneth Gitere
003953332f Refactor downloading of HTML pages
This change allows for parallel downloads of HTML pages upto a maximum
number of concurrent HTTP requests which is more efficient than
before where all HTTP requests are likely to begin at the same time.
2021-02-06 17:06:03 +03:00
Kenneth Gitere
b98c0a69a6 Bump version 2021-01-24 17:54:33 +03:00
Kenneth Gitere
8407c613df Bug fixes
- Prevent downloading images with base64 strings as the source
- Add escaping of quotation characters in the serializer
- Disable redirects when downloading images which fails on multiple sites
- Remove invalid characters for making the epub export file name
- Fix version number in release
2020-12-24 14:03:36 +03:00
Kenneth Gitere
3bfa82ba60 Update README and version 2020-11-24 18:39:51 +03:00
Kenneth Gitere
37cb4e1fd2 Change from structopt to clap
This allows printing the help message if no args are passed
2020-11-24 09:58:50 +03:00
Kenneth Gitere
aff4054ca9 Update crates and fix bugs
The bug fixes are for:
- <base> elements with "/" as the href
- articles containing an ampersand in the title which would create
  corrupted manifest files.
2020-11-23 15:55:58 +03:00
Kenneth Gitere
ef3efdba81 Refactor to use temp directory and update surf
Change from using res directory for image downloads to using temp directories.
Update surf to v2 which required changing the way Content-Type headers are
read from.
2020-11-23 13:38:58 +03:00
Kenneth Gitere
b0e402d685 Resize logo 2020-10-24 08:20:47 +03:00
Kenneth Gitere
be48cc1e47 Fix alignment in README
Update manifest file
Add fix in serialized file to have self closing tags which is invalid
xhtml
2020-10-22 19:18:18 +03:00
Kenneth Gitere
87ff21b676 Add regex and lazy_static crates 2020-10-07 20:44:35 +03:00
Kenneth Gitere
e1debf5630 Add moz_readability initial code and accompanying unit tests
This currently contains the preprocessing code of the Readability.
It is a port of Readability.js by Mozilla.
2020-08-31 19:30:09 +03:00
Kenneth Gitere
9f56c58dd9 Add simple CLI wrapper 2020-05-16 10:09:44 +03:00
Kenneth Gitere
4e8812c1ee Add first attempt to save an epub file 2020-05-02 19:25:31 +03:00
Kenneth Gitere
78ba40f57a Add image download functionality 2020-05-02 18:33:45 +03:00
Kenneth Gitere
4527fb07d9 Initial extraction code to get meta information on a blog 2020-04-30 11:05:53 +03:00