Kenneth Gitere
|
ab800d0174
|
Bug fix and add printing of the name of the extracted EPUB
The fix prevents creating the res directory if it already exists
|
2020-11-23 09:06:13 +03:00 |
|
Kenneth Gitere
|
be48cc1e47
|
Fix alignment in README
Update manifest file
Add fix in serialized file to have self closing tags which is invalid
xhtml
|
2020-10-22 19:18:18 +03:00 |
|
Kenneth Gitere
|
1b4c4ee658
|
Change CLI option to allow for multiple arguments
Add basic looping in async runtime
|
2020-10-22 15:22:56 +03:00 |
|
Kenneth Gitere
|
db11e78d8c
|
Add template for epub output
Change output format to name file with the title name
Add getters in MetaData
|
2020-10-22 13:55:02 +03:00 |
|
Kenneth Gitere
|
703de7e3bf
|
Merge the readability module with the rest of the extractor
|
2020-10-22 12:12:30 +03:00 |
|
Kenneth Gitere
|
75018894ae
|
Add regexes module in moz_readability that contains the regular
expressions used. For optimal performance, the regular expresions
are compiled to static values to prevent recompiling in loops
|
2020-10-15 22:25:10 +03:00 |
|
Kenneth Gitere
|
e1debf5630
|
Add moz_readability initial code and accompanying unit tests
This currently contains the preprocessing code of the Readability.
It is a port of Readability.js by Mozilla.
|
2020-08-31 19:30:09 +03:00 |
|
Kenneth Gitere
|
9f56c58dd9
|
Add simple CLI wrapper
|
2020-05-16 10:09:44 +03:00 |
|
Kenneth Gitere
|
271d3c8951
|
Change download code to save images to a folder
Add downloaded images to the output epub file
|
2020-05-05 12:24:11 +03:00 |
|
Kenneth Gitere
|
4e8812c1ee
|
Add first attempt to save an epub file
|
2020-05-02 19:25:31 +03:00 |
|
Kenneth Gitere
|
e5a318282d
|
Update img tags with new src values to point to the local files
|
2020-05-02 19:06:03 +03:00 |
|
Kenneth Gitere
|
78ba40f57a
|
Add image download functionality
|
2020-05-02 18:33:45 +03:00 |
|
Kenneth Gitere
|
f24e72e70f
|
Change signature of extract_content to copy the reference to article DOM
node instead of writing to file
|
2020-05-02 14:51:53 +03:00 |
|
Kenneth Gitere
|
529704d227
|
Add test for extract content
|
2020-05-01 20:42:41 +03:00 |
|
Kenneth Gitere
|
b5336e078d
|
Factor out text extraction into extractor module
|
2020-05-01 16:17:59 +03:00 |
|
Kenneth Gitere
|
4527fb07d9
|
Initial extraction code to get meta information on a blog
|
2020-04-30 11:05:53 +03:00 |
|