Kenneth Gitere
|
db11e78d8c
|
Add template for epub output
Change output format to name file with the title name
Add getters in MetaData
|
2020-10-22 13:55:02 +03:00 |
|
Kenneth Gitere
|
703de7e3bf
|
Merge the readability module with the rest of the extractor
|
2020-10-22 12:12:30 +03:00 |
|
Kenneth Gitere
|
aacb442b7a
|
Move MetaAttr to moz_readability and rename to MetaData
Add get_article_metadata, get_article_title and unescape_html_entities
and their tests
|
2020-10-20 22:27:40 +03:00 |
|
Kenneth Gitere
|
6dab011cac
|
Fixed img resolving bug
|
2020-05-16 10:22:49 +03:00 |
|
Kenneth Gitere
|
c30d5f732e
|
Fix test data
|
2020-05-06 14:01:49 +03:00 |
|
Kenneth Gitere
|
271d3c8951
|
Change download code to save images to a folder
Add downloaded images to the output epub file
|
2020-05-05 12:24:11 +03:00 |
|
Kenneth Gitere
|
f02973157d
|
Refactor downloading code to download images in parallel
|
2020-05-05 09:40:44 +03:00 |
|
Kenneth Gitere
|
e5a318282d
|
Update img tags with new src values to point to the local files
|
2020-05-02 19:06:03 +03:00 |
|
Kenneth Gitere
|
78ba40f57a
|
Add image download functionality
|
2020-05-02 18:33:45 +03:00 |
|
Kenneth Gitere
|
f24e72e70f
|
Change signature of extract_content to copy the reference to article DOM
node instead of writing to file
|
2020-05-02 14:51:53 +03:00 |
|
Kenneth Gitere
|
529704d227
|
Add test for extract content
|
2020-05-01 20:42:41 +03:00 |
|
Kenneth Gitere
|
b5336e078d
|
Factor out text extraction into extractor module
|
2020-05-01 16:17:59 +03:00 |
|