Commit graph

12 commits

Author SHA1 Message Date
Kenneth Gitere
db11e78d8c Add template for epub output
Change output format to name file with the title name
Add getters in MetaData
2020-10-22 13:55:02 +03:00
Kenneth Gitere
703de7e3bf Merge the readability module with the rest of the extractor 2020-10-22 12:12:30 +03:00
Kenneth Gitere
aacb442b7a Move MetaAttr to moz_readability and rename to MetaData
Add get_article_metadata, get_article_title and unescape_html_entities
and their tests
2020-10-20 22:27:40 +03:00
Kenneth Gitere
6dab011cac Fixed img resolving bug 2020-05-16 10:22:49 +03:00
Kenneth Gitere
c30d5f732e Fix test data 2020-05-06 14:01:49 +03:00
Kenneth Gitere
271d3c8951 Change download code to save images to a folder
Add downloaded images to the output epub file
2020-05-05 12:24:11 +03:00
Kenneth Gitere
f02973157d Refactor downloading code to download images in parallel 2020-05-05 09:40:44 +03:00
Kenneth Gitere
e5a318282d Update img tags with new src values to point to the local files 2020-05-02 19:06:03 +03:00
Kenneth Gitere
78ba40f57a Add image download functionality 2020-05-02 18:33:45 +03:00
Kenneth Gitere
f24e72e70f Change signature of extract_content to copy the reference to article DOM
node instead of writing to file
2020-05-02 14:51:53 +03:00
Kenneth Gitere
529704d227 Add test for extract content 2020-05-01 20:42:41 +03:00
Kenneth Gitere
b5336e078d Factor out text extraction into extractor module 2020-05-01 16:17:59 +03:00