Commit graph

10 commits

Author SHA1 Message Date
Kenneth Gitere
aacb442b7a Move MetaAttr to moz_readability and rename to MetaData
Add get_article_metadata, get_article_title and unescape_html_entities
and their tests
2020-10-20 22:27:40 +03:00
Kenneth Gitere
6dab011cac Fixed img resolving bug 2020-05-16 10:22:49 +03:00
Kenneth Gitere
c30d5f732e Fix test data 2020-05-06 14:01:49 +03:00
Kenneth Gitere
271d3c8951 Change download code to save images to a folder
Add downloaded images to the output epub file
2020-05-05 12:24:11 +03:00
Kenneth Gitere
f02973157d Refactor downloading code to download images in parallel 2020-05-05 09:40:44 +03:00
Kenneth Gitere
e5a318282d Update img tags with new src values to point to the local files 2020-05-02 19:06:03 +03:00
Kenneth Gitere
78ba40f57a Add image download functionality 2020-05-02 18:33:45 +03:00
Kenneth Gitere
f24e72e70f Change signature of extract_content to copy the reference to article DOM
node instead of writing to file
2020-05-02 14:51:53 +03:00
Kenneth Gitere
529704d227 Add test for extract content 2020-05-01 20:42:41 +03:00
Kenneth Gitere
b5336e078d Factor out text extraction into extractor module 2020-05-01 16:17:59 +03:00