Kenneth Gitere
350447d1c4
Change calls on replacing regexes to replace_all
...
Add `fix_relative_uris`, `clean_classes`, `clean_readability_attrs`
and `post_process_content`
2020-10-21 19:55:22 +03:00
Kenneth Gitere
aacb442b7a
Move MetaAttr to moz_readability
and rename to MetaData
...
Add get_article_metadata, get_article_title and unescape_html_entities
and their tests
2020-10-20 22:27:40 +03:00
Kenneth Gitere
d99b1c687b
Fix counting of h2 nodes in prep_article
...
Add test for prep_article
2020-10-20 10:13:34 +03:00
Kenneth Gitere
94fa8db218
Fix bug in deletion of multiple nodes.
...
When calling `detach` in a for loop or `for_each` iterator consumer,
only the first node is ever deleted.
Fix replacement of table nodes in prep_article
Edit clean_conditionally to remove unnecessary assignment.
2020-10-20 10:04:12 +03:00
Kenneth Gitere
ccdbbb5a16
Add initial implementation of grabArticle
...
Change function signature of setNodeTag to return a NodeRef
Minor fix in clean, clean_headers and clean_conditionally
2020-10-20 07:42:32 +03:00
Kenneth Gitere
3254064c0d
Fix calls to select
to return an iterator excluding the original
...
calling node.
Edit `next_element` to either return an element node only or element/
text node
2020-10-17 07:13:39 +03:00
Kenneth Gitere
6377c01fb3
Add tests for clean_conditionally
and fix_lazy_images
...
Minor refactor in `fix_lazy_images`
Fix incorrect boolean expression and bug in element node name comparison
in `clean_conditionally`
2020-10-16 08:03:01 +03:00
Kenneth Gitere
78d6e16618
Add unit tests for clean
, clean_styles
, clean_headers
and
...
`clean_matched_nodes`
Add missing function calls in `prep_article`
2020-10-16 08:00:47 +03:00
Kenneth Gitere
b661211f0f
Refactored code to use regexes from regexes module
...
Extracted constants from the code for easier reusability in some cases.
Change select queries for multiple elements to use the `,` operator
instead of calling `chain`.
Remove check for "null" in `fix_lazy_images`. This mitigates a JSOM
issue so it doesn't affect the Rust code in any way.
2020-10-15 22:45:18 +03:00
Kenneth Gitere
75018894ae
Add regexes module in moz_readability that contains the regular
...
expressions used. For optimal performance, the regular expresions
are compiled to static values to prevent recompiling in loops
2020-10-15 22:25:10 +03:00
Kenneth Gitere
d2bd31dc47
Add helper functions for the grabArticle function
2020-10-07 20:46:08 +03:00
Kenneth Gitere
7219198524
Change function signature of next_element
to return an Option
...
rather than mutate a given value.
The new function signature reads a little easier than before.
Remove TODO task in replace_brs
2020-09-23 22:52:07 +03:00
Kenneth Gitere
7fb09130e8
Add calls to remove_scripts and prep_document
2020-08-31 20:40:37 +03:00
Kenneth Gitere
e1debf5630
Add moz_readability initial code and accompanying unit tests
...
This currently contains the preprocessing code of the Readability.
It is a port of Readability.js by Mozilla.
2020-08-31 19:30:09 +03:00