Kenneth Gitere
78d6e16618
Add unit tests for clean
, clean_styles
, clean_headers
and
...
`clean_matched_nodes`
Add missing function calls in `prep_article`
2020-10-16 08:00:47 +03:00
Kenneth Gitere
b661211f0f
Refactored code to use regexes from regexes module
...
Extracted constants from the code for easier reusability in some cases.
Change select queries for multiple elements to use the `,` operator
instead of calling `chain`.
Remove check for "null" in `fix_lazy_images`. This mitigates a JSOM
issue so it doesn't affect the Rust code in any way.
2020-10-15 22:45:18 +03:00
Kenneth Gitere
75018894ae
Add regexes module in moz_readability that contains the regular
...
expressions used. For optimal performance, the regular expresions
are compiled to static values to prevent recompiling in loops
2020-10-15 22:25:10 +03:00
Kenneth Gitere
d2bd31dc47
Add helper functions for the grabArticle function
2020-10-07 20:46:08 +03:00
Kenneth Gitere
7219198524
Change function signature of next_element
to return an Option
...
rather than mutate a given value.
The new function signature reads a little easier than before.
Remove TODO task in replace_brs
2020-09-23 22:52:07 +03:00
Kenneth Gitere
7fb09130e8
Add calls to remove_scripts and prep_document
2020-08-31 20:40:37 +03:00
Kenneth Gitere
e1debf5630
Add moz_readability initial code and accompanying unit tests
...
This currently contains the preprocessing code of the Readability.
It is a port of Readability.js by Mozilla.
2020-08-31 19:30:09 +03:00