Commit graph

6 commits

Author SHA1 Message Date
Kenneth Gitere
b661211f0f Refactored code to use regexes from regexes module
Extracted constants from the code for easier reusability in some cases.
Change select queries for multiple elements to use the `,` operator
instead of calling `chain`.

Remove check for "null" in `fix_lazy_images`. This mitigates a JSOM
issue so it doesn't affect the Rust code in any way.
2020-10-15 22:45:18 +03:00
Kenneth Gitere
75018894ae Add regexes module in moz_readability that contains the regular
expressions used. For optimal performance, the regular expresions
are compiled to static values to prevent recompiling in loops
2020-10-15 22:25:10 +03:00
Kenneth Gitere
d2bd31dc47 Add helper functions for the grabArticle function 2020-10-07 20:46:08 +03:00
Kenneth Gitere
7219198524 Change function signature of next_element to return an Option
rather than mutate a given value.

The new function signature reads a little easier than before.
Remove TODO task in replace_brs
2020-09-23 22:52:07 +03:00
Kenneth Gitere
7fb09130e8 Add calls to remove_scripts and prep_document 2020-08-31 20:40:37 +03:00
Kenneth Gitere
e1debf5630 Add moz_readability initial code and accompanying unit tests
This currently contains the preprocessing code of the Readability.
It is a port of Readability.js by Mozilla.
2020-08-31 19:30:09 +03:00