+1 for your project, YaCy’s the future!
Portions of pages that are indexed by YaCy are quite a bit “noisy” and a lot of informations do not have to be considered.
I read about Mercury Parser, that extracts from chaos the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more (used by Readability.js, a standalone version of the readability library used for Firefox Reader View).
Where (in the code) and how (respecting YaCy architecture) can I:
- add/develop a
View as: Reader Viewto the
URL Metadataof YaCy?
- index/parse pages using the Reader View content of pages?