we just discovered YaCy as an alternative to OpenSearchServer. We want to use it for prividing a portal search on our webpage https://www.ffh.de
YaCy is already installed an running. Also the crawler is able to find pages and does add these pages to the index. But not all subpages seems to be crawled. Even the craling depth is set to 4 what should be high enough.
i.e. the subpage Aktuelle Nachrichten & Sport aus Hessen – FFH.de is not crawled, even it is linked in the main menu on every page. The Regex does match too.
What could be the reason for this? Within the log no details regarding this URL are displayed.
An additional question: How does YaCy handle canonical urls? If more than one page has the same canonical URL, does it only the main page to the index?