I’m new YaCy user and I’m curious how to solve two questions I do have while using it:
I’ve crawled/indexed few sites I’m most interested in but once I try to search over them peer-to-peer search sometime completely ignores local index (local peer 0) and provides completely random answer which is usually complete garbage when taking in mind original question string. Is there any way to make search always consult also local peer?
internet sites I’m most interested in are quite huge, the question is how to make crawler tolerably fast and yet kind of doable. I guess with a speed of 10s of pages per minute man can’t reasonably index whole wikipedia (example). When I increase crawler speed I may end up with speed around 1-2k pages per minute but then the question is if some webserver will not kick me out as a robot. Is crawler able to detect kick out and then limit speed to particular server? I’m also curious if for speedy crawling there is a chance to use remote peers somehow. I’ve enabled remote crawling on my side, but so far has not figured out how to initialize remote crawling. Also does “remote crawling” mean distributed crawling? Last question: does crawler rotate user agents strings to better “confuse” web server?