I have a list of 9000 universities worlwide. all different domains. It started with up to 3000 ppm, but broke down to 5-20 after 2 hrs. Crawls started afterwards look like they have to wait / are enqueued.
Lots of urls hit my blacklist, which has all the commercial crap on it like tumblr, facebook twitter and amazon. I got the impression, that many asian URLs answer fairly slow, so IMHO a simple increase of parallel threads should help, but where is the parameter?
CPU (12 Cores) is at 15%, Network at 5% and Disk at 3%, 30 GB Ram allowed, but only 20GB used.
3.200.000 at the Local Crawler queue.
DNS definitely is a suspect. Too many unknown domain lookups lead to lockouts after a few hours. OpenNIC is a good hint - thx. I will try. Have a look at DNSJumper.
I am thinking about a parallel setup: 10 YaCy instances on one machine, solrs connected. If it works: Increase the number of machines. Unfortunately the Index browser does not work anymore it there is several solrs connected. Even worse: It depends, which YaCy Instance you use for searching. Results are totally different. It looks like the search mostly always use the local solr for searching.