Database full of website urls

jjdesign · 23 October 2021 05:31

Hi , can you force Yacy to access a URL database of 25,000 websites that have been currated?

Orbiter · 24 October 2021 09:09

Easy, just convert those 25000 Websites to a text-based URL list, one URL per line, and paste that list inside the Crawl Start url-window. I just tried this some weeks ago.

It will take about one hour until YaCy has ingested that huge bunch of URLs in the Crawl Start, but it works.

You can adjust the crawl start with some proper or crazy settings, like (proper) craw depth = 0 to only index the given urls. Or differently if you want to have a full crawl of any depth for each of thos 25000 URLs.

nhaas · 27 October 2021 05:16

Let us know how long it takes. I am interested.

jjdesign · 27 October 2021 12:27

Do you know if you can push into the URL WINDOW from a database and update the pasted crawl list. IE remove and add??

Thank you