Index Migration to Solr 8.8.1

Hi,

I had an issue bootstrapping YaCy v.1.925/10082. According to the log, the new Solr version doesn’t accept the “old” Index Data Files from Solr version 7.7.3. Do I need to dump the whole Index via the Export feature and put it in /yacy/DATA/SURROGATES/in on the new YaCy release or is there a smoother way for an Index transition to Solr 8.8.1?

Greetings

LA_FORGE

Yes, that code change was a bit in a hurry and should be better to support a migration.
So far there is no migration. I am working on it.

Thank you very much. It would be very great if the step via the export and import isn’t necessary because the PCIe SSD I’m using for YaCy has “only” 70 petabytes of I/O lifespan (MTBF) and 30 terabytes of I/O per week isn’t a rarity when operating a peer with the largest Index in Freeworld :slight_smile: (I’m continuously recrawling very old Index entries, too).

I just commited changes that moves YaCy latest git version into a situation where it can start up again. There were many more dependency and configuration problems for solr 8 but it works now with that.

Sad to say that there is currently no automatic migration of indexes… Here is what you must do:

  • use YaCy in version (up to) 1.924 (not latest git), i.e. https://download.yacy.net/yacy_v1.924_20201214_10042.tar.gz
  • Open Index Export/Import
  • click on the “Export” button (leave all other settings as it is by default, export as json)
  • let it run and it will create a (large) single dump file in /DATA/EXPORT/

If you loaded a newer version already, don’t worry, the original 6.6er index files are not touched and not removed. Just move the DATA folder to an older YaCy version and the old index is still there.

I will make sure that this will can be imported with the solr 8.8.1 version.
Sorry for inconveniences, but I researched for solr-embeded tools but could not find one that migrates from 6.6 to 8.8.

1 Like

Thank you very much for outstanding work!

Sad to say that there is currently no automatic migration of indexes

No problem, I’m using an old SAS-HDD with good average access time for the dump and the restore. Is the JSON export method more powerful than the XML one?

the JSON method is the choosen alternative since I started with YaCy Grid which has also a JSON dump format with almost the same attribute fields.

I just uploaded https://download.yacy.net/yacy_v1.924_20210209_10069.tar.gz as the “latest 1.924” before the commit to solr 8.8.1 happened - to be used for an export.

Great! Thanks

Cicero:index stefan$ ls -S -lah
total 8001171072
-rw-r--r--    1 user  staff   1.2T Nov 15 16:48 _1nkhb.fdt
-rw-r--r--    1 user  staff   381G Oct 23 06:47 _1nkhb_Lucene50_0.pos
-rw-r--r--    1 user  staff   260G Nov 15 09:41 _1nkhb_Lucene54_0.dvd

The whole Solr data directory has 3,6 TB. Well, that would take a few weeks I guess :rofl: :rofl: :rofl:

oh wow das is not good. Looks like I should do some performance enhancements on the export process. But then - there would be the need to backport that. Hm.

I could try to increase the process priority if you can tell me the filename of the Java class. This boosted the performance at another workflow (recrawl the whole index) remarkably.

well everything happens within net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector but how do you boost a single class?

Hallo Michael,

ich hab immer noch null Ahnung vom Programmieren :frowning: Ich bewundere das weiterhin was du da machst.

Ich meine natürlich den Thread :slight_smile: hab in der RecrawlBusyThread.java einen String gefunden

this.setPriority(Thread.MAX_PRIORITY);

Seitdem ich das auf MAX gesetzt habe, sind 200 GB Traffic/Woche beim recrawlen keine Seltenheit mehr :slight_smile: Kann ich die Priorität von dem Index Export auch beeinflussen? Wenn ja: Wie heißt die Datei vom Quellcode?

Viele Grüße

Stefan

also das ist gar keine schlechte Idee, ich habe jetzt genau diese Zeile mal in AbstractSolrConnector.java in Zeile 371 gepackt. Weiss noch nicht obs hilft…

1 Like

Wenn ich mit meinem “Spezialpeer” (Endeavour) fertig mit exportieren bin (Er hat nur 40 Mio. Einträge, hiermit habe ich eine Open Source Intelligence Plattform gecrawlt) mache ich die Änderung auch mal im Quellcode der 1.924/10042 und exportiere dann damit die 206 Mio. Dokumente von dem Epistemophilia peer. Hab mir eine schnelle SCSI-Platte via AFP gemounted für den Dump, will die PCIe SSD für YaCy etwas schonen, die hat “nur” 70 PB bis MTBF und hab hab garantiert schon 20 PB verbraten weil ich immer so viele Crawl-Jobs habe und die Tiefe immer auf 5 stelle :slight_smile:

So, der Export der 206 Mio. Dokumente läuft:

sh-3.2# iostat -d -n 3
          disk0               disk1               disk3
KB/t  tps  MB/s     KB/t  tps  MB/s     KB/t  tps  MB/s
51.05    4  0.21    41.25   27  1.08   900.86   50 43.94

disk3 ist die Festplatte für den Dump. Läuft recht flott dafür dass es keine SSD ist. Ich hab die Änderungen im Code auch so gemacht wie du geschrieben hattest und neu kompiliert. Ich freue mich auf die neue Version mit dem Sol 8.8.1. Vielen Dank für alles!