Cannot import JSON flat dump

LA_FORGE · 20 March 2021 14:34

Hi,

I’m succesively migrating all of my YaCy peers to the new release with Solr 8.8.1. I just did a JSON flat dump before migrating to 1.925/10086. After the version upgrade I put the JSON flat dump into /DATA/SURROGATES/in but the import doesn’t work. After a few secs the Log shows the following:

E 2021/03/20 14:21:13 
org.apache.solr.handler.RequestHandlerBase 
org.apache.solr.common.SolrException: ERROR: [doc=-
ql5IgPCpqc4] Error adding field 'last_modified'='Sat Dec 02 
17:46:10 GMT 2017' msg=Invalid Date String:'Sat Dec 02

The full stacktrace is located at E 2021/03/20 14:21:13 org.apache.solr.handler.RequestHandlerBase org.apache.solr - Pastebin.com

A fix would be great

Greetings

LA_FORGE

sixcooler · 20 March 2021 16:24

Hi,
in order to have something to be imported via SURROGATES/in you need the fill-blown xml export.
(the json export is for imports at elastic search)

I’ve done some tuning on the solr-8.8.1 topic - check the latest version.

Cu, sixcooler.

LA_FORGE · 20 March 2021 16:49

Hi sixcooler,

thx for the info. Ok, I’ll fetch the latest code at the repo.

Thank you very much.

LA_FORGE · 20 March 2021 17:41

I’m sad that the data can’t be imported into a 1.925 Yacy Is YaCy Grid ready to dock to freeworld right now? Or is it possible to do a “backport” export as a XML dump for the “old” YaCy. I can provide the JSON flat file as soon the upload is finished.

LA_FORGE · 20 March 2021 21:37

Here is my dump in JSON flat format:

https://archive.org/download/yacy_dump_f197001010100_l202103170000_n202103170846_c000016709862_tc

Orbiter · 21 March 2021 10:32

I will make it possible to do the json import to get compatibility with YaCy Grid

LA_FORGE · 21 March 2021 11:16

Thank you very much. I’m very glad that the data in the dump I created will soon enrich our freeworld network again. There is some special metadata in the dump that imho is valuable for the community.

e.g. https://archive.li/Ld5ov

Orbiter · 29 March 2021 16:50

I just fixed the import.

However, it is working a bit slow because of an enrichment process that can re-annotate synonyms and facets in case that such things are defined in the importing peer. It is possible to speed up that process but it needs extra care.

LA_FORGE · 29 March 2021 18:09

Thank you very much

Orbiter · 29 March 2021 23:34

Do not start huge imports right now, I will work on the performance!

LA_FORGE · 30 March 2021 05:34

Ok thx. This is what is theoretically possible:

Orbiter · 30 March 2021 10:08

now I have added concurrency and removed superfluous tokenization in case no synonyms or semantic tags are defined.

LA_FORGE · 30 March 2021 12:57

Yeah THX

Just cloning our repo now. The benchmark results shown above are made from my PCIe NVMe SSD acquired only for YaCy. But some OS’s NVMe drivers aren’t very mature yet. I had many Kernel Panic’s @ Mac OS < Catalina with that. Linux works fine but I’m not sure which filesystem is the fastest. I’m currently using XFS.