Cannot import JSON flat dump

Hi,

I’m succesively migrating all of my YaCy peers to the new release with Solr 8.8.1. I just did a JSON flat dump before migrating to 1.925/10086. After the version upgrade I put the JSON flat dump into /DATA/SURROGATES/in but the import doesn’t work. After a few secs the Log shows the following:

E 2021/03/20 14:21:13 
org.apache.solr.handler.RequestHandlerBase 
org.apache.solr.common.SolrException: ERROR: [doc=-
ql5IgPCpqc4] Error adding field 'last_modified'='Sat Dec 02 
17:46:10 GMT 2017' msg=Invalid Date String:'Sat Dec 02 

The full stacktrace is located at E 2021/03/20 14:21:13 org.apache.solr.handler.RequestHandlerBase org.apache.solr - Pastebin.com

A fix would be great

Greetings

LA_FORGE

Hi,
in order to have something to be imported via SURROGATES/in you need the fill-blown xml export.
(the json export is for imports at elastic search)

I’ve done some tuning on the solr-8.8.1 topic - check the latest version.

Cu, sixcooler.

1 Like

Hi sixcooler,

thx for the info. Ok, I’ll fetch the latest code at the repo.

Thank you very much.

I’m sad that the data can’t be imported into a 1.925 Yacy :frowning: Is YaCy Grid ready to dock to freeworld right now? Or is it possible to do a “backport” export as a XML dump for the “old” YaCy. I can provide the JSON flat file as soon the upload is finished.

Here is my dump in JSON flat format:

https://archive.org/download/yacy_dump_f197001010100_l202103170000_n202103170846_c000016709862_tc

1 Like

I will make it possible to do the json import to get compatibility with YaCy Grid

1 Like

Thank you very much. I’m very glad that the data in the dump I created will soon enrich our freeworld network again. There is some special metadata in the dump that imho is valuable for the community.

e.g. https://archive.li/Ld5ov

I just fixed the import.

However, it is working a bit slow because of an enrichment process that can re-annotate synonyms and facets in case that such things are defined in the importing peer. It is possible to speed up that process but it needs extra care.

1 Like

Thank you very much

Do not start huge imports right now, I will work on the performance!

Ok thx. This is what is theoretically possible:

now I have added concurrency and removed superfluous tokenization in case no synonyms or semantic tags are defined.

1 Like

Yeah THX

Just cloning our repo now. The benchmark results shown above are made from my PCIe NVMe SSD acquired only for YaCy. But some OS’s NVMe drivers aren’t very mature yet. I had many Kernel Panic’s @ Mac OS < Catalina with that. Linux works fine but I’m not sure which filesystem is the fastest. I’m currently using XFS.