YACY Web interface hangs after some time

Dear All
My YACY installation under windows 10 works for some time(web interface) after some ideal time, it just goes into endless loop. But the debug window(I started yacy with debug state) shows, its crawling continuously without any issue. Your help is highly appreciated. I’m really worried. :unamused:

My version : 1.922/9966

Thanks
Vamsi

1 Like

My primary instance on a FreeBSD based installation is stopping suddenly, too.

The log do not show any useful information. But on my side the complete process is stopped. Without any entry in the log why

I’m using the version 1.922/9964

1 Like

Das ist ganz normal mit JAVA Software.
Regelmässiger Neustart hilft.

1 Like

Hab das mit nem Cron gelöst, der das handled, funktioniert ist aber dennoch doof… Schlussendlich ist es auch immer ne kurze Downtime und in dieser dem Netzwerk eine Lücke :confused:

I came to this forum to report the very same issue. I definitely disagree with the statement that hanging is normal for any software, Java software included. The issue with Java software tends to be that it is often times an Enterprise software. Enterprise software tends to be done for deadlines, not quality. Due to historic reasons the Java tends to be the technology for creating Enterprise software, which might explain the impression that all Java software is of low quality and has problems. Another aspect with anything with the word “Enterprise” in it is that unlike freelancers and home users and small businesses, the Enterprises have a lot of money to waste and fast delivery is more important than efficient use of money. Optimized, efficient, software takes longer to develop, so it makes sense to leave things unoptimized and just buy a lot of hardware that has a lot of CPU-power and RAM. That may also explain the typical “good” software development practice of first writing really low quality, crappy, software and then later trying to gradually turn shit into gold by trying to rewrite parts of software with dirty-hackish-flawed-architecture and then hope to fix the speed and RAM and other resource consumption issues like that, in stead of taking far longer to deliver the very first version, but then deliver a version that is reasonably optimized from the very start. Well, Enterprises have a lot of money to burn and the software companies that do that kind of contract work certainly do not complain about a situation, where a client is offering a lot of money for an additional project.

I find the YaCy project to be a very inspirational and from functionality point of view a step in the right direction. However, my observations are that the GUI problems seem to start after the YaCy instance has indexed some pages and the GUI problems seem to not occur, if the YaCy instance Java VM has a lot of RAM allocated to it. In my case, 600MiB of JVM allocated RAM will run without the GUI, but about 1.2GiB will kind of work. I also noticed that the original authors of the YaCy project seem to run a consultancy business, where they use the YaCy as their core technology. That may explain, why they might not test the YaCy with lower RAM consumption scenarios. After all, businesses have the money to just buy computers with a lot of RAM and speed and comfort has higher priority than RAM consumption. Unfortunately the business use case differs substantially from the hactivist use case. In my opinion it is perfectly OK for a P2P-search engine to crawl with the speed of 1 HTML-page per 5s or even 10s and have a ~5GiB index on an USB-stick that is connected to a 1-core, 700MHz Raspberry_Pi with 512MiB of RAM and an USB storage read/write speed of about 1MB/s. It seems to me that the YaCy is totally unoptimized for that kind of a use case.

I mean, think of the 512MiB RAM and 700MHz single CPU core as something that is far more powerful than desktop computers were at the late 90-ties. Even if the Java VM takes about 200MiB and needs some warm-up, then it should be possible to run a pretty decent P2P-node on such a machine, provided that it is running in a server mode, id est the GUI/X11 of the Raspberry_Pi has been switched off, not started. With an exception of video files and sound files, a well crafted company web page with proper static pages with may be a JavaScript based menu and none of the nonsense JavaScript framework bloat is about 300MiB, most of it photos. If one Raspberry_Pi indexes just a few of such pages, may be the home page of a small business owner, then those Raspberry_Pi-s can already offer a lot of data to the P2P-search-engine-network. Given that people tend to have old Android phones that they do not use and that have at least 256MiB of RAM, may be at some point the old Android phones might just lay next to home WiFi router and run a small P2P-search-engine node. Basically “free” hardware that consumes relatively little electrical power and has greater computational power than 90-ties desktop computers. The Android phones might be used only as servers, id est there is no need to create an elaborate native mobile app GUI for them. A very primitive GUI for setting up administrator username and password and port number(s) might do.

I remember that one of texts at the YaCy command line console kindly asked for feedback, how the YaCy could be updated to make it fit better with the end user use case. I’m not sure that it makes sense to start re-writing the current YaCy implementation, but it certainly makes a lot of sense to CREATE AN API THAT CAN BE USED BY OTHER P2P-search-engines so that different P2P-search engines can use the same P2P-seach-engine-network and YaCy could also use the indexes of those other P2P-search-engines.

The idea of defining a common P2P-interface for different P2P-software projects is not new. For example, there are multiple BitTorrent clients that all use the same protocol and therefore can exchange files, despite being written in different programming languages. A concrete example is also the Fediverse of P2P-social network software, where different P2P-social-network applications support multiple P2P-social-networking-protocols and form a common, huge, network. The details of that might be found from

So, all in all, I think that the YaCy is IDEOLOGICALLY a very nice project, but its implementation is absolutely ill-suited for non-enterprise use cases. But, it’s a nice first step and if it is possible to agree on a P2P-protocol, then things can start moving :smiley:

Thank You for reading my comment.

1 Like

Nach mehr als 2 Jahren jetzt mit YaCy und nach über 15 Jahren mit allen möglichen Suchmaschinen, auch einer Eigenentwicklung kann ich folgendes sagen:

YaCy ist allem, was ich bisher an Open-Source Suchmaschinen Software in den Fingern hatte, haushoch überlegen.

Performance Probleme und Abstürze hatte ich anfangs auch, vermutlich wegen zu kleiner Hardware. (8 GB RAM und eine schnelle Platte (min. SATA 6GB/s) ist das Minimum. Jetzt habe ich keine Hänger mehr. Vielleicht liegts auch an der neuesten Version und dem Setup “out-of-the-box”, das seeehr stabil läuft.

Eine Schwierigkeit sehe ich darin, dass der freeworld Index sehr einseitig gebiased ist. Vermutlich liegt das in der Natur des P2P und den vielen Opportunisten, die gerne mitmachen, aber keine eigenen Start-URLs mitbringen, geschweige denn vernünftig crawlen.

Ich bin nun dabei, mehrere 100GB RAM / mehrere TB Maschinen mit YaCy aufzusetzen, um eine mögliche Grundlage zu setzen für eine echte Alternative zu G und Konsorten.

Was JAVA angeht… Die Geschmäcker sind verschieden. Auch in der Musik gibt es in jedem Genre durchaus gute Sachen, sowie Crap in der Klassik.

Für meinen Geschmack wird JAVA in Hässlichkeit des Codes nur noch durch Javascript übertroffen.
Leider hat JAVA ein paar umständliche Designfehler, aber Microsoft ist ja auch Marktführer…

Marketing ist alles :wink: und hoch lebe YaCy!!

I found the issue causing the main interface to hang was the JVM memory allocation. The default setting was 600MB which is not nearly enough. I found 8GB resulted in fairly reliable operating except if you tried to start multiple crawls. But now the main interface works but the admin hangs.

It was a memory issue but to be reliable I had to increase to 24GB of RAM, 8GB was not enough,
just for the JVM, machine has 48GB now. And 1TB filled up very fast so I’m kind of stuck for now until this coronavirus gets under control, then I’ll buy a couple of 16TB drives and make a larger partition for it.

It’s just a UI thing, the process is still running behind, there is no issues at all here.

@martin_vahi
Yeah it takes a lot of ressources and is not optimized for small regular users.
It would be better as a federated decentralized solution but maybe it’s to rebuild everything from scratch with more moderns tools.

Everyone just use meta search engines right now but there is a lot of open source web crawlers waiting to enter an other king of project.
I also found this https://gitlab.com/infinitysearch/infinity-search
We’ll see how that goes.

Tell us more about UI process? please…

There is an interesting option in memory settings: RAM

Memory state: correct Correct state information
Requires a minimum of
50
MiB of free space. Disable DHT-in below.

Periodically, the system reaches the maximum memory consumption and possibly disables DHT.

But I’m wondering - how is the web interface connection implemented? Not via DHT with Jetty?

thus, Yacy continues to work, but becomes unavailable via the web interface.

it is also Interesting that the JVM always takes the maximum, always reaches the maximum in the RAM of the physical server… no matter How much it is given… Thus, you give it memory, and it takes and disables DHT, and blocks the web-interface …

And blocks the web interface after the user does not access it for a while…

What do you think about it?

What do you think about it?

There is an interesting option in memory settings: RAM

Memory state: correct Correct state information
Requires a minimum of
50
MiB of free space. Disable DHT-in below.

Periodically, the system reaches the maximum memory consumption and possibly disables DHT.

But I’m wondering - how is the web interface connection implemented? Not via DHT with Jetty?

thus, Yacy continues to work, but becomes unavailable via the web interface.

it is also Interesting that the JVM always takes the maximum, always reaches the maximum in the RAM of the physical server… no matter How much it is given… Thus, you give it memory, and it takes and disables DHT, and blocks the web-interface …

And blocks the web interface after the user does not access it for a while…

I run YaCy under FreeBSD now. 96GB RAM, 3 Instances w/ 30GB each.
The hanging GUI is pain in the ass.

I had to hack the startYACY.sh to run under FreeBSD at all. stopYACY.sh only works if the GUI does not hang.

Has anyone a script to run YaCY under FreeBSD as a service?

I run one instance in “Robinson Mode” to crawl a defined list of domains (~1Mio) and the GUI comes up but hangs after a few minutes of crawling. JAVA is ~700% of CPU. Maybe this is the reason for hanging? When I start the crawl job (1 Level deep, max. 100 links per domain) It at first reads the robots.txt. No clue why, because I unchecked to obey the rules. Then, the domains itself are crawled. When I then reboot the machine after a while, after the GUI fucked up, it continues to crawl the links at level 1. Then it runs like hell and the GUI hangs again after few minutes. Not funny.

sorry for all the inconvenience, I am working on this but it looks like a hard problem because there are no hints that are sufficient to identify the cause. I’m on it.

So what I have done so far is:

  • set up several old versions, running them to see if the problem occurs only in recent versions
  • add some configurations in the startup of the http server process to provide more servlet threads
  • added a forced garbage collection every 10 minutes; maybe this helps somehow to remove load from the host
  • added an automated storage of thread dumps to DATA/LOG/threaddump.txt to be able to get such a dump even if the web interface is not available and kill -3 commands are not working (like inside of docker containers)

Unfortunately the bug did not show up in any of the freshly set up test peers, so I either have to wait until that happens or find the cause through debugging (which looks hard right now as nothing can be seen)

  • updated jetty from 9.4.17 to 9.4.35 and fixed a (bad!) bug in SSI handling (how was it possible that this ever worked?)

However thats just another shot

OK. Solved for the moment. I guess YaCy runs best when alone on a machine. So I decided to run every YaCy instance in a dedicated VM (FreeBSD) using VMWare Player on Windows 2008 Servers. (Verrry old school). But runs like hell…and…STABLE!! Crazy setup but after 3 yrs of trial-and-error w/ YaCy the 1st solution which did not f*up for days :wink:

BTW: Every VM has its own unbound service.

This way my 2008 host (4 YaCy VM) crawls ~1000 PPM each, which is fine for now.

1 Like

Orbiter,

there was no inconvenience at all.

Cheers
Markus