Understanding YaCy

Are there any docuemnts to learn and understand YaCy ? The videos are pretty fine to reproduce some things, and some pages of the wiki is missing …
How can I get a deeper understanding on the principles of SE and ist there a documentation explaining the software point by point ?

This 16-year old project has a long history of “please explain everything” - Requests. I understand that things are confusing, but where should anyone start? For example: I was once an assistant teacher for the lecture “Information Retrieval”. We could start with a one semester lecture and after that we did not even touch the basics of YaCy because we learned only theoretical basics. I doubt you want that.

Its the same with linux, you never ask “please explain everything”. Too much. The better choice is an adventure where you peek around and try to understand pieces here and there. Please have a look around in this forum, in the FAQ of yacy.net and then try out some things yourself.

To be perfectly honest, I love YaCy, and digging into it and exploring all it’s mysteries is rewarding at times, but the lack of clear, consistent, thorough step by step documentation, all in one place can be very frustrating.

For example, my recent post covering or addressing how to use the “site:” search.to instruct Yacy to search the content of just one website or domain.

The instructions I stumbled upon here: https://wiki.yacy.net/index.php/En:SearchParameters didn’t seem to work. About a year ago I spent untold hours struggling, using trial and error to try to figure out this and probably half a dozen other recondite aspects of the program.

YaCy is a GIFT to the world, of, IMO, untold value and importance, and it is vital that more people learn how to use it, if we would have a free internet. By free, I do not necessarily mean without cost, I mean “by the people for the people” but how can that be, if “the people” give up, because they can’t understand how to use it.

It’s fine to say, you shouldn’t be using a search engine without understanding it. Great. But some of the inner workings of YaCy are so esoteric, I’m not sure there is more than one person on the planet who really knows what all is inside.

Language is an issue. Some time back I spent hours maticulusly trying to translate some of the available documentation from German to English, as that was mostly all I could find.

These aren’t really criticisms, but praise and appreciation. People love the IDEA of YaCy, I read that in review after review, but people just aren’t really able to use it due to the complexity and lack of understanding.

The developers are too busy to spend all their time explaining what may be obvious to them, over and over and over again, busy on the cutting edge, developing something new. And most people like myself, are too busy with the struggles of day to day life to figure it out by trial and error or in depth research with little guidance.

People sincerely want “a deeper understanding”.

I have tried, and would be happy to write documentation, but I don’t understand half of what YaCy can do myself, and only wish I had the freedom to spend more time trying to figure it all out.

I suppose if I knew Java, it would all be transparent. Just look at the source code, but that isn’t really true either. How many other things have been incorporated into YaCy that require in depth study? A lot.

If someone competent enough with YaCy could work on documentation, I would be glad to contribute financially to such a project.

1 Like

I totally agree. This is such a fascinating project but too hard to understand and make it work. One of the reasons behind evil corpotations is that their services are easy to use while one has to have a degree in computer science to use many of the alternatives (a bit exaggerated, but you get my point). Consequence: surveillance capitalism is winning, the decay of the open net continues

In my case and for many others it would be helpful to have simple instructions om

  1. How to set up your own search engine for your website
    1.1. How to add the domain(s) you would like to be crawled
    1.2. How to exclude folders from site search
    1.3. How to automate crawling
    1.4. How to edit what I just entered (site search settings)

  2. Regular expressions: Please provide examples

I have now spent many hours trying to set this up, some things work now, but many don’t. I tried to excude folders from search and rerun the indexer without success. Regular expressions remain a mystery even after some googling.

Running a search engine is not trivial, but YaCy is AFAIK the only package available, which runs out-of-the-box within 5 minutes and contains almost all features you need.

I recommend to setup a “virgin” public P2P instance to play around and a second one (or more) as soon as you understand what you are doing.

1.1. http://127.0.0.1:8090/CrawlStartExpert.html - Either enter some domains into the textarea or (what I prefer): Maintain lists of starting points as text files. I therefore always have a subdir “starturls” where I keep the files.

1.2 ?? What do you mean? Search syntax like “not containing something”?

1.3. Menu “Crawler Monitor”->> http://127.0.0.1:8090//CrawlProfileEditor_p.html (upper right)
All crawls are saved and listed. The last 2 colums can be edited for scheduling a crawl again.

1.4 Same table left side. You cannot edit the record, but copy it into a new one.

  1. Useless for crawls. Weird non regex standard syntax for blacklists. Good starting point to play: https://regex101.com/

I stopped messing around with blacklists. Now i kick out spammers from time to time by simply deleting stuff from the index or better: Tell the crawls in more detail what you want, restrict to dedicated domains and crawl the levels separately.