Noob question: Can one installation power the site search of two different websites?

I am de-googling my website and would like to replace my Google site search with a selfhosted alternative. I just installed Yacy on a VPS for testing purposes and so far it looks impressive.

Now I consider using it for two of my websites and wonder: Can I use one installation of Yacy to power the site search of two different websites? The search on domain1 is supposed to provide only results for pages on domain1 while the search on domain2 is supposed to provide only results for pages on domain2.

Do I need to instances of Yacy for it or is one enough - how can I do that? I tried to find it out by myself without success, so I hope some of you experienced users or developers can help me with my noob question. Thanks a lot!

Very simple:
One instance is definitely enough for only 2 domains. (YaCy is made to serve an index of millions of domains)
http://127.0.0.1:8090/ConfigBasic.html -> Use Case : “Search portal for your own web pages”
http://127.0.0.1:8090/ConfigNetwork_p.html -> Network Config: “Robinson Mode”
http://127.0.0.1:8090/Crawler_p.html -> Start one Crawl per domain. Level 3 or more, depending on your site. Alternatively you can provide a list of your URLs to crawl.
http://127.0.0.1:8090/CrawlProfileEditor_p.html -> Schedule the 2 crawls periodically
that’s it

1 Like

Thanks for your answer. Not sure if I was clear enough. I know that Yacy can index multiple domains. I wanted to create two search engines for two different sites: domain1 and domain2. Both sites need their own site search that only displays search results for their own domain and not for all of them. How can I do that, if possible?

Another thing. You said:

It is not clear how to start a crawl on this site. I have used http://127.0.0.1:8090/CrawlStartSite.html

Thanks!

PS: I looked through all settings and options, it seems one needs a seperate instance for each website search. No way to create a search for only one domain. But I might have overlooked something, very complex admin area!

If you want 2 independent indices (which do not overlap), then you need 2 instances of YaCy.

Well, YaCy is the only search engine which you can use out of the box without coding or scripting knowledge :wink: Running a search engine is not trivial. You should know what you are doing :wink:

1 Like

Thanks. Alright. Using docker, to start a second instance, I tried to run
docker run -d --name yacy3 -p 8093:8093 -p 8445:8445 yacy/yacy_search_server
but this did not work.

I have started learning this stuff recently and would like to learn more, so sorry for my noob-questions. Maybe we can add your answers to the FAQ?

EDIT. experimented a bit with the ports, this one worked

Two instances mean 1GB memory usage instead of 700 MB

docker run -d --name yacy1 -p 8091:8090 -p 8444:8443 yacy/yacy_search_server

You are right, and I am eager to learn! On the other hand, I wish it would be easier for beginners to become independent of Google & Co. Now, most solutions are still too geeky

I run a YaCy Instance on default port 8090 to a) support the global index (recommended) and b) to have direct access to the global index by my own.

My other internal YaCy indices run (Robinson Mode) on 8091, 8092 … which are NATed by my firewall to be available from the outside.

Starting a fresh instance with 1GB is always a good idea. In /Status.html you later can see how much your instance really consumes (login as admin first) so you can optimize the params.

I run my instances in VMs.

Advanced setup! I don’t run my own server at home so I use a rented VPS instead, now with one instance on port 8090, the other on 8091. Next step will be reverse proxy and connect it to a domain and then integrate with my two website. Let’s see if I can make it or if I will have to go back to using Google what I would hate…

Your setup surely will work! Rescheduling the periodical crawl can be a little bit tricky as it needs slightly different params than the initial one. I recommend another YaCy instance as playground.

1 Like

Thanks, yes, I will play a bit the coming days. Very impressive search results, and it seems the VPS with 2GB RAM with 1cpu seems to be enough for two instances, thanks a lot for your help, very much appreciated!

@TheNomad11 Be a bit careful with certain VPS’ companies regarding YaCY :warning:… There’s quite a few unserious actors out there.

E.g I recently got straightup scammed by a company called :angry:greencloudvps . com / greenkvm . com:angry:

I already had a Japan VPS at theirs which i was quite pleased with, running yacy. And was looking to establish a US based VPS, so to consolidate an existing US Linode VPS to them as well. So as to have just one company to deal with.

When greencloudvps suddenly, after i had been recommend a package by their own sales rep. trough chat, and had paid for it, decided to change terms of use via email ; Citing “New VPS info”, and that e.g ‘P2P’ was not allowed. Even though i had specifically stated intended use as being YaCY, with their own sales rep. mere hours before , and had already been working on setting up the node for over an hour…

Thankfully, i had not signed up for longer than 1 month plan. Because they denied me refund on it. So i cancelled all services with them, and reported their Delaware branch to the U.S FTC, for breaking contract. And to DNB ASA/BankID for their scam. (i paid with VISA)

And most fortunately, i had not come so far into my plan as to actually cancel and delete the existing Linode VPS , that i had intended to replace with greencloudvps’ fcuking sh * t.

Yeah, I am following discussions lowendtalk.com - there are indeed lots of dubious actors in the market. I used Hetzner, Germany’s largest hoster and with good reputation. Ideal for testing as they have hourly billing, so I only paid 8 cent for testing Yacy. Besides that, I don’t do P2P, just a simple website search. I also heard good stuff about Linode, they are in the same league as Digital Ocean, Vultr, Hetzner etc

1 Like

Take some old PC Hardware. Install Freebsd. Install YaCy. NAT Port 8090 to that box and go!
All you have to pay for is the electric energy. Backup / export your URLs as HTM from time to time and:
Enjoy watching your YaCy instance being part of the most fascinating search engine since www.

Professional hosters are good for hosting static content (which YaCy isn’t). A reasonable (dedicated) machine to run YaCy will cost you several 100 bucks per month. Everything else is a lie.

All the cloud crap is made to rip off big companies after having them locked in before. Nothing for private freaks to play around :wink:

1 Like

@zooom

meh… If something’s worth doing, it’s worth doing right :wink:

I agree, but I don’t know about which dimentions you are takling about. All I want to say is, that if you plan to index more than some few websites using YaCy, a professionally hosted hardware will quickly consume a lot of money.

If you have different experience, pls let me know.

1 Like

Yeah. It is for a home-hosted YaCY machine.

The thing with greencloudvps / greenkvm, was they fancied YaCy to be p2p software.

Though I guess in their stupid minds, even a httpd that’s being accessed by more than one client at a time would be p2p… It’s not like p2p actually reduces bandwith use or anything… :upside_down_face:

the multi-site-search-in-one-YaCy thing should also be doable with the collection attribute. Thats the idea behind it: every crawl run is assigned to a specific collection by the user and a search can pick out by default only results from a given collection.

Cool. How would the parameters in the search box look like if I also want to pick results from a specific collection?