Noob question: Can one installation power the site search of two different websites?

Your setup surely will work! Rescheduling the periodical crawl can be a little bit tricky as it needs slightly different params than the initial one. I recommend another YaCy instance as playground.

1 Like

Thanks, yes, I will play a bit the coming days. Very impressive search results, and it seems the VPS with 2GB RAM with 1cpu seems to be enough for two instances, thanks a lot for your help, very much appreciated!

@TheNomad11 Be a bit careful with certain VPS’ companies regarding YaCY :warning:… There’s quite a few unserious actors out there.

E.g I recently got straightup scammed by a company called :angry:greencloudvps . com / greenkvm . com:angry:

I already had a Japan VPS at theirs which i was quite pleased with, running yacy. And was looking to establish a US based VPS, so to consolidate an existing US Linode VPS to them as well. So as to have just one company to deal with.

When greencloudvps suddenly, after i had been recommend a package by their own sales rep. trough chat, and had paid for it, decided to change terms of use via email ; Citing “New VPS info”, and that e.g ‘P2P’ was not allowed. Even though i had specifically stated intended use as being YaCY, with their own sales rep. mere hours before , and had already been working on setting up the node for over an hour…

Thankfully, i had not signed up for longer than 1 month plan. Because they denied me refund on it. So i cancelled all services with them, and reported their Delaware branch to the U.S FTC, for breaking contract. And to DNB ASA/BankID for their scam. (i paid with VISA)

And most fortunately, i had not come so far into my plan as to actually cancel and delete the existing Linode VPS , that i had intended to replace with greencloudvps’ fcuking sh * t.

Yeah, I am following discussions lowendtalk.com - there are indeed lots of dubious actors in the market. I used Hetzner, Germany’s largest hoster and with good reputation. Ideal for testing as they have hourly billing, so I only paid 8 cent for testing Yacy. Besides that, I don’t do P2P, just a simple website search. I also heard good stuff about Linode, they are in the same league as Digital Ocean, Vultr, Hetzner etc

1 Like

Take some old PC Hardware. Install Freebsd. Install YaCy. NAT Port 8090 to that box and go!
All you have to pay for is the electric energy. Backup / export your URLs as HTM from time to time and:
Enjoy watching your YaCy instance being part of the most fascinating search engine since www.

Professional hosters are good for hosting static content (which YaCy isn’t). A reasonable (dedicated) machine to run YaCy will cost you several 100 bucks per month. Everything else is a lie.

All the cloud crap is made to rip off big companies after having them locked in before. Nothing for private freaks to play around :wink:

1 Like

@zooom

meh… If something’s worth doing, it’s worth doing right :wink:

I agree, but I don’t know about which dimentions you are takling about. All I want to say is, that if you plan to index more than some few websites using YaCy, a professionally hosted hardware will quickly consume a lot of money.

If you have different experience, pls let me know.

1 Like

Yeah. It is for a home-hosted YaCY machine.

The thing with greencloudvps / greenkvm, was they fancied YaCy to be p2p software.

Though I guess in their stupid minds, even a httpd that’s being accessed by more than one client at a time would be p2p… It’s not like p2p actually reduces bandwith use or anything… :upside_down_face:

the multi-site-search-in-one-YaCy thing should also be doable with the collection attribute. Thats the idea behind it: every crawl run is assigned to a specific collection by the user and a search can pick out by default only results from a given collection.

Cool. How would the parameters in the search box look like if I also want to pick results from a specific collection?

I would think it would be a trivial matter to use the environment variables, assuming the site search form field is on the same page as the page searched, and YaCy has some means of sorting and returning search results by domain.

Have the forms point to some intermediary script like: if http referrer = domain 1 do this, if http referrer = domain 2 do that.

Or, better, if referrer match referrer. (Only return results where the associated url matches the referrer url).

Basically, if YaCy stores searches by domain, in some way linked to or associated with a domain, (I don’t know how it could be otherwise), then certainly it should be possible to sort results by domain according to which domain the request came from which could be read from the environment variables.

I’m rather certain I could implement some such thing in Perl, but I don’t know Java and don’t know enough about YaCy internals to implement it, but that is probably how I would do it if YaCy was written in Perl, or if I knew Java.

Those are my thoughts anyway.

I think by simply matching the referrer, one instance of YaCy could handle site search requests coming from any domain anywhere. Couldn’t it?

Thanks, but I can’t follow you here, I am no coder (yet), I am sorry. Any chance that a non-developer can make it work?

Poking around, I find this, which I thought I had seen before but wasn’t sure, but YaCy apparently has, like pretty much all search engines, a built in site search filter. I say apparently, only because I’ve never actually made use of it myself, but I assume it works, being that it is mentioned in the documentation: https://wiki.yacy.net/index.php/En:SearchParameters

site:
findsomething site:yacy.net will limit the results to the domain yacy.net

Assuming this is sent as a “Get” request to YaCy from the search form field, if you don’t want to have to, or don’t want visitors to your site to have to type that in, assuming that this function is actually implemented in YaCy, which I’m not 100% certain of, I assume this could be appended to the Get request in one way or another.

What I mentioned above (in my previous post) was a possible feature that I think could theoretically be incorporated into YaCy by the developers, (Which I’m not one of, as I don’t know Java), but I believe the respective search forms could be modified in such a way as to add the necessary switch i.e the term: findsomething site:your-first-website to the form on your-first-site and finsomething site:your-second-website to the form on your other site.

Possibly this could be achieved very simply by adding a query string to the get request or by including a “hidden field” to the HTML search form on each respective website.

In other words, Yacy generates the search field which looks something like:

<form method="get" action="http://127.0.0.1:8090/yacysearch.html">
    <input type="hidden" name="maximunRecords" value="20" />

etc.

That could be changed to something like:

<form method="get" action="http://127.0.0.1:8090/yacysearch.html?query=findsomething%20site:http://www.my-first-site.com">

All in one line, same as above.

or perhaps alternatively the parameter could be passed as an additional hidden field, something like:

<input type="hidden" name="site" value="http://www.my-first site.com" />

One way or the other this, or something similar SHOULD mimic or substitute for actually typing the parameter into the text field manually.

For me to say EXACTLY what this would look like, I would have to make up a little Perl script to see just what it is YaCy is receiving when findsomething site:your-first-website.com is typed into the form text field, then try those methods to see which works and exactly what syntax or URL encoding to use to reproduce the same results as actually typing it in.

But I’m not actually entirely sure that this site search feature is working or functional or activated in the current distribution of YaCy, but I’m assuming.

It you can confirm that the “standard” method for doing a site search works as described in the documentation cited above, and maybe post a copy of your form so I can use it to send the data to my script I might be able to figure it out.

The Perl script would not be much more than “echo” just so I can see what Yacy is receiving from the form.

Also it might help to know the actual URL’s of your sites where the actual forms are located, but strictly speaking not necessary. Just as long as you know when I write your-site-dot-com or whatever, you have to make that your actual site URL.

Hmmmm…

Sort of coincidentally, this seems to be related to collections:

collection = The name of a collection or a comma-separated list of collections. This collections can be used to separate search results into different subsets which is used with the GSA search interface using the ‘site’ parameter in the search request.

Don’t really know what all that means, but I see “‘site’ parameter” in there, which I believe is the same thing. Right?

https://wiki.yacy.net/index.php/Dev:APICrawler

Edit: GSA=Google Search Appliance?

In other words, an HTML form field or “search interface”. Yes?

OK, looks like YaCy already has this built in.

Look at this

From this page:

putting something in the “collection” field at the time the form is generated:

should, I assume, create the appropriate form element.

If I understand this right, just use a different word or term in the “collection” box and yacy will generate the HTML form with the appropriate “Get” info appended or a hidden form field, presumably.

I don’t actually see that result in the example, but again I assume.

I’m not at all sure that what I’m saying is actually correct. I don’t have an instance of YaCy going at the moment to test this.

edit:

I’m thinking that this collection feature would really help to sort out YaCy search results if people running YaCy actually understood what it was for and how to use it.

Say for example if everyone with YaCy on their, for example “Gardening” website, designated their gardening website as included in the gardening “collection” of websites.

I’m a little vague on how this is supposed to be implemented, but I get the idea that it is a very important but neglected feature.

I’m not certain that the word “findsomething” is actually to be literally included. It seems very cumbersome and unnecessary.

Probably the way this would be implemented is “search-term site:website-to-search”

Where “search-term” is the search term(s) and “website-to-search” is the URL of the website to be searched.

So searching for “onions” on gardenweb.com would look like:

onions site:http://www.gardenweb.com

This would find instances of “onions” on the site.

perhaps the http://www. part is not necessary.

Or would it be site:gardenweb.com onions ?

Presumably it works the same as Google

2 Likes

Everyone, disregard @Tom_Booth 's dreadful, heinous video :warning::exclamation: DO NOT GO TO GUGGLE DAWT COM :exclamation::warning: :rage:

1 Like

LOL.

I was certainly not saying to go to (the forbidden zone), but it appears YaCy utilizes much of the same technology and implements various things in the same way as most other search engines. The “Site:” advanced search apparently being one of them.

Orbiter stated:

the multi-site-search-in-one-YaCy thing should also be doable with the collection attribute. Thats the idea behind it: every crawl run is assigned to a specific collection by the user and a search can pick out by default only results from a given collection.

TheNomad11

Cool. How would the parameters in the search box look like if I also want to pick results from a specific collection?

All I have ever come across that might answer that question is to use “site:” but, that is just a stab in the dark.

1 Like

Now that I have YaCy running again, I did some experimenting, and indeed, it seems YaCy “site” search is implemented in exactly the same way.

I know for example that the word “map” appears several times on my website.
If I use YaCy to search for just “map” I get these results:

989 hits from here there and everywhere.

When I search with YaCy with just “site:peoplesresearchcenter.com map” in the search form field I get this:

Just 4 hits from my http://peoplesresearchcenter.com website and only that website.

Of course, this only works after the site has been crawled and indexed. It does not generally seem to work on the fly, without crawling and indexing the site first.

I don’t really know how “collections” factors into this.

Also I tested this with a fresh restart, with YaCy in Portal mode. I have not been able to verify if it works in community-based mode.

edit: verified, works in either mode, but only after the site is crawled and included in the local index, (or it may just be that these sites are not in any peer index)

The get query string looks like:

`

http://localhost:8090/yacysearch.html?query=site%3Amywebsite.com+term&Enter=&auth=&verify=iffresh&contentdom=text&nav=location%2Chosts%2Cauthors%2Cnamespace%2Ctopics%2Cfiletype%2Cprotocol%2Clanguage&startRecord=0&indexof=off&meanCount=5&resource=global&prefermaskfilter=&maximumRecords=10&timezoneOffset=300

`
where “mywebsite.com” is whatever site is the target of the site search.

%3A is simply the colon ( : ) URL encoded

“term” is the search term

everything after that seems to be just standard. The purpose of which all, I don’t have much of a clue.

including the http:// prefix does not seem to be necessary.

So the important part is:

http://localhost:8090/yacysearch.html?query=site%3Amywebsite.com+term

“Site:yourwebsite” could be made to just pre-populate the search text field.

instead of:

  <input type="text" name="query" value="" maxlength="80" 
           style="width:300px; font-size:16px; float:left;" />

do:

  <input type="text" name="query" value="site:yourdomain.com" maxlength="80" 
           style="width:300px; font-size:16px; float:left;" />

might work (notice the only change is to put site:yoursite in value="" instead of leaving it undefined (blank).

Or it might not work. I haven’t tried it. Not with YaCy anyway.

After giving my own suggestions a try, so far Ive not been able to get any of them to work. The browser wants to insert something different or standard rather than whatever is placed into a text or hidden field.

possibly some javascript could accomplish the trick, but I haven’t found any scripts for such a purpose.

I think I could very easily write a Perl script to accept the form input and modify it, then construct a link with all the parameters included in the Get request in the appropriate format, then forward the browser to that link. But it would have to run on a server with Perl support (which I have already available), but for that I need to also get YaCy up on a subdomain, which wont cost me anything.

Let you know how it all goes.

I’ve set up a page here: http://yacy-kiosk.calypso53.com/index.html

but it will be in flux for a while. and some weird behavior may be manifesting there, so use with caution, or better yet, best not to go there, but if very very daring with an insatiable curiosity, let me know what behavior is encountered.

Efforts are, of course being directed towards having two or more sites with their own “site search” utilizing just one YaCy instance. And why not? YaCy can index gazillions of websites and do a site search on all of them. why not just two or six or ten? without having to manually type “site:my-personal-domain-place.abracadabra” Visitors to a website can’t be expected to do that for every site search.

edit - continuation:

Well, actually, now that I’m using the “portal for your own webpages” I see that these features are already mostly already implemented.

If you are only getting results from a few websites, then there are check boxes that appear on the left to narrow the search to one website or the other. the “Site:” and url are auto-populated. There does not seem to be a way to do this in advance of a search though, and it doesn’t seem to be “sticky”, which I think could be desirable.

It seems if there is already a checkbox for this,…

Anyway, I worked on a Perl script and thought I almost had it working, but not quite.

Anyway, the exercise was fun(ish).

I’ve finally managed to build a website with an embedded Porteus+YaCy Kiosk.

I’ve been playing around with the search results sidebar (accessible through “Portal Configuration”) Getting it to look the same without direct access to some css file is challenging but I made two additional “About” columns for local resources, News and Events, and Area Attractions.

The search results are mostly limited to local websites in some way associated with the Fort Plain NY area.

I actually think YaCy is very very wonderful as a regional portal to run on a “virtual kiosk” or whatever this Porteus+YaCy amalgam has become. Porteus and YaCy complement each other and enhance each other and I think work very well together.

The entire Porteus/YaCy Kiosk is still running LIVE on a “dead” no-hardrive laptop that crashed about a year ago on a 32 Gig USB “pen drive”, sort of serving a domain to a subdomain and all remote configurable by logging into YaCy’s admin.

What I haven’t quite figured out how to do is access the YaCy and/or Porteus Kiosk files directly. I did attempt to set up “file access” during configuration, but am not sure how to actually get file access on a running kiosk, without powering it down and unpacking the ISO.

What sort of file explorer does Slackware use? maybe that could be installed on the Kiosk along with Firefox and YaCy, though it may be already and I just don’t know how to use it.