Noob question: Can one installation power the site search of two different websites?

I would think it would be a trivial matter to use the environment variables, assuming the site search form field is on the same page as the page searched, and YaCy has some means of sorting and returning search results by domain.

Have the forms point to some intermediary script like: if http referrer = domain 1 do this, if http referrer = domain 2 do that.

Or, better, if referrer match referrer. (Only return results where the associated url matches the referrer url).

Basically, if YaCy stores searches by domain, in some way linked to or associated with a domain, (I donā€™t know how it could be otherwise), then certainly it should be possible to sort results by domain according to which domain the request came from which could be read from the environment variables.

Iā€™m rather certain I could implement some such thing in Perl, but I donā€™t know Java and donā€™t know enough about YaCy internals to implement it, but that is probably how I would do it if YaCy was written in Perl, or if I knew Java.

Those are my thoughts anyway.

I think by simply matching the referrer, one instance of YaCy could handle site search requests coming from any domain anywhere. Couldnā€™t it?

Thanks, but I canā€™t follow you here, I am no coder (yet), I am sorry. Any chance that a non-developer can make it work?

Poking around, I find this, which I thought I had seen before but wasnā€™t sure, but YaCy apparently has, like pretty much all search engines, a built in site search filter. I say apparently, only because Iā€™ve never actually made use of it myself, but I assume it works, being that it is mentioned in the documentation: https://wiki.yacy.net/index.php/En:SearchParameters

site:
findsomething site:yacy.net will limit the results to the domain yacy.net

Assuming this is sent as a ā€œGetā€ request to YaCy from the search form field, if you donā€™t want to have to, or donā€™t want visitors to your site to have to type that in, assuming that this function is actually implemented in YaCy, which Iā€™m not 100% certain of, I assume this could be appended to the Get request in one way or another.

What I mentioned above (in my previous post) was a possible feature that I think could theoretically be incorporated into YaCy by the developers, (Which Iā€™m not one of, as I donā€™t know Java), but I believe the respective search forms could be modified in such a way as to add the necessary switch i.e the term: findsomething site:your-first-website to the form on your-first-site and finsomething site:your-second-website to the form on your other site.

Possibly this could be achieved very simply by adding a query string to the get request or by including a ā€œhidden fieldā€ to the HTML search form on each respective website.

In other words, Yacy generates the search field which looks something like:

<form method="get" action="http://127.0.0.1:8090/yacysearch.html">
    <input type="hidden" name="maximunRecords" value="20" />

etc.

That could be changed to something like:

<form method="get" action="http://127.0.0.1:8090/yacysearch.html?query=findsomething%20site:http://www.my-first-site.com">

All in one line, same as above.

or perhaps alternatively the parameter could be passed as an additional hidden field, something like:

<input type="hidden" name="site" value="http://www.my-first site.com" />

One way or the other this, or something similar SHOULD mimic or substitute for actually typing the parameter into the text field manually.

For me to say EXACTLY what this would look like, I would have to make up a little Perl script to see just what it is YaCy is receiving when findsomething site:your-first-website.com is typed into the form text field, then try those methods to see which works and exactly what syntax or URL encoding to use to reproduce the same results as actually typing it in.

But Iā€™m not actually entirely sure that this site search feature is working or functional or activated in the current distribution of YaCy, but Iā€™m assuming.

It you can confirm that the ā€œstandardā€ method for doing a site search works as described in the documentation cited above, and maybe post a copy of your form so I can use it to send the data to my script I might be able to figure it out.

The Perl script would not be much more than ā€œechoā€ just so I can see what Yacy is receiving from the form.

Also it might help to know the actual URLā€™s of your sites where the actual forms are located, but strictly speaking not necessary. Just as long as you know when I write your-site-dot-com or whatever, you have to make that your actual site URL.

Hmmmmā€¦

Sort of coincidentally, this seems to be related to collections:

collection = The name of a collection or a comma-separated list of collections. This collections can be used to separate search results into different subsets which is used with the GSA search interface using the ā€˜siteā€™ parameter in the search request.

Donā€™t really know what all that means, but I see ā€œā€˜siteā€™ parameterā€ in there, which I believe is the same thing. Right?

https://wiki.yacy.net/index.php/Dev:APICrawler

Edit: GSA=Google Search Appliance?

In other words, an HTML form field or ā€œsearch interfaceā€. Yes?

OK, looks like YaCy already has this built in.

Look at this

From this page:

putting something in the ā€œcollectionā€ field at the time the form is generated:

should, I assume, create the appropriate form element.

If I understand this right, just use a different word or term in the ā€œcollectionā€ box and yacy will generate the HTML form with the appropriate ā€œGetā€ info appended or a hidden form field, presumably.

I donā€™t actually see that result in the example, but again I assume.

Iā€™m not at all sure that what Iā€™m saying is actually correct. I donā€™t have an instance of YaCy going at the moment to test this.

edit:

Iā€™m thinking that this collection feature would really help to sort out YaCy search results if people running YaCy actually understood what it was for and how to use it.

Say for example if everyone with YaCy on their, for example ā€œGardeningā€ website, designated their gardening website as included in the gardening ā€œcollectionā€ of websites.

Iā€™m a little vague on how this is supposed to be implemented, but I get the idea that it is a very important but neglected feature.

Iā€™m not certain that the word ā€œfindsomethingā€ is actually to be literally included. It seems very cumbersome and unnecessary.

Probably the way this would be implemented is ā€œsearch-term site:website-to-searchā€

Where ā€œsearch-termā€ is the search term(s) and ā€œwebsite-to-searchā€ is the URL of the website to be searched.

So searching for ā€œonionsā€ on gardenweb.com would look like:

onions site:http://www.gardenweb.com

This would find instances of ā€œonionsā€ on the site.

perhaps the http://www. part is not necessary.

Or would it be site:gardenweb.com onions ?

Presumably it works the same as Google

2 Likes

Everyone, disregard @Tom_Booth 's dreadful, heinous video :warning::exclamation: DO NOT GO TO GUGGLE DAWT COM :exclamation::warning: :rage:

2 Likes

LOL.

I was certainly not saying to go to (the forbidden zone), but it appears YaCy utilizes much of the same technology and implements various things in the same way as most other search engines. The ā€œSite:ā€ advanced search apparently being one of them.

Orbiter stated:

the multi-site-search-in-one-YaCy thing should also be doable with the collection attribute. Thats the idea behind it: every crawl run is assigned to a specific collection by the user and a search can pick out by default only results from a given collection.

TheNomad11

Cool. How would the parameters in the search box look like if I also want to pick results from a specific collection?

All I have ever come across that might answer that question is to use ā€œsite:ā€ but, that is just a stab in the dark.

1 Like

Now that I have YaCy running again, I did some experimenting, and indeed, it seems YaCy ā€œsiteā€ search is implemented in exactly the same way.

I know for example that the word ā€œmapā€ appears several times on my website.
If I use YaCy to search for just ā€œmapā€ I get these results:

989 hits from here there and everywhere.

When I search with YaCy with just ā€œsite:peoplesresearchcenter.com mapā€ in the search form field I get this:

Just 4 hits from my http://peoplesresearchcenter.com website and only that website.

Of course, this only works after the site has been crawled and indexed. It does not generally seem to work on the fly, without crawling and indexing the site first.

I donā€™t really know how ā€œcollectionsā€ factors into this.

Also I tested this with a fresh restart, with YaCy in Portal mode. I have not been able to verify if it works in community-based mode.

edit: verified, works in either mode, but only after the site is crawled and included in the local index, (or it may just be that these sites are not in any peer index)

The get query string looks like:

`

http://localhost:8090/yacysearch.html?query=site%3Amywebsite.com+term&Enter=&auth=&verify=iffresh&contentdom=text&nav=location%2Chosts%2Cauthors%2Cnamespace%2Ctopics%2Cfiletype%2Cprotocol%2Clanguage&startRecord=0&indexof=off&meanCount=5&resource=global&prefermaskfilter=&maximumRecords=10&timezoneOffset=300

`
where ā€œmywebsite.comā€ is whatever site is the target of the site search.

%3A is simply the colon ( : ) URL encoded

ā€œtermā€ is the search term

everything after that seems to be just standard. The purpose of which all, I donā€™t have much of a clue.

including the http:// prefix does not seem to be necessary.

So the important part is:

http://localhost:8090/yacysearch.html?query=**site%3Amywebsite.com+term**

ā€œSite:yourwebsiteā€ could be made to just pre-populate the search text field.

instead of:

  <input type="text" name="query" value="" maxlength="80" 
           style="width:300px; font-size:16px; float:left;" />

do:

  <input type="text" name="query" value="site:yourdomain.com" maxlength="80" 
           style="width:300px; font-size:16px; float:left;" />

might work (notice the only change is to put site:yoursite in value=ā€œā€ instead of leaving it undefined (blank).

Or it might not work. I havenā€™t tried it. Not with YaCy anyway.

After giving my own suggestions a try, so far Ive not been able to get any of them to work. The browser wants to insert something different or standard rather than whatever is placed into a text or hidden field.

possibly some javascript could accomplish the trick, but I havenā€™t found any scripts for such a purpose.

I think I could very easily write a Perl script to accept the form input and modify it, then construct a link with all the parameters included in the Get request in the appropriate format, then forward the browser to that link. But it would have to run on a server with Perl support (which I have already available), but for that I need to also get YaCy up on a subdomain, which wont cost me anything.

Let you know how it all goes.

Iā€™ve set up a page here: http://yacy-kiosk.calypso53.com/index.html

but it will be in flux for a while. and some weird behavior may be manifesting there, so use with caution, or better yet, best not to go there, but if very very daring with an insatiable curiosity, let me know what behavior is encountered.

Efforts are, of course being directed towards having two or more sites with their own ā€œsite searchā€ utilizing just one YaCy instance. And why not? YaCy can index gazillions of websites and do a site search on all of them. why not just two or six or ten? without having to manually type ā€œsite:my-personal-domain-place.abracadabraā€ Visitors to a website canā€™t be expected to do that for every site search.

edit - continuation:

Well, actually, now that Iā€™m using the ā€œportal for your own webpagesā€ I see that these features are already mostly already implemented.

If you are only getting results from a few websites, then there are check boxes that appear on the left to narrow the search to one website or the other. the ā€œSite:ā€ and url are auto-populated. There does not seem to be a way to do this in advance of a search though, and it doesnā€™t seem to be ā€œstickyā€, which I think could be desirable.

It seems if there is already a checkbox for this,ā€¦

Anyway, I worked on a Perl script and thought I almost had it working, but not quite.

Anyway, the exercise was fun(ish).

Iā€™ve finally managed to build a website with an embedded Porteus+YaCy Kiosk.

Iā€™ve been playing around with the search results sidebar (accessible through ā€œPortal Configurationā€) Getting it to look the same without direct access to some css file is challenging but I made two additional ā€œAboutā€ columns for local resources, News and Events, and Area Attractions.

The search results are mostly limited to local websites in some way associated with the Fort Plain NY area.

I actually think YaCy is very very wonderful as a regional portal to run on a ā€œvirtual kioskā€ or whatever this Porteus+YaCy amalgam has become. Porteus and YaCy complement each other and enhance each other and I think work very well together.

The entire Porteus/YaCy Kiosk is still running LIVE on a ā€œdeadā€ no-hardrive laptop that crashed about a year ago on a 32 Gig USB ā€œpen driveā€, sort of serving a domain to a subdomain and all remote configurable by logging into YaCyā€™s admin.

What I havenā€™t quite figured out how to do is access the YaCy and/or Porteus Kiosk files directly. I did attempt to set up ā€œfile accessā€ during configuration, but am not sure how to actually get file access on a running kiosk, without powering it down and unpacking the ISO.

What sort of file explorer does Slackware use? maybe that could be installed on the Kiosk along with Firefox and YaCy, though it may be already and I just donā€™t know how to use it.

Update: I was able to make it work: One Yacy installation and multiple search forms each only for its own domain.

One corona winter later I took a look at Yacy again. Searching by collection does not work.

BUT:

After dealing with common setup issues like the search index suddely disappearing and other stuff, I looked at the search form that Yacy kindy provides ut under ā€œPortal configurationā€ - ā€œSearch Box Anywhereā€. And there is it. The important input field, the urlmaskfilter

<input type="hidden" name="urlmaskfilter" value=".*" />

Enter your domain (RegEx) in the value field and only the search results for this domain are displyed, perfect! So easy!

<input type="hidden" name="urlmaskfilter" value=".*mydomain.com.*" />

1 Like