What became of YaCy's GSA interface? (collection feature)

in the process of exploring the mysteries of “collections”, which appears in the basic as well as advanced crawler. In advanced, in the information pop-up, there is a link to the GSA interface, which seemingly no longer eists:

Mention is made of GSA in the forum in The Story of YaCy Grid:

YaCy Grid: A Scalable Search Appliance

As YaCy does not only provide a rich, opensearch-based search API but also an implementation of the Google Search Appliance XML API . That means, YaCy Grid may be a drop-in replacement of existing GSA user. As Google abandoned the GSA, users should switch to YaCy Grid.

Yes, while I did a Solr upgrade the implementation of the GSA API in ‘legacy’ YaCy became faulty because it was based on an old Solr Index Writer API which was not there (or at least not the same any more) in the new Solr version. I decided to end maintenance of that API because it is still available in YaCy Grid.

The YaCy Grid implementation is independent from Solr because it uses Elasticsearch. The GSA API will also be further maintained in YaCy Grid. So as there was no user afaik for the GSA API in legacy YaCy, I believed that this can disappear.

If you need the GSA API, please turn to YaCy Grid.

So, absent a GSA interface in “legacy YaCy” does the “collection(s)” feature or field any longer have any lingering functionality or alternative implementation (in legacy YaCy)?

In a recent discussion, reference was made to “collections” for differentiating index content, and/or search results in some way.

As something that appears on even the most basic crawler interface, it appeared to be a core feature. Is “collections” entirely dependent on a GSA interface? Or can it still be utilized by a "site: " search, or other method?

Does the collections input field any longer serve any purpose or retain any functionality?

Don’t know as I “need” any “Google search appliance” gizmo. I have little idea about it, Just following your lead about “collections” and trying to understand YaCy’s various features i.e. “collections” etc.

YaCy Grid… I have no inkling about whatsoever.

The readme here: GitHub - yacy/yacy_grid_mcp: The YaCy Grid Master Connect Program for YaCy Grid seems rather elaborate.

On the other hand there is a paragraph that makes it sound quite easy:

How do I install the yacy_grid_mcp: Download, Build, Run

At this time, yacy_grid_mcp is not provided in compiled form, you easily build it yourself. It’s not difficult and done in one minute! The source code is hosted at GitHub - yacy/yacy_grid_mcp: The YaCy Grid Master Connect Program, you can download it and run loklak with:

> git clone https://github.com/yacy/yacy_grid_mcp.git
> cd yacy_grid_mcp
> gradle run

Loklak?

If I were to run that in Linux, does that make a useable YaCy something? Or is it also necessary to install all the other ingredients on that page: Apache ftp, Rabbitmq, erlang, etc.?

All these ports:

The default port number of the MCP is 8100

Other port numbers will be:

8200: webloader, a http(s) loader acting as headless browser which is able to enrich http with AJAX content
8300: webcrawler, a crawler which loads a lot of documents from web documents
8400: warcmanager, a process which combines single WARC files to bigger ones to create archives
8500: yacyparser, a parser service which turns WARC into YaCy JSON
8600: yacyenricher, a semantic enricher for YaCy JSON objects
8700: yacyindexer, a loader which pushes parsed/enriched YaCy JSON content to a search index
8800: aggregation, a search front-end which combines different index sources into one
8900: moderation, a search front-end which for content moderation, i.e. search index account management
10100: successmessages, a service which handles the successful operation messages
10200: errormessages, a service which handles failure messages and broken action chains
2121: ftp, a FTP server to be used for mass data / file storage
5672: rabbitmq, a rabbitmq message queue server to be used for global messages, queues and stacks
9300: elastic, an elasticsearch server or main cluster address for global database storage

Legacy YaCy is complicated, just installing YaCy grid seems more so. What are the hardware requirements? Would any Linux system run YaCy Grid?

there must be a misunderstanding. The collection-feature is not dependent on the GSA api. Removing the GSA api did not touch anywhere the collection functionality; the GSA function also never had any connection to collections. Where did you get that?

The first screenshot posted above, the info popup for the collections (lower right of screenshot) mentions GSA and links to a page in a “GSA” folder.

There was also something in a YaCy Wiki that seemed to make some connection.

collection = The name of a collection or a comma-separated list of collections. This collections can be used to separate search results into different subsets which is used with the GSA search interface using the ‘site’ parameter in the search request.

https://wiki.yacy.net/index.php/Dev:APICrawler

I certainly may be misreading or misinterpreting something. It’s all soup to me.

1 Like

@Tom_Booth, you’re not alone.
I experienced the same “surprise”, after updating YaCy.

I would be also happy to find a way to use the GSA/collection feature in legacy YaCy, when updating to 1.924.x.

Thanks for any pointer!

The collection feature has not been removed and is available in the normal search and also in the XML and json API of YaCy.

Just go to Portal Design → Search Page Layout → look at the screenshot:

You can add additional navigator fields and there is also the collection field. Just click on the +.

Next, you find the collection in the search result. If you click on the “API” button (upper right) you get an xml (essentially rss) of the result. If you furthermore replace the .rss extension with .json you get the same as json. Please watch how the search query is constructed with the collection attribute, you can do the same with an automated URL construction.