YaCy on a flash drive

I’ve been using MX linux on a bootable flash drive for about a year now. Out of curiosity I tried installing YaCy. No real problem with the installation. Everything seems to function… probably a little better and faster than running YaCy on Windows using the same laptop.

Tried searching for my local college website FMCC without any luck. Put the URL into the crawler and let it run a while but terminated as it was taking some time indexing several thousand pages. The college website has a lot of pages, more than I ever knew.

Searching for FMCC again produced a wealth of results.

Still running on the flash drive while making this post.

A personal portable search engine on a flash drive. Truely YaCy is “a completely useless piece of software”. Not!

2 Likes

By way of explanation, if needed for anyone reading this;

MX Linux is one of a few “LIVE” Linux operating systems (That run directly from a USB flash drive) that also has “persistence” which means updates, installed programs and files created on the USB can be saved between reboots and… that also has provisions for cloning the entire OS along with all the newly installed programs to an ISO file or directly to another USB.

MX Linux boots and runs great directly from the USB on any laptop I’ve plugged it into, generally much better and faster than the natively installed OS.

Anyway I was anxious to see if YaCy could run in such an environment. Seems it can.

I was then able to successfully clone the system to another Flash Drive, which I then booted up as a new install, choosing language, time zone, password etc, and now have a brand new MX Linux on a new BIGGER flash drive with YaCy pre-installed.

During installation to the new flash drive I pushed the persistence files to the maximum. (20Gigs each) which seems to have satisfied YaCy as I was getting some messages with the old Flash drive that had a much smaller persistence file that YaCy was pausing due to too little free space on the “hard drive”.

The only thing that is a slight bother is the Linux install does not have any start icon on the desktop or start menu so when the USB is removed and plugged into another computer or the system is just shut down, it becomes necessary to open a command prompt and start YaCy manually. For distribution, it would be nice to have a startup icon for YaCy on Linux.

I am having some difficulty opening the port to make my YaCy visible to other peers, but I don’t think that has anything to do with it’s running off a USB as I have the same problem with the windows installation on my laptop, probably because I’m connecting to the internet through one of those little mobile “hot spots” that connect wireless to the cell-phone tower.

Anyway, the whole system runs much nicer off a 3.0 flash drive than on the laptop’s hard drive. Don’t know about anyone else but this is very exciting to me. Just thought I’d let people know that it is possible to do, if everybody doesn’t already know it.

It was an experiment for me but perhaps it is old news to everyone else.

At shut down, MX Linux also gives the option to save to the persistence file or forget the session, as it runs, or can be made to run entirely in RAM. In other words, nothing gets written to the flash drive (unless intentionally saved at shutdown). I can’t think of anything that could be more secure. Of course if some new sites were spidered the changes can be saved if you choose to do so.

There are many “Live” Linux OS’s that can run on a flash drive but most don’t have a persistence option, or implementing persistence can be difficult. MX Linux is built for running on a USB with persistence, (Inherited from AntiX I believe) so they try to make it easy.

I Managed to figure out how to make a desktop icon and make it executable. Not as difficult as I thought. Just right click the desktop and open a utility and fill in the fields.

1 Like

Why did you choose MX linux?

Having YaCy on a USB stick within it’s own linux is quite a good idea. we should make a kind of USB stick release!

It would be nice to have one usb stick or VM with new Yacy Grid with ElastichSearch for test.

I didn’t choose MX Linux for YaCy, particularly, I’ve been distro-hopping, going through every Linux Distrobution on the chart on this Wiki Page: https://upload.wikimedia.org/wikipedia/commons/1/1b/Linux_Distribution_Timeline.svg that still exist, and some that aren’t on the chart. I haven’t tried them all yet though. Maybe 50 or 100 if you include ones I couldn’t even get running or get online with.

MX Linux is not the most stable version of Linux I’ve run but it has been optimized to be able to run the Compiz windows manager, which I love for the effects and for making tutorials as it allows live annotation on the desktop, MX will run virtually everything Linux, it has a lots of cool features. Most of all what I like is how easy it is to clone.

If you customize MX Linux and get everything running how you like with all the programs installed that you want it has a built in graphical utility to live copy the entire system to a fresh USB flash drive while it is still running and without rebooting, to make as many copies as you might want for backups or to share and another utility to copy the live system to an ISO for distrobution.

I have one flash drive with all my favorite Linux distros (about a dozen or so) using Easy2Boot’

http://www.easy2boot.com/

But MX is the one distribution that grew on me and that I have customized how I like and found worth putting onto its own high speed USB drive by itself to have as much “persistence” as possible. Oh and of course it HAS persistence by design which few Live distributions actually have. Most other Live distributions are for trial only and “forget” everything if rebooted.

I’ve been interested in, or have actually been in the process for a few years customizing a distribution based on MX to burn to an ISO and share online. Having YaCy included is like the final icing on the cake. It is like the whole internet in your pocket. Your own personal OS with your own personal Search Engine, It’s an ideal combination. Boot it up on any machine anywhere without using any online tracking, intrusive search engines and if you like, save the session or erase all tracks and reboot without saving as everything runs in ram until shut down, at which time it can either be saved or discarded.

I’m a bit command-line phobic. With MX I’ve been able to install everything I’ve been able to find on any other distribution and add it to my MX system using the graphical installer.

If you want “YaCy on a USB stick within it’s own linux”. its already done. I can burn it to an ISO and upload it somewhere and post a link here. Probably the MX Linux team would be interested in helping work out any bugs and getting it optimized.

I can see it being expected that every Linux distribution would include YaCy by default in the very near future.

1 Like

Out of curiosity, how many websites might be indexed on a USB? (or any other storage device for that matter.) On average per Meg or Gig of available storage.

I really have little or no idea about the storage space requirements of current web-indexing, other than Google: https://www.youtube.com/watch?v=XZmGGAbHqa0

By comparison I’m currently running Linux/YaCy, seemingly quite comfortably, on a USB stick. The laptop battery time estimate is 3.5 hours.

Watching that video about Google’s data centers, and knowing something about Stirling Engines I’m a bit aghast at the energy waste. All that heat just being dissipated by cooling towers?

In other words, how long might I expect to be able to spider and index the web using YaCy on a flash drive?

Typically, on average, how many websites can be indexed per Megabyte or Gig of storage space?

I wrote some very bare bone programs for indexing websites using faceted indexing and estimated It could index about 5 million / Gig. and could at least double that if using only conceptual indexing. That could just about index the entire internet on a 256 Gig flash drive, in theory anyway.

I’m interested in trying out/activating Geolocalization. I’m a little reluctant to do so on this USB stick, not knowing what the sizes of the files to be downloaded might be.

The instructions in the admin area note: “These libraries are not included in the main release of YaCy because they would increase the application file too much.” I may have, probably about 20 Gigs of free space allocated for storage on the flash drive running the MX Linux / YaCy.

Of course I suppose I could just grit my teeth and click the LOAD buttons and see what happens but… don’t really want to crash the system or something.

I should probably go out and spring for another high speed / high capacity flash drive and make another clone just for experimenting. Anyway, in the mean time if anyone can tell me about how big these files might be that would be greatly appreciated.

I’m asking this here because it is mostly a “running YaCy on a flash drive” specific question, but I might also start a Geolocalization topic area for additional, more general questions.

Well I got brave and went ahead and tried loading the files. Starting with the smallest. Only cities with populations over 100,000. That went well, loading surprisingly fast, I tried it out and searched a few cities and maps appeared in the search, so I went ahead and loaded the next larger file, cities with 5,000 or more was it? Anyway that took much much longer to load, but still, reasonably fast. I havn’t been brave enough to load the big file of all cities with populations of 1000. I think I’ll check my free disk space first. Perhaps do a remaster, because, with MX Linux, before doing a remaster it maintains a backup copy of the OS in the event something goes wrong and you need to do a roll-back. Remastering combines and compresses the files into one new Master ISO… which frees up a lot of disk space.

Sometimes if you have too much new data there will not be enough room left to perform the remastering and some files will have to be deleted.

Anyway, to show this is working so far, here is a screen shot of a search for Albany.

I’m still booted up in YaCy-MX on a 3.0 64 Gig flash drive and the whole system is still running smooth.

Well, apparently there is still more than 50 Gigs of free disk space an the flash drive, but I haven’t shut down, so the new data hasn’t been written to the disk yet.

disk_space_before_saving

1 Like

OK, shut down and rebooted and still works. YaCy did seem to take a little longer than usual to start up.

This time searched for Hong Kong

Odd though, the reported free disk space has not changed. Not sure how to account for that. Perhaps MX Linux reports free disk space in a way that accounts for what is in RAM that will be written to disk. I don’t really know.

Regardless, it seems it may be safe to go ahead and load the last file. Still have not done any remaster on this drive though.

Enough for today. I’ll report on how it goes tomorrow.

Personally I find this quite amazing and exciting that it could even be possible to have anything like this running off a flash drive and using, so far, what seems like a minimal amount of disk space.

1 Like

Thinking more on the subject, it certainly could be worthwhile to create a USB stick release of YaCy, pre-configured for usability/security/optimized in various ways, however that might be.

That would be completely over my head I’m afraid though. MX Linux might not be the best option.

Since the new yacy grid (with elasticsearch) is not available in a easy installer as the yacy 1.92 ( such as in windows), I would be glad to see it in a pre-configured linux install.

I don’t know anything about, or completely understand YaCy Grid, and if it is centralized, may not be interested in it, but if it runs on Linux, it should be possible to clone using dd if nothing else.

That is, get it working on whatever Linux, they all have dd, then copy the entire system with YaCy (or presumably YaCy Grid or any other desired programs) to a bootable Flash Drive or DVD.

I’m quite sure this works with YaCy peer-to-peer as I’ve already cloned my own MX Linux System running YaCy to a larger flash drive. Which I’m running now.

I don’t know, however, what YaCy Grid requires for hardware, memory, free disk space etc.

Yacy Grid uses ElasticSearch instead of Solr. I have a lot of pdf that werent indexed, even beeing searchable pdfs. So I would like to try Yacy Grid, but its kind of tricky for who doesnt know linux as well to configure it and its dependences. Would be nice to have it done, at least for test.

Installing YaCy Grid does seem quite involved. If it can run on a Linux desktop, I would think it should be possible to clone the entire setup, but, looking into it very briefly and superficially, It appears that the only (?) running instance of YaCy Grid currently is on some kind of cloud computing thing.