Perhaps not particularly related to the topic, but various services/platforms seem to be using “shortening” or some other form of substitution, sometimes lengthening… apparently for the purpose of controlling access to links posted as a form of “safeguard”, to “protect” platform members from “inappropriate” or potentially “dangerous” content that might lead to wrong think. or “misleading” news articles and no doubt for tracking and advertising purposes, data collection etc.
Generally, these “redirects” have no content of their own, so I’m not sure what YaCy might want to index anyway, but Google seems to sometimes, if not often or always, displays a website within a frame of some sort so the URL in the nav bar is actually something like https-googles-url-long-string-of-gibberish-:http:-actual-website-url-more-jibberish…
This extended or bracketed, or frame encapsuled url is another case.
I’ve often had to strip off the surrounding garbage when trying to copy/paste links to a forum so as to bypass Google’s apparent tracking/highjacking of or whatever one wishes to call it, links posted, which may subsequently spread like a virus as people share the link to some news story or whatever.
There again, I’m not sure Google’s “frame” or whatever it is bracketing the actual url has any REAL content to index.
This sort of link hijacking/redirection/tracking seems to be getting more and more prevalant whether it involves apparent shortening or lengthening of the actual url.
I personally have been infuriated when I post a link to share, then go back and find that twitter/facebook/youtube/google/whomever has programmatically inserted some link that is not the link I posted but some filter or redirection or tracking.
So, I guess the question is, how should, or how might, or how does YaCy handle such links.
I’m also curious.
I know there are services like tinyurl that may be in a different category, that people intentionally use, for one reason or another. Again, though, not THE REAL url to the actual resource.
I tend to think of all such url impostering as just so much garbage and internet congestion blocking the free flow of information and should probably just be stripped away and discarded if possible.
The entire domain name system itself is a kind of redirect on top of the actual ip
Perhaps a solution might be to discard any and all links that do not resolve to an actual ip address, though I cannot offer anything regarding how that could be implemented.
I don’t think YaCy should in any way, directly or indirectly, support such third party tracking and data mining by indexing links that would redirect traffic through such a link shortening “service”. regardless of how seemingly innocuous.
After reading through these policies, I’m not sure there are any that could be considered entirely innocuous. Sorry to say.