Jump to content

Recommended Posts


Posted
1 hour ago, Mathwiz said:

Unrelated to original problem, but WTF is this?

FWIW, r3dfox passes whatever this is and lets you in (eventually).

The WWW has become such an unpleasant place.

It's Cloudflare, Jim, but not as we know it. Anubis is another clunky piece of software developed by people who believe that only selected browsers are allowed to use the internet. It seems to be predicated on the idea that any non-conformant browser must be evil.

Ironically, https://forums.mozillazine.org/ are using the exact same thing.

Ben.

Posted

I can pass the anime girl hash quicker than CloudFlare in New Moon, and it doesn't come back for some time. Cloudflare doesn't like when I switch to another application while it is milling. It wants me to keep it in the foreground and look at it.

Posted (edited)
1 hour ago, Ben Markson said:

It's Cloudflare, Jim, but not as we know it. Anubis is another clunky piece of software developed by people who believe that only selected browsers are allowed to use the internet. It seems to be predicated on the idea that any non-conformant browser must be evil.

Ironically, https://forums.mozillazine.org/ are using the exact same thing.

Ben.

Anubis,the clunky piece of software developed by non-conformant browser-hating people protects those kind of websites to avoid AI crawling.

Edited by Leokids123
Posted
3 hours ago, Ben Markson said:

Anubis is another clunky piece of software developed by people who believe that only selected browsers are allowed to use the internet. It seems to be predicated on the idea that any non-conformant browser must be evil.

Anubis (from Egyptian mythology) was also the name of a villainous character on the Stargate SG-1 television series.

2 hours ago, Leokids123 said:

Anubis,the clunky piece of software developed by non-conformant browser-hating people protects those kind of websites to avoid AI crawling.

AI crawling sounds bad but I'm not sure why, what it is, how it differs from ordinary Web crawling, or why robots.txt cannot be relied on.

Posted (edited)
11 hours ago, Mathwiz said:

Anubis (from Egyptian mythology) was also the name of a villainous character on the Stargate SG-1 television series.

AI crawling sounds bad but I'm not sure why, what it is, how it differs from ordinary Web crawling, or why robots.txt cannot be relied on.

because AI crawlers don't respect robots.txt.

Edited by roytam1
Posted
10 minutes ago, roytam1 said:

because AI crawlers don't respect robots.txt.

... This only means further/wider proliferation of such MITM anti-crawler "services", with the web becoming even more "unpleasant" as time moves on (with a greater impact, of course, on older browser engines, on old/weak H/W); what's the point of having 500 (or more) Mbps fiber internet speeds when every tab you open in your browser will need additional time to pass complex JS tests? :angry:

FTR, I keep, for historical purposes, a "portable copy" of NM27 (not the last one published, though), which isn't able to pass the Anubis test (e.g., on RPO Gitea repos); and even Serpent 52 struggles passing that test, especially if I don't disable "security/privacy"-oriented solutions (extensions and/or userscripts); privacy also falls victim here, because you're given no other choice on such web-policing "services": either "open" your "privacy fences" or be denied access to the site you're after :angry: ...

This is all depression-inducing :} ...

Posted

@nicolaasjan

Are you the one that's maintaining yt-dlp for Windows 7? If so thanks for that, it works great with the MPC-BE player and should extend the player's life when they stop supporting it on Win 7. I also didn't know how useful yt-dlp is for streaming almost every video out there until recently. Can't live without it now:

https://github.com/nicolaasjan/yt-dlp

Posted
9 hours ago, roytam1 said:

because AI crawlers don't respect robots.txt.

I sort of figured, but why don't AI crawlers respect robots.txt, when other Web crawlers do? That's what I was really after.

Which leads to another question: why do public repos need to block AI crawlers so badly that Gitea resorted to Anubis to do the job? Is it a bandwidth issue or a legal one?

Posted
2 hours ago, Mathwiz said:

that Gitea resorted to Anubis to do the job?
Is it a bandwidth issue or a legal one?

RPO (repo.palemoon.org) is a server owned and paid-for by Moonchild himself; Gitea is just the version control software deployed on that private server to manage source development for UXP, Pale Moon and related projects; it's what is called a "private (self-hosted) Gitea instance" ;) ; on the Anubis test page itself, one can read:

Quote

 You are seeing this because the administrator of this website has set up Anubis to protect the server against the scourge of AI companies aggressively scraping websites. This can and does cause downtime for the websites, which makes their resources inaccessible for everyone.

Anubis is a compromise. Anubis uses a Proof-of-Work scheme in the vein of Hashcash, a proposed proof-of-work scheme for reducing email spam. The idea is that at individual scales the additional load is ignorable, but at mass scraper levels it adds up and makes scraping much more expensive.

Ultimately, this is a hack whose real purpose is to give a "good enough" placeholder solution so that more time can be spent on fingerprinting and identifying headless browsers (EG: via how they do font rendering) so that the challenge proof of work page doesn't need to be presented to users that are much more likely to be legitimate.

Please note that Anubis requires the use of modern JavaScript features that plugins like JShelter will disable. Please disable JShelter or other such plugins for this domain.

This website is running Anubis version 1.19.1.

... so I believe MC is mainly concerned with excessive bandwidth consumption; please also note that he had already implemented geolocation-based ACLs (access-control-lists) to counter automated DDoS attacks and, as a consequence, in Hong Kong (where our own roytam1 lives :P ) RPO is blocked :( ...

Posted

So it is a bandwidth issue. Fair enough.

I had no idea that AI crawling had become such a burden for Web servers. Still having a hard time grokking why the AI crawlers don't respect robots.txt though. AIUI, their purpose is just to gather content to train AI engines; surely there's plenty of content even without violating such a longstanding norm!

In any case, I question Anubis's assertion that "The idea is that at individual scales the additional load is ignorable." It took R3dfox v.139 several seconds to complete the challenge, to say nothing of UXP browsers. But I suppose there was a silver lining: MC probably had to ensure UXP could pass the challenge before using it to protect his own repo! It would be quite embarrassing if RPO couldn't be accessed by Pale Moon....

Posted (edited)
On 6/29/2025 at 1:15 AM, Leokids123 said:

Anubis,the clunky piece of software developed by non-conformant browser-hating people protects those kind of websites to avoid AI crawling.

I have a lot more run-ins with Cloudflare than I do Anubis. Both of them can get stuck in an infinite validation loop. Cloudflare can get insanely aggressive, it will quite happily lock your browser into an irretrievable Loop of Death (who would write code like that?). At least Anubis can easily be stopped.

I think the thing I object to the most is that they focus on the way a browser looks rather than what it is actually doing. In civil society this is characterised as profiling rather than being intelligence led.

All that will happen is that the bots will better disguise themselves and their behaviour will go unchecked.

Ben.

Edited by Ben Markson

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...