All this discussion about captchas raises a question for me: if fingerprinting is so accurate and easy, that ublock, no cookies and a VPN don't help... then why the fuck do I have to keep doing captchas?
Is that why I can no longer go from a web search (eg.: DDG, Ecosia) or forum link to StackOverflow without going through three CF captchas? If AI had not killed SO for me before, this does.
Yeah, it's only anecdotal but I feel like hobbyists like us, who do slightly unusual things without nefarious intent, who are the ones who get hit with these sorts of issues the most. For example, I've noticed that some websites start throwing captchas at me or even just straight-up refuse to load with 403: unauthorized errors because I have my router set up to load-balance across two Internet connections. (At least, that's my guess as to why it's happening.)
For example, I’ve noticed that some websites start throwing captchas at me or even just straight-up refuse to load with 403: unauthorized errors because I have my router set up to load-balance across two Internet connections. (At least, that’s my guess as to why it’s happening.)
I maintain several multi-wan commercial setups and they don't have this problem. I obviously don't know what your setup is but I'd guess something is wrong with how its handling flows / connections. Once a connection is established between your edge and an internet resource that flow should remain "stuck" to whatever wan port it started with and it sounds like that isn't happening.
I've seen captchas for years before the recent influx of AI. It's the way I go about obfuscating network activities that the site security cannot determine if I am a bot on not. There is a Captcha Buster extension for Firefox. If the captcha is 'Pick the three busses from these blurry, pixelated set of pictures' then I can solve those easily. It's when the captcha is a full page of a motorcycle and you have to check all the relevant pieces, then on to the next full picture, that chap me. So you click Captcha Buddy and it 'listens' to the audio portion of the captcha, then solves it. It's not 100% on all types of captchas, but it 90% of the time it works every time. It's interesting to me that after a while, you start to notice patterns in the captcha images. For instance if the directions are 'Pick the fire hydrants', there will be at least 5 you have to pick. Crosswalks are the same way too.
I'd much rather have to do captchas than have my jimmy out in the ether traffic. Anecdotal, but Stack Overflow doesn't trigger a captcha for me. All I get is the cookie popup.
@irmadlad@lambalicious I just manually do the audio captcha. Every time. Because the picture captchas often don't work correctly for me.
It does bug me a little that I don't know what the audio captcha is being used for - am I helping an amazon echo transcribe whatever it is surreptitiously listening to?
How does it differentiate an "AI crawler", from any other crawler?
Search engine crawler?
Someone monitoring data to offer statistics?
Archiving?
This is not good. They are most likely doing the crawling themselves and them selling the data to the best bidder. That bidder could obviously be openAI for all we know.
They just know that introducing the sentence "this is anti AI" a lot of people is not going to question anything.
Seen plenty of people who think this is a bad thing, do they just want everything to be crawled. I mean I don't think this is the saviour but it has got to be better than wholesale theft
Yes. Web crawling has been a normal and vital part of the web from day 1. We'd have no search engines without crawlers.
The web is user-centric by design. I'm sick of tech companies trying to flip the script and hoard information, most of which is not theirs to begin with (e.g. Google, Reddit, Twitter, Facebook, etc.).
I don’t think this blocks crawlers. About 1/5 websites uses cloudflare, the significant thing here’s is that AI scraping is now blocked by default on most of those sites, NOT crawling