Future

Geoff

•

Sep 11, 2025

•

Future

When All You Have Is a Robots.txt Hammer

Nick Heer:

Robots.txt is an open standard that is specifically intended to communicate access rules. Thus, while an open web is averse to centralization and proprietary technologies, it does not necessarily mean a porous web. The open web does not necessarily come without financial cost to human users. I see no reason the same principle should not be applied to robots, too.

Therein lies the problem. Site authors can use open standards to restrict access to their content, but the approach for restricting incoming traffic from AI bots has the unintended effect of restricting access to human beings who use AI to navigate the open web. Remember, AI is another tool to surface content. It may be misused/abused in practice, but the philosophical drift of what we know as the open web should allow it.

It’s a convergence of concerns: What is an “open” web that restricts access against tools that extract the content that site owners create, maintain, and publish for use in proprietary services and platforms that are effectively walled gardens?

And iff you’re thinking that scraping open content is inherently wrong (there’s good reason for that), it’s worth mentioning that the Internet Archive itself is a giant scraper, albeit used for the noble purpose of archiving and preserving the web, which is constantly changing and evolving.

Websites like 404 Media have explicitly cited A.I. scraping as the reason for imposing a login wall. A cynical person might view this as a convenient excuse to collect ever-important email addresses and, while I cannot disprove that, it is still a barrier to entry. Then there are the unintended consequences of trying to impose limits on scraping. After Reddit announced it would block the Internet Archive, probably to comply with some kind of exclusivity expectations in its agreements with Google and OpenAI, it implied the Archive does not pass along the robots.txt rules of the sites in its collection. If a website administrator truly does not want the material on their site to be used for A.I. training, they would need to prevent the Internet Archive from scraping as well — and that would be horrible consequence.

This is the first time I’ve heard of the Really Simple Licensing (RSL) standard, which debuted yesterday:

One thing that might help, not suggested by Masnick, is improving the controls available to publishers. Today marked the launch of the Really Simple Licensing standard offering publishers a way to define machine-readable licenses. These can be applied site-wide, sure, but also at a per-page level. It is up to A.I. companies to adhere to the terms but with an exception — there are ways to permit access to encrypted material.

Compensation and attribution is the nail that the RSL hammer appears to be hitting. Unfortunately, that does nothing to preventing a move towards what Heer explains is the web splitting in two:

I, too, am saddened by an increasingly walled-off web, whether through payment gates or the softer barriers of login or email subscriptions.

Walled gardens. We’ve been concerned about them forever, but most notably with the emergence of Facebook and its propensity to restrict access to shared content by a login. The same is true, even of publishing platforms like Medium. It’s a curated version of the web that feels a lot like the AOL pattern of yesteryears. The difference is that we’re talking about the entire corpus of the open web scraped, repurposed, and redistributed in a completely separate corner of some other web.

0
Jay

•

Jul 5, 2025

•

Present, Future

Saving Search
battellemedia.com

I wrote about Bill Gross, and Goto.com once. He’s a fascinating individual that figured out the key to monetizing search years before Google would eventually copy him. He has a certain way of understanding technology as inevitable and rolling along with it, rather than trying to resist it.

He appears to have made the same determination about AI. I’m not sure I agree that we should give up on the resist part, but if anybody’s going to save at least some semblance of the open web from the onslaught of AI, it may very well be Bill Gross. John Batelle, who wrote the literal book on Google, appears to agree.

Batelle has taken an interest in Gist.AI, a new startup that grew out of Gross’ startup accelerator and that he is now at the helm at. Gross is approaching the problem of AI with his usual pragmatism, and proposing a solution that focuses on partnerships between publishers and AI search.

Those ravenous AI bots hoovering up websites at a rate of thousands of crawls a day? They’re shoplifting, Gross says. AI services should pay for the privilege of ransacking the open Internet, he argues. This concept – “pay per crawl” – has already taken root: Internet infrastructure giant CloudFlare has implemented a pay-per-crawl marketplace premised on a similar philosophy. Publishers that aren’t being paid by those data-hungry AI bots can now avail themselves of a free service from CloudFlare that blocks them at the door.

Batelle seems to seem think that Gist.AI might give publishers the tools to fight back against the larger AI companies. I’ve actually heard rumblings about Gist in the publisher world, so maybe he’s right. He certainly has been before.

0
Geoff

•

May 6, 2025

•

Future

christopher.org for the next 100 years
chriscoyier.net

I could chuck christopher.org on Pressable and it would have a good long life there surely, but now it’s tied to my own future death and legacy plan. Automattic has 100-year domains ($2,000) and 100-year hosting. ($38,000, includes domain). Jesse mentioned we could get christopher.org onto that as well.

Ari (Christopher’s partner in life and business), David (Christopher’s brother), and I talked it over and agreed it would be a good plan.

100 years! This is longer than any of us can promise good stewardship of Christopher’s digital footprint.

I imagine that maintaining someone else’s digital footprint following their death has to feel like a major responsibility. And while the thought of a 100-year domain and hosting bundle for $40,000 might seem ludicrous to those of us who are living, I get how relieving it would be to get those responsibilities off your plate and wipe your hands clean of that overhead for the rest of your life.

It’s not a bad idea in an estate plan, either. In addition to funeral costs, there’s something about planning in advance for what to do with your digital presence once you’re gone.

And Christopher, you should know I got your Grunt build process running again. You’re welcome. But I’m not going to fix those Sass warnings. They are just deprecation warnings, it’s fine.

LOL, I can only hope that someone would do me the favor of maintaining my outdated Gulp scripts and dependencies after I’m gone. Then again, a backwards-compatible web should ensure that I never have to worry about that, so long as my website is properly archived.

0
Jay

•

May 4, 2025

•

Past, Present, Future

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

1

Future

When All You Have Is a Robots.txt Hammer

Saving Search

christopher.org for the next 100 years

Hello world!