• Nick Heer:

    Robots.txt is an open standard that is specifically intended to communicate access rules. Thus, while an open web is averse to centralization and proprietary technologies, it does not necessarily mean a porous web. The open web does not necessarily come without financial cost to human users. I see no reason the same principle should not be applied to robots, too.

    Therein lies the problem. Site authors can use open standards to restrict access to their content, but the approach for restricting incoming traffic from AI bots has the unintended effect of restricting access to human beings who use AI to navigate the open web. Remember, AI is another tool to surface content. It may be misused/abused in practice, but the philosophical drift of what we know as the open web should allow it.

    It’s a convergence of concerns: What is an “open” web that restricts access against tools that extract the content that site owners create, maintain, and publish for use in proprietary services and platforms that are effectively walled gardens?

    And iff you’re thinking that scraping open content is inherently wrong (there’s good reason for that), it’s worth mentioning that the Internet Archive itself is a giant scraper, albeit used for the noble purpose of archiving and preserving the web, which is constantly changing and evolving.

    Websites like 404 Media have explicitly cited A.I. scraping as the reason for imposing a login wall. A cynical person might view this as a convenient excuse to collect ever-important email addresses and, while I cannot disprove that, it is still a barrier to entry. Then there are the unintended consequences of trying to impose limits on scraping. After Reddit announced it would block the Internet Archive, probably to comply with some kind of exclusivity expectations in its agreements with Google and OpenAI, it implied the Archive does not pass along the robots.txt rules of the sites in its collection. If a website administrator truly does not want the material on their site to be used for A.I. training, they would need to prevent the Internet Archive from scraping as well — and that would be horrible consequence.

    This is the first time I’ve heard of the Really Simple Licensing (RSL) standard, which debuted yesterday:

    One thing that might help, not suggested by Masnick, is improving the controls available to publishers. Today marked the launch of the Really Simple Licensing standard offering publishers a way to define machine-readable licenses. These can be applied site-wide, sure, but also at a per-page level. It is up to A.I. companies to adhere to the terms but with an exception — there are ways to permit access to encrypted material

    Compensation and attribution is the nail that the RSL hammer appears to be hitting. Unfortunately, that does nothing to preventing a move towards what Heer explains is the web splitting in two:

    I, too, am saddened by an increasingly walled-off web, whether through payment gates or the softer barriers of login or email subscriptions.

    Walled gardens. We’ve been concerned about them forever, but most notably with the emergence of Facebook and its propensity to restrict access to shared content by a login. The same is true, even of publishing platforms like Medium. It’s a curated version of the web that feels a lot like the AOL pattern of yesteryears. The difference is that we’re talking about the entire corpus of the open web scraped, repurposed, and redistributed in a completely separate corner of some other web.

  • I love the web, but I also spent a large part of my life loving movies too. And it feels like I’ve continuously witnessed an industry that has tried to put cinema to death. It’s happened a few times, and the latest cycle of never-ending franchises stamped out to appeal to the largest possible global audiences seems like the latest attempt.

    Today I saw this trailer, which was co-written by Kareem Rahma of Subway Takes fame:

    Without having seen it already, I can tell you that there’s at least something about it I love. It’s this wonderful, small-market, self-contained indie flick that clearly has something to say and a style to try to do it in. It’s the kind of film that Noah Baumbach and the Duplass brothers were making when I was coming up in that world and falling in love with movies.

    You can’t kill cinema. The industry will try, trust me. But people won’t let it happen. They will just keep making films that they love and that hope others love. They will make it with heart. They will pour themselves into their art. Every time you try to knock cinema down a new generation of filmmakers will bring back to life. It’s unkillable.

    The industry tries to kill the web off from time to time as well. I’ve certainly witnessed plenty of those cycles as well. But it’s unkillable too.

  • I saw this on Bluesky recently.

    And I think it gets to the root of what I don’t like about AI. It’s not about how good it is at certain things, or how magical it can feel, or whether it not it can trick people into thinking a human might of been behind something it created. Honestly, I don’t really care. Human beings are incredible, and we are worth far more than machines.

  • I wrote about Bill Gross, and Goto.com once. He’s a fascinating individual that figured out the key to monetizing search years before Google would eventually copy him. He has a certain way of understanding technology as inevitable and rolling along with it, rather than trying to resist it.

    He appears to have made the same determination about AI. I’m not sure I agree that we should give up on the resist part, but if anybody’s going to save at least some semblance of the open web from the onslaught of AI, it may very well be Bill Gross. John Batelle, who wrote the literal book on Google, appears to agree.

    Batelle has taken an interest in Gist.AI, a new startup that grew out of Gross’ startup accelerator and that he is now at the helm at. Gross is approaching the problem of AI with his usual pragmatism, and proposing a solution that focuses on partnerships between publishers and AI search.

    Those ravenous AI bots hoovering up websites at a rate of thousands of crawls a day? They’re shoplifting, Gross says. AI services should pay for the privilege of ransacking the open Internet, he argues. This concept – “pay per crawl” – has already taken root: Internet infrastructure giant CloudFlare has implemented a pay-per-crawl marketplace premised on a similar philosophy. Publishers that aren’t being paid by those data-hungry AI bots can now avail themselves of a free service from CloudFlare that blocks them at the door. 

    Batelle seems to seem think that Gist.AI might give publishers the tools to fight back against the larger AI companies. I’ve actually heard rumblings about Gist in the publisher world, so maybe he’s right. He certainly has been before.

  • Matthew Phillips:

    Furthermore, the nature of engagement itself has been subtly reshaped. Algorithms often favor content that elicits strong, quick reactions – the kind that can be easily signaled with an emoji or a “reaction thumbnail.” Nuanced discussion and thoughtful communication, the traditional hallmarks of blog comment sections and the communities around them, take a backseat to attention-grabbing, often polarizing, content. The algorithm, in its quest for maximum engagement, can inadvertently filter out the very depth and thoughtfulness that blogs once championed.

    Look around at the average blog and you’re about to notice the dearth of comments. It wasn’t long ago that the number of comments on a post was a solid indicator that it struck a chord. As someone who runs a long-running blog, I’ve seen this happen in seemingly real time, and all you have to do is compare an article from 2015 like this to a similarly provocative article from 2025 like this. Reactions are the new currency and they’re happening on some other platform.

    I‘ve toe-dipped into the IndieWeb over on my personal blog because it helps mitigate some of the silence by porting social media interactions into the site so they can be rendered in both places. It’s sorta like a modern-day version of Pingbacks. But it also feels like a FOMO-driven response because the interactions are happening elsewhere and all I’m doing is collecting artifacts. I’m still required to engage outside of my web home and feed the algorithm.

  • Cleanse your palette with a collection of Abandoned Blogs, curated by Lucy Pham. I truly don’t remember where I stumbled on this anymore, which I suppose is fitting. But it’s an incredible testament to the vastness and peculiarities of the web. Fight back. Build a blog. Make it awesome (but don’t abandon it).

  • Samuel Arbesman is publishing a new book about code, but with a different slant than what you’ll usually see. He once said this about code:

    I like to think of code as a sort of reverse centrifuge, spinning huge numbers of topics together and intimately connecting them. These topics range from our attempts to model the world, the nature of history, how we think and use language (both natural and computational), to even biology, philosophy, and serendipity.

    This book is an attempt to peel back that curtain and lay bare the artifice that’s been constructed around computing. And to have some fun. And those are some very lofty goals, and sorely needed right now, as the programming world spends much of its time trying to abstract away the best bits of itself.

    This one is shooting to the top of my list.

  • The Internet Phonebook is sold out. There should be more copies in stock soon though. It’s a cool idea from Kristoffer Tjalve and Elliott Cost that curates a directory of lovely personal websites into a physical directory and book you can carry around with you. Each site has a phone number that, when dialed through the phonebook’s dial-a-site feature, will direct you to the right place.

    This is paired with some lovely essays that give you a chance to feel the weight of( a corner) of the Internet in the real world. I love any opportunity to bring a caring side of the web out of our screens and out into the world.

    That phone to web connection makes me think of net artist Heath Bunting, who created an online directory of phone numbers for payphones at King’s Cross Station in London. Visitors to the site were encouraged to call around 5PM for maximum effect and to connect with other web citizens that might drift towards the phone at that time.

    The web and the real world are the same thing. I like projects that acknowledge that.

  • Yes, yes, it’s Global Accessibility Awareness Day. While that’s deservedly today’s focal point, it shouldn’t go unnoticed that the W3C published a set of Privacy Principles as well:

    This document is intended to help its audiences address privacy concerns as early as possible in the life cycle of a new web standard or feature, or in the development of web products. Beginning with privacy in mind will help avoid the need to add special cases later to address unforeseen but predictable issues or to build systems that turn out to be unacceptable to users.

    There are 30 principles (and sub-principles) in all. A few choice selections, starting with restricting the sort of data that is transferred around to what’s strictly necessary:

    • Principle 2.2.1Sitesuser agents, and other actors should restrict the data they transfer to what’s either necessary to achieve their users’ goals or aligns with their users’ wishes and interests.

    People have rights when the data is about them:

    This one’s particularly damning to browsers and marketers:

    Principle 2.9.2User agents and sites must take steps to protect their users from abusive behaviour, and abuse mitigation must be considered when designing web platform features.

    And let’s ditch legal jargon when explaining how data is handled:

    Principle 2.11.2: Information about privacy-relevant practices should be provided in both easily accessible plain language form and in machine-readable form.

    How many times have you agreed to or confirmed cookie notices? Wouldn’t it be great to have access to your choices after the fact?

    Principle 2.12.3: It should be as easy for a person to check what consent they have given, to withdraw consent, or to opt out or object, as to give consent.

    Lastly, let’s make sure we don’t punish someone for wanting to protect their privacy:

    Principle 2.14Actors must not retaliate against people who protect their data against non-essential processing or exercise rights over their data.

  • It’s the 13th Global Accessibility Awareness Day today. A good reminder that there are a lot of web folks who care a hell of a lot about this kind of thing. And it’s the little things, in aggregate, that can help us shake off efforts that are hostile to accessible experiences.