Geoff

Geoff

•

Sep 11, 2025

•

Future

When All You Have Is a Robots.txt Hammer

Nick Heer:

Robots.txt is an open standard that is specifically intended to communicate access rules. Thus, while an open web is averse to centralization and proprietary technologies, it does not necessarily mean a porous web. The open web does not necessarily come without financial cost to human users. I see no reason the same principle should not be applied to robots, too.

Therein lies the problem. Site authors can use open standards to restrict access to their content, but the approach for restricting incoming traffic from AI bots has the unintended effect of restricting access to human beings who use AI to navigate the open web. Remember, AI is another tool to surface content. It may be misused/abused in practice, but the philosophical drift of what we know as the open web should allow it.

It’s a convergence of concerns: What is an “open” web that restricts access against tools that extract the content that site owners create, maintain, and publish for use in proprietary services and platforms that are effectively walled gardens?

And iff you’re thinking that scraping open content is inherently wrong (there’s good reason for that), it’s worth mentioning that the Internet Archive itself is a giant scraper, albeit used for the noble purpose of archiving and preserving the web, which is constantly changing and evolving.

Websites like 404 Media have explicitly cited A.I. scraping as the reason for imposing a login wall. A cynical person might view this as a convenient excuse to collect ever-important email addresses and, while I cannot disprove that, it is still a barrier to entry. Then there are the unintended consequences of trying to impose limits on scraping. After Reddit announced it would block the Internet Archive, probably to comply with some kind of exclusivity expectations in its agreements with Google and OpenAI, it implied the Archive does not pass along the robots.txt rules of the sites in its collection. If a website administrator truly does not want the material on their site to be used for A.I. training, they would need to prevent the Internet Archive from scraping as well — and that would be horrible consequence.

This is the first time I’ve heard of the Really Simple Licensing (RSL) standard, which debuted yesterday:

One thing that might help, not suggested by Masnick, is improving the controls available to publishers. Today marked the launch of the Really Simple Licensing standard offering publishers a way to define machine-readable licenses. These can be applied site-wide, sure, but also at a per-page level. It is up to A.I. companies to adhere to the terms but with an exception — there are ways to permit access to encrypted material.

Compensation and attribution is the nail that the RSL hammer appears to be hitting. Unfortunately, that does nothing to preventing a move towards what Heer explains is the web splitting in two:

I, too, am saddened by an increasingly walled-off web, whether through payment gates or the softer barriers of login or email subscriptions.

Walled gardens. We’ve been concerned about them forever, but most notably with the emergence of Facebook and its propensity to restrict access to shared content by a login. The same is true, even of publishing platforms like Medium. It’s a curated version of the web that feels a lot like the AOL pattern of yesteryears. The difference is that we’re talking about the entire corpus of the open web scraped, repurposed, and redistributed in a completely separate corner of some other web.

0
Geoff

•

Jun 13, 2025

•

Present

The Second Death of Blogs

Matthew Phillips:

Furthermore, the nature of engagement itself has been subtly reshaped. Algorithms often favor content that elicits strong, quick reactions – the kind that can be easily signaled with an emoji or a “reaction thumbnail.” Nuanced discussion and thoughtful communication, the traditional hallmarks of blog comment sections and the communities around them, take a backseat to attention-grabbing, often polarizing, content. The algorithm, in its quest for maximum engagement, can inadvertently filter out the very depth and thoughtfulness that blogs once championed.

Look around at the average blog and you’re about to notice the dearth of comments. It wasn’t long ago that the number of comments on a post was a solid indicator that it struck a chord. As someone who runs a long-running blog, I’ve seen this happen in seemingly real time, and all you have to do is compare an article from 2015 like this to a similarly provocative article from 2025 like this. Reactions are the new currency and they’re happening on some other platform.

I‘ve toe-dipped into the IndieWeb over on my personal blog because it helps mitigate some of the silence by porting social media interactions into the site so they can be rendered in both places. It’s sorta like a modern-day version of Pingbacks. But it also feels like a FOMO-driven response because the interactions are happening elsewhere and all I’m doing is collecting artifacts. I’m still required to engage outside of my web home and feed the algorithm.

0
Geoff

•

May 15, 2025

•

Present

W3C Privacy Principles
www.w3.org
Yes, yes, it’s Global Accessibility Awareness Day. While that’s deservedly today’s focal point, it shouldn’t go unnoticed that the W3C published a set of Privacy Principles as well:

This document is intended to help its audiences address privacy concerns as early as possible in the life cycle of a new web standard or feature, or in the development of web products. Beginning with privacy in mind will help avoid the need to add special cases later to address unforeseen but predictable issues or to build systems that turn out to be unacceptable to users.

There are 30 principles (and sub-principles) in all. A few choice selections, starting with restricting the sort of data that is transferred around to what’s strictly necessary:
- Principle 2.2.1: Sites, user agents, and other actors should restrict the data they transfer to what’s either necessary to achieve their users’ goals or aligns with their users’ wishes and interests.
People have rights when the data is about them:
- Principle 2.5: People have certain rights over data that is about themselves, and these rights should be facilitated by their user agent and the actors that are processing their data.
This one’s particularly damning to browsers and marketers:

Principle 2.9.2: User agents and sites must take steps to protect their users from abusive behaviour, and abuse mitigation must be considered when designing web platform features.

And let’s ditch legal jargon when explaining how data is handled:

Principle 2.11.2: Information about privacy-relevant practices should be provided in both easily accessible plain language form and in machine-readable form.

How many times have you agreed to or confirmed cookie notices? Wouldn’t it be great to have access to your choices after the fact?

Principle 2.12.3: It should be as easy for a person to check what consent they have given, to withdraw consent, or to opt out or object, as to give consent.

Lastly, let’s make sure we don’t punish someone for wanting to protect their privacy:

Principle 2.14: Actors must not retaliate against people who protect their data against non-essential processing or exercise rights over their data.
0
Geoff

•

May 9, 2025

•

Present

It wasn’t the idea that failed: it was the execution
blog.nordcraft.com

A great bird’s eye view of the visual programming historical landscape, starting with Visual Basic in 1991 and ending with what is ultimately a push to use Nordcraft’s product in 2025.

Salma’s actual point, however, is that visual coding apps and platforms have failed to get it “right” even after 30 years of attempts.

It’s no surprise we weren’t getting it right in 1995, if we
still can’t get it right 30 years later with all of this knowledge, experience, and empathy under our belts. And I’m not even going to mention at this point how AI can’t get this right, either. Of course it can’t; it doesn’t possess the capacity for empathy.

Which, of course, is an indirect response to Figma introducing its own visual site builder, Figma Sites. The public response to Figma Sites has been abysmal because of the inaccessible HTML that the tool generates.

This week on May 7th 2025 Figma announced Figma Sites, a tool to publish your designs built in Figma directly to the web. But this new product has not been well received. Adrian Roselli warns us: Do not publish your designs on the web with Figma Sites.

Adrian’s post doesn’t even delve deeply into the accessibility issues produced by Figma Sites. All he needs to do is run simple automated tests to demonstrate just how deep the dumpster fire is.

It feels relevant to bring up Jakob Neilsen’s recent remarks that AI will completely eliminate accessibility issues:

Accessibility will disappear as a concern for web design, as disabled users will only use an agent that transforms content and features to their specific needs.

Will it? Even if it does, perhaps Jony Ive’s warning to designers from Stripe Sessions 2025 this past week:

Even if you’re innocent in your intention, if you’re involved in something that has poor consequences, you need to own it.

0
Geoff

•

May 7, 2025

•

Present

Third Party Cookies Must Be Removed
w3ctag.github.io

W3C Technical Architecture Group:

Third-party (AKA cross-site) cookies are harmful to the web, and must be removed from the web platform.

[…]

We are strongly in favor of innovations to build sustainable business models on the web platform, but an in-depth discussion of the various possibilities are outside of the scope of this document. From an architectural standpoint, web standards should avoid encoding particular business models that are available to authors, publishers, and web content creators.

Them are some strong words from the W3C that leave no doubt about their opinion to remove third-party cookies from the web. We recently noted that Google is sidestepping COPPA regulations. Something tells me the W3C is publishing this in response to Google dropping its own plans to remove third-party cookies from Chrome. Let the battle begin!

0
Geoff

•

May 6, 2025

•

Future

christopher.org for the next 100 years
chriscoyier.net

I could chuck christopher.org on Pressable and it would have a good long life there surely, but now it’s tied to my own future death and legacy plan. Automattic has 100-year domains ($2,000) and 100-year hosting. ($38,000, includes domain). Jesse mentioned we could get christopher.org onto that as well.

Ari (Christopher’s partner in life and business), David (Christopher’s brother), and I talked it over and agreed it would be a good plan.

100 years! This is longer than any of us can promise good stewardship of Christopher’s digital footprint.

I imagine that maintaining someone else’s digital footprint following their death has to feel like a major responsibility. And while the thought of a 100-year domain and hosting bundle for $40,000 might seem ludicrous to those of us who are living, I get how relieving it would be to get those responsibilities off your plate and wipe your hands clean of that overhead for the rest of your life.

It’s not a bad idea in an estate plan, either. In addition to funeral costs, there’s something about planning in advance for what to do with your digital presence once you’re gone.

And Christopher, you should know I got your Grunt build process running again. You’re welcome. But I’m not going to fix those Sass warnings. They are just deprecation warnings, it’s fine.

LOL, I can only hope that someone would do me the favor of maintaining my outdated Gulp scripts and dependencies after I’m gone. Then again, a backwards-compatible web should ensure that I never have to worry about that, so long as my website is properly archived.

0