Austin Lanari

Coding and comics and comics and coding.

Give Me RSS or Give Me Death

I've got a few websites (mainly this one and my comic crit site), but neither is really a center of operation for me. Previously, I've used twitter for that; but, I've stepped away from twitter both to take a break from the endless negative feedback loop of social media (of twitter in particular), and to dial back the proprietary software I use.

I wanted to start putting content out into the internet on my own terms, meaning on my own sites, my own servers, with my own format, and my own style.

The first step was maintaining some kind of social media presence, which I'm doing on Mastodon. The second step was to make a site that would aggregate my existing content in addition to being my base of operations for tinkering.

You can find that site at jumanji.io.

The Goal: Aggregation

To aggregate content, I needed to get posts from austinlanari.com and fuckupsomecomics.com as a start. Ghost (the blogging platform I use for fuckupsomecomics.com ) has a public API that can be used for fetching data about a given blog and its posts. This site, however, has no such API since it's just a big ol' bundle of statically generated JavaScript goodness.

What it does have is an RSS feed.

Of course, the Ghost blog has an RSS feed too. Because, well, nearly everything on the internet has a damn RSS feed. With the death of Google reader, a lot of folks tossed their habit of feed-reading aside. And from a developer perspective, we should be miffed about this: RSS is one of the closest things we have to a standard on the incredibly fragmented internet. It exists out of the box on the most major website providers. It provides a tried-and-tested standardized format like XML with predictable results (unless some custom generation of the feed got in the way).

In an age where Google wants to centralize everything to the point of re-serving your mobile pages under their own domain, we should be re-embracing technology like RSS that allows us to both distribute and aggregate the content we want to serve/view on the internet on our own terms.

The Front-End: gatsby-source-rss...-fork

There's really only one way to pull in an RSS feed in terms of actually retrieving it: you GET the requisite /rss endpoint (for instance https://austinlanari.com/rss.xml) and you use a library to parse the XML into JS objects or JSON as necessary, et voilà.

The question is, when should this be done?

If it's a live fetch in the browser when a user goes to my site, it's going to take too long. On top of requesting jumanji.io at /, they now have to wait for at least one other route to fetch. Then, they have to wait for the parsing to occur, which is meaty and takes time, on top of whatever actual data manipulation is happening to sanitize it for the client.

The only upside of live fetching is that as soon as a post goes up on one of my RSS feeds, visits to jumanji.io will show that post to users. But since we're aggregating long-form content, it's not as if we're updating a Mastodon feed widget. It does not need to be that up to date.

So, instead of doing expensive fetching live, we can do it at build time. Each time we statically generate jumanji.io, we'll fetch the RSS feed data and bake it in. One of the upsides here is that there are tools for doing this kind of data sourcing in Gatsby such that we don't need to rely on fetching feeds and writing the data ourselves. Source plugins are made exactly for this. By using one of these plugins, RSS feed data is exposed in Gatsby via a series of graphql queries and the actual act of fetching is abstracted completely from the declarative code which renders it.

Unfortunately, gatsby-source-rss doesn't actually work, as far as I can tell. All the code looked right to me but the plugin wasn't hooked up to the Gatsby ecosystem correctly. Luckily, a plugin search yielded gatsby-source-rss-fork which worked correctly.

Except it only worked for this site and not my Ghost blog. Despite the fact that I could curl https://fuckupsomecomics.com/rss/ in my konsole, any GET requests made by Gatsby or in the browser were failing without so much as an error message. Which could only mean one thing:

God damn stinking CORS.

The Back-End: Stop Ghost RSS from *looks into camera* Ghosting.

You can see some discussion about the issue here, and the latest PR regarding the issue (3 years ago!) here. The long of the short of it is that at least one Ghost maintainer thinks that there's no reason that an /rss endpoint should be publicly accessible in a cross-origin fashion. Here's the relevant comment (emphasis mine).

The use case you're suggesting here is being able to get your latest X posts on an external site of your choice, but by specifying global CORS headers, what you're actually allowing for is anyone to show any Ghost blog's latest X posts on any site. That's an enormous leap to add to Ghost core, and I don't think there's a justification for it.

The JSON API is intended to allow for this sort of thing in a controlled way (via OAuth clients) which means that the owner of the blog would always have absolute control over who can do what with their content.

I don't want to have to learn an API just to display links to posts on one of my blogs: RSS is literally made for this. Additionally, since I chose to have an RSS feed on my blog, I clearly want my posts to be publicly available. The only thing CORS blocking does is stop people who want to do stuff with my posts in a browser: folks can still write server-side scripts to grab my entire RSS feed and do whatever they want with it. Additionally, even if I had chose to have no RSS feed at all, someone clever enough to abuse my RSS feed could easily just scrape my site. The logic is nearly identical, just slightly more fragmented.

To briefly rant, and to re-underscore my point, this kind of thing is so indicative of the modern web ecosystem. There are all these psuedo-proprietary ways of asking for data, driven by API's that think they are solving a problem when really they're just putting their preferred brand of dressing on an issue that is already half-solved, sometimes by tested standards (*cough* RSS *cough*). I should be able to run three blogs in three different platforms and aggregate data from all of them in a unified manner.

It's not a security issue: it's a common sense issue.

After initially trying to add CORS to the /rss route via my nginx config (that doesn't work because of the way Ghost apparently internally reverse-proxies things), I opted for the method implemented by the aforementioned rejected PR which just slaps the appropriate headers on the response in Node. Only problem is that since the PR is 3 years old there isn't even a core/server/controllers/frontend.js anymore. Luckily, there is a core/server/controllers/rss.js (snaps for solid naming), so adding the headers in the exact same manner as the original PR is possible within the generate function, which exposes the res object needed for setting the headers on the response.

Follow on Mastodon