I’ve also been wading into this the past few weeks while writing my own ActivityPub server. It seems simple at first but once you start interacting with other server implementations, you have to keep expanding your code and dealing with all of these interoperability issues or redesigning for the “worst case”.
It isn’t possible to implement ActivityPub without a server and a database. You can’t do it with just a static site.
Not only a server, but one that has a queue and will retry sending messages. A lot of smaller Mastodon servers aren’t scaling well right now, so you will frequently get 500s or timeouts trying to send to them.
Also note that quite a few interactions require that you fetch something before processing, which can take many seconds or even timeout, so you’re supposed to queue up things received at your inbox and then process them out-of-band. For instance, if you receive a signed message from a user you’ve never interacted with before, you have to fetch their profile and key through an HTTP fetch, then verify the signature on the original message they sent you. If you can’t reach their server, you’ll have to wait and retry before verifying their original message.
This is especially important when talking to servers like GoToSocial which require every message to be signed, even basic GETs of public information. So you have to sign your message to the GoToSocial server, but then that server has to reach out to you and fetch your key to verify it. I’m not sure how they handle two GoToSocial servers talking to each other because Server A would try to send a message to Server B, but B needs the key from A to verify that request, but that request itself from Server B has to be signed with something Server A can verify already.
Another example is if someone you follow forwards/“boosts” a post from someone you don’t follow. Now you have to fetch that remote user’s post, then fetch the JSON-LD for the user that made that post, then fetch all of the attachments and the user’s avatar, etc. You can see how this quickly spreads around a lot of data and a lot of traffic. After running my single-user server for 2 weeks following ~35 people, my SQLite database is already 556Mb that is mostly storage of avatars and post attachments.
Also, every reply made by users you follow, and replies to every post from users you do follow, will come your way and you’re supposed to cache those so when viewing posts from users you follow, you have semi-accurate threading and like/forward counters.
Oh and currently Mastodon sends user account deletions to every server it has ever interacted with, even if those individual users have never interacted with your server. This means that just mastodon.social alone will send you a random user deletion request every 30 seconds or so. And how do you verify the key of a user you’ve never interacted with? You have to call out to mastodon.social to fetch their key, but mastodon.social will respond with a 404 because that user already deleted their account. So you just end up throwing the message away.
Step one is WebFinger
Ackshually, step one is to query /.well-known/host-meta, which returns some XRD XML that says where the WebFinger endpoint is. I’ve run into two hosts so far that didn’t have it at /.well-known/webfinger, so this preliminary step was required to find it.
Every time I publish something, I send an update to every subscriber. If this blog gets popular, it’ll send an enormous amount of updates. Maybe there’s a more efficient way to get this done, but I couldn’t find it.
Mastodon JSON-LD defines a “sharedInbox” that is supposed to help with this. If you have 100 users at the same domain, you should be able to submit once to that domain’s sharedInbox instead of 100 individual inboxes.
There’s nothing like a “feed reader” in the world of ActivityPub. If you want to subscribe to someone’s content, you need an account and to send and receive messages. You need to be addressable on the internet.
Technically you could just read the user’s outbox URL defined in their JSON-LD, which should include a list of all of the latest messages. I don’t think this is required by ActivityPub (just an Inbox) but most servers are doing it.
After running my single-user server for 2 weeks following ~35 people, my SQLite database is already 556Mb that is mostly storage of avatars and post attachments.
That seems a bit heavy - my Pleroma has been up for 4 years going from 5 to 55 following and the Postgres size is only 310M with 113k stored activities.
If anyone is interested in some plumbing that makes building ActivityPub services in Go a little easier, I’m working intensively on a library called Go-ActivityPub, that you can find on Github.
It is composed of:
a vocabulary package where the default types are defined
a client package, which can be used to retrieve ActivityPub data and gives you back the objects based on the vocab types.
a “processing” package which contains the actual logic of the ActivityPub exchanges for client to server and server to server interactions.
a number of custom “storage” backends that can be used for persisting the activities.
a simple webfinger server that can load from these backends for Mastodon user discovery compatibility.
a reference ActivityPub server.
The packages APIs try to stick as close as possible to the actual specification to the point where it is used as documentation for functions and types in a lot of places.
That being said, a lot of work went into that and because it’s a one person show, not everything is as easy to understand as it could be. I am posting here in hope that someone is interested enough to lend a hand. On the Github org page there are some links to the mailing list and wiki page where a would be developer can start.
The code looks great and you’ve accomplished a lot already.
But what’s your end goal here? Are you going to maintain these repos over the next five years if people decide to rely on them? How is that sustainable for you?
I don’t have a 5 year plan to be honest. Outside of unforseen circumstances I don’t see why I can’t maintain this for at least that time though.
The goal of the libraries are to provide a good foundation for the link aggregator I wanted to build when I started. Something like (old) reddit and lobste.rs, only powered by ActivityPub. When I’ll achieve that goal I don’t know what kind of things I’ll want to be doing, so activity might decrease but as long as I’ll have an internet connection I doubt I’ll let them die. My hope is that there’ll be at least one other person/community interested in using the library, so they can do some of the maintenance work. So far there was only some marginal interest from the devs of gitea.
Now that I’ve written that I realize that I do have a longer term plan which involves building a full federated identity platform (something that will group different products under the same service: social media through activitypub, email, chat through matrix, online SAML provider, maybe a nextcloud instance, etc). However I doubt this is achievable if I can’t gather enough mindset and people to help me build it. :D
I found a Mastodon with only 6 files to be an interesting read. I’m not sure how well it integrates with Mastodon, but it does go over the very basics of the protocol.
This blog runs on Jekyll, one of the original static site generators.
Jekyll came out in 2008. When I was in college in…well, trust me, before that, there were already at least a half dozen generators, all written in Perl, of which MovableType[1] was just the biggest/most popular. (And I think “half a dozen” is probably off by a good order of magnitude.) The rest of the article was solid, but that one line gave me a bit of a spit-take.
[1]: MovableType does generate static HTML; the hosted part just handles the admin interface. That’s part of why MovableType sites used to scale so much better than WordPress. (I am also aware that, somewhere around MT5, they started adding some dynamic bits, but those were actually done by injecting PHP into the static templates, so I’m not sure it counts.)
You’re correct, but Jekyll did kick off a wave of next generation static site generators, and helped popularize the nomenclature “static site generator”.
However, lately I’ve started being disenchanted with the rigid “date & title required” format of most blogging platforms and have started using a “tumblelog” format instead. It’s also bridging the gap between blogging and posting on social media for me, as I repost most entries to Mastodon.
I don’t like self-promoting in comments but this feels incredibly coincidental. TODAY I published an iOS app called Beluga which tries to reproduce a “Twitter-like” experience on top of JSON Feed (with some extensions) and RSS. The app does all the work on the device and uses an S3 compatible account to publish a feed and a statically generated mini-site.
That is funny, I had almost the same idea. Well, mine is a reader as well, and you can only retweet from existing feeds. This is a design decision I am still on the fence about.
My project is currently for linux and windows and although I could publish an apple version, I would need apple hardware to do so and I do not have that. Now I don’t need to I can just point ios and mac users to your app.
WebSub is an optional add-on to RSS and if your hub is broken, everything falls back on polling. Whereas, IIUC, in AP the server has to retry sending messages to other servers, or they will miss them.
Technically you can do that with activitypub, you just need to know who you’re following and query their actor/outbox endpoints whenever you want to know what they’re saying. I think that, while this is a bit difficult to scale sometimes (especially for personal servers), it works out really well for social feeds because the speed at which those posts are coming in is so much faster, and the posts are so much smaller.
Typical RSS feeds include longer-form blog posts or news articles, that are paragraphs of text you need to sit down and read. Tweets and social media posts aren’t really like that. When I scroll through Twitter I’m skipping over a lot of stuff I’m not necessarily interested in, and refreshing to see if anything interesting is coming in. I don’t want to have to wait for my client app to check every one of the users I follow for updates, because by the time I’m done with that, many of their feeds will have most likely changed. So now I need to fetch new posts.
IMHO, RSS is only “planetary-scale” because of the kind of information it transmits. You can’t create IRC with RSS, and vice versa.
To be honest WebSub (used to be PubSubHubbub, no wonder the name was changed) doesn’t seem to be actively maintained. This post from 2015, linked from the wiki entry, suggests using http://push-pub.appspot.com/ as a test enpoint, but that returns a 500.
It also states
PubSubHubbub is based on webhooks, which means you need to have an HTTP server able to handle requests coming from the web (not behind a firewall, and no localhost).
This is a no-go for most people, especially if they’re on mobile.
If B follows A and then A blocks B, B’s server can still deliver messages from A if A’s server delivers to B’s instance shared inbox. A bad faith instance can evade a block in that sense. Or even in good faith if instance B did not process the block notification somehow.
There are quite a few things that email does better than ActivityPub, and you just hit on a big one. jcs has quite a few above as well, such as the fact that receiving a message from someone you’ve not contacted requires several round-trips between the servers. Quite the additional baggage of a simple EHLO.
There’s nothing like a “feed reader” in the world of ActivityPub. If you want to subscribe to someone’s content, you need an account and to send and receive messages.
For Pleroma and honk, at least, you can follow an RSS of someone’s posts - it’s how I was using IFTTT to crosspost from honk to Twitter.
When you say “whatever Mastodon does”, what do you mean exactly? It’s still generally operating within the ActivityPub spec, though I guess it has some extensions that make things a bit easier at scale. The standard itself is open-ended and can be used in different ways, it mostly just describes a format and how a set of existing technologies should be used together to create a social networking experience.
Hopefully, Mastodon’s influence in the fediverse will drive evolution of the ActivityPub spec to help with managing large-scale instances. Personally I think the killer feature of federated social networking is the ability to expose your work to an audience beyond whatever community/app you happen to be posting from. If I post music on FunkWhale, I don’t need to repost it to Mastodon, you can just follow my FunkWhale account on your Mastodon account.
I’ve also been wading into this the past few weeks while writing my own ActivityPub server. It seems simple at first but once you start interacting with other server implementations, you have to keep expanding your code and dealing with all of these interoperability issues or redesigning for the “worst case”.
Not only a server, but one that has a queue and will retry sending messages. A lot of smaller Mastodon servers aren’t scaling well right now, so you will frequently get 500s or timeouts trying to send to them.
Also note that quite a few interactions require that you fetch something before processing, which can take many seconds or even timeout, so you’re supposed to queue up things received at your inbox and then process them out-of-band. For instance, if you receive a signed message from a user you’ve never interacted with before, you have to fetch their profile and key through an HTTP fetch, then verify the signature on the original message they sent you. If you can’t reach their server, you’ll have to wait and retry before verifying their original message.
This is especially important when talking to servers like GoToSocial which require every message to be signed, even basic GETs of public information. So you have to sign your message to the GoToSocial server, but then that server has to reach out to you and fetch your key to verify it. I’m not sure how they handle two GoToSocial servers talking to each other because Server A would try to send a message to Server B, but B needs the key from A to verify that request, but that request itself from Server B has to be signed with something Server A can verify already.
Another example is if someone you follow forwards/“boosts” a post from someone you don’t follow. Now you have to fetch that remote user’s post, then fetch the JSON-LD for the user that made that post, then fetch all of the attachments and the user’s avatar, etc. You can see how this quickly spreads around a lot of data and a lot of traffic. After running my single-user server for 2 weeks following ~35 people, my SQLite database is already 556Mb that is mostly storage of avatars and post attachments.
Also, every reply made by users you follow, and replies to every post from users you do follow, will come your way and you’re supposed to cache those so when viewing posts from users you follow, you have semi-accurate threading and like/forward counters.
Oh and currently Mastodon sends user account deletions to every server it has ever interacted with, even if those individual users have never interacted with your server. This means that just mastodon.social alone will send you a random user deletion request every 30 seconds or so. And how do you verify the key of a user you’ve never interacted with? You have to call out to mastodon.social to fetch their key, but mastodon.social will respond with a 404 because that user already deleted their account. So you just end up throwing the message away.
Ackshually, step one is to query
/.well-known/host-meta
, which returns some XRD XML that says where the WebFinger endpoint is. I’ve run into two hosts so far that didn’t have it at/.well-known/webfinger
, so this preliminary step was required to find it.Mastodon JSON-LD defines a “sharedInbox” that is supposed to help with this. If you have 100 users at the same domain, you should be able to submit once to that domain’s sharedInbox instead of 100 individual inboxes.
Technically you could just read the user’s outbox URL defined in their JSON-LD, which should include a list of all of the latest messages. I don’t think this is required by ActivityPub (just an Inbox) but most servers are doing it.
That seems a bit heavy - my Pleroma has been up for 4 years going from 5 to 55 following and the Postgres size is only 310M with 113k stored activities.
Are you not caching attachments?
No[1], although if I were, Pleroma stores them on disk, not in the DB.
[1] Not currently inclined to take the chance that someone posts something UKGOV objects to and it being stored on my server landing me in trouble.
If anyone is interested in some plumbing that makes building ActivityPub services in Go a little easier, I’m working intensively on a library called Go-ActivityPub, that you can find on Github.
It is composed of:
The packages APIs try to stick as close as possible to the actual specification to the point where it is used as documentation for functions and types in a lot of places.
That being said, a lot of work went into that and because it’s a one person show, not everything is as easy to understand as it could be. I am posting here in hope that someone is interested enough to lend a hand. On the Github org page there are some links to the mailing list and wiki page where a would be developer can start.
The code looks great and you’ve accomplished a lot already.
But what’s your end goal here? Are you going to maintain these repos over the next five years if people decide to rely on them? How is that sustainable for you?
I don’t have a 5 year plan to be honest. Outside of unforseen circumstances I don’t see why I can’t maintain this for at least that time though.
The goal of the libraries are to provide a good foundation for the link aggregator I wanted to build when I started. Something like (old) reddit and lobste.rs, only powered by ActivityPub. When I’ll achieve that goal I don’t know what kind of things I’ll want to be doing, so activity might decrease but as long as I’ll have an internet connection I doubt I’ll let them die. My hope is that there’ll be at least one other person/community interested in using the library, so they can do some of the maintenance work. So far there was only some marginal interest from the devs of gitea.
Now that I’ve written that I realize that I do have a longer term plan which involves building a full federated identity platform (something that will group different products under the same service: social media through activitypub, email, chat through matrix, online SAML provider, maybe a nextcloud instance, etc). However I doubt this is achievable if I can’t gather enough mindset and people to help me build it. :D
This is the best entrypoint I’ve seen for getting started with AP. Usually people just point at the RFCs and don’t tell you enough to get started.
Thx for finding this. The ones i’ve found before were definitely not this good
I found a Mastodon with only 6 files to be an interesting read. I’m not sure how well it integrates with Mastodon, but it does go over the very basics of the protocol.
That link doesn’t so much “integrate with Mastodon” as imitate one Mastodon workflow with static files.
Jekyll came out in 2008. When I was in college in…well, trust me, before that, there were already at least a half dozen generators, all written in Perl, of which MovableType[1] was just the biggest/most popular. (And I think “half a dozen” is probably off by a good order of magnitude.) The rest of the article was solid, but that one line gave me a bit of a spit-take.
[1]: MovableType does generate static HTML; the hosted part just handles the admin interface. That’s part of why MovableType sites used to scale so much better than WordPress. (I am also aware that, somewhere around MT5, they started adding some dynamic bits, but those were actually done by injecting PHP into the static templates, so I’m not sure it counts.)
You’re correct, but Jekyll did kick off a wave of next generation static site generators, and helped popularize the nomenclature “static site generator”.
Personally I used https://en.wikipedia.org/wiki/Greymatter_(software)
I’m using Blosxom in static mode, simple is best.
Are you running the old codebase or a fork? I have so many fond memories of Blosxom :-)
I’m running the original (v. 2.1.2).
However, lately I’ve started being disenchanted with the rigid “date & title required” format of most blogging platforms and have started using a “tumblelog” format instead. It’s also bridging the gap between blogging and posting on social media for me, as I repost most entries to Mastodon.
I don’t like self-promoting in comments but this feels incredibly coincidental. TODAY I published an iOS app called Beluga which tries to reproduce a “Twitter-like” experience on top of JSON Feed (with some extensions) and RSS. The app does all the work on the device and uses an S3 compatible account to publish a feed and a statically generated mini-site.
Here’s my mini-site https://beluga.gcollazo.com
That is funny, I had almost the same idea. Well, mine is a reader as well, and you can only retweet from existing feeds. This is a design decision I am still on the fence about. My project is currently for linux and windows and although I could publish an apple version, I would need apple hardware to do so and I do not have that. Now I don’t need to I can just point ios and mac users to your app.
FWIW I think you want http or https at the beginning of your link; lobste.rs is treating it as a relative link right now.
For the lazy: http://gametheatre.org/porifera/
Beluga looks neat!
This is pretty neat, seems like it would be useful for the tildeverse since it can just publish to a subdirectory of public_html/
a pre-release version had support for SFTP, maybe I should reconsider bringing that back
Anyone know the rationale for doing the opposite of RSS?
You mean push instead of pull?
Originally this was done for faster update speed, and then also for scaling so you’re not polling thousands of feeds with no updates
There is WebSub for efficient pings, and of course RSS is proven to be planetary-scale.
Yes, WebSub/PuSH/PSHB is exactly what I was referring to, since everything about the AP architecture is descended from that.
WebSub is an optional add-on to RSS and if your hub is broken, everything falls back on polling. Whereas, IIUC, in AP the server has to retry sending messages to other servers, or they will miss them.
Technically you can do that with activitypub, you just need to know who you’re following and query their actor/outbox endpoints whenever you want to know what they’re saying. I think that, while this is a bit difficult to scale sometimes (especially for personal servers), it works out really well for social feeds because the speed at which those posts are coming in is so much faster, and the posts are so much smaller.
Typical RSS feeds include longer-form blog posts or news articles, that are paragraphs of text you need to sit down and read. Tweets and social media posts aren’t really like that. When I scroll through Twitter I’m skipping over a lot of stuff I’m not necessarily interested in, and refreshing to see if anything interesting is coming in. I don’t want to have to wait for my client app to check every one of the users I follow for updates, because by the time I’m done with that, many of their feeds will have most likely changed. So now I need to fetch new posts.
IMHO, RSS is only “planetary-scale” because of the kind of information it transmits. You can’t create IRC with RSS, and vice versa.
To be honest WebSub (used to be PubSubHubbub, no wonder the name was changed) doesn’t seem to be actively maintained. This post from 2015, linked from the wiki entry, suggests using http://push-pub.appspot.com/ as a test enpoint, but that returns a 500.
It also states
This is a no-go for most people, especially if they’re on mobile.
AP and most other Webby protocols require this as well, but this is fine as you usually don’t run your server in your mobile.
WebSub isn’t so much “unmaintained” as it is “done”. I’ve been using it continuously for years. It’s simple and it works.
IIRC related to how it’s easier to maintain privacy of posts than with everyone polling unauthenticated RSS (or Atom) feeds.
[Comment removed by author]
Wait, it’s per individual subscriber, not per instance? Why? Am I missing something non-obvious? SMTP can do that.
You can send once to an instance that hosts many of your subscribers. Here’s the relevant section of the spec: https://www.w3.org/TR/activitypub/#shared-inbox-delivery
Thank you! However, just from reading that part it isn’t clear how exactly “receiving server participates in determining targeting”.
If B follows A and then A blocks B, B’s server can still deliver messages from A if A’s server delivers to B’s instance shared inbox. A bad faith instance can evade a block in that sense. Or even in good faith if instance B did not process the block notification somehow.
[Comment removed by author]
[Comment removed by author]
There are quite a few things that email does better than ActivityPub, and you just hit on a big one. jcs has quite a few above as well, such as the fact that receiving a message from someone you’ve not contacted requires several round-trips between the servers. Quite the additional baggage of a simple
EHLO
.Very helpful simple summary of basic ActivityPub. Frustrating it’s so complicated!
For Pleroma and honk, at least, you can follow an RSS of someone’s posts - it’s how I was using IFTTT to crosspost from honk to Twitter.
This feels like an enormous miss.
What’s the under-over on whatever Mastodon does becoming de facto standard?
When you say “whatever Mastodon does”, what do you mean exactly? It’s still generally operating within the ActivityPub spec, though I guess it has some extensions that make things a bit easier at scale. The standard itself is open-ended and can be used in different ways, it mostly just describes a format and how a set of existing technologies should be used together to create a social networking experience.
Hopefully, Mastodon’s influence in the fediverse will drive evolution of the ActivityPub spec to help with managing large-scale instances. Personally I think the killer feature of federated social networking is the ability to expose your work to an audience beyond whatever community/app you happen to be posting from. If I post music on FunkWhale, I don’t need to repost it to Mastodon, you can just follow my FunkWhale account on your Mastodon account.