I did this once when designing an in-house programming language. It was a DSL for statisticians and it compiled down to C++. To get the users working quickly, I changed the parser so that anything between double ‘@’ pairs was emitted directly as C++. It was a good way to get people working as the language was being developed.
I wonder whether the introduced bugs or any pattern they form are detectable? If they are, attackers would move on to other targets rather than get trapped in ‘flypaper.’ Making attackers believe that the bugs are exploitable would be the real win. It would be like the tactic of keeping the telemarketer on the line to keep them from calling others.
Making attackers believe that the bugs are exploitable would be the real win.
That’s a really, common strategy called honeypot systems. Some even fake entire networks.
I believe the initial assumption is that people treat large classes of bugs, like “the program crashes on invalid input”, as promising exploit candidates, in part because there is tooling to find those kinds of bugs (fuzzers and such). So you can maybe make that search harder if you inject a buch of non-exploitable bugs for each of those common categories, so that fuzzers turn up far too many false positives. But yeah, then you have the usual arms race: can people just narrow their heuristics to exclude your fake bugs? There’s a small discussion of that from one of the authors on Twitter.
No need to speculate on control theory or continuous models in software: they’re heavily used in embedded field in industrial control, aerospace, and so on. Certainly be beneficial for more developers to look into that stuff, though.
Yes, I agree, that they are used to create software that controls or models things in the world. I was getting at the point that those tools can be used to control or model software systems. We do this to some degree in site reliability but likely there’s more we can borrow.
ActivityPub strikes me as the invention of people who believe that the internet = HTTP, and who know about JSON but not RFC822.
Some of the example message bodies just look like JSON-ized SMTP headers, “inReplyTo” etc. It looks like it has a MIME-inspired “mediaType” attribute too, but does it allow only one media type per message?
Can someone who is more familiar with ActivityPub give me the sales pitch about why existing protocols don’t suffice?
RFC822 is ASCII only to begin with one of the biggest limitations of email related “standards”.
Some 6.5 billion people around the globe use non-ascii charecters, and old standards only have layers of hacks upon them to support their usecases to some extent.
Why not create new standards from the ground up for the current usecases? I’m not interested in ActivityPub curently, but I have some experience with email and related technologies, and it badly needs a redesign. It won’t happen as none of the parties capable to organise it is interested in it.
My uninformed guess is that with the slow decline of email, there are more & better JSON parsers than there are MIME or email parsers. I would have made the same choice, but my reason would have revolved around JSON’s ability to store structured data, for future flexibility.
HTTP Headers are the same format like MIME headers, browsers already have everything one would need for mail. Multipart documents (email attachments) are the same format like HTTP file uploads via form. There is a number of headers both share.
I think it comes down to tooling. Protocol A could be 10x as widely deployed as protocol B, but if protocol B has better libraries, I’ll give that more weight in my decision of which to use. I had to assemble a multipart MIME message for work a few weeks ago, and everything about the experience was inferior to “create a data structure and convert it to JSON”.
Coders are likely to pick the easiest path, if everything else is roughly equal.
SMTP is forever tainted by spam. ISPs like to block ports, spam filters like to eat mail from new unknown servers, etc.
Giving a pitch for Webmention instead of ActivityPub: Webmention requires the sender to publish an html page that actually links to the target URL. You can be stricter and require a valid microformats2 reply/like/repost/bookmark. That already stops old school pingback spam. For stronger protection, there are clever schemes based on “this non-spam domain you linked to has linked to me”.
I’ve been reading a lot of Nancy Leveson’s work and she provided had an amazing explanation for why software engineering is so different from “conventional” engineering. In f.ex mechanical engineering, the main danger is component failure: Something breaks and cascades through the machine. In software engineering, the main danger is emergence: the combination of multiple interacting pieces, all working perfectly, leads to a global problem.
It’s not a “we’re more incompetent than the REAL software engineers”. She studied the designers of missile systems, power plants, and aircraft, all places that take software engineering extremely seriously. But they’re all still buggy for emergence reasons.
It sure feels like the emergence is the consequence of the sheer scale.
I came across this tweet recently https://twitter.com/nikitonsky/status/1014411340088213504 Bet that missile systems, power plants, and aircraft all have less code than many relatively simple desktop apps.
That’s interesting. A quick search indicates that the F-35, which has had numerous delays and reliability issues (I read somewhere that pilots have to reboot one of the onboard computers every 10 minutes or so) has over 8 million lines of code.
It’s true. I don’t think it counters the point, though. How many of those systems are designed with integration patterns or analyses that ensure the individual components work together properly? I doubt many. The few I’ve seen came out of the correct-by-construction approaches. Even they usually have simplified mechanisms for the integration that make it easier to analyze the system. Many real-world systems use unnecessarily, complicated forms of integration from how they couple modules up to the build systems they use.
I think emergence will have a mix of intrinsic and accidental complexity as usual. I think many failures are caused by accidental, though.
I’m basing a lot of this off her free online book engineering a safer future. She also has a seminar on it here: https://youtu.be/8bzWvII9OD4
Does anyone know any more about this? I’ve never heard of it and it seems very new, but there is already a BallerinaCon in July? Looks like it’s owned by WSO2 who I’ve never heard of before either.
It has been about 3 years in development but we really started talking about it earlier this year. The origins indeed have been in WSO2’s efforts in the integration space (WSO2 is an open-source integration company and had a research project on code-first approach to integration). Ballerina is an open-source project - at this moment has 224 contributors.
It is getting a lot of interest in the microservices and cloud-native (CNCF) space because it supports all the modern data formats and protocols (HTTP, WebSockets, gRPC, etc.), has native Docker and Kubernetes integration (build directly into a Docker image and K8S YAMLs), is type-safe, compiled, has parallel programming and distributed constructs baked in, etc.
You can see lots of language examples in Ballerina by Example and Ballerina Guides.
I actually posted this hoping someone would have more info. The language looks interesting and far along to be under the radar.
The company seems to be based in Sri Lanka. It is nice to see cool tech coming from countries like that.
The company seems to be based in Sri Lanka. It is nice to see cool tech coming from countries like that.
The project has non-WSO2 contributors as well, and WSO2 has also offices in Mountain View, New York, Pao Paolo, London, and Sydney, but indeed Colombo (Sri Lanka) is the biggest office so at the moment my guess would be that Ballerina is 90% from Sri Lanka - which indeed is a fantastic place! :)
The author mentions that ASCII-friendly APL successors (see J) “are all ugly far beyond the possibility of public success.” While I don’t necessarily agree, I feel like ligature fonts would be a perfect fit for a language like J. It could be used to map verbs onto their APL equivalents, and just make things look a bit more cohesive.
J is beautiful and, in terms of semantics, is even more elegant than APL. The notation is its primary drawback, to me, for two reasons: there is no longer a one-to-one mapping of action to symbol (because some symbols are digraphs or semi-trigraphs); and because the symbols used already have well-known meanings, causing cognitive burden when switching between the J and the everywhere-else meaning.
Also:
This is the Game of Life in APL
I love APL but I swear if all you read is pop-CS articles about APL you’d think it’s Life: The Language
Would you have some recommendation of array language snippets that are more representative of the things people end up writing?
As someone extremely tired of seeing fibonacci examples for functional languages, I’m very interested in knowing what real APL looks like
The Co-dfns compiler (https://github.com/Co-dfns/Co-dfns) is an APL-to-C++ compiler written in APL.
GNU APL has a pretty nice community page at https://www.gnu.org/software/apl/Community.html where they list some APL projects (some written in APL and others in other languages).
J has an extensive standard library and a complete relational database written in J, all at https://jsoftware.com
Array languages get the most use today in finance, I believe. The K language from Kx Systems (and the Q query language strongly related to it) are widely used there and have a free-as-in-beer version available with some documentation.
(I don’t remember who said it, but the statement “every time you buy stock you’re using K” is probably a reasonably true statement.)
Q is worth a look. https://en.wikipedia.org/wiki/Q_(programming_language_from_Kx_Systems)
Seems like he had the answer in the first few paragraphs: Ideally, the people determining the scope and timing of the work should be the people doing the work. After that, he went elsewhere.
I thought it would actually be about std::optional, not workspace issues that have nothing to do with the problem at hand.
TL;DR: keep your toolchain up to date if you want to use recent language features.
yeah. I suspect better article naming would be better at not leaving people feel like they kept on expecting the article to go somewhere it didn’t.
I think it’s funny because the reader’s experience parallels the author’s experience of wanting to get someplace.
Sorry folks :(. But std::optional works as one expects - you can write functions to accept std::optional and you just check early on if it evaluates to true and just return empty as needed, so you can chain functions neatly.
Now, if only we could have pattern matching …
I think the consensus of languages with options and pattern matching is “don’t use pattern matching, use combinators”.
Hmm as a full-time Haskeller “don’t use pattern matching” is news to me. Do you mean “don’t use pattern matching for fundamental vocabulary types like Maybe or Either? In which case it’s a reasonable guideline. For types representing your business domain, pattern matching is perfectly good practice. IMHO exhaustiveness checking of pattern matching is an indispensable feature for modelling your domain with types.
Do you mean “don’t use pattern matching for fundamental vocabulary types like Maybe or Either?
Yes.
Consensus, really? I’m a big fan of combinators, but I’ll still match on option types sometimes if I think it looks clearer.
Agreed. I read all the way down and nothing significant about std::optional.
I thought it was going to be some sort of piece about how using std::optional could lead to yak shaving or something :(
It looks similar to ideas I have as I work on my keywordless language [1]. A simple example would be:
? t < v : ^ ERANGE , v;
(where ? is IF, : is THEN and ^ is RETURN). A more complex example is:
{? err,c = getc()
== EOF , _ : ^ 0 , v;
!= 0 , _ : ^ err , v;
_ , _ : { ungetc(c) ; ^ 0 , v; }
_ , is_digit(c) : n = c - '0';
_ , is_upper(c) : n = c - 'A' + 10;
_ , is_lower(c) : n = c - 'a' + 10;
}
(where _ is “don’t care” placeholder). Internally, the compiler will [2] re-order the tests from “most-specific” to “least-specific” (so the _ , _ : bit is like ELSE). Also, here, getc() returns two values [3], both of which are checked. I do not have exceptions because I’m not fond of exceptions [5] so I don’t have syntax for it.
[1] Based off an idea I read about in the mid-80s [4].
[2] I’m still playing around with syntax.
[3] I had a hard time moving from assembly to C, simply because I could not return multiple values easily.
[4] It’s a long term PONARV of mine.
[5] It’s a dynamic GOTO and abused way too much in my opinion.
Very nice. Re [2], does that mean that the sequence of the checks in this construct really is immaterial?
Non-existent. I’m still working (even after all these years) on syntax. It was only after I posed the above did I realize that trying to go from “most-specific” to “least-specific” is problematic in the above example. Of these two:
== EOF , _
!= 0 , _
Which one is more specific? It’s for these reasons (and some more) that this is taking a long time.
This article makes me think about the difference between constructs where we have to impose an order on the checks and ones where we don’t. The latter seems nicer but they would lead us to drop ‘else’ and be explicit about the full decision space.
Author here, thought this might create some interesting discussion!
TL;DR:
Why have multiple distinct syntactic constructs for if-then-else, pattern matching and things like if-let, when they are largely doing the same thing (check some condition to make a decision on how the program should go on)?
The core idea is having a single, unified condition syntax that scales from simple one-liners to complex pattern matches that subsumes the existing syntax options.
Are they the same?
Why do we even use if statements anyway?
k/q doesn’t use them very often, since it rarely makes things clearer. Function application is indexing, decode, projection and each-left, and so on, make it possible to write much less code.
for example, if x == 1.0 then "a" else "z" could be simply "za"1=
“one comparison operator on multiple targets” is: "zba"2 sv 1 2=\:
“different comparison operators, equality and identity” is: "zna"2 sv(1=;0w=)@\:
“method calls” are "zne"2 sv(isempty;0 in)@\:
Scala is an atom-language though. It can only do one thing at a time, so you see there to be a need to “check some condition to make a decision on how the program should go on” but, let’s say those lists are big, we can trivially parallelise “each”; In a data-parallel language, you very infrequently check some condition to make a decision on how the program should go on.
Your “simply” is my “incomprehensibly”.
Computer languages need to strike a balance between human-language intuition and machine-parser explicitness. Simply slamming the slider all the way to the right isn’t a solution, so much as an admission of defeat, IMO.
My idea was totally different. I’ve noticed what people comprehend depends on their thinking style, background (esp prior languages), and so on. However, there’s fewer underlying concepts or structures in play than there are surface syntaxes or API’s. So, I was thinking that maybe languages should try multiple front-ends with different syntaxes, coding styles, etc. As a start, C and Python. Each client has a tool that automatically translates it to their style with same meaning.
maybe languages should try multiple front-ends with different syntaxes, coding styles, etc.
is it just me or does it sound like racket’s #lang?
Probably also not a coincidence that Racket is at the top of my list for a future project doing something similar. ;)
Your “intuition” is really mediocracy.
Code that is shorter has a higher chance of being correct. If you can’t read it now, learning how to read it will make you a better programmer, and that benefits you, and everyone you work with.
(laughs)
Downvote my thoughtful response as a troll, insult me, and then talk down to me. Really hit the internet trifecta, huh?
You’re the one who said you can’t comprehend something, and yet you believe you have something important to comment on it?
How is that not mediocrity?
Either indentation-based, or requiring some delimiter.
I’m largely in the indentation-based camp these days, so I haven’t spent much time thinking about how to make the delimitation to look nice. I’d probably just go with mandatory curly braces around the branches.
On HN someone said: “This will be a tongue in cheek comment, but there’s another thing Datomic isn’t making you do either: GDPR compliance.”
Immutable data stores are great but the world wants some level of mutability. I’ll link to the comment and responses if anyone is interested.
I’m probably biased, but I think Datomic’s model of deletion is perfect for GDPR.
When you delete something permanently, we call it “excision.” (As in “cutting out”.) After the excision, the data is gone, gone, gone. Any old storage segments that held the excised data get reindexed and garbage collected.
But, we record the event of the excision as its own transaction. So there’s a permanent record of what got deleted (by the matching criteria) and when. And like any transaction, the excision transaction can have extra attributes attached like who did it and in response to what document number, etc.
With any other database, once data is deleted, you don’t know that it ever existed and you don’t know who deleted it, when, or why.
The link @mfeathers linked to says that excision is very expensive but it’s unclear what that means for use. Do you have any guidance on that?
Excision does require a full scan of the log, plus a full index job. Depending on the size of your database that can take a while. Because this has to be done atomically, the transactor can’t do anything else while that excision runs.
This is for the on-prem version of the product. I don’t know how the cloud version does it… it may be friendlier to throughput there.
EDIT: Sorry for the wall of text… I wanted to say a bit more than “you have to design for it.”
I have seen installations where we had to get creative to work around 3 or 4 hour excision times. But I’ve also seen installations where it took a couple of minutes. But even on the low end, it requires design work to handle those delays.
There’s a cluster of related design techniques to achieve high throughput with Datomic. I’m still learning these, even after 6 years with the product. But it turns out that designing for stability under high throughput makes you less sensitive to excision time.
Mostly it comes down to the queue. Datomic clients send transactions to the transactor via a queue. (This is actually true for any database… most just don’t make the queues evident.) Any time you look at the result of a transaction, you’re exposed to queuing time. “Transaction” here specifically means changing data, not queries. Those are unaffected by excision or the tx queue.
I design my systems to start with a DB value that I capture at the beginning of a request. That means I freeze a point in time and all my queries are based at that point in time. This would be similar to a BEGIN TRAN with repeatable read isolation. Then while processing a request, I accumulate all the transaction data that I want to submit. At the end of the request, I make a single transaction out of that data so all the effects of the request happen atomically.
When I call the transact function, I get back a future. I pass that future off to an in-memory queue (really a core.async channel, if you’re a Clojurist.) A totally different thread goes through and checks the futures for system errors.
All this means that even if the tx-queue is slow or backed up, I can keep handling requests.
As a separate mechanism, I’m also exploring the idea of separating databases by the calendar. So like you’d roll a SALES table over each year and keep a history of SALES_2016, SALES_2017, etc. Since I can query across multiple databases quite easily, I can keep my working set smaller by doing that.
All this means that even if the tx-queue is slow or backed up, I can keep handling requests.
Can you? For example, let’s say we have a request from the web that is updating my Tinder profile and we’re running an excise to remove old GPS coordinates and this takes 3 minutes. That means my request will hang 3 minutes, right? While you might technically be correct, from a UX perspective, you’re not continuing to handle requests. Or did I misunderstand your description? If I understand you correctly, if you were pitching this technology to me I would probably reject it. I can’t have multi-minute write outages in my super important most popular product ever.
like you’d roll a SALES table over each year and keep a history of SALES_2016, SALES_2017,
I haven’t used Datomic so maybe the model is so great putting up with things like this is worth it, but I do really dislike having to decide a sharding strategy (should I do years? months? weeks? how do I know? How expensive is it to change after I decide?). Certainly most databases have pretty miserable payoffs, though. Also, is excise just inside a DB or is it across all DBs?
It supports complete deletion of data.
Yes. More here: https://news.ycombinator.com/item?id=16891425
That was one of my points against blockchains due to encumberance pollution attacks repos. I had ideas for dealing with it but each had tradeoffs. Tricky paradox to address.
A lot of clunky error handling is a side effect of returning data. When we call getLocation we get back the retrieved location and an error code. If we were messaging, we could simply message someone the location when we can get it, and message an entirely different part of code if we can’t get it.
Good point about beginners, but thing about non-trivial abstractions is that they always leak someplace.
He uses memory management as an example of something programmers don’t have to think about day to day, but you kinda need to have a mental model of memory. I’ve visited teams that haven’t and they’ve dug themselves into deep holes.
The cases for data are real. The question is how much rework you have to do when you come across one.
Yeah, most of these techniques would work in any modern language. I assume the target audience is recidivist C coders.
The goal was to demonstrate how modern code, as in code you’d write today in Swift, would solve the same problems without relying on goto or multiple inheritance. I don’t claim that Swift pioneered any of this or that any of it is novel.
This a fascinating case. It’s very unfortunate that the cyclist had to die for it to come before us. However, had the car been driven by a human, nobody would be talking about it!
That said, the law does not currently hold autonomous vehicles to a higher standard than human drivers, even though it probably could do so given the much greater perceptiveness of LIDAR. But is there any precedent for doing something like this (having a higher bar for autonomous technology than humans)?
Autonomous technology is not an entity in law, and if we are lucky, it never will be. Legal entities designed or licensed the technology, and those are the ones the law finds responsible. This is similar to the argument that some tech companies have made that “it’s not us, it’s the algorithm.” The law does not care. It will find a responsible legal entity.
This is a particularly tough thing for many of us in tech to understand.
It’s hard for me to understand why people in tech find it so hard to understand. Someone wrote the algorithm. Even in ML systems where we have no real way of explaining its decision process, someone designed it the system, someone implemented it, and someone made the decision to deploy it in a given circumstance.
Not only that, but one other huge aspect of things nobody is probably thinking about. This incident is going to probably start the ball rolling on certification and liability for software.
Move fast and break things is probably not going to fly in the faces of too many deaths to autonomous cars. Even if they’re safer than humans, there is going to be repercussions.
Even if they’re safer than humans, there is going to be repercussions.
Even if they are safer than humans, a human must be held accountable of the deaths they will cause.
Well… it depends.
When a bridge breaks down and kills people due to bad construction practices, do you put in jail the bricklayers?
And what about a free software that you get from me “without warranty”?
Indeed. The same would work for software.
At the end of the day, who is accountable for the company’s products is accountable for the deaths that such products cause.
Somewhat relevant article that raised an interesting point RE:VW cheating emissions tests. I think we should ask ourselves if there is a meaningful difference between these two cases that would require us to shift responsibility.
Very interesting read.
I agree that the AI experts’ troupe share a moral responsibility about this death, just like the developers at Volkswagen of America shared a moral responsibility about the fraud.
But, at the end of the day, software developers and statisticians were working for a company that is accountable for the whole artifact they sell. So the legal accountability must be assigned at the company’s board of directors/CEO/stock holders… whoever is accountable for the activities of the company.
What I’m saying is this is a case where those “without warranty” provisions may be deemed invalid due to situations like this.
I don’t think it’ll ever be the programmers. It would be negligence either on the part of QA or management. Programmers just satisfy specs and pass QA standards.
It’s hard to take reponsability for something evolving in a such dynamic environment, with potentially used for billions of hours everyday, for the next X years. I mean, knowing that, you would expect to have a 99,99% of cases tested, but here it’s impossible.
It’s expensive, not impossible.
It’s a business cost and an entrepreneurial risk.
If you can take the risks an pay the costs, that business it not for you.
It’s only a higher bar if you look at it from the perspective of “some entity replacing a human.” If you look at it from the perspective of a tool created by a company, the focus should be ok whether there was negligence in the implementation of the system.
It might be acceptable and understandable for the average human to not be able to react that fast. It would not be acceptable and understandable for the engineers on a self-driving car project to write a system that can’t detect an unobstructed object straight ahead, for the management to sign off on testing, etc.
But did GitHub really centralize something decentralized? Git, as a VCS is still decentralized, nearly everyone who seriously uses it has a git client on their computer, and a local repository for their projects. That part is still massively decentralized.
GitHub as a code sharing platform, that allows issues to be raised and discussed, patches/pull requests to be submitted, etc. didn’t previously exist in a decentralized manner. There seems to have always been some central point of reference, be it website or just a mailing list. It’s not as if whole project were just based around cc’ing email to one another all the time. How would new people have gotten involved if that were the case?
The only thing I could see as centralising is the relative amount of project hosted on GitHub, but that isn’t really a system which can be properly described as “decentralized” or “centralized”..,
It’s the degree to which people are dependent on the value-adds that github provides beyond git. It’s like a store having a POS that relies on communication with a central server. Sure, they can keep records on paper do sales but it’s not their normal course, so they don’t. This comment on HN sums it up: https://news.ycombinator.com/item?id=16124575
Email would be a prominent one. Most people (and I can’t say I am innocent) use gmail, hotmail, yahoo mail, etc. I belive there is some general law that describes this trend in systems, which can then be applied to the analysis of different topics, for example matter gathering in around other matter in physics or money accumulating itself around organization with more money, etc.
On the other side you have decentralized systems which didn’t really centralized significantly, for whatever reason, such as IRC, but which had a decrease in users over time, which I also find to be an interesting trend.
Many businesses run their own email server and also I don’t have to sign up to gmail to send a gmail user an email but I do have to sign up to github.
A tendency towards centralisation doesn’t mean that no smaller email servers exist, I’m sorry if you misunderstood me there. But on the other hand, I have heard of quite a few examples where businesses just use gmail with a custom domain, so there’s that.
And it’s true that you don’t have to be on gmail to send an email to a hotmail server, for example, but most of the time, if just a normal person were to set up their mail server, all the major mail providers automatically view this new host as suspicious and potentially harmful, thus more probably redirecting normal messages as spam. This wouldn’t be that common, if the procentual distribution of mail servers weren’t that centralised.
Did a talk using them. This cuts to the chase: https://www.youtube.com/watch?v=MgbmGQVa4wc#t=11m35s
… federation is about data/communications between servers.. but seeing as you asked, yes it does: https://manpages.debian.org/stretch/git-man/gitweb.1.en.html
Right, and there are literally dozens of git web interfaces. You can “federate” git and use whichever web ui you prefer.
But you then miss out on issue tracking, PR tracking, stats, etc. I agree that Git itself provides a decentralized version control system. That’s the whole point. But a federated software development platform is not the same thing. I would personally be very interested to see a federated or otherwise decentralized issue tracking, PR tracking, etc platform.
EDIT: I should point out that any existing system on par with Gitea, Gogs, GitLab, etc could add ActivityPub support and instantly solve this problem.
git-appraise exists. Still waiting for the equivalent for issues to come along.
huh git appraise is pretty cool.
I was going to suggest some kind of activitypub/ostatus system for comments. A bit like peertube does to manage comments. But a comment and issue system that is contained within the history of the project would be really interesting. Though it would make git repos take a lot more space for certain projects no?
I’d assume that those could potentially be compressed but yes. It’s definitely not ideal. https://www.fossil-scm.org/index.html/doc/tip/www/index.wiki
^^^^ Unless I’m mistaken, Fossil also tracks that kind of stuff internally. I really like the idea that issues, PRs, and documentation could live in the same place, mostly on account of being able to “go back in time”, and see when you go back to a given version, what issues were open. Sounds useful.
BugsEverywhere (https://gitlab.com/bugseverywhere/bugseverywhere), git-issues (https://github.com/duplys/git-issues), sit (https://github.com/sit-it/sit) all embed issues directly in the git repo.
Don’t blame the tool because you chose a service that relies on vendor lock-in.
If I recall correctly the problem here is that to create an issue you need write access to the git repo.
Having issues separated out of the repositories can make it easier, if the web interface can federate between services, that’s even better. Similar to Mastodon.
There’s nothing to say that a web interface couldnt provide the ability for others to submit issues.
Right, and there are literally dozens of git web interfaces.
Literally dozens of git web interfaces the majority of developers either don’t know or care about. The developers do use GitHub for various reasons. voronoipotato and LeoLamda saying a “federated Github” means the alternative needs to look like or work with Github well enough that those using Github, but ignoring other stuff you mentioned, will switch over to it. I’m not sure what that would take or if it’s even legal far as copying appearance goes. It does sound more practical goal than telling those web developers that there’s piles of git web interfaces out there.
Im going to respond to two points in reverse order, deliberately:
or care about.
Well, clearly the person I replied to does care about a git web interface that isn’t reliant on GitHub.com. Otherwise, why would they have replied?
Literally dozens of git web interfaces the majority of developers either don’t know [about]
Given the above - The official git project’s wiki has a whole page dedicated to tools that work with git, including web interfaces. That wiki page is result 5 in google and result 3 in duckduckgo when searching for “git web interface”. If a developer wants a git web interface, and can’t find that information for themselves, nothing you, or I or a magic genie does will help them.
It won’t happen for a while due to network effects. They made it easy to get benefits of a DVCS without directly dealing with one. Being a web app, it can be used on any device. Being free, that naturally pulls people in. There’s also lots of write-ups on using it or solving problems that are a Google away due to its popularity. Any of these can be copied and improved on. The remaining problem is huge amount of code already there.
The next solution won’t be able to copy that since it’s a rare event in general. Like SourceForge and Github did, it will have to create a compelling reason for massive amounts of people to move their code into it while intentionally sacrificing the benefits of their code being on Github specifically. I can’t begin to guess what that would take. I think those wanting no dependency on Github or alternatives will be targeting a niche market. It can still be a good one, though.
I hear the ‘network effects’ story every time, but we are not mindless automatons who have to use github because other people are doing it. I’m hosting the code for my open source projects on a self-hosted gitlab server and i’m getting contributions from other people without problems. Maybe it would be more if the code was on github, but being popular isn’t the most important thing for everyone.
Just look at sourceforge, if everyone had to set up their own CVS/SVN server back in the say do you think all those projects would have made it onto the internet?
Now we have a similar situation with got, if GitHub/Bitbucket/etc. didn’t exist I’m sure most people would have stuck with sourceforge (Or not bothered if they had to self host).
You can also look at Googlecode to see the problem with not reaching critical mass (IMHO). There were some high profile projects there, but then I’m sure execs said, why are we bothering to host 1% (A guess) of what is on GitHub?
‘Network effects’ doesn’t mean you’re mindless automatons. It means people are likely to jump on bandwagons. It also means that making it easy to connect people together, esp removing friction, makes more of them do stuff together. The massive success of Github vs other interfaces argues my point for me.
“Maybe it would be more if the code was on github”
That’s what I telling you rephrased. Also, expanded to the average project as some will get contributions, some won’t, etc.
I thought about a project along these lines a while ago. Something along the lines of cgit, which could offer a more or less clean and consistent UI, and a easy to set up backend, making federation viable in the first place. Ideally, it wouldn’t even need accounts, instead Email+GPG could be used, for example by including an external mailing list into the repo, with a few addition markup features, such as internal linking and code highlighting. This “web app” would then effectively only serve as an aggregator of external information, onto one site, making it even easier to federate the entire structure, since the data wouldn’t even be necessarily bound to one server! If one were to be really evil, one could also use GitHub as a backend…
I thought about all of this for a while, but the big downsides from my perspective seemed to be a lack of reliability on servers (which is sadly something we have come to expect with tools such as NPM and Go’s packaging), asynchronous updates could mess stuff up, unless there were to be a central reference repo per project, and the social element in social coding could be hard to achieve. Think of stars, followings, likes, fork overviews, etc. these are all factors which help projects and devs display their reputation, for better or for worse.
Personally, I’m a bit sceptical that something along these lines would manage to have a real attractiveness, at least for now.
Lacks a web interface, but there are efforts to use ipfs for a storage backend.
I think there have been proposals for gitlab and gitea/gogs to implement federated pull request. I would certainly love it since I stuff most of my project into my personal gitea instance anyway. Github is merely a code mirror where people happen to be able to file issues.
I’m personally a bit torn if a federated github-like should handle it like a fork, ie, if somebody opens an issue they do it on their instance and you get a small notification and you can follow the issue in your own repo
Or if it should merely allow people to use my instance to file issues directly there like with OAuth or OpenID Connect. Probably something we’ll have to figure out in the process.
just make it work like gnusocial/mastodon. username@server.com posted an issue on your repo. You can block server, have a whitelist, or let anyone in the world is your oyster.
I always thought it would be neat to try to implement this via upspin since it already provides identity, permissions, and a global (secure) namespace. Basically, my handwavy thoughts are: design what your “federated github” repo looks like in terms of files. This becomes the API or contract for federation. Maybe certain files are really not files but essentially RPCs and this is implemented by a custom upspin server. You have an issue directory, your actually git directory, and whatever else you feel is important for managing a software project on git represented in a file tree. Now create a local stateless web interface that anyone can fire up (assuming you have an upspin user) and now you can browse the global upspin filesystem and interact with repos ,make pull requests, and file issues.
I was thinking that centralized versions of this could exist like github for usability for most users. In this case users’ private keys are actually managed by the github like service itself as a base case to achieve equal usability for the masses. The main difference is that the github like service exports all the important information via upspin for others to interact with via their own clients.
Like the author, I’ve been at this a long time. Most recently, I’ve been working with people who are practicing ‘mob programming.’ I like it but I have mixed feelings about it. As a technical coach, it’s great. When you have a design discussion, everyone is there. Learning and knowledge diffusion are maximized - but I can’t help feeling that it really is slower and not ideal for all circumstances. Distributed work has been around for decades but the trend to make it “the way we do things” at companies is recent. Now people are saying it’s not just convenient, it’s better.
It’s interesting that we have these two diametrically opposed ways of working and advocates of each don’t just say “Hey, I like to work this way.” They proselytize and make the pitch that their way of working is best. In fairness, the author is saying something more nuanced, but yes, it is a pitch. I agree with him when he says context matters. It’s ok to admit that preference is a part of the context.