I’ve been thinking a lot about how AI changes programming, and what effects it will have on programming languages and tools. Would love to hear thoughts (about darklang or in general)
I suspect the languages which embed the most semantic information about the problem within the source code will become more important.
The biggest change will be from programming languages focusing on describing what to do (“terseness” and “expressiveness”), to describing constraints on inputs and outputs. Rather than writing code directly, specifying and verifying tests (unit, property, etc.), types and their invariants, and pre/post-conditions will be much more prominent.
Hm yeah I was thinking something similar the other day …
The complement of that could be that languages that DISCARD the unimportant info could also be more easily written by LLMs. Languages which are high level, memory safe, etc.
(The downside of Middle Out is basically the curse of Lisp, although we’re doing things to avoid that )
I wasn’t very excited about LLMs, since I think they could create a testing burden and so forth.
I tried ChatGPT on 3 problems I had recently, and it didn’t quite help me, but I could see how it COULD possibly help me. (It also flat out lied a few times, which is still kinda shocking, but I’m getting used to it by Googling everything it says afterward.)
So I guess now I’m thinking of it like Google and StackOverflow. Would I want to go back to a world without those? No.
Did they put me out of a job? No again. Probably the opposite was true – I might not have had a programming job if I could only look stuff up in stacks of paper manuals :) It was way more tedious back then.
Does StackOverflow create a testing burden? Probably a little bit. But on balance I would rather it exist.
Dependently typed languages sound like a perfect match for that. Especially Idris2 which is designed around programming being a conversation with the type checker. One problem with dependently typed languages is that one has to find the right balance of how much information to put in the types before they stop helping you and start getting in the way. Code assistant could change that.
Maybe we will get that with lean4? It’s already a Microsoft project but I would guess that being able to annotate what parts of the type should be erased at runtime like one can do in Idris2 would really be helpful for a code assistant.
But I’m too pessimistic to think that that’s what we’re are gonna get. Programming language theory and type theory are usually just ignored.
I see a lot of “word salad” in this post that’s close to something I know as a software engineer but far enough away that I can’t precisely understand what the author is talking about. For example:
Darklang’s major features: Deployless, Invisible Infrastructure, and Trace-Driven Development, were all difficult before Dark, because they occurred at the intersection points of the editor, language and infra. By removing that intersection in Darklang, these features were very obvious and fell out fairly naturally.
I’ve been programming for nearly 30 years and I have no idea what that means. Anyone want to ELI5? The post gives the vibe of “programming talk” for sure, but it’s also nonsense. Anyone else see that?
Hey, sorry those are features of Dark. I didn’t explain them in the post cause it got too wordy (and the target audience was kinda Darklang users who would somewhat know what they are).
I rescued this from a previous draft:
Darklang is a combined programming language, editor, and cloud platform. The integration of the three parts of the development gives us some really cool features:
“invisible infra” - in Darklang, you don’t provision databases or servers or even serverless things, it’s all just sorta there for you without having to think about it
I’ve been programming for over 30 years, and I have questions.
“invisible infra” …
At some point, there is some computer somewhere that runs the database and runs the code. Someone has had to set that up. How does it end up being “just sorta there for you without having to think about it”?
“deployless” …
So bugs get through to production in 50ms (but as stated elsewhere, the buggy code can be reverted in 50ms as well). I know I don’t trust this, and neither would my manager at my previous job (who didn’t want me to fix a bug based on undefined behavior, but that’s another story).
“trace-driven development” …
Okay, how does this work? Which user? Does this run afoul of the GDPR? Isn’t this a form of telemetry that is currently being argued over in some other threads?
Okay, let me ask—what why of development are you targeting? Or platform? Because a lot of the tech I see is geared for web delivery, and that’s just not the type of work I do.
EDIT: Okay, I checked the site, and it seems to be for web applications. Not the type of work I do (or did).
At some point, there is some computer somewhere that runs the database and runs the code. Someone has had to set that up. How does it end up being “just sorta there for you without having to think about it”?
We run it. You code in our editor and the code your right “just runs” - which is to say you write a DB schema and can just immedately store data, emit to a worker you havent initialized, write the HTTP handler and it’s just there (no running a webserver or LB or whatever). Here’s a demo: https://youtu.be/iUDjx1HdA5k?t=506
So bugs get through to production in 50ms (but as stated elsewhere, the buggy code can be reverted in 50ms as well). I know I don’t trust this, and neither would my manager at my previous job (who didn’t want me to fix a bug based on undefined behavior, but that’s another story).
The idea is you use feature flags, so the code is in production but that doesn’t mean the users are running it. Basically, collapse the concepts of environments, branches, and feature flags into just one concept (feature flags). The blog I linked above goes into detail.
Okay, how does this work? Which user? Does this run afoul of the GDPR? Isn’t this a form of telemetry that is currently being argued over in some other threads?
All requests are stored, all the time. If there’s too many we revert to sampling. Almost certainly a GDPR violation, no reason you can’t control that though. Yes, it’s ver similar to open telemetry/observability, the difference is it’s automatic, tailors for the language, and complete. Also the traces are combined with an interpreter in the editor to fill in the blanks and update as you change code, so you see actual values. It’s basically just a perfect debugger that you don’t need to enable or replay.
Okay, let me ask—what why of development are you targeting? Or platform? Because a lot of the tech I see is geared for web delivery, and that’s just not the type of work I do.
Do we expect that in 3-5 years time developers will still be typing out artisanal code at a keyboard?
I expect that we will still be doing this, quite a bit, if only because AI will only have the patterns that were on the internet.
Also, because adversarial code reading is currently vastly under practiced compared to defensive code writing, and that takes longer than 3-5 years to shift across the industry.
Also, people are still writing Fortran and Cobol and running their code on mainframes. Things take a long time to change, and there’s always a long tail.
That might not require changing the languages themselves!
For example, if you can have a $LANG <-> $L translation, where $L is a “compressed” version of $LANG optimized for model consumption, but which can be losslessly re-expanded into $LANG for human consumption, that might get you close enough to what you’d get from a semantically terser language that you’d rather continue to optimize for human consumption in $LANG.
So all those years of golfing in esolangs will pay off?? I’ve thought about this too, and you might be able to store more code in your context window if the embeddings are customized for your language, like a Python specific model compressing all Python keywords and common stdlib words to 1 or 2 bytes. TabNine says they made per-language models, so they may already exhibit this behavior.
Or perhaps there will be huge investment in important language models like python, and none for Clojure. I have a big fear around small languages going away - already it’s hard to find SDKs and standard tools for them.
I don’t think small languages will be going anywhere. For one thing, they’re small, which means the level of effort to keep them up and going isn’t nearly as large as popular ones. For another, FFI exists, which means that you often have access to either the C ecosystem, or the system of the host language, and a loooot of SDKs are open source these days, so you can peer into their guts and pull them into your libraries as needed.
Small languages are rarely immediately more productive than more popular ones, but the sort of person that would build or use one (hi!) isn’t going to disappear because LLMs are making the larger language writing experience more automatic. Working in niche programming spaces does mean you have to sometimes bring or build your own tools for things. When I was writing a lot of Janet, I ended up building various things for myself.
I think that copyright will not survive this sort of shift. One feature I’ve tried to build in Cammy – and which I don’t see in Unison or Dark – is the ability to assimilate unknown hives, adding their optimizations and functions to the current project. I’m not talking about adding packages or frameworks, but a kind of image-based development where images can be monoidally merged. Such a system no longer can decide who authored what first, and thus cannot support copyright.
I think that some sort of parameter-efficient fine-tuning is required for languages that aren’t literally Python, Java, or C. The shape of the training data matters a lot. For most languages, the correct framing is text completion of a module of source code; however, for Cammy, I’m thinking of a more direct text-to-text transformation which sends human-written documentation (“docstrings”, “trails”, etc.) to Cammy expressions. This should work because every Cammy expression is a valid program and composition in Cammy is homomorphic; I’m not sure whether Unison or Dark have this property.
I’ll add technical details (JSON encoding) to the wiki page in the sibling comment, but I’ll explain the relevant high-level parts here.
Cammy is just a basic syntax for categorical logic. We can use any syntax with trivial alpha-equivalence (no variables or lambdas), really; Cammy’s keywords are drawn from standard academic literature and alternative keywords are documented.
A hive is essentially a union-find structure with bookmarks. Each time an expression is added to the hive, it is hash-cons’d in terms of its components. If we can find a more efficient expression which is equivalent, then we can simplify the entire hive by replacing a single reference. Because the semantics are always homomorphic (each semantics is a category!) this is always valid, even for user-defined templates. Instead of mutating the hive directly, users mutate their collection of bookmarks.
For a concrete example, the other day I found that I had accidentally computed 7 instead of 8. My bookmark was called nat/8, so I changed the bookmark to be nat/7 instead. As I understand it, this is a dramatic contrast with Unison or Dark, which instead focus on mutating versioned (content-addressed) modules.
In order to make all of this practical at scale, we do need some sort of aggressive algebraic optimizer which can improve both small and large expressions (here’s mine, using egg) and also a GC for hives (not yet written). But the overall upshot is that hive data is homogenous and easy to exfiltrate, making it impractical to build up a private repository of Cammy expressions.
I’m genuinely not sure whether Cammy expressions are copyrightable. Documentation, including docstrings, are copyrightable. I pair the documentation with the bookmarks, rather than with the expressions; if a corporate hive leaked, then we would require a communal practice of stripping the bookmarks prior to redistribution. I’m working on implementing basic theorem-proving, and theorem documentation would also be easily stripped.
Such a system no longer can decide who authored what first, and thus cannot support copyright.
This statement confuses me. I think a system does not have to have direct support for copyright—indeed, most do not—to be susceptible to it. If I write something down on a piece of paper, and you write down the same thing on another piece of paper, the pieces of paper have no sense of when they were written on (to a first approximation, at any rate); and we also know that real-world events have a tendency to not be totally ordered. Yet, copyright still applies to things written on paper.
So if I put in my shadow hive a routine with the comment ‘all rights reserved; do not redistribute’, and you grab that routine and start distributing it, I think I have a solid case for suing you. (I don’t know if you have comments, but I expect it is possible to encode arbitrary bitstrings somehow as cammy code.)
I think that copyright will not survive this sort of shift.
An interesting proposition. The tech companies seem to be the ones which wield the most power and influence at the moment; they stand to benefit the most from language models; and their unrestricted usage is contingent on there being no copyright. But it doesn’t seem particularly likely to me. What seems more likely is that using ml to launder copyright remains illegal, and indeterminate cases remain in limbo and continue to be decided on a case-by-case basis.
There’s no comments, of course; they can’t be algebraically rewritten. I see your point, but consider the following inductive argument. Small expressions aren’t copyrightable; they’re just too common and easy to find by computer search. Compositions of expressions (including applications of supercombinators) are copyrightable, but any valid composition in a hive can be found by computer search too, because all expressions are simply typed. This makes a mockery out of the originality requirement: should I really be entitled to copyright just because I took two existing expressions in a hive and composed them? And then note that every big expression contains a composition, by the pigeonhole principle.
Copyright only works for programmers as long as the typical programs are (quoting Feist) “the fruits of intellectual labor”. In this model, one programmer works hard and spends a lot of time writing programs which are cheap to republish. However, we are moving towards a model in which the hard-working programmer produces program-producing programs; the toolchains and compilers are copyrightable, but their outputs may not be.
If there’s enough time before super-intelligent AI completely takes over all aspects of software development, I suspect we’ll see more interest in ultra high level (but still executable) specification languages, which are “compiled” to lower level languages and/or machine code by LLMs. A properly trained LLM (which likely will require RL) should be able to crank out optimized code much faster than humans. That code could then be compared against the specification using a combination of proofs (written by the LLM), model checking, and testing, in a fully automated loop until a desired performance and correctness threshold is achieved.
That depends on whether we continue to consider aggregation for AI training to be fair use. If we start insisting that people be paid for the training data they create, then probably not much.
I suspect the languages which embed the most semantic information about the problem within the source code will become more important.
The biggest change will be from programming languages focusing on describing what to do (“terseness” and “expressiveness”), to describing constraints on inputs and outputs. Rather than writing code directly, specifying and verifying tests (unit, property, etc.), types and their invariants, and pre/post-conditions will be much more prominent.
Hm yeah I was thinking something similar the other day …
The complement of that could be that languages that DISCARD the unimportant info could also be more easily written by LLMs. Languages which are high level, memory safe, etc.
If true, this bodes well for the “Middle Out” style :-)
(The downside of Middle Out is basically the curse of Lisp, although we’re doing things to avoid that )
I wasn’t very excited about LLMs, since I think they could create a testing burden and so forth.
I tried ChatGPT on 3 problems I had recently, and it didn’t quite help me, but I could see how it COULD possibly help me. (It also flat out lied a few times, which is still kinda shocking, but I’m getting used to it by Googling everything it says afterward.)
So I guess now I’m thinking of it like Google and StackOverflow. Would I want to go back to a world without those? No.
https://lobste.rs/s/iualxr/ai_enhanced_development_makes_me_more#c_ubybec
Did they put me out of a job? No again. Probably the opposite was true – I might not have had a programming job if I could only look stuff up in stacks of paper manuals :) It was way more tedious back then.
Does StackOverflow create a testing burden? Probably a little bit. But on balance I would rather it exist.
Dependently typed languages sound like a perfect match for that. Especially Idris2 which is designed around programming being a conversation with the type checker. One problem with dependently typed languages is that one has to find the right balance of how much information to put in the types before they stop helping you and start getting in the way. Code assistant could change that. Maybe we will get that with lean4? It’s already a Microsoft project but I would guess that being able to annotate what parts of the type should be erased at runtime like one can do in Idris2 would really be helpful for a code assistant.
But I’m too pessimistic to think that that’s what we’re are gonna get. Programming language theory and type theory are usually just ignored.
I see a lot of “word salad” in this post that’s close to something I know as a software engineer but far enough away that I can’t precisely understand what the author is talking about. For example:
I’ve been programming for nearly 30 years and I have no idea what that means. Anyone want to ELI5? The post gives the vibe of “programming talk” for sure, but it’s also nonsense. Anyone else see that?
Hey, sorry those are features of Dark. I didn’t explain them in the post cause it got too wordy (and the target audience was kinda Darklang users who would somewhat know what they are).
I rescued this from a previous draft:
I’ve been programming for over 30 years, and I have questions.
At some point, there is some computer somewhere that runs the database and runs the code. Someone has had to set that up. How does it end up being “just sorta there for you without having to think about it”?
So bugs get through to production in 50ms (but as stated elsewhere, the buggy code can be reverted in 50ms as well). I know I don’t trust this, and neither would my manager at my previous job (who didn’t want me to fix a bug based on undefined behavior, but that’s another story).
Okay, how does this work? Which user? Does this run afoul of the GDPR? Isn’t this a form of telemetry that is currently being argued over in some other threads?
Okay, let me ask—what why of development are you targeting? Or platform? Because a lot of the tech I see is geared for web delivery, and that’s just not the type of work I do.
EDIT: Okay, I checked the site, and it seems to be for web applications. Not the type of work I do (or did).
We run it. You code in our editor and the code your right “just runs” - which is to say you write a DB schema and can just immedately store data, emit to a worker you havent initialized, write the HTTP handler and it’s just there (no running a webserver or LB or whatever). Here’s a demo: https://youtu.be/iUDjx1HdA5k?t=506
The idea is you use feature flags, so the code is in production but that doesn’t mean the users are running it. Basically, collapse the concepts of environments, branches, and feature flags into just one concept (feature flags). The blog I linked above goes into detail.
All requests are stored, all the time. If there’s too many we revert to sampling. Almost certainly a GDPR violation, no reason you can’t control that though. Yes, it’s ver similar to open telemetry/observability, the difference is it’s automatic, tailors for the language, and complete. Also the traces are combined with an interpreter in the editor to fill in the blanks and update as you change code, so you see actual values. It’s basically just a perfect debugger that you don’t need to enable or replay.
Yeah, it’s all for cloud/web.
I expect that we will still be doing this, quite a bit, if only because AI will only have the patterns that were on the internet.
Also, because adversarial code reading is currently vastly under practiced compared to defensive code writing, and that takes longer than 3-5 years to shift across the industry.
Also, people are still writing Fortran and Cobol and running their code on mainframes. Things take a long time to change, and there’s always a long tail.
I think, paradoxically, we’re going to see more ultra-terse languages, so that the AI can store more context and you can save money on tokens.
That might not require changing the languages themselves!
For example, if you can have a $LANG <-> $L translation, where $L is a “compressed” version of $LANG optimized for model consumption, but which can be losslessly re-expanded into $LANG for human consumption, that might get you close enough to what you’d get from a semantically terser language that you’d rather continue to optimize for human consumption in $LANG.
So all those years of golfing in esolangs will pay off?? I’ve thought about this too, and you might be able to store more code in your context window if the embeddings are customized for your language, like a Python specific model compressing all Python keywords and common stdlib words to 1 or 2 bytes. TabNine says they made per-language models, so they may already exhibit this behavior.
Or perhaps there will be huge investment in important language models like python, and none for Clojure. I have a big fear around small languages going away - already it’s hard to find SDKs and standard tools for them.
I don’t think small languages will be going anywhere. For one thing, they’re small, which means the level of effort to keep them up and going isn’t nearly as large as popular ones. For another, FFI exists, which means that you often have access to either the C ecosystem, or the system of the host language, and a loooot of SDKs are open source these days, so you can peer into their guts and pull them into your libraries as needed.
Small languages are rarely immediately more productive than more popular ones, but the sort of person that would build or use one (hi!) isn’t going to disappear because LLMs are making the larger language writing experience more automatic. Working in niche programming spaces does mean you have to sometimes bring or build your own tools for things. When I was writing a lot of Janet, I ended up building various things for myself.
Timely that I’ve started learning https://mlochbaum.github.io/BQN :)
Perl’s comeuppance!
I think that copyright will not survive this sort of shift. One feature I’ve tried to build in Cammy – and which I don’t see in Unison or Dark – is the ability to assimilate unknown hives, adding their optimizations and functions to the current project. I’m not talking about adding packages or frameworks, but a kind of image-based development where images can be monoidally merged. Such a system no longer can decide who authored what first, and thus cannot support copyright.
I think that some sort of parameter-efficient fine-tuning is required for languages that aren’t literally Python, Java, or C. The shape of the training data matters a lot. For most languages, the correct framing is text completion of a module of source code; however, for Cammy, I’m thinking of a more direct text-to-text transformation which sends human-written documentation (“docstrings”, “trails”, etc.) to Cammy expressions. This should work because every Cammy expression is a valid program and composition in Cammy is homomorphic; I’m not sure whether Unison or Dark have this property.
Interesting! Where can I read about Cammy - couldn’t find it using google.
I’ll add technical details (JSON encoding) to the wiki page in the sibling comment, but I’ll explain the relevant high-level parts here.
Cammy is just a basic syntax for categorical logic. We can use any syntax with trivial alpha-equivalence (no variables or lambdas), really; Cammy’s keywords are drawn from standard academic literature and alternative keywords are documented.
A hive is essentially a union-find structure with bookmarks. Each time an expression is added to the hive, it is hash-cons’d in terms of its components. If we can find a more efficient expression which is equivalent, then we can simplify the entire hive by replacing a single reference. Because the semantics are always homomorphic (each semantics is a category!) this is always valid, even for user-defined templates. Instead of mutating the hive directly, users mutate their collection of bookmarks.
For a concrete example, the other day I found that I had accidentally computed 7 instead of 8. My bookmark was called
nat/8
, so I changed the bookmark to benat/7
instead. As I understand it, this is a dramatic contrast with Unison or Dark, which instead focus on mutating versioned (content-addressed) modules.In order to make all of this practical at scale, we do need some sort of aggressive algebraic optimizer which can improve both small and large expressions (here’s mine, using egg) and also a GC for hives (not yet written). But the overall upshot is that hive data is homogenous and easy to exfiltrate, making it impractical to build up a private repository of Cammy expressions.
Interesting. Though copyright still exists I presume. “What color are your bits”, right?
I’m genuinely not sure whether Cammy expressions are copyrightable. Documentation, including docstrings, are copyrightable. I pair the documentation with the bookmarks, rather than with the expressions; if a corporate hive leaked, then we would require a communal practice of stripping the bookmarks prior to redistribution. I’m working on implementing basic theorem-proving, and theorem documentation would also be easily stripped.
[Comment removed by author]
This statement confuses me. I think a system does not have to have direct support for copyright—indeed, most do not—to be susceptible to it. If I write something down on a piece of paper, and you write down the same thing on another piece of paper, the pieces of paper have no sense of when they were written on (to a first approximation, at any rate); and we also know that real-world events have a tendency to not be totally ordered. Yet, copyright still applies to things written on paper.
So if I put in my shadow hive a routine with the comment ‘all rights reserved; do not redistribute’, and you grab that routine and start distributing it, I think I have a solid case for suing you. (I don’t know if you have comments, but I expect it is possible to encode arbitrary bitstrings somehow as cammy code.)
An interesting proposition. The tech companies seem to be the ones which wield the most power and influence at the moment; they stand to benefit the most from language models; and their unrestricted usage is contingent on there being no copyright. But it doesn’t seem particularly likely to me. What seems more likely is that using ml to launder copyright remains illegal, and indeterminate cases remain in limbo and continue to be decided on a case-by-case basis.
There’s no comments, of course; they can’t be algebraically rewritten. I see your point, but consider the following inductive argument. Small expressions aren’t copyrightable; they’re just too common and easy to find by computer search. Compositions of expressions (including applications of supercombinators) are copyrightable, but any valid composition in a hive can be found by computer search too, because all expressions are simply typed. This makes a mockery out of the originality requirement: should I really be entitled to copyright just because I took two existing expressions in a hive and composed them? And then note that every big expression contains a composition, by the pigeonhole principle.
Copyright only works for programmers as long as the typical programs are (quoting Feist) “the fruits of intellectual labor”. In this model, one programmer works hard and spends a lot of time writing programs which are cheap to republish. However, we are moving towards a model in which the hard-working programmer produces program-producing programs; the toolchains and compilers are copyrightable, but their outputs may not be.
If there’s enough time before super-intelligent AI completely takes over all aspects of software development, I suspect we’ll see more interest in ultra high level (but still executable) specification languages, which are “compiled” to lower level languages and/or machine code by LLMs. A properly trained LLM (which likely will require RL) should be able to crank out optimized code much faster than humans. That code could then be compared against the specification using a combination of proofs (written by the LLM), model checking, and testing, in a fully automated loop until a desired performance and correctness threshold is achieved.
That depends on whether we continue to consider aggregation for AI training to be fair use. If we start insisting that people be paid for the training data they create, then probably not much.