1. 7

I’m increasingly of the opinion that it’s better to have gotchas than two subtly ways to do something.

1.

Yeah. They talk about this as removing a “wart”, but nothing is actually being removed from the language! You cannot eliminate a gotcha by adding to the language: anyone who knows to use the late binding syntax has already sidestepped the gotcha. The advantage of this proposal isn’t that it removes a wart or fixes a gotcha, it’s that it improves convenience for those who already know Python’s default value behavior.

1.

I am hopeful that once this gets added, the previous default-argument syntax will get increasingly flagged by linters and such, reserving that syntax exclusively for optimization, and preventing newbies from accidentally stepping on it.

1. 4

I can’t help but think that what they call Late bound argument default is required only because of the design mistake of initializing the parameter default value at the definition time. If this proposal gets us the normal default values for parameters (even if under a slightly surprising syntax), I would be really happy.

(This is the current proposal)

def append_to(element, to=None):
if to is None: to = []
to.append(element)


To

def append_to(element, to=:[]):
to.append(element)

1. 8

Python’s approach may require fewer concepts / abstractions and may be one of the reasons of it’s success, even if lately with the latest releases they seem be quickly increase in number (I’m thinking about async/await, typing, walrus and pattern matching). The real enpowerer of Ruby mental model seems to be the code block, which is something at same time more powerful and expressive than Python’s lambdas.

The Python’s for enabled class is complete fiction:

class Stuff:
def __init__(self):
self.a_list = [1,2,3,4]
self.position = 0
def __next__(self):
try:
value = self.a_list[self.position]
self.position += 1
return value
except IndexError:
self.position = 0
raise StopIteration
def __iter__(self):
return self


A shared position? What’s for? You would instead code or see something like:

class Stuff:
def __init__(self):
self.a_list = [1,2,3,4]
def __iter__(self):
for item in self.a_list:
yield item


which is way more similar to the Ruby’s one. Or:

class Stuff:
def __init__(self):
self.a_list = [1,2,3,4]
def __iter__(self):
return iter(self.a_list)

1. 3

I wouldn’t say the number of required elements really increased that significantly: async/await - it’s optional and trivial if you were doing twisted/green before; typing - completely optional; walrus and pattern matching - those are new elements, but only the patterns go beyond syntax sugar. Python continues to be fairly conservative with regards to actually required changes.

But yeah, the python iteration from the post is super contrived and basically wrong.

1. 1

Shared position is not correct but you’re kinda missing the point. They could’ve omitted the body entirely, what matters is the interface: def __iter__(self) does not take a code block. It is a generator producing values for the outer code that’s driving the outer loop.

1. 1

I’ve not missed that, I was just stating that where the Python source may seem more complex, no one would code an iteration enabled class like that.

2. 1

Python’s approach may require fewer concepts / abstractions

Why do you say that? From working in Ruby some time back, and working with Python now, it seems to me that the amount of abstractions ruby uses at the language level seems to be much smaller than Python. (e.g. No with, no separate concept of iterators, no comprehensions etc. – everything is blocks).

1. 1

The Python’s for enabled class is complete fiction:

Author here, thanks, I kinda knew this when I wrote it and wanted to avoid teaching the user half-a-dozen contepts to get to the more concise version. In the end, nobody would iterate a list of consecutive numbers this way anyway, they’d use range :)

There’s a similar issue in Ruby, you would use Enumerable mixin, not define a select, map etc. But the effect and point is the same.

But perhaps I should put a note to these better practices so nobody is copy-pasting code

1. 3

I have been on the lookout for a good beginner language that is similar to Python, and it seems Kuroko is a strong contender.

1. 1

I’m curious on why you are asking. Having the project contributors’s buy in and commitment is really essential to success. For a large project, consider how Dropbox completed type checking across most of its Python code.

https://dropbox.tech/application/our-journey-to-type-checking-4-million-lines-of-python

1. 1

I want to know what decisions were made in the process. For example, how far did you go in converting certain data structures to use types, and where did you leave it for future (or never), and why, how much of the project were you able to convert, did it result in some of the tests being removed/added?.

1. 1

I see. I’ve only converted small projects with less than 50 files. I was able do conversions one package at a time and complete the conversion over a period of multiple commits.

1. 1

Thanks! Do you have these project repositories online somewhere?

1. 1

Not open source and I’m not able to confirm this personally but instagram?

1. 1

Thanks! I want to have a look at the commits at least, so I can see what kind of discussions took place.

1. 2

pip is pretty heavily annotated, though not 100%

1. 1

Thanks! I will check that out.

1. 6

Zulip project is using mypy since 2016 and has around ~1600 files.

1. 1

Thank you!

1. 14

Flask recently added type stubs, around the time 2.0 was being released. Turns out it was rather easy(?) because it didn’t take that long, was done by a single contributor and took 800 lines of type hints.

As for my experience with adding types to a project, it’s usually fairly seamless, and more often than not, you’ll end up finding subtle bugs in your code instead of the type checker getting in the way.

1. 1

Thanks! Much appreciated.

1. 2

The problem I found with parser combinator libraries is that left recursion seems to be essentially impossible to tackle, thus rendering them useless for simple things as arithmetic expressions. I’ve also heard that Packrat fails to intelligently parse A ::= a A a | \epsilon: presumably, only strings of length a power of two are accepted. All in all, after hearing about PEG being the one true way to solve real life parsing, I’ve been pretty disappointed… Open to refutation though!

1. 4

I think that you can do good engineering work with PEG parsers, even if they don’t handle certain difficult cases. Your job as a programmer is to understand the limitations of the tools available and e.g. rewrite a grammar with a A a  in it to something the parser system can handle. I don’t think the problem of taking a completely arbitrary CFG and generating an efficient parser for it has been solved. https://tratt.net/laurie/blog/entries/parsing_the_solved_problem_that_isnt.html

One of the positives about PEG parsers for parsing things like programming languages is the left biasing is useful to express common PL syntax idioms, where a more theoretically pure description could have ambiguity in it. (relevant terms: ordered choice/soft cut)

If I was using rust I would want to take advantage of nom for the zero copy properties.

1. 2

Interesting link, thanks! The author refers to “boolean grammars”; I wonder if these got any traction, either in research or applications.

2. 4

If you are not too much worried about the performance impact, you can limit the recursion to num(nonterminals) * (1 + |input_length|), which lets you escape the infinite recursion. (e.g)

See “Richard A. Frost, Rahmatullah Hafiz, and Paul C. Callaghan. Modular and efficient top-down parsing for ambiguous left recursive grammars. IWPT 2007”

1. 3

There’s a clever way that PEG / packrat parsers can use the memoization cache to support left recursion quite simply: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1

I still think that expressing infix operators with the precedence climbing / Pratt parser algorithm is both more natural to write and more performant than via left recursion, though.

(I maintain another PEG library in Rust that implements both of these.)

1. 1

I looked at your library before deciding on Pest for my attempt at Make A Lisp. It looks quite robust but I had trouble with the pseudo-Rust syntax in the parser macro. What made you decide to go that route instead of a more traditional PEG syntax?

(Not to say Pest is “traditional”, it is fairly different than the PEG grammar described in the original paper, but it’s at least close enough to make translation easy.)

1. 1

Using a procedural macro makes it integrate better into the Rust ecosystem. Many of the tokens are passed through to the generated Rust code with span information intact, so error messages point to the code you wrote rather than a generated file. Even works live in an editor with rust-analyzer! It does mean the syntax has to be compatible with the Rust lexer, though.

The biggest departure in syntax is the use of Rust’s patterns (['a'..='z']) for matching a character vs the regex-like syntax in most other PEG libraries ([a-z]). It’s more verbose, but this is to enable operation over data types other than strings – for example you can use an external lexer and match tokens like [Identifier(n)].

Other than that, it’s heavily inspired by PEG.js and some of the operators like \$ come from there.

2. 1

As far as I can tell, most every left recursive expression can be reformed into a non-left recursive form: term <- term "+" num / num can be rewritten as term <- num ("+" num)*, in the simplest example. This isn’t an ideal solution as the resulting parse tree is not as “natural” as the left recursive parse tree would be, but it is possible.

For your “only even forms” objection, Pest has added specific operators for “range” matching: A = @{ "a" ~ ("a"{2})* } matches 1 “a” plus any number of pairs of “a”. (This could be accomplished more easily with @{ "a" ~ "aa"* } but that’s less fun. ;) )

There are methods for handling left recursion directly (Python’s new PEG parser implements one such method, and the Pegged library for D implements another), but they’re more complicated than the base case.

1. 1

Re left recursion: see here - https://www.youtube.com/watch?v=b2AUW6psVcE Solves left recursion for BNF/EBNF/… https://github.com/Engelberg/instaparse

1. 2

I am actually finding it difficult to figure out from this post which errors were caught by using MyPy as opposed to simply reviewing the code (I am looking for bugs caught because of the better typing.). I found one. Were there any others?

1. 6

surely there are more subtle ones, like these:

keep in mind that urllib3 is a very mature, well-tested library. this is not true to the same extent for many other projects. strict type hinting prevents many subtle bugs, even with good test coverage, especially when callers (accidentally) deviate from ‘intended’ usage.

on top of that, tools that check code means reviews become easier, and the same applies to refactoring, which means it becomes easier to keep a code base maintainable over time.

1. 3

keep in mind that urllib3 is a very mature, well-tested library. this is not true to the same extent for many other projects. strict type hinting prevents many subtle bugs, even with good test coverage, especially when callers (accidentally) deviate from ‘intended’ usage.

I think the question isn’t “did adding Mypy prevent bugs”, but did it prevent enough bugs to justify the investment? He did say that “Adding type hints to urllib3 was clearly a huge amount of work, hundreds of engineer hours across several month”.

1. 8

i am convinced it does, especially if you consider the wider ecosystem.

downstream consumers benefit massively from this. applications with subtle bugs triggering only in production (even with okay test coverage) cost a tremendous amount of engineering work (and overhead). preventing these right at the moment the mistake was introduced (this can literally be seconds after typing the code!) is a huge win.

on top of that it simplifies code reviews for urllib3 itself, prevents regressions, doubles as documentation etc.

1. 1

somewhere on pytype’s infinite todo list is better support for inferring and then automatically adding type hints to a codebase. we had that functionality at one point, but no one really used it (possibly because inference wasn’t as good back then), so it’s unmaintained and needs an overhaul.

1. 3

Perhaps I have missed the point completely, but why should one use a macro for this? Why not a function? (with a lambda parameter)?

1. 2

A lambda can be slightly less efficient (depending on the compiler) but mostly it screws up the indentation real bad. Not a huge deal really and doing it without a macro would be fine too.

1. 4

What’s this? How does a pretty simple snake turn into a chimera? Because, space like black hole? Do we really paint flowers and rainbow colors so that we can ignore the sharp fang?

1. 4

Do you mean the sharp bang (#!)?

(sorry)

1. 8

Missing: Data flow languages represented by Labview, Lusture etc.

1. 1

Interesting suggestion. This may just be a gap in my background. What would you say the key mental structures you get from them are?

1. 3

Dataflow programming is about modelling continuous systems. The focus is on how the data flows rather than the control flow in other languages. This answer explains it very well.

1. 1

Is there something there that isn’t already present in, say, generators in Python or laziness and composition of functions in Haskell?

1. 2

The entire concept is orthogonal to them, despite the way everything coalesces in the metal. Dataflow programming is much closer to excel than anything else in this list.

1. 1

Thank you. I’ll do some more studying and consider how to update the list.

2. 2

LabVIEW is the predecessor of Unreal’s Blueprints if you want to try these ideas out in a modern setting. It’s also the origins of reactive programming.

1. 3

The paper is here.

1. 19

I have a problem with how some of the results in this paper are interpreted (and that is not considering the statistical problem of how test suite size is accounted for).

The essential trouble here is that the authors are using normalized effectiveness score. What is a normalized effectiveness score? “The normalized effectiveness measurement is the number of mutants a test suite detected divided by the number of non-equivalent mutants it covers”. It is only when the authors compare the correlation of normalized effectiveness score to statement coverage that they find the correlation dropping. It is also from this result that they get the title of the paper.

Now, why is this problematic? Consider, for argument’s sake, that you have perfect test cases. That is, the test case kills any non equivalent-mutant it covers. If so, should we now have high correlation with coverage? If what they are measuring is a proxy for the effectiveness of the test suite, and you have perfect test cases, then you should have high correlation right?

If a test case kills every mutant it covers, then the normalized score for every test suite would be 1. That is, no correlation at all.

My take here is that the results has been misinterpreted here. You can’t take from this paper that coverage is not a good measure for test suite effectiveness.

1. 6

I have been on the lookout for an indentation based language to replace Python for some time now as an introductory language to teach students. Python has too many warts (bad scoping, bad implementation of default parameters, not well-thought-out distinction between statements and expressions, comprehensions are a language within the language that makes student’s life difficult, and so on.). Is Nim the best at this point in this space? Am I missing warts in Nim that makes the grass greener on the other side? Anyone who has experience with both Nim and Python, can you tell me what the trade-offs are?

1. 9

I am uncomfortable with statements like (from this article) “if you know Python, you’re 90% of the way to knowing Nim.” The two languages are not IMO as similar as that. It’s sort of like saying “if you know Java, you’re 90% of the way to knowing C++.” Yes, there is a surface level syntactic similarity, but it’s not nearly as deep as with Crystal and Ruby. Nim is strongly+statically typed, doesn’t have list comprehensions, doesn’t capitalize True, passes by value not reference, has very different OOP, etc.

That said, there’s definitely evidence that Nim has a smooth learning curve for Pythonistas! This isn’t the first article like this I’ve read. Just don’t assume that whatever works in Python will work in Nim — you don’t want to be like one of those American tourists who’s sure the locals will understand him if he just talks louder and slower :)

So yes, Nim is excellent. It’s quite easy to learn, for a high performance compiles-to-machine-code language; definitely easier than C, C++ or Rust. (Comparable to Go, but for various reasons I prefer Nim.) When programming in it I frequently forget I’m not using a scripting language!

1. 2

Thank you for your perspective. Much appreciated.

1. 1

passes by value not reference

The terminology here is very muddied by C, so forgive me if this sounds obvious, but do you mean that if you pass a data structure from one function to another in Nim, it will create a copy of that data structure instead of just passing the original? That seems like a really odd default for a modern language to have.

1. 4

At the language level, it’s passing the value not a reference. Under the hood it’s passing a pointer, so this isn’t expensive, but Nim treats function arguments as immutable, so it’s still by-value semantically: if I pass an array or object to a function, it can’t modify it.

Obviously you don’t always want that. There is a sort-of-kludgey openarray type that exists as a parameter type for passing arrays by reference. For objects, you can declare a type as ref which makes it a reference to an object; passing such a type is passing the object by reference. This is very common since ref is also how you get dynamic allocation (with GC or more recently ref-counting.) It’s just like the distinction in C between Foo and *Foo, only it’s a safe managed pointer.

This works well in practice (modulo some annoyance with openarray which I probably noticed more than most because I was implementing some low-level functionality in a library) … but this is going to be all new, important info to a Python programmer. I’ve seen this cause frustration when someone approaches Nim as though it were AOT-compiled Python, and then starts either complaining or asking very confused questions on the Nim forum.

I recommend reading the tutorial/intro on the Nim site. It’s well written and by the end you’ll know most of the language. (Even the last part is optional unless you’re curious about fancy stuff like macros.)

(Disclaimer: fate has kept me away from Nim for about 6 months, so I may have made some dumb mistakes in my explanation.)

1. 4

Gotcha; I see. I wonder if it’d be clearer if they just emphasized the immutability. Framing it in terms of “by value” opens up a big can of worms around inefficient copying. But if it’s just the other function that’s prevented from modifying it, then the guarantee of immutability isn’t quite there. I guess none of the widely-understood terminology from other languages covers this particular situation, so some new terminology would be helpful.

2. 5

Python has too many warts (bad scoping, bad implementation of default parameters

I don’t want to sound like python fanboy, but those reasons are very weak. Why do you need to explore the corner cases of scoping? Just stick to a couple of basic styles. Relyokg on many scoping rules is a bad idea anyways. Why do you need default parameters at all. Many languages have no support for default parameters and do fine. Just don’t use them if you think their implementation is bad.

Less is more. I sometimes flirt with the idea of building a minimal indendtation based language with just a handful of primitives. Just as a proof of concept of the practicallity os something very simpl and minimal.

1. 7

At least for python and me, it’s less a matter of exploring the corner cases in the scoping rules and more a matter of tripping over them involuntarily.

I only know three languages that don’t do lexical scoping at this point:

1. Emacs lisp, which does dynamic scoping by default for backwards compatibility but offers lexical scoping as am option and strongly recommends lexical scoping for new code.

2. Bash, which does dynamic scoping but kind of doesn’t claim to be a real programming language. (This is wrong but you know what I mean.)

3. Python, which does neither dynamic nor lexical scoping, very much does claim to be a real programming language, and has advocates defending its weird scoping rules.

I mean, access to variables in the enclosing scope has copy on write semantics. Wtf, python?

(Three guesses who started learning python recently after writing a lexically scoped language for many years. Thank you for indulging me.)

1. 4

It is weirder than copy on write. Not tested because I’m on my iPad, but given this:

x = 1
def f(cond):
if cond:
x
x = 2


f(false) does nothing, but f(true) will thrown an undefined variable exception.

1. 4

I think you need nonlocal x but I don’t quite get why this is weird/nonlexical.

It has lexical scoping but requires you mark variables you intend to modify locally with ‘nonlocal’ or ‘global’ as a speed bump on the way to accidental aliasing. I don’t think I’d call puthon “not lexically scoped”

1. 3

Have you tried declaring a variable inside an if?

if True:
X = 1
print(X)

1. 1

Yeah, if doesn’t introduce scope. Nonlexical scope doesn’t IMO mean “there exist lexical constructs that don’t introduce scope”, it is more “there exist scopes that don’t match any lexical constructs”

1. 2

I just learned the idea of variable hoisting thanks to this conversation. So the bizarre behavior with carlmjohnson’s example can be understood as the later assignment declaring a new local variable that comes into scope at the start of the function. Because python does block scope instead of expression scope.

I guess I’ve been misusing “lexical scope” to mean expression-level lexical scope.

I still find the idea of block scope deeply unintuitive but at least I can predict it’s behavior now. So, thanks!

1. 1

Yeah I’m not a huge fan either tbh, but I guess I’ve never thought of it as weird cause JavaScript has similar behavior.

2. 2

I agree. This is more of a quirk due to python not having explicit variable declaration syntax.

1. 2

It’s multiple weird things. It’s weird that Python has† no explicit local variable declarations, and it’s weird that scoping is per function instead of per block, and it’s weird that assignments are hoisted to the top of a function.

† Had? Not sure how type declaration make this more complicated than when I learned it in Python 2.5. The thing with Python is it only gets more complicated. :-)

Different weird thing: nonlocal won’t work here, because nonlocal only applies to functions within functions, and top level variables have to be referred to as global.

2. 3

JavaScript didn’t have it it either until the recent introduction of declaration keywords. It only had global and function (not block) scope. It’s much trickier.

But I am puzzled with why/how people stumble up on scoping problems. It doesn’t ever happen to me. Why do people feel the urge of accessing a symbol on a block outside the one when it was created? If you just don’t do it, you will mover have a problem, on any language.

1. 1

For me it’s all about closures. I’m used to using first class functions and closures where I suspect an object and instance variables would be more pythonic.

But if you’re used to expression level lexical scope, then it feels very natural to write functions with free variables and expect them to close over the thing with the same name (gestures upward) over there.

I’m curious, do you use any languages with expression level scope? You’re not the first python person I’ve met who thinks pythons scope rules make sense, and it confuses me as much as my confusion seems to confuse you.

1. 2

I don’t need to remember complicated scoping rules because I don’t ever use a symbol in a block higher up in the tree than the one it is defined in. Nor do I understand the need to re-assign variables, let alone re-using their names. (Talking about python now). Which languages qualify having expression level scope? Is that the same as block scope? So… Java, modern JavaScript, c#, etc?

I am confused. What problems does python pose when using closures? How is it different than other languages in that respect?

1. 1

I use closures in python code all the time. I just tend not to mutate the free variable. If you do that, then you don’t need to reference the free variable as global or nonlocal. If I was mutating the state then I might switch over to an object.

3. 5

Nim is pretty strongly typed, that is certainly different from Python. I’m currently translating something with Python and Typescript implementations, and I’m mostly reading the Typescript because the typing makes it easier to understand. With Nim you might spend time working on typing that you wouldn’t do for Python (or not, Nim is not object oriented), but its worth it for later readability.

1. 4

Nim is less OO than Python, but more so than Go or Rust. To me the acid test is “can you inherit both methods and data”, and Nim passes.

Interestingly you can choose to write in OO or functional style, and get the same results, since foo(bar, 3, 4) is equivalent to foo.bar(3, 4).

IIRC, Nim even has multimethods, but I think they’re deprecated.

1. 2

what? don’t you mean foo(bar, 3, 4) and bar.foo(3, 4)? AFAIK the last token before a parenthesis is always invoked as a function.

1. 1

Oops, you’re right!

2. 1

what? don’t you mean foo(bar, 3, 4) and bar.foo(3, 4)? AFAIK the last token before a parenthesis is always invoked as a function.

2. 3

Latest release of Scala 3 is trying to be more appealing to Python developers with this: https://medium.com/scala-3/scala-3-new-but-optional-syntax-855b48a4ca76

So I guess could make it an option.

1. 2

Thanks!, this certainly looks interesting. Would it make an introductory language, though? By which I mean that I want to explain a small subset of the language to the pupil, and that restricted language should be sufficient to achieve fairly reasonable tasks in the language. The student should then be able to pick up the advanced concepts in the language by self exploration (and those implementations should be wart free. For example, I do not want to explain again why one shouldn’t use an array as a default parameter value in Python).

1. 2

There is no such thing as a programming language that is “wart free”, and while initially you want to present any language as not having difficulties or weirdness, in the long run you do need to introduce this to the student otherwise they will not be prepared for “warts” in other languages.

2. 1

Depending on what you’re trying to teach, Elm does fit your description of an introductory language for teaching students that uses indentation. I know there’s a school that uses Elm for teaching kids how to make games, so it definitely has a presedence for being used in education too. Though, of you’re looking to teach things like file IO, http servers, or other back end specific things then it’s probably a poor choice.

1. 14

I liked Nim when I played with it but I haven’t used it for data processing specifically. Often in Python you lean a lot on libraries (numpy, and more recently pandas) to do things like read CSV files in weird formats, screw around with big matrices, etc. I don’t know if Nim has any of those.

What I do know is that Nim’s greatest achievement is convincing people that it’s a compiled Python, and not a 21st-century Pascal dressed up to look like Python. It takes a lot of inspiration from Pascal-family languages, and I mean that as a complement.

1. 6

Nim’s greatest achievement is convincing people that it’s a compiled Python, and not a 21st-century Pascal dressed up to look like Python.

Interesting, I too look very favorably on the Pascal family for having gotten most of the things right.

1. 3

I used Pascal a lot in the 80s — long enough to get sick of the lack of a return statement — and also Mesa (one of Wirth’s lesser known languages) at an internship at Xerox; but I’m not familiar with other languages in the family.

Nim doesn’t remind me that much of Pascal, beyond simple syntactic things. What would you say are the deeper features inspired by it?

1. 1

The stack [interpreter] has done three instructions, whereas the register [interpreter] has only done two instructions

Generally, stack machines need more instructions, in order to shuffle data around. You see this especially with getlocal and storelocal (or equivalent) instructions, which are completely obviated on a register machine.

And ‘reasonable encoding’ there are much more compact encodings. There are also much more extensive encodings (which you almost certainly want). ‘Increment’ and ‘add immediate’ are almost certainly things you want, on either style of vm.

[compiler is not cheap]

Both truffle/graal and jitter are able to automatically generate a compiler from a ‘cheap’ interpreter. I believe pypy does similarly. Additionally, compilers are not actually so hard as the video makes them out to be; a very dumb compiler is similarly complex to a very smart interpreter, and similarly performant as well.

1. 1

Would you have a link for Jitter? I searched, but not sure what I am finding is related.

1. 2
1. 1

Thanks!

1. 11

Ah.

My pet hobby horse.

Let me ride it.

It’s a source of great frustration to me that formal methods academia, compiler writers and programmers are missing the great opportunity of our life time.

Design by Contract.

(Small Digression: The industry is hopelessly confused by what is meant by an assert. And subtle disagreements about what is meant or implied by different programmers is an unending source of programmers talking passed each other).

You’re welcome to your own opinion, but for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment.

By an assert in the following I mean, it’s a programmer written boolean expression, that if it ever evaluates to false, the programmer knows that the preceding code has an unknown bug that can only be fixed or handled by a new version of the code.

If it evaluates to true, the programmer full expects the subsequent code to work and that code will fully rely on the assert expression being true.

In fact, if the assert expression is false, the programmer is certain that the subsequent code will fail to work, so much so, there is no point in executing it.

So going back to DbC and formal methods.

Seriously. Writing postconditions is harder than just writing the code. Formal methods are way harder than just programming.

But we can get 90% of the benefit by specializing the postconditions to a few interesting cases…. aka. Unit testing.

So where can Formal Methods really help?

Assuming we’re choosing languages that aren’t packed with horrid corner cases… (eg. Signed integer overflow in C)…

Given a Design by Contract style of programming, where every function has a bunch of precondition asserts and a bunch of specializations of the postconditions……

My dream is a future where formal methods academics team up with compiler writers and give us…

• Assuming that every upstream assert expression is true, if it can be determined that any downstream assert will fail, the compile will fail with a useful warning.
• Where for every type, the programmer can associate an invariant expression, and the compiler will attempt to verify that it is true at the end of the constructor, and the start and end of every public method and at the start of the destructor. If it can’t, it will fail the compile with a warning.
• Wherever a type is used, the invariant expression can be used in this reasoning described above.

So far, you might say, why involved the compiler? Why not a standalone linter?

Answer is simple… allow the optimizer to rely on these expressions being true, and make any downstream optimizations and simplifications based on the validity of these expressions.

A lot of optimizations are base on dataflow analysis, if the analysis can be informed by asserts, and the analysis can check the asserts, and be made more powerful and insightful by relying on these asserts… then we will get a massive step forward in performance.

My experience of using a standalone linter like splint… is it forces you to write in a language that is almost, but not quite like C. I’d much rather whatever is parsed as valid (although perhaps buggy) program in the language by the compiler, is parsed and accepted as a valid program by the linter (although hopefully it will warn if it is buggy), and vice versa.

I can hear certain well known lobste.rs starting to the scream about C optimizers relying on no signed integer overflow since that would be, according the standard, undefined and resulting in generated assembler that results in surprised pikachu faced programmers.

I’m not talking about C. C has too much confused history.

I’m talking about a new language that out of the gate takes asserts to have the meaning I describe and explains carefully to all users that asserts have power, lots and lots of power, to both fail your compile AND optimize your program.

1. 5

As someone who has been using Frama-C quite a lot lately, I can’t but agree with this. There’s potential for a “faster than C” language that is also safer than C because you have to be explicit with things like overflow and proving that some code can’t crash. Never assume. Instead, prove.

1. 3

for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment

Didactic/pedagogical critique: in such a case, it may be more appropriate to introduce a new term rather than using one which has a common lay meaning.

My dream is a future where formal methods academics team up with compiler writers and give us […]

Sounds a lot like symbolic execution.

1. 3

using one which has a common lay meaning.

Does assertions have a different meaning than the one given here?

1. 4

I have colleagues for whom it means, “Gee, I didn’t think that input from outside the system was possible, so I want to know about it if I see it in unit test, and log it in production, but I must still handle it as a possibility”.

When I personally put in an assert at such a point, I mean, “a higher layer has validated the inputs already, and such a value is by design not possible, and this assert documents AND checks that is true, so in my subsequent code I clearly don’t and won’t handle that case”.

I have also seen debates online where people clearly use it to check for stuff during debugging, and then assume it is compiled out in production and hence has no further influence or value in production.

1. 1

GTest (Google Test), which I’ve recently had to use for C++ school assignments, refers to this in their macro names as EXPECT. Conditions whose failure is fatal are labelled with ASSERT. This makes intuitive sense to me: if you expect something to be true you accept its potential falsehood, whereas when you assert something to be true you reject its potential falsehood.

1. 1

TIL! Thanks! I only knew about assertions from the contract perspective.

2. 1

A common design choice is that assertions are evaluated in test environments, but not production. In that case, plus a test environment that you’re not confident fully covers production use cases, you might use assertions for hypotheses about the system that you’re not confident you can turn into an error yet.

I’m not sure that’s a good idea, but it’s basically how we’ve used assertions at my current company.

3. 2

Alas, my aim there is to point out people think there is a common lay meaning…. but I have been involved in enough long raging arguments online and in person to realize… everybody means whatever they damn want to mean when they write assert. And most get confused and angry when you corner them and ask them exactly what they meant.

However, the DbC meaning is pretty clear and for decades explicitly uses the term “assert”… except a lot of people get stuck on their own meaning of “assert” and conclude DbC is useless.

Sounds a lot like symbolic execution.

Ahh, there is that black or white thinking again that drives me nuts.

Symbolic execution and program proving is a false aim. The halting problem and horrible things like busy beavers and horrible fuzzy requirements at the UX end of things make it certain that automated end to end program proving simply will never happen.

That said, it can be incredibly useful. It’s limited for sure, but within it’s limits it can be extraordinarily valuable.

Odds on symbolic execution is going to fail on a production scale system. Not a chance.

However, it will be able to reason from assert A to assert B that given A, B will fail in these odd ball corner cases… ie. You have a bug. Hey, thats’ your grandfathers lint on steroids!

4. 2

You might find the Lean theorem proving language meets some of your requirements. As an example:

structure Substring :=
( original : string )
( offset length : ℕ )
( invariant : offset + length ≤ original.length )


In order to construct an instance of this Substring type, my code has to provide proof of that invariant proposition. Any function that consumes this type can rely on that invariant to be constrained by the compiler, and can also make use of that invariant to prove proofs about the function’s postcondition.