1. 4

My research and development project that I worked on during my honours year of 2022 was on the topic of “Explicit sequencing of C programs” whereby I looked at ways to improve the current state of Software Verification in Lazy-CSeq with regards to the underspecified evaluation ordering of function arguments and binary operators in the C programming language. The transformation module was written in Python2 (as is standard with CSeq).

  1. 1

    Why did you choose to have such a short abstract? For me it’s not clear what improvements you add with your research and what Lazy-Cseq is. I need to dig deeper in your paper to find that information, which I’m not ready to do because I don’t know if it’s worth it.

    1. 1

      That’s described within the introduction - my supervisor says what goes and he is German - so it basically is always what goes.

      In any case such an abstract would be the length of my introduction, it’s various concepts that need explaining that would have taken up the same space. (Also Lazy CSeq is linked in the link I sent - see above).

      Hope that helps.

      1. 2

        You don’t need to explain the all concepts in the abstract. However the abstract is the first serious contact of the reader with your writing so you need to make sure that the reader has as much context as possible to be able to further give interest to your work.

        1. 1

          I totally agree with you, don’t get me wrong but it’s just how the paper turned out. I was never asked to expand that section, some supervisors have their tastes.

          In any case, petty things outside of my control - the paper is done and handed in - thought I’d share the results.

          1. 2

            Can you also give us a tl;dr version of your work?

            1. 2

              Sure thing!

              So undefined behavior is something very common in C programs, but to clarify I was looking at underspecified behavior which has to do with behavior which is not specified in the C spec and therefore transitively left up to the compiler implementors to decide.

              One of these things left up to the implementors is the order in which function arguments are evaluated. One can imagine that with non-pure function calls as arguments how such can lead to bugs when manipulating state.

              For example given the program below, compile it with gcc, then run ./a.out and then compile it with clang and run ./a.out, one will pass the assertion one won’t:

              #include<assert.h>
              int counter = 1;
              int f1() {
                  counter = counter*2;
                  return counter;
              }
              
              int f2() {
                  counter = counter*3;
                  return counter;
              }
              
              int f(int x, int y) {
                  return x+y;
              }
              
              int main() {
                  int res = f(f1(), f2());
                  assert(res==8);
                  return 0;
              }
              

              Underspecified behavior is something many people are of course aware of however, my project was to now effectively update a software verification software known as LazyCSeq such that it would account for such behavior. Before my additions, Lazy CSeq would assume left to right evaluation ordering, effectively not counting gcc’s right to left behavior and therefore leaving out potential paths to be explored in the symbolic execution phase.


              I implemented an algorithm which generates algorithmic code that CBMC (the verifier) can run and ensure it “sees” those pathways and therefore makes the software verification more accurate.

              1. 2

                Why did you feel the need to create a new type of behavior called “underspecified behavior”. C89 for example states that the following [1] is unspecified behavior (A.6.1): The order in which the function designator and the arguments in a function call are evaluated ($3.3.2.2).

                So at the end of the day you improved Lazy CSeq so that more program states are visited by changing the evaluation ordering of function calls. Is that right?

                Did you find some interesting bugs in software projects that use this kind of unspecified behavior?

                [1] https://port70.net/~nsz/c/c89/c89-draft.txt

                1. 1

                  Under specified == unspecified. Just a synonym that was used interchangeably. I think we went with the former because it emphasizes the lacking nature of it. Implying, as one would think rightfully (at least in the right mind), that argument ordering should be, specified.

                  I’m yet to see how it will perform in the real life scenario. There is a competition that lazy CSeq partakes in quite often (let me pull up the link) and that is where the real catches will/should appear. Various test cases are tried and I presume some are written with the knowledge of several outcomes (I never entered the competition, it is done by my supervisor who works on CSeq). So I will keep you guys posted!


                  There is definitely a performance improvement I would have liked to make but it was a bit iffy in terms of getting it to work alongside ensuring other things worked.


                  So at the end of the day you improved Lazy CSeq so that more program states are visited by changing the evaluation ordering of function calls. Is that right?

                  Exactly! And it was very fun. The rewarding experience of seeing CBMC execute semantically correct was great - it was a new tool for me - first time doing symbolic execution. Seeing that I enjoy compilers etc., CSeq was great as it is based on PyCParser to traverse C code input as a string, tokenized/ parserd and then I would output heavily modified strings. In a stream like fashion, my module was one of many. String of code -> transform based on AST info -> string of code


                  Now let me find that link…

                    1. 1
                2. 1

                  I recommend looking at the slides below if you are more of a visual person - they were used in my presentation to CS professors/lecturers and those who had no idea about the topic (such as ML lecturers) easily grasped it.

                  1. 1

                    Slides: http://deavmi.assigned.network/blobs/research/presentation.pdf

                    These include information about CBMC-specific C pragmas which only make sense in the context of CBMC. The references section of the paper refers to a tutorial on CBMC to help you understand how things such as __CPROVER_assume() work and what it does!