1. 8
  1.  

  2. 5

    I thought this was kind of neat but not very practical until they got to Section 3.3, where they suggest putting a norm on the free vector space of stack frames and using that to automatically detect performance regressions by comparing the distance between two successive flame graphs. Now that is something I’d really like to have in a CI pipeline. They say that

    When the flame graphs originate from a sampling process, the similarity score σ is of no much use and it is perhaps best to consider conducting a statistical analysis of the collected data.

    but while that’s technically correct I’m not as convinced as they are that it’s a problem in practice. There’s always some variation in the machine when running tests anyway, and having something raise its hand when σ goes over some threshold between CI runs would be very nice.

    1. 2

      I’ve been playing(struggling) around with a similar, but different problem and your comment has made me download the paper now to read.

      The basic idea is to first create an graph of import dependency of the entire project(say on master branch) and on each pull request re-calculate the graph and somehow figure out if i can highlight a diff of what has changed. The project i’m doing this for is in python, so has cyclic imports sometimes or weird dynamic import behaviour. There might not also be a single root(main.py) or the entry point might be in a 3rd party framework etc. I would eventually want to tag some parts of the codebase as critical and if those change then they “taint” other paths. This could be useful to

      • get a “risk” profile of a pull-request. High risk ~= more approvals(from a senior engineer) etc.
      • run a reduced set of unit, integration tests depending on the path checked

      Currently i’m a bit over my head calculating a reasonable graph(or a collection of graphs, forest?) representation of the project and figuring out how to diff it. But performance regression over flamegraphs seems to have some overlap with this problem.

      There’re some saas apps that roughly do the “diagram” part. See http://codesee.io/ but i didn’t find any prior open source work around this(but likely i’ve not looked at the right places). There’s a few projects that i found that atleast try to calculate the dependency tree like https://github.com/thebjorn/pydeps (but not happy with the results or the performance yet).