1. 3
  1.  

  2. 2

    I am very interested in this topic: at work we are developing some new bioinformatics software and a major part of this is comparing our pipeline to an existing tool.

    We judge other projects by our own objectives rather than the objectives under which that project was developed

    We may take the other tool outside it’s comfort zone, but as long as we are clear on this it’s fine - we may be trying to state that for this use case, this tool is better. What is disingenuous is to purposely compare against a competitor tool that is weak in the test domain while ignoring another one that might be stronger.

    We fail to use other projects with the same expertise that we have for our own

    This is indeed a big problem. However, if the other tool needs special handling while your’s does not (e.g. it comes with a feature that inspects the input data and adjusts it’s parameters automatically) - that’s a plus. In bioinformatics there is a push to have some standardized data sets against which folks are allowed to run their tool themselves and submit their results, controlling for the expertise factor.

    We naturally gravitate towards cases at which our project excels

    I don’t see this as a negative. This is similar to the first point. However, we should go to some pains to be fair, and I would say that we have a fiduciary duty to test our panel of tools against a wide array of benchmarks to allow the reader to determine what each tool’s comfort zone (i.e. area of applicability) is

    We improve our software during the benchmarking process

    Ok, so what? That’s called R & D.

    We don’t release negative results

    Well, it depends on what this means. Yes, we don’t publish/release results until we think we are competitive. This is OK. However, if we only show those results where we are better than the competing panel of tools and suppress those where we trail, that is disingenuous, and dove-tails with some previous points.

    In short, for fairer benchmarking one should

    1. Collect a set of standardized test cases that has general acceptance in the community
    2. Allow tool makers to run their tool (in good faith - avoid special tuning) on the test cases themselves
    3. Have a wide range of test cases
    4. Show results for all test cases
    1. 2

      In bioinformatics there is a push to have some standardized data sets against which folks are allowed to run their tool themselves and submit their results, controlling for the expertise factor.

      I am also very interested in this topic. What are some of these standardized data sets? I know about Genome in a Bottle. Anything else?

      1. 1

        We use GiAB. Illumina has the platinum genomes data set though we have not used those.