1. 11

I’m writing a Web service that needs to get lots of information from people as gently as possible, and that means asking questions that are simple, easy to answer, aren’t asked unless needed, and are asked in an intelligent order. For example, if many available questions are about one’s children, the system should ask whether one has children early, and the childless respondent should not get follow-up questions beginning ‘If you have children, …’

The program will need to be able to display questions, store answers, (RESTfully) select the next question depending on previous answers, and output information when appropriate. A more-or-less general solution is desirable since it will need to handle indefinite numbers of questions and questionnaire structures. To me, this implies some ‘just data’ encoding of the questions and questionnaires.

GOV.UK, whose interface work I generally admire, writes that we should start with one ‘thing’ per page. This gives me a fair idea of how to do the ‘front end’ of the solution, and to a lesser extent the way to structure the gathered data for storage in an RDBMS; but I have only vague ideas on how to implement the question selection/response routines. I’d like to make the inference system as intelligent as possible (e.g., if subject died before 1998 & was married: skip spouse gender question), which to me this suggests some kind of logic programming or rules engine, but that’s probably overkill for the task at hand.

I note that there are existing programs of this nature that are not garbage: I recently went through a very complex interaction through a Dutch mortgage broker’s Web service (for some reason skinned as a chat bot). Unfortunately, while I can find many third-party services that say they will do this job for me, I can’t find any open-source solutions whose mathematical underpinnings I can emulate.

So, the question is: please share the name of

  1. a book (chapter) or other material with guidance on this specific problem
  2. an open-source program (especially a JVM library) that implements a solution

Thanks in advance for any responses.


  2. 2


    WRT > “select the next question depending on previous answers, and output information when appropriate.>

    this is called ‘branch logic’. The systems implementing this stuff are called, usually, ‘Survey’ systems.

    A PHP example of it is: https://github.com/LimeSurvey/LimeSurvey

    a JVM example is: https://github.com/JD-Software/JDeSurvey

    Basically, search GitHub for ‘survey jvm’, you will get many hits.

    I am not associated with any of the above, never used them – so do not know how well they work/how secure/performant they are/etc.

    If you would like to build it yourself, and to leverage an existing client-side rules engine (based on RETE formalism), you can try leveraging a no-longer maintained nools js https://github.com/noolsjs/nools

    If you want to do rules based evals on server side – you can use Drools, or RETE-centric JVM solution. There are rule editors that are either integrated with full solutions, or can be leveraged as a component.

    It is also possible that a full blown RETE engine, is way too much for your needs (they are meant to evaluate 100s or 1000s of conditionals)… so something to think about…

    1. 2

      What do you need the information for?

      I think per page logic is ok if you are asking one question per page, but your service better take less than 100ms to get the next question, assuming your customer’s worst internet connection

      1. 1

        The service needs the information to do most of its job. For example, people wanting to use the service will need to supply different types of information (to comply with anti-money-laundering & -terrorism-financing regulations and other statutory requirements, and to protect the service against fraud) depending on where they live.

        On latency we are agreed. The 100 ms figure is for the time to first byte?

      2. 2

        Two of these goals are at odds:

        • You want it to be fully general.
        • You want non-developers to be able to add new questions

        In addition to a predefined list of kinds of question (multi-choice, select-from-a-list), I’d suggest a set of ‘eligibility criteria’ types:

        • Answer for question number X is ( greater-than 3, less-than 2, greater-or-equal-to 4, exists)
        • Same, but for the sum of answers to questions X, Y, Z
        1. 1

          I don’t know of anything off-the-shelf to handle things like this, but the idea of doing it in a general way makes my sensors twinge with the “don’t create your own programming language” anti-pattern.

          This means that my first thought would be to implement the server in a dynamic language, like Python or Ruby, and implement the question series as an external Gem or module that can be plugged in. This way, the logic about what answers exclude or include which other questions, or modify them or whatever, can be implemented in a real programming language, instead of guessing what you’ll need, and then re-doing it when you find you need more features.

          That means question series plugins would basically just implement a “next_question” method that takes a hash/JSON blob of the answers so far, and returns either a question and set of choices, or an indication that the series is done. No concerns about request types or data storage or anything. Then the server module would handle the data storage, listening for requests, formatting data, and rendering the next questions.

          1. 4

            “don’t create your own programming language”

            This quote is too often truncated from the form I first heard it in - “don’t create your own programming language by accident”.

            That is, if you’re going to end up with something fully general, you need to design it.

            1. 1

              Thanks, that’s actually a better expression of what I was trying to say.

              I’ve seen more than a few systems that started out as just implementing the ability to create a few custom rules in SQL or XML or JSON or something. Then you keep adding more options until it becomes almost a programming language itself - a truly awful one that nobody can understand.

          2. 1

            Very likely you naively come to EAV (Entity Attribute Value). See https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model#Alternatives for some pointers on how to avoid that anti-(debatable)-pattern.