1. 20

Consider this snippet of c:

void f(int *x);

void g(void) {
	struct {
		int foo;
		char bar;
		struct foo *baz;
		int *biz;

		void (*booze)(void);

		struct {
			int foo;
			char bar;
		} bones;
	} x;

	f(&x.

The cursor is at the very end of the file, and the environment is prompted for an autompletion. To me, a human, it seems obvious that the most viable completions are foo and bones (or, maybe, bones.foo). Clangd, however, is blissfully unaware and simply lists all the members of x in order. (Maybe visual studio does better?)

But ok, that’s c. Java is known for its high-quality IDEs; how does it fare? Here’s a java snippet (with the cursor at the _):

public class Bar {
	int foo;
	byte bar;
	String baz;
	int[] booze;
	Bar boop;
}

public class Foo {
	public static void main(String[] argv) {
		Bar bar = new Bar();

		test(bar._
	}

	static void test(int x) {}
}

Viable completions are foo, bar, boop, and booze. (Or, for bonus points, boop. and booze[.) I tried this in both intellij and eclipse, with the same result: foo and bar are both suggested, but boop and booze are not. (Also, foo is not preferred to bar, even though bar would have to be promoted, but this is not as big of a deal.)


This sort of structural introspection is effectively destructuring pattern matching which, when it’s an explicitly supported feature (as in MLs), compilers are great at. So why are they so bad at it in an interactive environment?

  1.  

  2. 37

    (I write IDEs for a living and worked at JetBrains)

    Answer for Java:

    I think the premise is wrong? At least, IntelliJ does the different thing for this example than what OP suggests.

    If I invoke default completion (ctrl+space), I get everything (including boop and booze): image

    If I invoke smart completion (ctrl+shift+space), I get foo, bar, and hashCode: image

    If I invoke smart completion twice (asking for more indirect suggestions), I get, among other things, boop.bar and booze[]: image

    Why default completion doesn’t do filtering by type? Because it’s not what the used necessary want. Perhaps they intend to write bar.baz == "hello" ? 0 : 1 or to call some yet-not-existing method and then “create from usage”.

    Answer for C++:

    I don’t know. I suspect CLion might do better, but I haven’t tried.

    My guess would be that it’s the combination of two:

    • LSP protocol for completion is severely underpowered. It isn’t even possible to sort completions, without going through some super weird hops. I know that clangd supports custom extension for sorting, but not all clients implement it.
    • On-the-fly semantic analysis for C++ is relatively immature, and relatively hard.
    1. 8

      TIL about ctrl+shift+space, thanks! (hadn’t used IntelliJ in many years).

      1. 2

        suspect CLion might do better

        I was using clion.

        premise is wrong

        Fair enough

        If I invoke smart completion twice (asking for more indirect suggestions), I get, among other things, boop.bar and booze[]

        Ahh, that’s interesting! Do you know why that isn’t the default smart completion behaviour?

        On-the-fly semantic analysis for C++ is relatively immature, and relatively hard.

        Out of curiosity, do you know of any other languages (aside from java) with more mature tooling?

        1. 4

          Out of curiosity, do you know of any other languages (aside from java) with more mature tooling?

          I like very much how Merlin works for OCaml (Merlin is also exposed through LSP via ocamllsp package). The completions are provided contextual and sorted. Also Merlin’s parser recovers very well from syntax errors so completions can be provided even for syntactically invalid buffers.

          1. 3

            Do you know why that isn’t the default smart completion behaviour?

            My guess would be that, in a big project, you’d get bajiliion suggestions. First smart completion serves to narrow the set of results, second serves to extend it.

            Out of curiosity, do you know of any other languages (aside from java) with more mature tooling?

            C# with resharper/rider should be in a good shape. Maybe with Roslyn as well.

        2. 4

          Here are some (basically wild) guesses:

          • this stuff is really performance sensitive, and involves handling parsing of code that is not syntactically complete. So there’s a bunch of caching but also messy input and ever-changing input

          • There’s a heavy reliance on incremental input for high-quality results (I imagine if you start typing bo you might get boop and booze to show up). It’s not meant for the “I know nothing about this”. After all, most “real world” data structures have tens, if not hundreds of fields!

          • The results are “good enough” and so you’re in the dreaded valley of “not enough issues for good reports” and “reports are very complicated so not being done”

          1. 4

            Thinking about the problem a bit more, from the perspectives of 1) language grammar and 2) structured editing environments with typed holes (like hazel) I think I understand part of the problem with the java example.

            Say the code I have in mind is this:

            test(bar.booze[0])
            

            In the language grammar, this is:

            • a function call
              • whose sole argument is an indexing expression
                • whose array is a member access
                  • whose structure is bar
                  • and whose member is booze
                • and whose index is an integer literal 0

            In other words, the actual structure of the code is an indexing expression first and a member access second. But we type it the other way around. In hole-language

            test(_a); // _a is a hole which must have type int
            

            refine:

            test(_b[_c]); // _b must have type int[], and _c must be a valid type for an index
            

            refine:

            test(_d._e[_c]); // _d must be an object with field _e, which has type int[]
            

            and so on. Refining in this way, it’s very easy to generate extremely high-quality autocompletions for each hole. But in an unstructured editing environment, where the order in which expressions are typed doesn’t necessarily correspond to the hierarchy in which they exist in the language grammar, it’s difficult for the compiler to predict what sort of expression might ultimately be formed.

            (This also resolves the comparison with pattern matching: a pattern exists in its entirety, already completely parsed.)

            1. 3

              Try tabnine. It’s autocompletion via machine learning using GPT-2. The suggestions are good to creepy (because they are so good). They offer an offline version (model is a few hundred megabytes and execution is only viable when connected to a power supply, because of the power consumption) and an online version.

              1. 5

                This may well be an interesting project, but it’s not really responsive to OP’s question?

                You don’t need shotgun statistics “machine learning” just to sort a list of already known lexical completions by how well they match an argument’s type, and I too am curious if there is any good reason this isn’t commonly done.

                1. 1

                  Came here to say that. It’s one of the the few, well actually I think it’s the only coding tool that I pay for, and it’s worth it 100%.

                2. 1

                  Maybe also your IntelliJ installation is just messed up?

                  Java example: https://imgur.com/a/S4C59p0

                  Sorry for the name change, but my demo project already had Foo.java, but it changes nothing if I also rename “int foo;” to “int lol;” in bar.

                  1. 1

                    Yes, but notice how baz (which is a string and couldn’t possibly be useful) is ranked above booze and boop; it has no confidence in the latter two, it’s just showing them because they’re members of Bar.

                    If you ask for completion earlier, it only shows the two members.

                    1. 5

                      which is a string and couldn’t possibly be useful

                      What if you wanted to write bar.baz.length()?