The title seems misleading. I just quickly skimmed it and it’s pretty dense. So, do correct me if I’m wrong.

It looks like it’s an O(N x M) algorithm (pre-processing) followed by an O(N) algorithm (actual sorting). Then, they say the O(N) part takes O(N x L) in worst case. So, a two-step, sorting algorithm that delivers O(N x M) + (O(N) or O(N x L)) performance?

The other thought that has been rolling around in the back of my head is that the test inputs they use are generated according to nicely behaved, friendly probability distributions.

I suspect that you could generate pathological inputs that would cause the neural network first step to fail to get the input “almost sorted” enough for the second step to work. That would invalidate the theoretical complexity claim they make, and then the question becomes, in practice, how hard is it to generate pathological inputs and how likely is it that a real-world input would be pathological?

They claim M and L are both constants. This is the part of the claim I find dubious… I suspect that as problem sizes grow it might turn out these aren’t actually constants.

They also don’t seem to include model training in the complexity, apparently because they use the same size training set for every problem size. This also might not be a valid assumption if input is big enough.

They did problem sizes from 10^3 to 10^7. The log2 difference is only about 10. If they were a little conservative in selecting their “constants”, their algorithm would work even if the “constants” were logarithmic.

They claim M and L are both constants. This is the part of the claim I find dubious… I suspect that as problem sizes grow it might turn out these aren’t actually constants.

The thing that made me wonder is they said one of the constants was something that could change the effects of their analysis if they changed its size. That made me default on it wasn’t constant so much as constant in this instance of the method. Next set of problems might require that number to change. Given online nature of these algorithms, it might even have to change overtime while doing the same job. I don’t think we can know yet.

I agree with you that’s it’s interesting work. It just feels like they really wanted their paper to stand out, and felt like doing sorting with a neural network wasn’t an exciting enough title, so they made a really big claim (that being, an O(N) sorting algorithm). The problem is that the claim they made has a specific, rigorous meaning and I don’t think they did the analysis to PROVE the claim is true (although it might still be true anyway).

Yeah, I don’t think the claims in this paper would be accepted as-written if the authors submitted it to a machine-learning conference or journal. You can title a paper anything you want on arXiv though, for better or worse.

The title seems misleading. I just quickly skimmed it and it’s pretty dense. So, do correct me if I’m wrong.

It looks like it’s an O(N x M) algorithm (pre-processing) followed by an O(N) algorithm (actual sorting). Then, they say the O(N) part takes O(N x L) in worst case. So, a two-step, sorting algorithm that delivers O(N x M) + (O(N) or O(N x L)) performance?

The other thought that has been rolling around in the back of my head is that the test inputs they use are generated according to nicely behaved, friendly probability distributions.

I suspect that you could generate pathological inputs that would cause the neural network first step to fail to get the input “almost sorted” enough for the second step to work. That would invalidate the theoretical complexity claim they make, and then the question becomes, in practice, how hard is it to generate pathological inputs and how likely is it that a real-world input would be pathological?

I figured it was an elaborate prank to disguise a lookup table…

They claim M and L are both constants. This is the part of the claim I find dubious… I suspect that as problem sizes grow it might turn out these aren’t actually constants.

They also don’t seem to include model training in the complexity, apparently because they use the same size training set for every problem size. This also might not be a valid assumption if input is big enough.

They did problem sizes from 10^3 to 10^7. The log2 difference is only about 10. If they were a little conservative in selecting their “constants”, their algorithm would work even if the “constants” were logarithmic.

The thing that made me wonder is they said one of the constants was something that could change the effects of their analysis if they changed its size. That made me default on it

wasn’tconstant so much as constant in this instance of the method. Next set of problems might require that number to change. Given online nature of these algorithms, it might even have to change overtime while doing the same job. I don’t think we can know yet.It was interesting work, though.

I agree with you that’s it’s interesting work. It just feels like they really wanted their paper to stand out, and felt like doing sorting with a neural network wasn’t an exciting enough title, so they made a really big claim (that being, an O(N) sorting algorithm). The problem is that the claim they made has a specific, rigorous meaning and I don’t think they did the analysis to PROVE the claim is true (although it might still be true anyway).

How is this possible? The theoretical lower bound on sorting is O(n*log(n)) (without knowledge of the structure of the data); machine learning or not.

I suspect that the training phase encodes knowledge of the structure of the data into an algorithm that only works on similar data.

In that case the title is very, very misleading :º

Yeah, I don’t think the claims in this paper would be accepted as-written if the authors submitted it to a machine-learning conference or journal. You can title a paper anything you want on arXiv though, for better or worse.