1. 3

Corresponding blog post: https://medium.com/towards-data-science/facebook-research-just-published-an-awesome-paper-on-learning-hierarchical-representations-34e3d829ede7

Corresponding paper: https://arxiv.org/abs/1705.08039

This paper explores Poincare disk model instead of Euclidean space for embedding hierarchical data


  2. 4

    It’s kind of weird to me to see computer people keep naming more things after mathematicians who already have a lot of things named after them. It’s a funny cultural phenomenon: for computer people, the mathematicians are more like distant relatives than close friends, so cluttering the namespace further doesn’t seem like a problem to them. It’s kind of funny how everything gets called Bayes this, Gauss that, where those mathematicians have the most tenuous relationship to the things computer people are naming.

    I had to think of what a Poincaré embedding could be… maybe a higher-dimensional manifold into R^2 or R^3? Higher-dimensional topology is something Poincaré really is known for. But no, it’s just the disk model of the hyperbolic plane. Most of the time, I don’t even grace that with Poincaré’s name, kind of like how mathematicians just call finite fields, “finite fields”, very rarely Galois fields.

    My nitpick isn’t just purely pedantic: this unfamiliarity with mathematics has caused some real problems, such as an AI winter. It’s blindingly obvious that a perceptron was just defining a plane to separate inputs and that lots of of data sets couldn’t be separated by a plane, but because of hype and because Minsky pointed out what really should have been obvious to everyone, connectionism, neural networks, deep learning, or whatever the next rebranding will be, all fell into a deep AI winter. I know I sound like an ass, but it’s both cute and worrying to see computer people struggling with and rediscovering my familiar friends. It almost reminds of me Tai’s model but a little less severe.

    1. 2

      But clearly Poincaré embedding is a reference to https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model. It’s not like machine learning researchers chose the name arbitrarily. The embedding is named after the metric you choose to use for it. These names are informative. When someone say Euclidian, I know nothing fancy is happening. When someone says Gaussian, I know somewhere is a normal distribution in the formulation. When someone say Bayesian, I know I can expect a space to inject priors. The naming isn’t arbitrary.

      You suggest using the terms mathematicians use, but it’s not clear it makes the work any more accessible. For non-Mathematicians it just means they are more likely to end up on some unrelated paper that doesn’t help them understand the idea. I get where you are coming from, I remember when kernels were a big thing, watching people struggle with what is essentially inner products and properties of inner products. It never helped to tell them they need to understand inner products. I just had to give them LADR and that was enough.

      I think there is some confusion between the deep learning hype and the people practicing it. The practitioners are mostly aware of the mathematics. It’s everyone downwind that gives the impression of ignorance.

      1. 3

        When someone says Gaussian, I know somewhere is a normal distribution in the formulation.

        For example, Gaussian integers and Gaussian curvatures, right?

        1. 1

          I think I could make a connection for Gaussian curvature, but fair point.

          1. 2

            I know both probability theory and differential geometry, and I don’t see the connection (pun not originally intended, but thoroughly enjoyed).

            1. 1

              Sorry for the delay in responding. One connection I might draw is if you sample points from a multivariate Gaussian, that cloud of points resembles a sphere with Gaussian curvature. It’s a bit of a reach.

        2. 3

          I agree the researchers seem to usually know the mathematics, but they speak with such a funny foreign accent. Learning rate instead of step size, learning instead of optimising, backpropagation instead of chain rule, PCA instead of SVD… everything gets a weird, new name that seems to inspire certain superstitions about the nature of the calculations (neural! learning! intelligent!). And they keep coming up with new names for the same thing; inferential statistics becomes machine learning and descriptive statistics becomes unsupervised learning. Later they both become data science, which is, like we say in my country, the same donkey scuffled about.

          There are other consequences of this cultural divide. For example, the first thing any mathematician in an optimisation course learns is steepest descent and why it sucks, although it’s easy to implement. The rest of the course is spent seeing better alternatives, and discussing how particulars of it like such as its line search can be improved (for example, the classic text Nocedal & Wright proceeds in this manner). People who learn optimisation without the optimisation vocabulary never proceed beyond gradient descent and are writing the moral equivalent of bubble sort because it’s more familiar than quicksort and has less scary mathematics

          1. 1

            Is PCA really the same thing as SVD? I suspect I may finally be able to understand PCA!

            1. 2

              It’s essentially the SVD. The singular vectors are the directions of highest variation and the singular values are the size of this variation. You do need to recentre your data before you take the SVD, but it’s, like we say in the business, isomorphic.

              And if you know it’s SVD, then you also know that there are better algorithms to compute it than eigendecomposition.

              1. 1

                SVD is a tool you can use to perform a PCA.