1. 14

Here are the slides I will use in my upcoming talk for the University of Milan.

The title of the talk mirrors my recent article about AI misconceptions and misuse, but the talk will have a more technical perspective, assuming a bit of multidisciplinary knowledge about the topic.

They are rather condensed (the original ones required a 45 minutes exposition), but I post them here with the hope to get some useful feedback to improve my exposition.

In particular I look for questions.

I can’t promise I’ll answer to all of them here, but I will write something to answer them after the talk.

  1.  

  2. 8

    We call artificial neural networks a class of deterministic algorithms that can statistically approximate any function

    they are just applied statistics, not an inscrutable computer brain

    The counterpoint to this is we don’t actually know yet if our brain isn’t just a mechanism that can statistically approximate any function. The difference of course is that even if brains were analogous to neural nets, which we currently do not actually know enough to say either way, the complexity is just not there. The AI are like a guppy or a tadpole, very very good at some specific task like swimming, but they aren’t doing any “thinking” as we do because they simply are nowhere near complex enough.

    I’m not saying our brains are analogous to neural nets, I am saying we don’t actually know enough to to pinpoint the importance of structural covariance of human brains. The structure could be entirely where the intelligence comes from, or it could be very little. The important thing instead of saying it’s not a computer brain, is to say that it’s more like reflexes. Completely unconscious, but potentially very skilled. This will help prevent people doubting your “not an inscrutable computer brain” claim, because when something does a task better than them they’re going to think it’s smarter than them, when really for that AI it’s more of a reflex.

    1. 2

      it’s more like reflexes. Completely unconscious but potentially very skilled

      You’re making a good point but I stumbled on this bit. Intelligence and consciousness are potentially two very different things, so that’s an entirely different line of enquiry. It might be worth framing this as “very skilled, but in a very small set of tasks” instead.

      1. 1

        It could be very skilled in a very large set of tasks and still be totally incapable of metacognition.

        1. 2

          Yes, we are in agreement. I was just trying to say that it’s better to talk about (current) AI purely in terms of skill and intelligence, as bringing consciousness into it complicates things and is an entirely separate discussion.

          1. 1

            It’s what the original author was doing whether they were trying to or not and what I was responding to.

      2. 1

        Thanks for your advice!
        I get your point, but I do not think that AI is like a reflex, since a reflex needs way less data to train.

        While it’s true we know near to nothing about how our brain actually work, it seems not a statistical tool given how few attempts we need to learn something.

        However we are doing some progress in our understandings.
        Here an interesting article about the topic. I strongly suggest you to follow the links there: the article is nice, but the linked sources are great!

        1. 6

          I would not be so confident in saying the brain takes little data to learn. Take human development for example. Babies take well over a year consuming a constant stream of experience (unlabeled training data) to become competent enough to even perform simple actions.

          In my opinion, learning probably seems to occur quickly once the brain has matured a bit and has built a sufficiently large set of lower-level concepts. Such that new high-level concepts can be reasonably represented by a subset of the previously understood lower-level concepts working in tandem.

          1. 2

            some years of human experience is little compared to the huge amounts of data necessary to build a competent AI. to make a comparison you have to decide how to measure the data of human experience, but the amount that enters your perception is much smaller than e.g. recorded HD footage.

            1. 5

              That’s simply untrue. One eye alone has roughly a resolution of 576 megapixels, 4K is 8.3 megapixels. 1 inch of skin has on average 19,000 sensory cells. Also keep in mind the human brain has orders and orders and orders and orders of magnitude more complexity. It can afford relate each thing to everything instead of having the kind of amnesia that even our most advance neural networks have. It allows for much more complex patterns to be formed much faster.

              1. 3

                Yes, but the brain discards most of these informations and apparently the visual cortex works as a sort of lossy compression filter.

                1. 2

                  This is some significant hand waving. Does the brain filter? no doubt it would not be able to pay attention to specific things if it didn’t. Does it also process the entire visual field? How else could it find some specific feature. Keep in mind the brain can identify when around 9 photons hit the eye within less than 100 ms. One study recently claims to confirm with significance that humans can see a single photon, but the study was small so maybe just those people can. Either way, I’m going to call bullshit on that, it is not smaller than recorded HD footage.

                  1. 3

                    Does it also process the entire visual field? How else could it find some specific feature.

                    You’re falling for your brain’s convincing suggestion that you have full HD in your entire visual field. Really you don’t, and your brain just fills most of it in. Your visual system finds specific features by quickly shifting the eye from place to place, until it finds something worth looking at. That’s how it picks out features without processing the entire visual field.

                    Your peripheral vision has very poor color and shape detection. Mostly it has special cells designed to detect motion, and when it detects motion you often shift your focus to it, thus picking up the color and shape.

                    The fact that you can perceive a few photons hitting the eye within 100 ms is a matter of sensitivity and latency; it has no bearing on the data bandwidth of your visual system.

                    This video might help:

                    https://www.youtube.com/watch?v=fjbWr3ODbAo&t=8m38s

                    1. 1

                      Even 1% of my vision has more complexity than HD.

                      1. 3

                        Based on what? Your fovea covers only about 2 degrees of the visual field. The visual field is about 75 degrees in either direction. The width of the foveal part of the visual field is less than 1/10 the width of your entire visual field, so the foveal part is much less than 1%. If you view an HD screen from far enough away that it appears the same size as your thumbnail at arm’s length, can you distinguish every pixel?

                        https://en.wikipedia.org/wiki/Fovea_centralis#Function https://en.wikipedia.org/wiki/Visual_field#Normal_limits https://en.wikipedia.org/wiki/Fovea_centralis#Angular_size_of_foveal_cones

                        1. 1

                          I’m realizing now we’re talking about two entirely different things. I’m talking about the complexity of input you’re talking about the complexity of perception. The former is measurable, and the latter is very nebulous at best.

                          1. 3

                            What’s the difference between input and perception? If you don’t perceive something why would it be considered input?

                    2. 2

                      You’re entirely right. His statements are totally wrong. A recent-ish article that does a decent job of describing at least part of the story https://www.sciencedirect.com/science/article/pii/S089662731200092X

                      1. 1

                        Can you explain the relevance of that article? It didn’t seem to say much about the amount of sensory data that enters our perception, based on the abstract.

                        1. 1

                          It’s because the original poster said

                          Yes, but the brain discards most of these informations and apparently the visual cortex works as a sort of lossy compression filter.

                          That’s just not how human vision works.

                          You are right though. You only have high acuity vision in the fovea. But notions of resolution don’t map well to human vision. It’s also just not a thing that’s worth debating. Much better to discuss questions of “how much information do you need to recognize X” (generally very little, human vision works well with very small images) than “how many bits per second are coming in”. The second is ill-defined in any case. If I have 20/20 vision does it really mean that it’s good to think of that as I see HD video and someone else sees SD video? Not really. It just doesn’t answer any useful questions about human vision.

                          1. 2

                            The right questions to ask depend on what you’re interested in. Asking how many bits are coming in isn’t useful for the study of human vision, but it is useful if you’re trying to relate AI to the human mind. Humans “learn” things with much less data than machines, because they have innate capacities built in, which (so far) are too complex to build into a machine learning algorithm. This has been well established since psychology’s departure from behaviorism, but tech people tend to forget it when comparing the brain to computers. Granted there’s no way to determine how many bits are entering your mind, but with enough understanding of visual perception I think we can make the judgement call that you get less data than full-color HD video. Understanding that highlights how little data humans require to learn things about the world.

                            I’m also not convinced that lossy compression is not a good metaphor for human vision. Clearly we’re not using mp4 or mkv, but if you take a wider view of the concept of lossy compression, it makes sense.

                            1. 1

                              It’s actually not a good proxy for comparing humans to machines either. Far more important than # of bits in is what kind of data you’re getting. For example, data where you have some control (like you get to manipulate relevant objects) seems to be far more important for humans. The famous sticky mittens experiments show this very nicely.

                              In any case, HD video is mostly irrelevant for actual AI. Most vision algorithms use fairly small images because it’s better to have lots of processing over smaller images than less processing over bigger ones.

                              I think it’s worth separating “tech people” from people that actually do AI / CV / ML. People that work on these topics aren’t being confused by this. There’s a big push in CV and NLP to try to include semantically relevant features.

                              It’s worth reading the article I linked to. Human vision is not lossy compression and this model doesn’t fit the data that we have from either human behavior or from neuroscience. Once upon a time people thought this but those days are long long over.

            2. 5

              A reflex takes millions of years to train. You aren’t taking into account the entire lifespan of the human. They are using other patterns to infer the present context.

              I’m not saying we are a statistical tool. I’m saying we can’t actually tell that we aren’t with confidence just like we can’t tell that we are with confidence. I’ll read the links.

              1. 2

                It seems like additional complexity implies that each neuron itself approximates a smallish neural network. This doesn’t really change much outside of the obvious complexity growth and design considerations.

                1. 2

                  Disclaimer: I’m a programmer, not a biologist.

                  The fact that biological neurons exchange RNA genetical code looks like something that no artificial neural network can do. If I understand this correctly, this means that each neuron can slowly program the others.

                  Still you can see in the slides that this is not something I base my reasoning upon.
                  I’ve just thought the article could have been interesting to you, given your reasoning about reflexes. :-)

                  1. 2

                    I’m also a programmer and not a biologist :V. However complexity theory hints at the possibility that simple setups can lead to emergent complexity that approaches the complexity of a system with more complex agents. Basically the complexity as the system grows exceeds the additive complexity of the individual agents.

                    I think it’s totally reasonable to say that a node and edge cluster does not anywhere near approach the complexity of an individual neuron. You however can’t use that information to then say that a neural network isn’t able to achieve the same level of complexity or cognition. Programming also isn’t different from any function which takes arbitrary many inputs, which we already know that NN’s can approximate.

                    That is of course not to say that it CAN do all the above, merely to say that we should be cautious of any claims that are conclusive either way talking about the future.

                    1. 2

                      Nice! You are approaching one of the core argument in my talk! :-D

                      Programming also isn’t different from any function which takes arbitrary many inputs, which we already know that NN’s can approximate.

                      No AI technique that I know, neither supervised nor unsupervised nor based on reinforcement learning, can remotely approach a function that produce functions as output.

                      For sure, no technique based on artificial neural networks: there is no isomorphism between the set of the outputs of the continuous functions they can approximate (aka ℝ) and the set of functions. So whatever the size of your Deep Learning ANN, no current technique can produce an intelligence, simply because an ANN cannot express a function through its output.

                      It’s pretty possible that, a couple of centuries from now, we will be able to build an artificial general intelligence, but the techniques we will use will be completely different from the one we use now.
                      More, I guess that the role played by ANN will be peripheral, if not marginal.

                      That’s the worse threat that the current marketing hype pose to AI.
                      It’s the most dangerous.

                      Eager to attract funds, most of the research community is looking in the wrong direction.

                      1. 1

                        There’s a difference between no current technique can ever produce vs no technique has produced. It’s not impossible that a dumb technique could have complex consequences as the complexity increases. Sure we may not see it in our lifetime, but I think it’s very premature to call it a dead root for intelligence. We should definitely also travel down other paths, but to call it a dead end I think is severely jumping the gun.

                        1. 3

                          Just to clear something up since this person is still spreading FUD. There’s plenty of AI that deals with program induction, i.e., ML that leans functions. There’s a lot of NN work on this now. Just search program induction neural networks on google scholar.

                          The world is full of these people that don’t know anything about a topic but think they’re the next messiah and they’re the only ones that see the truth. My physicists and historian friends always complain about the crackpots they have to fight. Guess it’s the turn of AI folks!

                          1. 1

                            ROFTL! :-D

                            Thanks for the suggestion.

                            I know nothing about program induction and will surely study the papers I will find on Google scholar.

                            I suggest you to open your mind too.

                            I do not pretend to know something I don’t.

                            But if one tell me that current computers are not deterministic, I can’t help but doubt about his understanding of them.

                            I guess you have never debugged a concurrent multithreaded program.

                            I did.
                            And I’ve also debugged multi processor concurrent kernel schedulers.

                            Trust me: they are buggy but still deterministic.

                            They just look crazy and non deterministic if you do not understand the whole input of your program, that include time for example.

                            The input of a program is everything that affect its computation.

                            Also assuming that I am spreading FUD, make it impossible to address the real issues in my slides.

                            I may suspect that you are spreading hype. ;-)

                            But I ’m still eager of links and serious objections.

                            Because I want to know. I want to learn.
                            This requires the acceptance of one’s ignorance.

                            Do you wamt to learn too? ;-)

                            1. 1

                              RE determinism: I’ve heard (from a reputable source) that you can get a pretty good entropy by turning up the gain on an un-plugged microphone (electrons tunnel about, causing just enough voltage fluctuations to hiss).

                              Would love to have a reason to need it…

                              1. 1

                                What about GPG key generation?

                                1. 2

                                  I can get enough entropy for that by mashing my keyboard and waving the mouse about. I’d combine the soundcard approach with a hsm if I needed a lot of entropy on a headless box and didn’t want to trust the hsm vendor.

                            2. 1

                              Thanks for being a voice of reason about all this, I honestly don’t have enough domain experience to really hold ground on “Lets not jump to conclusions”.

            3. 8

              A huge number of issues. I work on this and sadly this is spreading a lot of misinformation.

              “We call artificial neural networks a class of deterministic algorithms that can statistically approximate any function”

              This isn’t true for many reasons.

              There are many NN approaches that are not deterministic. Actually, there are NN approaches that rely on noise.

              Also, there are other algorithms that can approximate any function. And single-layer networks are ANN but they can’t approximate any function.

              I’m also not sure what “statistically approximate” means.

              their output can always be explained (till quantum computing)

              I don’t know what this means. QC can also be explained just fine. It’s a bunch of linear transformations. That’s not what people mean when they say explain.

              there is no way to prove they are approximating a specific discrete function

              I don’t know what this means. I can make lots of continuous functions that will be epsilon within whatever discrete function you pick.

              AI is not accountable, so it cannot take decisions over humans

              Maybe shouldn’t? But it certainly can.

              The whole “Function” slide is needlessly complex. I wouldn’t let my students put that up.

              If you suspect that a function exists

              I don’t know what this means. If functions are just maps what do you mean that a function doesn’t exist?

              you can try to statistically approximate it with a neural network

              “statistically approximate it” doesn’t mean anything.

              This is the strongest strength of neural networks.

              Other models can approximate any function. That’s not the magic of NNs.

              we need a big data set to filter out unwanted functions with each sample we feed to it.

              Except that this is really bad intuition. NNs seem to overfit much less than people expect. Folks train networks with more parameters than data points and they still seem to generalize to new data.

              Still, infinitely many functions fit our samples!

              Eh. That’s always true. Infinitely many graphical models will fit something…

              We can not really know which function a complex ANN will approximate.

              I don’t even know what to make of this. Is this a statement that you don’t know what your network will learn? Well.. ok. I mean, that’s the point of training something isn’t it?

              Can we move from narrow intelligence to general intelligence?

              Domain and Codomain depends on “hardware”, intelligence does not

              What? Brains take input. They produce outputs.

              Domain and Codomain are (potentially) infinite sets

              So? RNNs can make any number of sequences of tokens.

              No equality relation in the Codomain

              shrug Who cares?

              The whole domain -> codomain thing being transitioned into perception -> action isn’t deep at all. Heck, people have been training models to do that for decades. I even do that.

              I’m not even going to touch the whole knowledge section. It’s not even wrong. It’s naive. This isn’t how people conceive of models or knowledge in cognitive psychology, neural networks, linguistics or philosophy.

              (to prove) to be general, an Artificial Intelligence should be able to discover and explain us new abstractions and functions over them

              Eh.. Kittens have general intelligence. Good luck getting them to do any of this! This is a deep question, how do you know something is intelligent. But this isn’t one of the answers.

              Artificial General Intelligence is Artificial Super Intelligence!

              Oh God… There are so many constraints on intelligence. The speed of light (which puts constraints on how big a chip can be and how big your brain can be). The speed of chemical reactions. The energy density at which your CPU melts into a heap / your brain swells / you run out of food, etc. The idea that AI is automatically superintelligent is pop science.

              So, where is the intelligence?

              This was actually ok! I wouldn’t say it that way but it’s not objectionable!

              I stopped here. :(

              1. 3

                Yes, I knew something was up with the certainty level the author had with the evidence they were proposing. Obviously I’m a novice, but I have some degree of intuition about the things we don’t or can’t know yet. Saying with certainty that neural nets will never accomplish intelligence reeks of anthropocentric bias even if it were true.

                1. 1

                  I don’t think the author necessarily implies that. It would be kind of a crazy thing to claim that something that we vaguely have a handle on can or can’t do something we can’t even define or agree about when we observe.

                  1. 1

                    That’s very generous of you.

                    1. 1

                      I’m kind of a crazy guy. :-)

                      As said, I’m not an expert in statistics.

                      And you are right, many AI researchers vaguely have an handle on ANN.

                      As for intelligence, maybe we cannot agree, but for sure we can define it. There are several definitions actually. Legg and Hutter describe some of them.

                      I propose my own, as a composition of few other functions. It has some advantages over other definitions and obviously it has some disadvantages.

                      For sure it shows how far we are from AGI. Is it an advantage? Boh! :-D

                  2. 1

                    Huh! Thanks for your answer and sorry if I you feel somehow sad about this!

                    I think that most of your objections come from a shallow read of the slides. But I can ensure you that other colleagues, that also work with ANN in particular, find them pretty well founded and clear.

                    Here I pick and answer only to those of your objections that are actually connected to what is meant there.

                    There are many NN approaches that are not deterministic. Actually, there are NN approaches that rely on noise.

                    False. Computers are deterministic.

                    When you add true entropy to an algorithm that run on a computer, you randomize the input, not the algorithm.

                    The random contribution to the computation can be recorded so that the computation can replicated.
                    And you will always get the same outputs.

                    Obviously, to do that, you must have a clear understanding of what your input is.
                    Obviously, if you forget a piece (the random part) you cannot reproduce your own results!
                    But if so… are you sure you are working in the field?

                    I’m also not sure what “statistically approximate” means.

                    Nothing exotic: “statistically approximate” means to approximate through the use of statistics.

                    An ANN is a statistical algorithm. Just like a K-mean is.
                    It’s obvious that it’s statistics: you can only use it if you have tons of data.

                    Had you read all the slides you had see that I explain how the fancy names we are using are good for literature (and business) but not for science. They fool the experts too!

                    we need a big data set to filter out unwanted functions with each sample we feed to it. … We can not really know which function a complex ANN will approximate.

                    I don’t even know what to make of this. Is this a statement that you don’t know what your network will learn? Well.. ok. I mean, that’s the point of training something isn’t it?

                    You have been fooled by the language you use: ANNs do not learn anything. They approximate.

                    I mean that underfit and overfit are two faces of the same medal: you do not know which one, of the infinitely many functions that fit your dataset, your ANN will approximate.

                    If you are lucky it start to approximate a function that is similar to the one you actually desire.

                    Otherwise it start to approximate a function that works well in the region of your training dataset but not outside (overfit) or it does not even fit the whole training dataset well enough (underfit).

                    AI is not accountable, so it cannot take decisions over humans

                    Maybe shouldn’t? But it certainly can.

                    How? We put it in jail? We turn it off? How?

                    Are you sure you work in the field? Oh… yes I can see… you are! ;-)

                    No equality relation in the Codomain

                    shrug Who cares?

                    The child killed by a self driving car that turned left by 3 cm.

                    It turned 3 cm to the left just like every other time. Just in the wrong place at the wrong time.

                    I’m not even going to touch the whole knowledge section. It’s not even wrong. It’s naive.

                    Oh this is a good objection! Can you back it with some papers or even book we can read?

                    The question is not rhetorical. I would really appreciate such links.

                    (to prove) to be general, an Artificial Intelligence should be able to discover and explain us new abstractions and functions over them

                    Eh.. Kittens have general intelligence. Good luck getting them to do any of this!

                    Ehm… funny you talk about kittens… :-)

                    Did you see the cat in the slides. Did you understand what it means?

                    You are so good at pattern matching that you see a cat even if you know that there is no cat there.

                    The same happens when we see a neural network at work: we see an intelligence, but there is no intelligence there.

                    It’s also what happened with the first train films of the Lumiere’s brothers.
                    People saw a train coming towards them, but there was no train.

                    And it’s the same with kittens.
                    You see an intelligence because you project your own experience to explain his behavior.

                    But there is no intelligence there.

                    The idea that AI is automatically superintelligent is pop science.

                    First I was not talking about Artificial Intelligence, but Artificial General Intelligence.

                    But, had you read more carefully you would have understood that saying that AGI is ASI is an obvious effect of the definition of AGI that I propose.

                    In particular, to be general it must be able to abstract. That means to be able to identify concepts and functions on its own.

                    Now, we as human, first react to perceptions then learn from them. Sometimes days later.
                    This is an effect of our biological evolution. But it also means that our reactions are always suboptimal when we face an event that we cannot explain because it contradicts our knowledge and predictions.

                    A machine would not have this limit. It can integrate the new perception in its knowledge first and use the new knowledge to react. This give it an edge over humans.

                    So when we will create an artificial general intelligence, we will create an intelligence with an edge over us.

                    1. 3

                      sorry if I you feel somehow sad about this!

                      I feel very sad. I fight against such misinformation all the time. If you had someone with a PhD in machine learning look this over and they were ok with it, I fear for the state of our field.

                      False. Computers are deterministic.

                      Ugh. No. That’s silly. If you can produce noise indistinguishable from uniform noise then all of this is an irrelevant detail. It makes zero difference if I happen to hook up a better source of entropy to my computer or not.

                      This shows that you don’t understand what’s going on here from a mathematical point of view, a recurrent theme on these slides. It leads to a lot of needless confusion.

                      Nothing exotic: “statistically approximate” means to approximate through the use of statistics.

                      It doesn’t. That’s just not how people use the terminology. By teaching your audience bad and confusing terms you make it harder for them to communicate with anyone. That’s a meaningless phrase.

                      It’s obvious that it’s statistics: you can only use it if you have tons of data.

                      This is wrong on many levels. Statistics has 0 to do with large amounts of data. Some graphical models have a lot of free parameters and some require 0 training.

                      An ANN is a statistical algorithm

                      This is a meaningless statement. I can use methods from a field to analyze an algorithm, but then I can use whatever methods I feel like from any field. We can talk about deterministic algorithms or non-deterministic algorithms or randomized algorithms, etc. Each of these have a very technical meaning and none of these terms mean what you refer to with the words “statistical algorithm”.

                      that I explain how the fancy names we are using are good for literature (and business) but not for science.

                      The fancy names we use are fine for science. It’s just that you’re misusing them as I’ve pointed in a numerous places.

                      ANNs do not learn anything. They approximate

                      This is what I mean. I haven’t been fooled. You’re coming up with your own nonsense terminology that no one in ML or AI uses because you object to something for some vague philosophical reasons. “Learn” has a technical and mathematical meaning in ML that everyone understands.

                      I mean that underfit and overfit are two faces of the same medal

                      This is nonsense. The two happen for totally different reasons in different models and you do different things when they happen. It sounds nice, but it’s useless.

                      AI is not accountable, so it cannot take decisions over humans Maybe shouldn’t? But it certainly can. How? We put it in jail? We turn it off? How? Are you sure you work in the field? Oh… yes I can see… you are! ;-)

                      You said that AI can’t take decisions over humans and I said that maybe it shouldn’t but it actually is. I don’t see how putting it in jail has anything to do with that.

                      And now I’m done. If you’re going to insult the people that actually do the things that you purport to “explain” without actually understanding anything I’m out of here.

                      But I can tell you. You are doing your audience and everyone that happens to listen to you a massive disservice by spreading incorrect terminology, bad ideas, and just all around ignorance.

                      1. 2

                        sorry if I you feel somehow sad about this!

                        I feel very sad. I fight against such misinformation all the time. If you had someone with a PhD in machine learning look this over and they were ok with it, I fear for the state of our field.

                        False. Computers are deterministic.

                        Ugh. No. That’s silly. If you can produce noise indistinguishable from uniform noise then all of this is an irrelevant detail. It makes zero difference if I happen to hook up a better source of entropy to my computer or not.

                        This shows that you don’t understand what’s going on here from a mathematical point of view, a recurrent theme on these slides. It leads to a lot of needless confusion.

                        Nothing exotic: “statistically approximate” means to approximate through the use of statistics.

                        It doesn’t. That’s just not how people use the terminology. By teaching your audience bad and confusing terms you make it harder for them to communicate with anyone. That’s a meaningless phrase.

                        It’s obvious that it’s statistics: you can only use it if you have tons of data.

                        This is wrong on many levels. Statistics has 0 to do with large amounts of data. Some graphical models have a lot of free parameters and some require 0 training.

                        An ANN is a statistical algorithm

                        This is a meaningless statement. I can use methods from a field to analyze an algorithm, but then I can use whatever methods I feel like from any field. We can talk about deterministic algorithms or non-deterministic algorithms or randomized algorithms, etc. Each of these have a very technical meaning and none of these terms mean what you refer to with the words “statistical algorithm”.

                        that I explain how the fancy names we are using are good for literature (and business) but not for science.

                        The fancy names we use are fine for science. It’s just that you’re misusing them as I’ve pointed in a numerous places.

                        ANNs do not learn anything. They approximate

                        This is what I mean. I haven’t been fooled. You’re coming up with your own nonsense terminology that no one in ML or AI uses because you object to something for some vague philosophical reasons. “Learn” has a technical and mathematical meaning in ML that everyone understands.

                        I mean that underfit and overfit are two faces of the same medal

                        This is nonsense. The two happen for totally different reasons in different models and you do different things when they happen. It sounds nice, but it’s useless.

                        AI is not accountable, so it cannot take decisions over humans

                        Maybe shouldn’t? But it certainly can.

                        How? We put it in jail? We turn it off? How? Are you sure you work in the field? Oh… yes I can see… you are! ;-)

                        You said that AI can’t take decisions over humans and I said that maybe it shouldn’t but it actually is. I don’t see how putting it in jail has anything to do with that.

                        And now I’m done. If you’re going to insult the people that actually do the things that you purport to “explain” without actually understanding anything I’m out of here.

                        But I can tell you. You are doing your audience and everyone that happens to listen to you a massive disservice by spreading incorrect terminology, bad ideas, and just all around ignorance.

                        1. 2

                          False. Computers are deterministic.

                          Ugh. No. That’s silly. If you can produce noise indistinguishable from uniform noise then all of this is an irrelevant detail. It makes zero difference if I happen to hook up a better source of entropy to my computer or not.

                          Dude, until the advent of quantum computing, computers will be deterministic machines.

                          Their output can always be reproduced from their input.

                          I’m afraid for your students if you do not understand this.

                          If you ignore part of the input (eg the time at which concurrent events occur, or the noise you use, or a random seed, or anything else that affects the execution) of your algorithm you can fool yourself to think that it’s not deterministic.

                          But the algorithm is deterministic anyway. You just need a crash course in debugging.

                          If you’re going to insult…

                          Your first line, in your first response was

                          A huge number of issues. I work on this and sadly this is spreading a lot of misinformation.

                          Maybe it was unintended, but it didn’t sound much polite to me… ;-)

                          I have no intention to insult anybody. And, as I wrote in the slides, I’m not an expert in statistics.

                          I’m very open to learn from you if you have something to teach.
                          But you shouldn’t assume you can bullshit me with a lot of vague objections.

                          For example you said that my definition of knowledge is naive. Fine!
                          Please provide alternative definitions! I’m eager to read them and study them in the documents you will propose. Really!

                          You say that neural networks are not statistical tools.
                          I can understand you want to distinguish your field of competence (and your marketing segment) from that of statisticians, but I argue that ANNs are statistical tools. Just like K-mean clustering are. And the rest of ML, for what it worth.

                          Indeed, according to Wikipedia:

                          Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data.

                          Guess what? It’s just what you do with ANNs! You analyze and organize data.

                          By considering ML and ANNs as simple statistical tools the whole field will progress faster!

                          You say that “training” is a good term for science.

                          I argue that it’s too anthropomorphic: you do not train anyone, you just calibrate an algorithm!
                          Indeed you establish weights!

                          You do not create an “artificial intelligence”, you just “simulate an intelligence”. And so on…

                          To me, all this hype about AI seems a huge collective hallucination, that hurts the research itself, because fools researchers.

                          1. 1

                            until the advent of quantum computing, computers will be deterministic machines

                            a year working on a project that deploys exactly the same code to 1000 identical hosts will smash this preconception, at least in a practical sense. even code that is intended to be discreet will sometimes pick up stochastic meta-inputs. there is much zen in treating all code as probabilistic. any proof based on a “perfect” computation is theoretical at best.

                            1. 1

                              I can feel your pain, really!

                              Actually I work with with a couple of systems deployed into a bit less of 2000 eterogenous machines which include clumpsy stacks (you know… browsers :-D).
                              And in the past I’ve worked with systems distributed over a couple of hundred thousand of such systems.

                              You are just confusing what is expensive with what is concretely possible.

                              Good architects and developers that keep things simple on a large scale are pretty expensive.

                              Experienced hackers that can analyze large amount of logs are even more expensive.

                              But it’s always a matter of what is at stake.

                              Believe me: when a well known bank realize that one of its customers faced a bug in their system that might cause them to be sued (and loose in court), it does not matter how much it costs, the bug will be reproduced, understood and fixed (and several other unrelated bugs will be identified and fixed in the process! There’s a great irony in this!).

                              And the reputation of a bank does not worth a human life. Or the discrimination of a minority.

                              Computer systems can be complex.

                              Actually the best ones has a low ratio between complexity and value provided. Thus they can evolve in a predictable and smooth way.

                              But they are always deterministic.

                              It’s just a matter of cost and competence.

                        2. 1

                          There are many NN approaches that are not deterministic. Actually, there are NN approaches that rely on noise.

                          False. Computers are deterministic.

                          So are humans?

                          1. 1

                            What do you mean? :-D

                        3. 1

                          No equality relation in the Codomain

                          shrug Who cares?

                          Re-reading the slide I realized that it might not be clear why this is relevant.

                          It’s related to the “complex” slide on Function.

                          Two functions are equal, if and only if

                          • they have the same domain
                          • they have the same codomain
                          • they follow the same rule

                          The rule part is what make the equality relation relevant.

                          If we state the equality as f(x) = g(x), we are assuming an equality relation in the codomain.

                          Instead stating that f(x) = y <=> g(x) = y, does not assume such relation.

                          But to avoid the need of equality in the Codomain, you need the co-implication.

                          But the only way to prove the co-implication without an equality relation in the codomain is to prove that each of the rules followed by the two functions can be logically deduced by the other one.

                          That is: the two rules are the same one expressed in different ways.