Threads for rpglover64

  1. 4

    The mention of (now deprecated) btrfs had me scroll up to check the date: 2015. Time flies!

    1. 5

      (now deprecated) btrfs

      Link to deprecation notice? I was under the impression that it was still under active development.

      1. 10

        I assume @varjag is referring to this redhat doc, stating that:

        Btrfs has been deprecated. The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a fully supported feature and it will be removed in a future major release of Red Hat Enterprise Linux. The Btrfs file system did receive numerous updates from the upstream in Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise Linux 7 series. However, this is the last planned update to this feature.

        1. 3

          Some people are still developing it, but Red Hat is no longer interested.

        2. 4

          SuSE still uses btrfs by default, AFAIR, so it’s not deprecated as such, but it also doesn’t have a lot to recommend it……

          There is bcachefs, still in development; but even if it is successful, I would assume it would be at least a decade before it would be a real competitor for even present-day ZFS (which presumably would not stand still).

        1. 3

          Another technique I’ve wanted to explore, but haven’t yet, is property-based testing. As far as I understand, it’s related to and complementary to fuzzing.

          One of the best resources I’ve found for property based testing is this. It’s target audience is just-barely-not-beginners, and really helps get over the hump of “What properties do I even write?”

          1. 3

            I’m reminded of Lauren Ipsum, a much more ahem traditional story for kids, reminiscent of Alice in Wonderland and The Phantom Tollbooth which lightly touches on some crypto and security.

            1. 10

              This isn’t about debugging or UNIX for that matter. It’s about “normalization of deviance”, that one gets used to an incomplete, unfinished tool/feature/system, come to rely on it, and then something doesn’t work and isn’t transparent about “why”.

              Reminds of Ken Thompson’s “ed” editor’s sole error message - “?”. Because you should be able to figure out why on your own.

              So he’s right that UNIX is filled with the issue from the beginning, but he’s a bit obscure about the connection to the debugger. gdb is no different than ton’s of similar debuggers I’ve used for decades on systems unrelated to UNIX as well, so don’t blame it.

              If something refuses to function, there needs to be a means to say “hey, this part over here needs attention”. In this case, it’s like an uncaught exception, which is shorthand for “figure it out yourself”, i.e. back to Ken Thompson’s world as above.

              These things happen because someone doesn’t have the time or interest to finish the job. All kinds of software are left similarly uncompleted, because it works well enough to get the immediate job done, and the rest is quickly forgotten. Hardware examples abound, the the Pentium floating point errors, and like variants.

              1. 2

                I’m reminded of this presentation, aimed at language (in a very broad sense) designers about how to make better error messages.

                1. 1

                  Thanks for that link. It is exactly what I needed at this time.

              1. 4

                Great article. Everyone assumes that if you know Ruby you can pick up Python trivially and visa veresa. The truth is you CAN, but as the article details, the devil is in the details, and in particular, the idioms.

                One that still bites me TO THIS DAY after ~3 years of Python is my brain still wants to default to:

                def myfunc()
                    correct_record = find_correct_record(yada,yada)
                    correct_record
                end
                

                Which will TOTALLY fail under Python because you didn’t put the ferschlugging “return” statement in there :)

                1. 4

                  THIS

                  3 times today today as I was writing new Python code, my test failed due to forgetting a return. 3 separate times! Granted I was not at my peak today.

                1. 12

                  Thanks for sharing.

                  In Zig, types are first-class citizens.

                  I get what you are trying to say but for types to be first-class citizens you’d have to have a dependently typed programming language like Coq or Agda. Your comptime restriction is commonly known as the “phase distinction”. The ML module system works similarly and can be completely understood in terms of a preprocessor for a non-dependently typed language.

                  I think this contradicts what you say later:

                  But more importantly, it does so without introducing another language on top of Zig, such as a macro language or a preprocessor language. It’s Zig all the way down.

                  I would rather say that the preprocessor language is Zig, and you’re using comptime to drive preprocessing. And using a language to preprocess itself is the classic Lisp approach to macros, so I don’t know if trying to distance yourself from macros makes sense here.

                  Re your approach to generics: does this flag errors at all call-sites, instead of at definition-site? This is one of the major reasons people don’t like using templates to implement polymorphism.

                  1. 5

                    to be first-class citizens you’d have to have a dependently typed programming language

                    This is the definition I used of “first-class citizen”:

                    In programming language design, a first-class citizen (also type, object, entity, or value) in a given programming language is an entity which supports all the operations generally available to other entities. These operations typically include being passed as an argument, returned from a function, and assigned to a variable.

                    Scott, Michael (2006). Programming Language Pragmatics. San Francisco, CA: Morgan Kaufmann Publishers. p. 140. (quote and citation copied from Wikipedia)

                    In Zig you can pass types as an argument, return them from a function, and assign them to variables.

                    But if some people have a different definition than this, that’s fine, I can stop using the term. My driving force is not an academic study in language design; rather it is the goal of creating a language/compiler that is optimal, safe, and readable for the vast majority of real world programming problems.

                    Re your approach to generics: does this flag errors at all call-sites, instead of at definition-site? This is one of the major reasons people don’t like using templates to implement polymorphism.

                    It flags errors in a sort of callstack fashion. For example:

                    https://gist.github.com/andrewrk/fd54e7453f8f6e8becaeb13cba7b9a7d

                    People don’t like this because the errors are tricky to track down, is that right?

                    1. 2

                      That isn’t what I mean. I’d try this on Zig itself but I can’t get it to install. Here is an example in Haskell:

                      foo :: (a -> Int) -> a -> Int
                      foo f x = f (0, 1)
                      
                      test1 = foo length "hello"
                      test2 = foo (\x -> x + 1) 9
                      

                      This raises one type error, saying that a doesn’t match (Int, Int). This is what I mean by flagging at the definition site.

                      In a language where errors are flagged at the call-sites (aka, just about everything that uses macros), this would raise two errors: length doesn’t expect a tuple, and + doesn’t expect a tuple. Which is it in Zig?

                      1. 2

                        if I understand your example correctly, this is similar zig code:

                        fn foo(comptime A: type, f: fn(A) -> i32, x: A) -> i32 {
                            f(0, 1)
                        }
                        
                        fn addOne(x: i32) -> i32 {
                            x + 1
                        }
                        
                        const test1 = foo(i32, "hello".len);
                        const test2 = foo(i32, addOne, 9);
                        

                        Produces the output:

                        ./test.zig:9:18: error: expected 3 arguments, found 2
                        const test1 = foo(i32, "hello".len);
                                         ^
                        ./test.zig:1:1: note: declared here
                        fn foo(comptime A: type, f: fn(A) -> i32, x: A) -> i32 {
                        ^
                        ./test.zig:2:6: error: expected 1 arguments, found 2
                            f(0, 1)
                             ^
                        ./test.zig:10:18: note: called from here
                        const test2 = foo(i32, addOne, 9);
                                         ^
                        ./test.zig:5:1: note: declared here
                        fn addOne(x: i32) -> i32 {
                        ^
                        

                        If I understand correctly, you are pointing out that the generic function foo in this case is not analyzed or instantiated until a callsite calls it. That is correct - that is how it is implemented.

                        1. 1

                          Yup, I believe this is correct understanding of cmm’s comment. In Rust (and Haskell, OCaml, etc.) generic functions are analyzed without instantiation, which leads to better diagnostics. C++ doesn’t, and that’s why C++’s template-related diagnostics are terrible.

                      2. 1

                        […] supports all the operations generally available to other entities.

                        You cannot pass types at runtime, at least that’s the impression I got from the article.

                    1. 1

                      With Go and No-Limit Poker falling to bots I don’t think there are any games left where humans dominate bots, just games that bots haven’t been written for yet.

                      1. 3

                        Ha! I can’t wait to see the Starcraft results. The complexity is way higher than these toy games that so simplify the interactions between combatants. I want to see it combine 3D object recognition, planning, replanning, detecting/making bluffs, and so on. These games, especially simple mechanics + billions of moves analyzed, make it too easy for the AI’s to outdo humans. I want to see harder situations closer to the messiness that is the real world. Then, see them do the same with as little data as we work with on the games themselves.

                        1. 2
                          1. 2

                            I know. So far, though, all the best bots for Starcraft were defeated by humans trivially despite defeating other bots with amazing displays of micro or planning prowess. The humans just spotted their patterns then worked around them. Occasionally, they bluffed them to hilarious end.

                            An example was the bot who did battles between units based on a scoring mechanism to determine the strength of one, a group, and so on. Human sensed this. Counter was to send large group of units at an enemy group to make them scatter. Then, split that group into smaller ones to go after each individual enemy unit. Scoring made enemy units weaker than what was sent after them. So, enemy units always would flee without firing a single shot back at human players despite actually being able to defeat a bunch of them. Another example was even simpler where human just ran some units randomly through the enemy bases while building an army to hit them. Enemy AI was unprepared as it wasted all its focus countering the “attack” since it couldn’t call a bluff.

                            Examples;

                            https://www.cs.mun.ca/~dchurchill/starcraftaicomp/history.shtml

                          2. 2

                            It’s not StarCraft-specific, but here’s some progress on the graphics recognition you wanted: https://arxiv.org/abs/1805.11592

                            1. 1

                              That’s really neat. Thanks! On the other end of things, there’s some circumstantial evidence that my prediction in favor of humans on Starcraft will be right.

                            2. 2

                              I agree with Nick that we’re far from bots dominating all games. I’m not sure Starcraft will hold out that long once they get going on it because computers have an inherent edge in the reflex-based/micro parts of the game.

                              I’d love to see someone work on competitive bots for more complex games of imperfect information, like Magic or Netrunner. Although I suppose computers have a similar edge to the one they have in Starcraft, in that they can count cards and compute probabilities much faster and more reliably that humans.

                              I’m also interested in competitive card games because metagaming (i.e. having a good understanding of the dominant strategies and styles of play, how to beat the others with yours, etc.) is necessary to do well. How will we give the machines metagame knowledge? Having a metagame also complicates the definition of “fair” for human vs. computer play.

                              1. 2

                                Indeed, even amateur bot authors report a significant edge:

                                This achieves an “idle” APM of about 500 and its battle APM is between 1000 and 2000 (as measured by SC2’s replay counter.)

                                I’m not into SC2, but I found some fans saying tournament-level humans operate around 150 APM with spikes into the 300-600 range.

                                I think metagaming is a distinction without a difference to AIs like Alpha Go.

                                1. 1

                                  I think metagaming is a distinction without a difference to AIs like Alpha Go.

                                  I don’t think you’re correct, at least not for all games.

                                  Consider this scenario:

                                  I’m trying to build an AI for a rock-paper-scissors-lizard-spock tournament. An AI like Alpha Go will ultimately find the optimal strategy: pick uniformly at random. However humans suffer from biases, so a human who has studied their opponent’s habits may place better in the tournament than the AI.

                                  In games like Go, this doesn’t seem to end up being particularly important. In games like Starcraft, I could be convinced that it’s not too important. In games like MtG, I think it’s quite important, at least until the AI successfully models deck construction as part of the game, which is qualitatively different than what Alpha Go has already done.

                                  1. 2

                                    There’s metagame in Chess and Go, though they’re certainly much smaller than in MtG. The popularity of openings and responses to common gambits has changes over the years just like it does in MtG. It didn’t matter.

                                    “Metagame” is a convenient distinction for people to talk about playing matches in the larger game of deckbuilding, but that doesn’t make it any less of a game. The task of analyzing and responding to “metagame” strategies is exactly the same as the task of playing the live game itself.

                                    It’s going to be years before a state-of-the-art AI implementer turns to MtG, but I’m not going out on a limb when I say that humans will lose.

                                    1. 1

                                      but I’m not going out on a limb when I say that humans will lose.

                                      I agree that humans will lose eventually, but I think your original claim of “distinction without a difference” is too strong, because it asserts that humans will lose to AI that is largely of the same form as the AI now.

                                      I would not be surprised (though I am not confident it is the case) if the success of Alpha Go’s architecture is limited in MtG and if competing with humans requires a different architecture.

                                2. 1

                                  computers have an inherent edge in the reflex-based/micro parts of the game

                                  Tournament organizers might initially allow computers to have that edge, but eventually, to keep the games interesting, I expect they would cap the AI’s actions per minute to the level of a top human.

                                  1. 3

                                    That’s what I recommend at first. It will help us assess what they can pull off given similar constraints to humans. I’m sure someone will complain that it’s basically cheating for humans because the computers could do better. In that case, I would be fair by limiting the computer to the size and energy consumption of the human brain but removing the actions per minute limit. The bot authors would ask to reinstate the APM limit instead.

                                3. 1

                                  I want to see it combine 3D object recognition

                                  Here’s an approach.

                                4. 1

                                  Your comment made me lookup the state of Arimaa, a game designed to be difficult for computers using then-standard techniques: TIL that computers beat humans at it in 2015 (source).

                                1. 10

                                  You don’t want to require programmers to have a degree, so C++ is out.

                                  As a Haskell programmer, this made me laugh.

                                  1. 3

                                    I can only report from personal experience, but Johns Hopkins might be a good consideration. They have a very practical CS department, and there are plenty of theory-heavy courses as well. There is nothing Haskell-specific, though. Let me know if you have questions.

                                    1. 9

                                      It should be pointed out that this experiment was abandoned after 10 days. But it was done entirely in a text-based terminal: no display servers at all. The criticisms of the experience are what you would expect of such a stunt.

                                      1. 1

                                        This kinda sours me on the rest of the article:

                                        But for security-related applications it is completely opposite: you definitely don’t want anyone to be able to figure out your randomly generated private key. If we wanted to generate random values for cryptographic purpose, we definitely wouldn’t want the random generator to be so explicitly predictable.

                                        It incorrectly implies that deterministic randomness is incompatible with cryptography, when the truth is the exact opposite. It conflates being able to predict future values using the key with being able to recover the key given outputs; the latter is anathema to crypto, but the former is reasonable.

                                        1. 1

                                          Yes, a stream cipher is perfectly dependent and predictable based on the key, but that’s really just rearranging the problem. Where does the seed for your RNG come from in a functional world?

                                          1. 1

                                            Plausibly, a KDF and a user’s password. Of course, now the question is how you get the salt.

                                            I understand that you need an IO Something; I just really didn’t like the way the above paragraph tried (and IMO failed) to communicate that fact.

                                        1. 1

                                          I get that leaking associated contact info and potential passwords is bad but the author seems really against username enumeration. However, I don’t see how that’s avoidable. If usernames are unique, then there’s probably some preemptive username availability checker that will help you enumerate. Even if there isn’t, you can just automate account creation and if you’re rejected because a username exists already, then you hit a collision.

                                          1. 1

                                            You can make it much harder.

                                            First of all, you can set rate limits to make username enumeration by account creation infeasible.

                                            Second of all, you can do away with usernames in favor of emails, at which point you don’t need to worry about notifying the web page user that account creation failed; just send an email saying that the account already exists, but that doesn’t pose an enumeration risk because the information is only available to someone who has access to the email. I guess if you and your spouse both sign up for Ashley Madison using your shared email, that’s a problem, but I think you have bigger things to worry about in that case.

                                          1. 3

                                            “Literally ten billion people whose surnames start with “O’” live in Ireland”

                                            There are fewer than 5 million people living in Ireland. Makes me worry about how well researched this is…

                                                1. 6

                                                  https://en.wiktionary.org/wiki/literally#Usage_notes

                                                  It’s literally in the dictionary that using “literally” to as a generic intensifier for figurative statements (i.e. to mean “not literally”) has been part of the language since the 1800s.

                                                  1. 3

                                                    It also says that anyone who isn’t a complete savage rejects this, although not literally.

                                                    1. 2

                                                      Only if you take a prescriptivist view on language; descriptivists have no choice but to accept it.

                                                2. 2

                                                  If I was doing some research about how to write my schema, I wouldn’t apprechiate unnecessary rhetoric and exaggerations.

                                                  EDIT: To clarify, I’m not making the same argument as adsouza, I don’t doubt that the article is well-researched.

                                                  1. 8

                                                    I don’t think that’s the point though, this isn’t a well researched, peer-reviewed document, this is somebody expressing frustrations with existing systems, and pointing out some good “you should do this” points, in contrast to the existing articles which are all “don’t do this thing”.

                                                3. 1

                                                  They only have to fill in 6 web forms a day to hit 10 billion O' a year :~)

                                                1. 3

                                                  „What is there in name? It is merely an empty basket, until you put something into it.“

                                                  – Charles Babbage, Passages from the Life of a Philosopher

                                                  1. 4

                                                    I’m sorry, your basket has the wrong shape and won’t fit into our narrowly constrained schema.

                                                  1. 1

                                                    When I think about security, in both the virtual and physical world, it really all just comes down to time. How long can you stall until someone breaks into something? A cheap No. 3 Masterlock with a piece of wire - 3 seconds; A Kryptonite bike lock - 4 minutes with a saw; a heavy 1 ton vault bolted to the ground - a few hours with a drill and other tools. I then thought, what if we got a 3rd party to hide our valuables, somewhere on the earth without our knowledge and then they died? Those valuables are STILL not safe. It’s only a matter of time until someone finds them. Cryptography seems to be the same. It’s only a matter of time until we develop techniques that make breaking through encryptions within a reasonable period.

                                                    So far the only secure place to store information is within your own mind. But even, you may succumb to torture or bribery to pass up that information. The mind itself has one gate keeper and that’s the owner. There are no passwords, there are no identifications.

                                                    When I think about it, we already sort of have this in the virtual world. If you want someone to know something, you can send them a file. In real life, you would speak to this person face to face or give them a document in person.

                                                    AFAIK there is only one unbreakable encryption, and that’s XORPADs. I would opt to use them for sensitive information in combination with modern cryptographic methods to create a very temporary file or something. I also wonder, are there time-sensitive cryptographic techniques? Like a document expires after n seconds, hours, days?

                                                    Just my 2 cents, and probably nothing new.

                                                    1. 3

                                                      AFAIK there is only one unbreakable encryption, and that’s XORPADs. I would opt to use them for sensitive information in combination with modern cryptographic methods to create a very temporary file or something.

                                                      I’ve got two big complaints about this:

                                                      1. There is no meaningful way to combine one-time pads (OTPs, what you probably mean when you say XORPADs) with any other technique without losing all the benefits of the OTP. OTPs are, in many ways, not a form of encryption (the key is not reusable and must be as long as the message); they are better thought of as a form of information splitting (like Shamir secret sharing)

                                                      2. Is encryption still considered breakable if, under the wildest, most optimistic assumptions of computational growth, the chance of breaking it before the sun goes nova is less than the chance of the oceans spontaneously boiling? Because cryptography is making steady progress in that direction?

                                                      I also wonder, are there time-sensitive cryptographic techniques? Like a document expires after n seconds, hours, days?

                                                      Not without quantum computers. Let me elaborate:

                                                      A fundamental property of (classical, as opposed to quantum) digital data is that it can be copied. Normal encryption techniques are about taking the data and transforming it into a form that’s unusable without the right other piece of data. This data is now inert, and can be decrypted whenever anyone wants.

                                                      One attempt to get around this is DRM, but that would be a cop-out answer, since it relies on some external service to grant permission; a safety deposit box at a bank is not a safe.

                                                      Quantum data, on the other hand, cannot generally be copied, and without intervention degrades over time (these are part of the reason building quantum computers is hard).

                                                      1. 1

                                                        If 2 is satisfied, then I would be happy. Otherwise, I would like to see one time use keys or something that prevents the document from being shared more than once…at least the original.

                                                        And yeah, I meant OTP.

                                                    1. 6

                                                      I really wonder if everyone in China looks at her and thinks yeah, weird, but doesn’t look like a prostitute. She seems to indicate that only westerners get a rise out of her appearance? She posted albums where she’s walking around Shenzhen and “nobody cares”. I find this so incredibly hard to believe, as if sexism were over. I’m sure most people are merely just politely looking away? Isn’t she going to feel some sort of societal ostracism in any way?

                                                      I just can’t believe that China is the feminist utopia we should be striving for, where a woman can modify her appearance like that just for fun, without any societal pressure to do so, and walk around completely free of all judgement. It seems way too incredible.

                                                      1. 8

                                                        It’s possible that sexism in China takes different forms than slut-shaming (like compulsory sexuality and glass ceilings); it could be that body modifications to improve appearance are explicitly encouraged, and that women who do not engage are shamed (not exactly prude-shaming or ugly-shaming, but similar).

                                                        There are plenty of ways for the culture to be vastly different and lack slut-shaming without being a feminist utopia.

                                                        1. 2

                                                          I think this is more like it. After all, China’s Missing Women indicates a different kind of problem.

                                                        2. 5

                                                          I for one would like to take her at her word.

                                                          Some day, one hopes, sexism may well be over and part of that will be giving up on trying to explain away lived experiences.

                                                          1. 3

                                                            After looking at her 360 video, I think what’s going on here is a bit of an internet misunderstanding.

                                                            She posts on Reddit and online she gets a lot of reactions, many of them negative and name-calling. I don’t know if this happens to her on the Chinese internet or not. Online, though, people tend to voice and say a lot of things that they would be more quiet about in person.

                                                            Now look at the reaction she gets in person where she’s walking around. Most people ignore her but some do stare at her, as you can see in that 360 video, both men and women (cool thing, these 360 videos, btw). That’s more or less is equal to the response I would expect she would get around a big city like here in Montréal, except maybe with more cat-calling from men. (Catcalling is reportedly rare in China and East Asia? Wow.) I expect that in a more rural area both in China and in America (América), she might get more of a reaction?

                                                            She acknowledges that she changed her appearance to get a reaction. What I think is the internet misunderstanding is that she’s expecting to get an angry reaction if she walks around like that in the West, because she gets angry reactions online. She might get anger in some places, and she’ll certainly get street harassment, but I doubt in most large urban centres most people would do much more than discreetly stare.

                                                        1. 5

                                                          How do you “explain” a multilayer neural network?

                                                          1. 12

                                                            That’s a good point, and one of the reasons that certain industries can’t use neural networks. I’ve heard that credit card companies have to use something like a decision tree because they have to be able to prove that race wasn’t a factor in the decision.

                                                            1. 7

                                                              Wow, that’s interesting to know. Leaving aside the clash of politics and science/engineering, why can’t they just use a NN and leave out race data from the feature dimensions? I would expect the result is the same.

                                                              1. 11

                                                                From TFA:

                                                                It is important to note that paragraph 38 and Article 11 paragraph 3 specifically address discrimination from profiling that makes use of sensitive data. In unpacking this mandate, we must distinguish between two potential interpretations. The first is that this directive only pertains to cases where an algorithm is making direct use of data that is intrinsically sensitive. This would include, for example, variables that code for race, finances, or any of the other categories of sensitive information. However, it is widely acknowledged that simply removing certain variables from a model does not ensure predictions that are, in effect, uncorrelated to those variables (e.g. Leese (2014); Hardt (2014)). For example, if a certain geographic region has a high number of low income or minority residents, an algorithm that employs geographic data to determine loan eligibility is likely to produce results that are, in effect, informed by race and income.

                                                                Thus a second interpretation takes a broader view of ‘sensitive data’ to include not only those variables which are explicitly named, but also any variables with which they are correlated. This would put the onus on a data processor to ensure that algorithms are not provided with datasets containing variables that are correlated with the “special categories of personal data” in Article 10.

                                                                However, this interpretation also suffers from a number of complications in practice. With relatively small datasets it may be possible to both identify and account for correlations between sensitive and ‘non-sensitive’ variables. However, as datasets become increasingly large, correlations can become increasingly complex and difficult to detect. The link between geography and income may be obvious, but less obvious correlations—say between browsing time and income—are likely to exist within large enough datasets and can lead to discriminatory effects (Barocas & Selbst, 2016). For example, at an annual conference of actuaries, consultants from Deloitte explained that they can now “use thousands of ‘non-traditional’ third party data sources, such as consumer buying history, to predict a life insurance applicant’s health status with an accuracy comparable to a medical exam” (Robinson et al., 2014). With sufficiently large data sets, the task of exhaustively identifying and excluding data features correlated with “sensitive categories” a priori may be impossible. The GDPR thus presents us with a dilemma with two horns: under one interpretation the non-discrimination requirement is ineffective, under the other it is infeasible.

                                                                1. 4

                                                                  Right. And depending on your threshold for correlated, you can’t use ANY variable.

                                                                  It’s also interesting that gender, marital status and age are not excluded - at least in the US. Car insurance rates are gender, age and marital status dependent.

                                                                  1. 1

                                                                    Right. And depending on your threshold for correlated, you can’t use ANY variable.

                                                                    That’s the second horn of the dilemma mentioned in the last line.

                                                                    It’s also interesting that gender, marital status and age are not excluded - at least in the US. Car insurance rates are gender, age and marital status dependent.

                                                                    I think this is because it is possible to make a specific business case for it; all three are considered protected classes, and are forbidden from discrimination in other cases (like employment and housing).

                                                                    Also responding to your previous comment, there are variables which can be used with high fidelity as proxies for e.g. race or sex, like name.

                                                                  2. 2

                                                                    Is there any way to look for correlations with protected classes in the data, and remove those correlations, while still preserving the ability of making inferences off whatever information remains?

                                                                    1. 1

                                                                      If there is, it’s probably related to differential privacy. It is subject to a problem of incentives, though; what motivation does anyone have to make the filter good?

                                                              2. 3

                                                                Map each connection to an input for a markov generator?

                                                                1. 4

                                                                  :) One could also print out the weight matrix …

                                                              1. 3

                                                                I’ve been faced with this problem lately - finding a robust, online way to calculate the mean of a stream of numbers coming in. It is indeed a harder problem than it seems.

                                                                My approach is to take an N-nary tree and prune the branches that aren’t needed. So, effectively, for X numbers, I’d be taking logN(X) nested running averages (Of The last 1-N values - updated each insert, Of the last N-NxN values - Updated every N inserts (ei, the running average of the previous complete running averages), and so-on repeat logN(x) times + balancing/rollover operations, which boil down adding a new node at the top). At any point, the mean is an appropriately weighted combination of these running averages.

                                                                Each running average involves summing P numbers where P < N and then dividing by N - so you need a double-double but you should get “minimal” error overall, you shouldn’t get accumulating error and other bad things.

                                                                If anyone knows or can think-of any Hole/Gotchas to this approach, I’d love to hear them.

                                                                1. 3

                                                                  What goes wrong with the naive streaming approach of m_1=x_1, m_n = m_(n-1) * (n-1)/n + x_n/n (possibly with some standard fp math tricks I don’t know about)?

                                                                  1. 3

                                                                    The problem I’d see with the naive running average is that as n becomes large and then (n-1)/n is going underflow to be 1, 1/n become zero and you’re screwed. Plus errors accumulate with each insertion.

                                                                    The virtue I’d claim with my approach is that you’re never dividing by more than a fixed constant. And you can make most of divisions be by this constant, which you can choose to be a power of 2, which should give minimal error if done appropriately.

                                                                  2. 2

                                                                    It seems like something that would be in the published literature, but google scholar isn’t finding much for me. This paper has something related, an addition algorithm that minimizes error: http://dx.doi.org/10.1109/ARITH.1991.145549 - maybe it could be inspiration, or there might be something useful in that journal / by that author?

                                                                    1. 2

                                                                      Ah, it seems like anything that emulates arbitrary precisions arithmetic would naturally guarantee exactness. And If you keep a running sum with arbitrary precision arithmetic and divide only at the end, the result algorithm is more or less identical to the approach I have been thinking of - if you break the process down to operations on regular floats.

                                                                  1. 4

                                                                    One threat to validity not mentioned is male vs. female speech patterns; if gender was still identifiable, bias could still creep in.

                                                                    Pretty interesting data nonetheless.

                                                                    1. 4

                                                                      I disagree with the relativization of static typing value. In my opinion, explicit encoding of our assumptions about the code is a good thing. And unlike dynamic typing, static typing can validate whole program before running it and thus prove our assumptions to be absolutely correct.

                                                                      Multiple researchers are currently working on static contract verification, which might end this debate once and for all. But until it has been deployed, I would suggest giving statically typed languages a chance.

                                                                      And it is not (only) about speed.

                                                                      1. 1

                                                                        I’m not aware of multiple researchers (unless you mean one research group, which happens to have multiple members); do you have references?

                                                                        1. 1

                                                                          I first encountered it in Racket SCV but a cursory search returned at least one paper from Microsoft Research with different authors, so I assumed the “multiple”. Specifically PLT and SPJ’s students.

                                                                          Aside: How many research clusters are there anyway? Haskell, Racket, OCaml, .NET… anyone else?

                                                                          1. 1

                                                                            Lots. I definitely don’t know anywhere near all of them, and most of them don’t fall into any of the above (either working on a more esoteric language, e.g. Coq, Agda, Idris, Kitten, or not focusing on any particular language). Also, within the language groupings, there are typically many clusters.