1. 5

    Google contributes suprisingly little back to in terms of open source compared to the size of the company and the number of developers they have. (They do reciprocate a bit, but not nearly as much as they could.)

    For example this is really visible in the area where they do some research and/or set a standard like with compression algorithms (zopfli, brotli), network protocols (HTTP/2, QUIC), the code and glue they release is minimal.

    It’s my feeling that Google “consumes”/relies on a lot more open source code than they then contribute back to.

    1. 10

      Go? Kubernetes? Android? Chromium? Those four right there are gargantuan open source projects.

      Or are you specifically restricting your horizon to projects that aren’t predominantly run by Google? If so, why?

      1. 11

        I’m restricting my horizon for projects that aren’t run by Google because it better showcases the difference between running and contributing to a project. Discussing how Google runs open source projects is another interesting topic though.

        Edit: running a large open source project for a major company is in large part about control. Contributing to a project where the contributor is not the main player running the project is more about cooperation and being a nice player. It just seems to me that Google is much better at the former than the latter.

        1. 2

          It would be interesting to attempt to measure how much Google employees contribute back to open source projects. I would bet that it is more than you think. When you get PRs from people, they don’t start off with, “Hey so I’m an engineer at Google, here’s this change that we think you might like.” You’d need to go and check out their Github profile and rely on them listing their employer there. In other words, contributions from Google may not look like Contributions From Google, but might just look like contributions from some random person on the Internet.

          1. 3

            I don’t have the hat, but for the next two weeks (I’m moving teams) I am in Google’s Open Source office that released these docs.

            We do keep a list of all Googlers who are on GitHub, and we used to have an email notification for patches that Googlers sent out before our new policy of “If it’s a license we approve, you don’t need to tell us.” We also gave blanket approval after the first three patches approved to a certain repo. It was ballpark 5 commits a day to non-Google code when we were monitoring, which would exclude those which had been given the 3+ approval. Obviously I can share these numbers because they’re all public anyway ;)

            For reasons I can’t remember, we haven’t used the BigQuery datasets to track commits back to Googlers and get a good idea of where we are with upstream patches now. I know I tried myself, and it might be different now, but there was some blocker that prevented me doing it.

            I do know that our policies about contributing upstream are less restrictive than other companies, and Googlers seem to be happy with what they have (particularly since the approved licenses change). So I disagree with the idea that Google the company doesn’t do enough to upstream. It’s on Googlers to upstream if they want to, and that’s no different to any other person/group/company.

            1. 2

              So I disagree with the idea that Google the company doesn’t do enough to upstream.

              Yeah, I do too. I’ve worked with plenty of wonderful people out of Google on open source projects.

              More accurately, I don’t even agree with the framing of the discussion in the first place. I’m not a big fan of making assumptions about moral imperatives and trying to “judge” whether something is actually pulling its weight. (Mostly because I believe its unknowable.)

              But anyway, thanks for sharing those cool tidbits of info. Very interesting! :)

              1. 3

                Yeah, sorry I think I made it sound like I wasn’t agreeing with you! I was agreeing with you and trying to challenge the OP a bit :)

                Let me know if there’s any other tidbits you are interested in. As you can tell from the docs, we try to be as open as we can, so if there’s anything else that you can think of, just ping me on this thread or cflewis@google.com and I’ll try to help :D

                1. 1

                  FWIW I appreciate the effort to shed some light on Google’s open source contributions.Do you think that contributions could be more systemic/coordinated within Google though, as opposed to left to individual devs?

                  1. 1

                    Do you think that contributions could be more systemic/coordinated within Google though, as opposed to left to individual devs?

                    It really depends on whether a patch needs to be upstreamed or not, I suppose. My gut feeling (and I have no data for this) and entirely personal and not representative of my employer opinion, is that teams as a whole aren’t going to worry about it if they can avoid it… often the effort to convince the upstream maintainers to accept the patch can suck up a lot of time, and if the patch isn’t accepted then that time was wasted. It’s also wasted time if the project is going in a direction that’s different to yours, and no-one really ever wants to make a competitive fork. It’s far simpler and a 100% guarantee of things going your way if you just keep a copy of the upstream project and link that in as a library with whatever patches you want to do.

                    The bureaucracy of upstreaming, of course, is working as intended. There does have to be guidance and care to accepting patches. Open source != cowboy programming. That’s no problem if you are, say, a hobbyist who is doing it in the evenings here and there, where timeframes and so forth are less pressing. But when you are a team with directives to get your product out as soon as you can, it generally isn’t something a team will do.

                    I don’t think this is a solved problem by any company that really does want to commit back to open source like Google does. And I don’t think the issue changes whether you’re a giant enterprise or a small mature startup.

                    This issue is also why you see so much more open source projects released by companies rather than working with existing software: you know your patches will be accepted (eventually) and you know it’ll go in your direction, It’s a big deal to move a project to community governance as you now lose that guarantee.

        2. 0

          Chromium?

          Did you ever tried to compile it?

          1. 2

            Yeah, and?

            1. 0

              How much time it took? On which hardware?

              1. 1

                90 minutes, on a mid-grade desktop from 2016.

                1. 1

                  Cool! You should really explain to Google your build process!

                  And to everybody else, actually.

                  Because a convoluted and long build process, concretely reduce the freedom that an open source license gives you.

                  1. 1

                    Cool! You should really explain to Google your build process!

                    Google explained it to me actually. https://chromium.googlesource.com/chromium/src/+/lkcr/docs/linux_build_instructions.md#faster-builds

                    Because a convoluted and long build process, concretely reduce the freedom that an open source license gives you.

                    Is the implication that Google intentionally makes the build for Chromium slow? Chromium is a massive project and uses the best tools for the job and has made massive strides in recent years to improve the speed, simplicity, and documentation around their builds. Their mailing lists are also some of the most helpful I’ve ever encountered in open source. I really don’t think this argument holds any water.

        3. 5

          The amount Google invests in securing open source software basically dwarfs everyone else’s investment, it’s vaguely frightening. For example:

          • OSS-Fuzz
          • Patch Rewards for OSS projects
          • Their work on Clang’s Sanitizers and libFuzzer
          • Work on the kerne’s self protection program and syzkaller
          • Improvements to linux kernel sandboxing technologies, e.g. seccomp-bpf

          I don’t think anyone else is close, either by number (and severity) of vulnerabilities reported or in proactive work to prevent and mitigate them.

          1. 2

            Google does care a lot about security and I know of plenty of positive contributions that they’ve made. We probably could spend days listing them all, but in addition to what you’ve mentioned project zero, pushing the PKI towards sanity, google summer of code (of which I was one recipient about a decade ago), etc all had a genuinely good impact.

            OTOH Alphabet is the world’s second largest company by market capitalization, so there should be some expectation of activity based on that :)

            Stepping out of the developer bubble, it is an interesting thought experiment to consider if it would be worth trading every open source contribution Google ever made for changing the YouTube recommendation algoritm to stop promoting extremism. (Currently I’m leaning towards yes.)

        1. 26

          I’ve talked to some people in and close to this industry and it feels like we’re a good 15 years away from autonomous vehicles. The other major issue we’re not addressing is that these cars cannot be closed source like they are now. At a minimum, the industry needs to share with each other and be using the same software or same algorithms. We can’t enter a world where Audi claims their autonomous software is better than Nissan’s in adverts.

          People need to realize they won’t be able to own these cars or modify them in any way if they ever do come to market. The safety risks would be too great. If the cars are all on the same network, one security failure could mean a hacker could kill thousands of people at once.

          I really think the current spending on this is a huge waste of money, especially in America when tax money given to companies to subsidize research could be used to get back the trains system we lost and move cities back inward like they were in the earlier 1900s. I’ve written about this before:

          http://penguindreams.org/blog/self-driving-cars-will-not-solve-the-transportation-problem/

          1. 20

            If the cars are all on the same network

            Any company that is connecting these cars to the Internet is being criminally negligent.

            I say that as an infosec person who worked on self-driving cars.

            1. 3
              1. 2

                They have to be able to communicate though to tell other cars where they intend to go or if there is danger ahead.

                1. 7

                  It’s called blinkers and hazard lights.

                  1. 9

                    That’s just networking with a lot of noise in the signal.

                    1. 7

                      Networking that doesn’t represent a national security threat, and nothing that a self-driving car shouldn’t already be designed to handle.

                      1. 3

                        What happens when someone discovers a set of blinker indications that can cause the car software to malfunction?

                    2. 1

                      Serious question (given that you’ve worked on self-driving cars): is computer vision advanced enough today to be able to reliably and consistently detect the difference between blinkers and hazards for all car models on the roads today?

                      1. 2

                        As often is the case, some teams will definitely be able to do it, and some teams won’t.

                        Cities and States should use it as part of a benchmark to determine which self-driving cars are allowed on the road, in exactly the same way that humans must pass a test before they’re allowed a drivers license.

                        The test for self-driving cars should be harder than the test for humans, not easier.

                    3. 2

                      They could use an entirely separate cell network that isn’t connected to the Internet. All Internet enable devices, like the center console, could use the standard cell network and they have a read-only bus between the two for sensor data like speed, oil pressure, etc.

                  2. 11

                    The other major issue we’re not addressing is that these cars cannot be closed source like they are now.

                    I strongly agree with this. I believe autonomous vehicles are the most important advancement in automotive safety since the seatbelt. Can you imagine if Volvo had kept a patent on the seatbelt?

                    The autonomous vehicle business shouldn’t be about whose car drives the best, it should be about who makes the better vehicles. Can you imagine the ads otherwise? “Our vehicles kill 10% fewer people than our competitors!” Ew.

                    1. 2

                      I don’t buy your initial claims.

                      When you said “we’re 15 years away from autonomous vehicles”, what do you mean exactly? That it’ll be at least 15 years before the general public can ride in them? Waymo claims this will happen in Pheonix this year: https://amp.azcentral.com/amp/1078466001 That the majority of vehicles on US roads will be autonomous? Yeah, that’ll definitely take over 15 years!

                      We can have a common/standard set of rigorous tests that all companies need to pass but we don’t need them to literally all use the same exact code. We don’t do that for aeroplanes or elevators either. And the vanguard of autonomous vehicles are large corporations that aren’t being funded by tax dollars.

                      That said, I agree that it would be better to have more streetcars and other light rail in urban areas.

                      1. 6

                        It will be at least 15 years before fully autonomous vehicles are available for sale or unrestricted lease to the general public. (In fact, my estimate is more like twice that.) Phoenix is about the most optimal situation imaginable for an autonomous vehicle that’s not literally a closed test track. Those vehicles will be nowhere near equipped to deal with road conditions in, for example, a northeastern US winter, which is a prerequisite to public adoption, as opposed to tests which happen to involve the public.

                        Also, it’s a safe bet this crash will push back everyone’s timelines even further.

                        1. 1

                          I think you are correct about sales to the public but a managed fleet that the public can use on demand in the southern half of the country and the west coast seems like it could happen within 15 years.

                    1. 1

                      Excellent with pockets of Poor and Exceptional

                      Excellent is pretty standard, but sometimes we come up against deadlines and I push myself pretty hard. Once the deadline has passed, I swing into the Exceptional category and then taper back to Excellent. Thankfully, we have a management chain who appreciates the work we put in and as a result when we’re up against deadlines most of us are more than happy to put in the extra time.

                      1. 4

                        How do you know that this isn’t their long term plan? Porting Edge to Android would be an absolutely massive effort. There is no way anyone at Microsoft would do that without at least getting something on the market first.

                        1. 31

                          Anyone who refers to their own teammates as their “opponent” must be an absolute pleasure to work with…

                          1. 3

                            Precisely this. Code review is impossible to get “right” when you have culture problems like folks wanting to help their friends get ahead (turning a blind eye to messy but working code) and see others fall behind (and nitpick senseless details). In almost every pre-commit review productive team I have been on, folks make little alliances to review quickly and honestly and ship each other’s code.

                          1. 24

                            This article starts by using an example of 8 “bad” password rules, but by the end of the post, Jeff ends up suggesting 5 of the 8 anyway. This post really should have been called “special character requirements are bullshit.”

                            I’d be willing to bet that a future version of Discourse will also disallow using your previous password as well. Then we’ll get another password blog post talking about how hard passwords are and how we need more rules for passwords. Experience is a funny thing.

                            1. 3

                              This is a bit clickbait-y. I doubt that slower fsync calls would actually surface as 10x lag from the user’s perspective.

                              1. 54

                                Ugh. I’m pretty happy sticking with Python 2, but this post is so bad I’m tempted to switch to Python 3. Even as a joke the Turing complete section is just stupid.

                                1. 23

                                  I couldn’t tell whether the Turing Complete section was a joke or just profoundly confused, to be honest. Conflating the language with the bytecode, claiming that one VM can’t run the bytecode generated by another language (or maybe complaining that the bytecode changed? or that there’s no py2 compiler targeting py3 bytecode?), and trying to tie that to a fundamental property that even “languages” like SQL and Excel have ….

                                  It was all very muddle-headed.

                                  Don’t get me wrong, I know he’ll claim that it was a joke, later. That’s not in question. I’m just not sure if it actually is a joke.

                                  1. 20

                                    I don’t think this is meant as a joke. The “post” is a chapter in his book Learn Python the Hard Way.

                                    Difficult To Use Strings

                                    Wait.. what? Strings are a mess in Python 2. They have been cleaned up in Py3.

                                    Python 3 Is Not Turing Complete

                                    No comment.

                                    Purposefully Crippled 2to3 Translator

                                    It took me about one day of work to port a 70k lines django application to Py3 with the 2to3 tool. That’s was one year ago. Since then I’ve only found two bugs caused by the conversion. Doesn’t seem that bad to me.

                                    Too Many Formatting Options

                                    Yes, I can agree with that. This is the only valid criticism in that chapter.

                                    1. 15

                                      I agree, but just as a data point, it’s taken about 3 people-weeks for us to port a large (~200kloc) Django application (over the course of several months).

                                      The points that made this take a while:

                                      • We tried to maximize the amount of changes that were Py2 + Py3 compatible. This meant pulling things in over time, heavy usage of six, and catching a lot of bugs early on. Highly recommended for easy reviewing by third parties. For example: a couple changesets that were just “use six for more imports”.

                                      • we deal with many different encodings, and a lot of old text-to-bytes conversion code was pretty brittle. Py3 forced us to fix a lot of this

                                      • imports! 2to3 generated way too many false positives for our taste (hard to review the real changes), so it took a while to find the right solution (for us: a monkey-patched six that would minimize changes)

                                      • changes of standard API from lists to iterators generated a decent amount of busy work. Changes for the better, though.

                                      • Handling binary files. Here, too, we were often doing the wrong thing in Py2, but it would “just work” before.

                                      • Lots of dependencies that we needed to upgrade to get the Python 3 support.

                                      • Pretty minor things around bytes’s defautl __str__ method. For example, checking the output of a process call, we would do if output == "'0'", and that would fail because `“b'0'” != “‘0’” but that turned out to cause more issues down the road.

                                      • issues around pickling + celery. Our solution mostly centered around lowering usage of pickle even more (dangerous)

                                      • deployment issues. Juggling Python 2 tooling and Python 3 tooling in the CI pipeline would sometimes mess things up.

                                      1. 4

                                        I can only recommend https://pypi.python.org/pypi/modernize

                                        Instead of translating x.iteritems() to x.items(), it translates it to six.iteritems(x), and adds the six import. Fixes 80% of the boring stuff and you only need to focus on unicode/bytes, etc.

                                        1. 2

                                          The idea of Python 3 was to make iteration cleaner the easier to read and understand. Now we have to insert calls to six in every loop and for every string. The result is hideous code that’s harder to read, not easier.

                                          1. 2

                                            I was writing this because

                                            We tried to maximize the amount of changes that were Py2 + Py3 compatible

                                            If that is not your objective, you can just use 2to3 and be ready.

                                            By the way, I really do not understand, why the CPython devs haven’t kept iteritems() as an alias to items() in Python 3 with a deprecation warning to be removed in Python 4. I cannot imagine that it would have been a massive maintenance effort. But on the other hand, making a function call instead of a method call is not rendering code unreadable. I have never heard a Python dev complain about calling str(foo).

                                            Essentially, Python 3 adoption has not been slow because of “readability”, but because Python 3 bundles fairly boring changes (iteritems->items) with this massive unicode change (requiring quite a few changes to code bases) and few breathtaking new features that weren’t available in 2.7. This changes at the moment with Async, matrix multiplication operators, etc.

                                      2. 3

                                        Python 3 Is Not Turing Complete

                                        No comment.

                                        Did you read the note?

                                        1. 12

                                          If anything, the note make it seems like he’s really serious about his Turing FUD.

                                      3. 9

                                        Joke or not, the fact that we even have to ask that question significantly harms the credibility of the article.

                                      4. 5

                                        Just out of curiosity, why are you sticking with Python2 for now? At this point all of the major libraries have been ported to be 2/3 compatible. In addition, while there aren’t any huge new features in Python3 (besides the byte/Unicode separation) there has been an accumulation of small quality of life improvements that from my point of view make it very much worth it to switch.

                                        1. 1

                                          Not sure about the comment author, but I’ve personally moved over some of my open source libraries to support both python 2 and 3. Moving larger project is tricky though as it involves updating the language, all dependencies and forking the ones that don’t support python 3.

                                        2. -7

                                          haha

                                        1. 2

                                          It is funny to me that their current solution for whitelisting Speedtest servers seems much more complicated than simply managing a whitelist. Before people start poking fun, I’m genuinely curious why they might have decided to do it this way.

                                          1. 1

                                            Started working on designs for an Electron-based IRC client. We use IRC at work and I find myself yearning for many of the features that clients like Slack provide, but unfortunately, most IRC clients are severely lacking in the…everything department. Just do a Google search for “inline images IRC.” I’ll wait.

                                            1. 2

                                              This was written in 2011 so a few important things have happened since then. That being said, I regularly work within the Chromium code base and I find most of this information to still be fairly accurate.

                                              1. 1

                                                What has changed?

                                                I know very little about front-end development and have been looking to learn it. Assume that I know nothing and will take whatever is said in the article (minus what you tell me is out-of-date) to be true.

                                              1. 12

                                                It’s 2016 and Node’s developers haven’t figured out yet that global names are never a good idea. Using UUIDs or as the article suggest namespaces would be a straightforward solution, but it’s amazing that they didn’t do that from the start.

                                                1. 4

                                                  What exactly would namespacing have solved in this instance?

                                                  Let’s review what happened (within the package repo):

                                                  • current version of left-pad is unpublished
                                                  • npm decides break their own rules to avoid breakage: re-publishes old version (normally impossible) under new author

                                                  If this scenario happened with namespaced package, what would stop npm from breaking their own rules again, and transfer ownership of the (now namespaced) package?

                                                  1. 6

                                                    Namespacing would prevent someone from republishing a malicious new version to anyone who uses caret dependencies.

                                                    Hell, namespacing could have prevented this entire fiasco since kik would really only need ownership over the “kik” namespace, not every package named kik.

                                                    1. 6

                                                      Namespacing would prevent someone from republishing a malicious new version to anyone who uses caret dependencies.

                                                      Again, if I unpublish a package from my namespace, and npm decides to break the rules to avoid massive breakage – what has changed?

                                                      Also keep in mind that I’m not arguing against namespacing in general, but I don’t think it would’ve helped with anything in this case.

                                                      Hell, namespacing could have prevented this entire fiasco since kik would really only need ownership over the “kik” namespace, not every package named kik.

                                                      IANAL but that seems speculative to me.

                                                      EDIT: See this DMCA for GitHub, which does have namespaces: https://github.com/github/dmca/blob/master/2014-02-12-WhatsApp.md

                                                      1. 4

                                                        No one can stop npm from doing whatever, but namespacing done by npm username would prevent a malicious third-party from uploading a new version of the package. I’m saying that is an improvement, not that it would have changed anything.

                                                        And yes, kik could still send a DMCA, but they would have much less reason for doing so. In this case, they probably wanted to publish a module for accessing their API and found that the logical name was already in use.

                                                        1. 1

                                                          namespacing done by npm username would prevent a malicious third-party from uploading a new version of the package

                                                          Something that hasn’t happened, and as far as I can see doesn’t happen. ~/^ dependencies are always a tradeoff and it’s not clear to me that upgrading to newer versions after a package was handed over to a third party isn’t the intended behaviour in that case.

                                                  2. 2

                                                    It’s 2016 and people are still using NPM as a build tool even after being explicitly told not to deserve their builds breaking because of political shit like this.

                                                  1. 1

                                                    Working on a Github issue auditor inspired by Docker’s “Gordon the Turtle.” I want to create a fluent interface that can be easily read and understood by contributors and have out-of-the-box support for deploying to AWS Lambda.

                                                    I’ve also been working on a blog post about how to design clear and maintainable shell scripts. I’ve been writing shell scripts for quite some time, but if you think you have any neat tricks I should incorporate, I’d appreciate it!

                                                    1. 1

                                                      Am I the only one who thinks this looks like the Comic Sans of monospace fonts?

                                                      1. 2

                                                        This one is the Comic Sans of monospace fonts http://www.dafont.com/pointfree.font

                                                        It’s actually fun to use every once in a while :)

                                                      1. 3

                                                        This is a really unique application of Lambda, I love it!

                                                        1. 1

                                                          Thanks!

                                                        1. 1

                                                          I’m surprised I havent yet found an alternative to github that isnt managed by a for profit corporation. Couldnt an open source or foss promoting organisation do the same thing but then funnel profits over to the foss projects they host?

                                                          1. 3

                                                            funnel profits over to the foss projects they host?

                                                            I highly doubt that it would even make enough money to keep the lights on. Let along give back.

                                                            1. 1

                                                              Perhaps, but it could also support hosting commercial open source projects.

                                                          1. 1

                                                            Pretty cool project, but it just makes me sad to think about redis-py. 86 open issues, 60 pull requests, and the last commit was in November. The maintainer also isn’t open to inviting collaborators. :(

                                                            1. 5

                                                              I must(?) be misunderstanding something, but:

                                                              confirms next Android version won’t implement Oracle’s proprietary Java APIs

                                                              and

                                                              is replacing its implementation of the Java application programming interfaces (APIs) in Android with OpenJDK, the open source version of Oracle’s Java Development Kit (JDK)

                                                              feel like two contradictory statements.

                                                              1. 11

                                                                Crappy reporting indeed. Google is moving from Google’s OpenSource API to Oracle’s OpenSource API. When I read badly reported stories like this I always wonder whether stories about things I have less insight about are as inaccurate.

                                                                1. 1

                                                                  That’s definitely something to wonder, yeah :|

                                                                  1. 1

                                                                    Michael Crichton came up with a name for this, The Gell Mann Amnesia Effect.

                                                                    http://www.goodreads.com/quotes/65213-briefly-stated-the-gell-mann-amnesia-effect-is-as-follows-you

                                                                    1. 3

                                                                      It really depends on the journalist. Some reporting doesn’t suck, although a lot of it does. Make friends with smart people with domain experience and they can usually recommend good articles. Would be great if there were a better way, but people pushing biased narratives seems to be the norm.

                                                                1. 2

                                                                  Looks really well developed, but doesn’t seem like it is very popular? Does anyone have any experience using this?

                                                                  1. 11

                                                                    The fact that works not explicitly licensed for public use are automatically copyrighted in the US is something that could probably be better explained when creating a new repository on GitHub and the like. I know that GitHub encourages you to choose a license when you create a repository, but it would be useful even after creation for the owners of repos without a license to receive some notice indicating that their work is not open source (as they very probably believe it is).

                                                                    Back on the article itself, it is definitely true that there are inherent risks in using an unsupported or poorly maintained open source library. Not everything (95% or more of what’s available, I imagine) is intended for direct use by others. An open source project may be a proof of concept, or a keeping of personal programming history. It may be made available as an example or inspiration to others, or as the accompaniment to a blog post or book or tweet. There are a great number of things published on sites like GitHub that are never intended for use as a component in a production system. There are no particular standards to hold things like this to.

                                                                    1. 2

                                                                      GitHub’s ToS actually say that if your code is in a public repo that other users have the right to fork and build upon it.

                                                                      1. 5

                                                                        Looking at the document, I don’t imagine that to be nearly the same as the rights afforded to users of a repo with an explicit license. GitHub’s TOS enumerates the right to view and fork, but not to modify the source in any way, nor to redistribute or make use of the works provided. I am not a lawyer, but I would be very interested to have some sort of analysis of the GitHub TOS' effect on the rights afforded to users of a GitHub repo without an explicit license, and how the wording squares with US intellectual property law.