Threads for sirwart

  1. 42

    I do not understand why “Don’t spy on people without their consent” is such a hard thing for programmers to accept.

    1. 21

      On the other hand, I don’t understand how collecting anonymous usage data that is trivial to opt out of is at all equivalent to spying or is harmful to anyone. I was hopeful when reading the original post that having an example of a well designed anonymous telemetry system would encourage other people to adopt that approach, but given it wasn’t treated any differently as non-anonymous telemetry by the community I don’t know why anyone would go through the effort.

      1. 23

        There is no such thing as “anonymous data” when it’s paired with an IP address.

        Even when it’s trivial to opt out, it’s usually extremely difficult to never use the software in a context where you haven’t set the opt-out flag or whatever. Opting out for one operation might be trivial, remaining opted out continuously across decades without messing up once is non-trivial.

        Just. Don’t. Spy. On. People. Without. Consent.

        1. 6

          I agree IP address is non anonymous, which is why this system doesn’t collect it. Most privacy laws also draw the line at collecting PII as where consent is required and I think that’s a reasonable place to draw the line.

          Most software and websites I use has far more invasive telemetry than this proposal, and I think my net privacy would be higher taking an approach like Go proposed rather than the status quo, which is why I was excited about it being a positive example of responsible telemetry. Good for you if you can go decades without encountering any of the existing telemetry that’s out there.

          1. 12

            How does the telemetry get sent to Google’s servers in a way which doesn’t involve giving Google the IP address?

            I agree that website telemetry is also an issue. But this discussion is about Go. There is no good example of responsively spying on users without their consent.

            1. 11

              You do have to trust Google won’t retain the IP addresses, but the Go module cache also involves exposing IP addresses to Google. I think the on by default but turn it off if you don’t trust Google is reasonable. I also trust that the pre-built binaries don’t contain backdoors or other bad code, but if you don’t want to trust that you can always compile the binaries from source.

              Anyways, I’m not trying to change your mind just trying to explain why some people don’t consider anonymous telemetry that’s opt-out to be non-consensual spying.

          2. 3

            guidance of both GDPR and CCPA is that an IP address is not considered PII until it is actively correlated / connected to an individual.

            None of the counters that are proposed to be collected contain your name, email, phone number or anything else that could personally identify you.

            1. 3

              IANAL, but collectioning data associated with an IP address (or some other unique identifier) definitely required consent under the GDPR.

              An IP address or UUID is considered pseudonymous data:

              ‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;

              https://gdpr-info.eu/art-4-gdpr/

              Pseudonymous data is subject to the GDPR:

              What differs pseudonymisation from anonymisation is that the latter consists of removing personal identifiers, aggregating data, or processing this data in a way that it can no longer be related to an identified or identifiable individual. Unlike anonymised data, pseudonymised data qualifies as personal data under the General Data Protection Regulation (GDPR). Therefore, the distinction between these two concepts should be preserved.

              https://edps.europa.eu/press-publications/press-news/blog/pseudonymous-data-processing-personal-data-while-mitigating_en

              1. 1

                That is some really creative copy pasting you did there. I am also not a lawyer but I don’t think it is super relevant for this proposal since they follow the first principle of data collection: “do not collect personal data”.

                Imagine the discussion goes like this:

                You: “Hello Google, I am a Go user and according to the GDPR I would like you to send me a dump of my personal data that was sent via the Go tooling telemetry. To which I OPTED-IN when it was released.”

                Google: “That data is anonymized. It is not connected to any personal data. We have the data you submitted but we cannot connect it to individuals.”

                You: “Here is my IP address, will that help?”

                Google: “No, we do not process or store the IP address for this data. (But thank you! now we know your IP! Just kidding!)”

                You: “Here is the UUID that was generated for my data, will that help?”

                Google: Unfortunately we cannot verify that is actually your UUID for this telemetry. And thus we don’t know whether you are requesting data for yourself.”

                ..

                1. 1

                  That is some really creative copy pasting you did there.

                  You can find all this in the GDPR. At any rate, I wasn’t criticizing The Go proposal, only the statement:

                  guidance of both GDPR and CCPA is that an IP address is not considered PII until it is actively correlated / connected to an individual.

                  But I see now that this is a bit ambiguous. I read it as analytics associated with IP addresses is not PII, which is not really relevant, since it is pseudonymization according to the GDPR and pseudonymous data is subject to the GDPR. But I think what you meant (which becomes clear from your example) was that in this case there is no issue, because even though Google may temporarily have your IP address (they have to if you contact their servers), but they are not storing the IP address with the analytics. I completely agree that the analytics data is then not subject to the GDPR. (Still IANAL.)

        2. 6

          For programmers, or the rest of the business?

        1. 55

          The standard library in Go.

          1. 30

            I have done lots and lots of Java and Python in my career before I used go. I honestly find the go stdlib just okay. There is usually something that does the trick, but I am not a super fan. I am also not buying this consistency thing. I deal a lot with strings unfortunately and this mix of fmt, strings, strconv,bytes to do anything with strings is not intuitive. I understand where go is coming from historically and from a design philosophy yet I don’t find it that superior.

            (personally I would love to see a language that is a bit more high level/expressive like python, but with the go runtime/deployment model)

            1. 7

              I started a security project specifically because of high quality cryptography code in the standard library like no other language.

              1. 13

                You mean the one where you can’t multiply a duration by a number, but you can multiply a duration by a duration?

                1. 3

                  Maybe I’m missing something, but dur := time.Hour * 2, as well as dur := 2 * time.Hour compile just fine.

                  1. 6

                    The literal is being implicitly converted to a duration. Try it with a variable instead of 2.

                    1. 5

                      Got it. However, that’s not really a limitation of the standard library, but rather a limitation of the language that prevents implicit type casting.

                      1. 7

                        The point is that mathematically, multiplying a number with a duration should work, whereas multiplying a duration with a duration should not.

                        1. 2

                          It never occurred to me that people would expect to be able to multiply an int by a duration and not multiply two durations together. Personally I’m grateful that Go doesn’t implicitly convert ints to durations or vice versa–I suspect this has prevented quite a few bugs.

                          1. -5

                            Have you ever had physics in school? You might want to repeat it.

                            I’m not talking about implicit conversions.

                            1. 3

                              I think the physics repeat remark might be a little heated for this context: we can all take a breath here and try to understand each other.

                              I’m personally of the opinion that multiplying an int by a duration implicitly is a bit of an anti-feature: I expect it to work in loosey-goosey languages like Python or Ruby, I even expect it to work in languages like Rust where the Into trait lets someone, somewhere, define explicitly how the conversion should occur (this starts getting into the realm of the newtype pattern from eg. Haskell), but I don’t expect two disparate types to multiply or add together, no, regardless of what those are.

                              To be extra clear: I think Into is the correct way to solve for the expected ergonomics here, and wish more languages had this type of explicit control.

                              1. 12

                                Well, thing is:

                                • Adding two durations is obviously okay.
                                • So is subtracting two durations.
                                • Negative durations are okay too.
                                • Adding a duration to itself n times is okay.
                                • We just defined multiplication of durations by natural numbers. Therefore it is okay.
                                • Since negative durations are a thing, we can extend this to relative numbers too.
                                • Actually, multiplication can be extended to real numbers as well.
                                • All real numbers except zero have an inverse, so it’s okay to divide durations by any non-zero number.

                                On the other hand:

                                • It is not okay to add (or subtract) a duration and a number together.
                                • It is not okay to multiply (or divide) a duration by another duration.

                                So if I want to be super-strict with my operations and allow zero implicit conversions, I would have the following functions:

                                seconds s_add(seconds, seconds);
                                seconds s_sub(seconds, seconds);
                                seconds s_mul(seconds, double);
                                seconds s_div(seconds, double);
                                

                                Or if we’re in something like ML or Haskell:

                                s_add : seconds -> seconds -> seconds
                                s_sub : seconds -> seconds -> seconds
                                s_mul : seconds -> real -> seconds
                                s_div : seconds -> real -> seconds
                                

                                Now the binary operators +, -, *, and / are functions just like any other. We can just overload them so they accept the right operands. We have such an overloading even in C: adding two floats together is not the same as adding two integer together at all, but the compiler knows which one you want by looking at the type of the operands. (It also has the evil implicit conversions, but that’s different.)

                                So while a language that allows multiplying a duration by a number looks like it is implicitly converting the number to a duration before performing the multiplication, it absolutely does not. That’s just operator overloading: because you really want to multiply durations by ordinary numbers. And since multiplying two durations together makes no sense, you should get an error from your compiler if you try it.

                                1. 7

                                  Again, multiplying a duration by a number is not “loosey-gooey”. Multiplying a duration by a duration is “loosey-gooey”, unless the result is a duration squared, which it isn’t.

                                  1. 1

                                    I think it depends on what you believe types are for—are they exactly units, or are they constraints (or both)?

                                    1. 3

                                      No matter if you treat types as units or constraints, you want to have operations that make sense. Multiplying 3 seconds by 5 hours doesn’t mean anything (except in the context of physics, where it can be an intermediate value).

                                      1. 1

                                        Agreed that you want operations that make sense, but if you think of types as units, then you probably want to be able to multiply ints and other types. If you think of them as constraints (especially for avoiding bugs) you probably don’t want to be able to multiply ints and arbitrary types. Personally, I’m more concerned with avoiding bugs rather than a strict adherence to mathematicians’ semantic preferences. There’s nothing fundamentally wrong with the latter, but it seems likely to produce more bugs.

                                        1. 3

                                          How exactly does allowing durations to be multiplied with each other, while not allowing them to be multiplied by integers, allow you to prevent bugs? If anything, I’d say it can introduce bugs.

                                          you probably don’t want to be able to multiply ints and arbitrary types.

                                          Where did I say anything about multiplying integers with arbitrary types?

                                          1. 1

                                            How exactly does allowing durations to be multiplied with each other, while not allowing them to be multiplied by integers, allow you to prevent bugs?

                                            It means we can’t accidentally multiply a duration field by some integer ID field (or a field of some other integer type) by accident. In general it stands to reason that the more precise you are about your types, the less likely you are to have bugs in which you mixed two things that ought not have been mixed, and Duration is a more precise type than int. I’m not familiar with any bugs arising from being too precise with types, and even if they exist I suspect they are rarer than the inverse.

                                            Where did I say anything about multiplying integers with arbitrary types?

                                            Presumably you aren’t advocating a type system that makes a special exception for durations and ints, right? Feel free to elaborate about what exactly you’re advocating rather than making us guess. :)

                                            1. 3

                                              It means we can’t accidentally multiply a duration field by some integer ID field

                                              That’s why you use a different type for the ID.

                                              In general it stands to reason that the more precise you are about your types, the less likely you are to have bugs in which you mixed two things that ought not have been mixed, and Duration is a more precise type than int.

                                              Preciseness is only good when the typing is correct.

                                              Presumably you aren’t advocating a type system that makes a special exception for durations and ints, right?

                                              No, I’m advocation for a system that allows you to define multiplication however it makes sense. Like in Python. Or Nim. Or even C++, though C++ is partially weakly typed because of the C heritage.

                                              1. 1

                                                That’s why you use a different type for the ID.

                                                I agree, I’m advocating for precise types. But in any case you seem to be okay with “untyped” ints for quantities/coefficients so we can use the example of mixing up coefficients of durations with coefficients of some other concept.

                                                Preciseness is only good when the typing is correct.

                                                Agreed, and Go gets the typing correct, because types aren’t units. 👍

                                                No, I’m advocation for a system that allows you to define multiplication however it makes sense. Like in Python. Or Nim. Or even C++, though C++ is partially weakly typed because of the C heritage.

                                                My background is in C++ and Python. Very little good comes out of operator overloading but it opens the door for all kinds of clever stuff. For example, Sqlalchemy overloads operators (such as ==) to allow for a cutesy DSL, but a bug was introduced when someone tried to use a variable of that type in a conditional. I’ve never heard of bugs resulting from a lack of overloading, and it’s easy to workaround by defining a Multiply() function that takes your preferred type. No surprises, precise, and correct. 💯

                                                Moreover, the canonical datetime libraries for C++ and Python don’t give you back “DurationSquared” when you multiply two durations, nor do they allow you to divide a distance by a duration to get a Speed because types aren’t units–you could overload the multiplication operator to support duration * duration or overload the division operator to support distance / miles, but you have to model that for every combination of types (at least in mainstream languages like C++, Python, Go, etc) and for no benefit that I’m able to discern (apart from “to allow certain types to behave sort of like units”, which doesn’t seem like a meaningful goal unto itself).

                                  2. 3

                                    In rust

                                    • The Into trait does not do automatic coercion. The Deref trait does under the right circumstances, but you shouldn’t use it to do so here (it’s really just meant for “smart pointers”, though there is no good definition of what a smart pointer is other than “something that implements Deref”).

                                    • Traits like Add and Mul take both a lhs and a rhs type. For Add those would both be Duration. For Mul I would strongly expect it to take a Duration on one side and an int/float on the other.

                                    Multiplying two duration’s together makes little sense. What is “2 seconds * 10 seconds”? Units wise I get “20 seconds^2” (which unfortunately most type systems don’t represent well). Physical interpretation wise time^2 is mostly just an intermediate value, but you could visualize time as distances (e.g. with light seconds), in which case it would be an area. Or alternatively you might notice that you divide distance by it you get an acceleration (m/s^2). What it definitely isn’t is a duration.

                                    Multiplying a duration by a unit-less quantity (like an integer) on the other hand makes perfect sense “2 seconds * 10” is an amount of time 10 times as long. Hence why I would Duration to implement Mul with the lhs/rhs as ints.

                                  3. 1

                                    Sorry, my remark wasn’t meant to be provocative. I’ve just spent so much more time in the programming world than the math or physics worlds, hence “it never occurred to me”.

                        2. 2

                          You can find rough edges in every language and every (standard) library. This is unfortunately a fact of developer life.

                          1. 2

                            I wouldn’t call this a rough edge, but a fundamental flaw of the type system.

                          2. 2

                            In retrospect, time.Duration should have been an opaque type and not a named version of int64 (as the person who added Duration.Abs(), I’m well aware of the pitfalls), but there are no mainstream languages with the ability to multiply variables of type feet by feet and get a type square feet result, so I wouldn’t blame Go for that particularly.

                            1. 2

                              there are no mainstream languages with the ability to multiply variables of type feet by feet and get a type square feet result, so I wouldn’t blame Go for that particularly.

                              Well yes, but it should be an error.

                              1. 1

                                There are also no mainstream languages where it’s an error. I agree though that it should have been type Duration struct { int64 } which would have prevented all arithmetic on durations.

                                1. 2

                                  It’s an error in rust

                                  fn main() {
                                      let dt = std::time::Duration::from_secs(1);
                                      dt * 2; // ok ("warning: unused arithmetic operation" technically)
                                      dt * dt; // error: mismatched types. Expected `u32` found struct `Duration` (pointing at second operand)
                                  }
                                  

                                  https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bfd82c951af32f237ff3fcd568be5f75

                                  1. 1

                                    In Rust, Duration is a struct, not a bare int. The multiplication works through operator overloading, which allows u32 but not another Duration. I take the point that this is better than Go.

                                    As I said above, it would be better if in Go Duration were type Duration struct { int64 }. Go doesn’t have operator overloading, so you wouldn’t be able to multiply at all, but you’d have to use methods, like d.Mul(3) etc. It would be worth it though because then those could saturate when they overflow instead of wrapping around. It’s a minor language wart.

                                  2. 2

                                    Python does it correctly.

                                    from datetime import timedelta
                                    hour = timedelta(hours=1)
                                    print(hour)
                                    print(3*hour)
                                    print(hour*hour)
                                    

                                    Output:

                                    1:00:00
                                    3:00:00
                                    TypeError: unsupported operand type(s) for *: 'datetime.timedelta' and 'datetime.timedelta'
                                    

                                    Attempt This Online!

                                    1. 2

                                      Yes, but Python itself does not use timedelta, which sucks.

                                      >>> time.sleep(datetime.timedelta(0,0,1))
                                      TypeError: 'datetime.timedelta' object cannot be interpreted as an integer
                                      
                            2. 1

                              I come from python world but would love to hear more about what’s so special about Go standard library

                              1. 4

                                I would say the quality is more consistent across modules than Python’s, which feels like it evolved more organically. There’s also some foundational concepts like a stream of bytes that’s simpler and more composable than file-like objects in Python

                                1. 2

                                  Python is 3 times the age of Go. 20 yrs from now I hope we say the Go stdlib is as consistent as it is now. I also hope to be retired by then and hope terminated before then.

                                  1. 3

                                    I remember when people were saying this about Go 5 and 10 years ago. It’s been almost 15 years and Go has done very well at consistency in its stdlib and elsewhere. When python was nearly 15 it’s standard library was already a mess—Go had the benefit of hindsight, specifically with respect to how “kitchen sink” languages like C++ turned out.

                                    1. 1

                                      Some of the worst things in Go’s standard library are old RPC formats and container types, but there’s not too much of it.

                                2. 2

                                  It’s better organized and more cohesive. For example, what’s the difference between import os and import sys? I really couldn’t tell you. You just have to memorize it.

                                  The json module has load and loads, but I’m not aware of any other packages that follow that convention for file-like vs string input. Anyway, why not just have one function take both and branch on the type? Go is more consistent about using their file-like types (io.Reader/Writer) everywhere.

                                  Time in Python is 💩. No one ever wants a time without a time zone! Go libraries all take a time.Duration instead of this one taking seconds and that one taking milliseconds. Python has a timedelta type but no one uses it.

                                  The various urllibs are all strictly worse than their Go equivalents.

                                  Python does have better itertools and collections though. Now that Go has generics, hopefully, those will get ported over too.

                                  1. 2

                                    Honestly I hate the whole datetime.datetime.now() thing. I also don’t love that Python is so inconsistent with its casing conventions (e.g., testing uses assertEqual, datetime uses timedelta, and most other things use UpperCamelCase for types and snake_case for functions and methods). Also I’ve done a lot of subprocessing in Python and I still have to consult the docs every single time for subprocess.run() and friends–the arguments are just dizzying. Also, despite being a very dynamic language, Python doesn’t have anything as convenient as Go’s json.Marshal()–you have to write to_json() methods on every class that return a JSON-like dict structure (and to be quite clear, I have grievances with Go’s JSON package). Similarly, Python’s standard HTTP libraries are more tedious than those in Go–the canonical advice is just to import requests or similar, but this is a pain for simple scripts (e.g., I now have a build step for my AWS Lambda function which pretty much erases the benefit of using Python over Go for Lambda in the first place). These are just a few examples of issues with the Python stdlib off the top of my head, but there are lots more :)

                              1. 4

                                Is it really that hard to not commit secrets? I mean I just don’t ever put them in with code… Never use git add . I guess it just seems like a heavy handed solution to a problem that’s barely there. Unless this happens a ton elsewhere. I am astounded when I hear the statistics about how many secrets are committed in GitHub, but I wonder if it has more to do with a lack of understanding than just a git flub/accident.

                                1. 4

                                  I think it’s a coding practices thing. Like Nathaniel Borenstein said, “No ethically-trained software engineer would ever consent to write a DestroyBaghdad procedure. Basic professional ethics would instead require him to write a DestroyCity procedure, to which Baghdad could be given as a parameter.” I never hard code secrets because secrets always come in as environmental variables or other parameters, but I think for people who just want to get something done quickly, hard coding seems like the fastest way to do things.

                                  1. 3

                                    I think in a perfect world you’re right, however a lot of exploits that cause users data to be exposed are caused in part by people checking in secrets into source control. My goal with this project was to create something as lightweight and quiet as possible such that you can mostly forget that it’s installed and get the protection with very little downside. Also if you work at a company on a web service the security team might mandate using a security scanner as part of your pre-commit, in which case it’s nice to have a very fast and lightweight option.

                                    1. 1

                                      i like the concept of having a company where the security team might mandate a tool like this

                                      the point i was making is that if someone is committing secrets, they probably don’t realize what they’re doing, and in those cases, they probably won’t understand the need to add this tool to their git pre-commit hooks

                                      in other words, if they were to fully understand what this tool is used for, then the usefulness is probably greatly reduced

                                      which leads me to feel like ultimately the education is what is important… but the security team mandate thing is good, having this run in CI before in a main branch is good too, depending on your repo setup this may make overwriting the history in the repo easier

                                      but best to have it checked before ever committed, which is the point of the tool

                                    2. 2

                                      It’s easy to not commit secrets! That’s why we should make a machine automate it for us.

                                      1. 1

                                        I’m also not sure what’s hard about it.

                                        Never use git add .

                                        Don’t even need to avoid any commands - instead, just don’t have secrets in the repository directory at all. It’s easy and also completely foolproof.

                                        1. 1

                                          Which breaks down for infrastructure repos that are mostly secrets :P

                                          1. 2

                                            i work every day in infrastructure repos and don’t have any secrets committed, so that just sounds like someone’s doing something wrong

                                            1. 1

                                              You can try to argue with reality or just accept it. There are setups that are a lot older than e.g. Vault has been around, or anything else based on tokens.

                                              I don’t think this is the place for judgment of practices, and I’m not even involved in this game anymore, so don’t read this as defending myself.

                                              1. 1

                                                oh, not arguing at all, it just seemed like your statement was a “this won’t work for infrastructure repos”… which was, at least I think, the only logical conclusion to take from your statement, because i think it’s pretty obvious this isn’t going to work for repos that are literally designed to store secrets

                                                then again, maybe the :P invalidates any attempt at a logical conclusion

                                          2. 1

                                            Never use git add . or git commit -a is good advice for other reasons though.

                                        1. 5

                                          A while back I accidentally committed (a not particularly important) secret to a personal project and had to clean it up manually, so I will definitely be checking this out.

                                          One suggestion from a quick look at the code: I’d probably try to avoid handling paths as Strings, instead favouring OsString/OsStr and creating Paths/PathBuf from them. File systems often don’t enforce UTF-8, which String requires but OsString does not.

                                          1. 2

                                            Seconded; I once had a corrupted hard drive that turned a bunch of directory names into garbage and it was hard as hell to figure out how to actually get a tool to delete them.

                                            1. 1

                                              That’s good to know, thanks for the tip!

                                            1. 13

                                              I think more programmers should be using secret scanners but there weren’t any “no-brainer” solutions I could find, so I decided to build a new one. The core of secret scanning is running regex against a large number of files, and it turns out this is something ripgrep is excellent at. By leveraging the ripgrep library effectively secrets is able to scan files roughly 100x faster than other solutions I tested. This is my first Rust project and I was impressed with how quickly I was able to put something together that is also really fast. Let me know if you have any feedback!

                                              1. 7

                                                I appreciate that you put links to other similar projects in the README! It’s a small thing but really helps to encourage adoption of the idea, even if the implementation doesn’t meet specific requirements. That being said, this tool looks good for my use case and I’m definitely going to try it.

                                                1. 5

                                                  I like the secretsignore feature. Sometimes you want things that look like secrets in your tests, and not being able to accommodate that has made me avoid similar tools in the past.

                                                  1. 1

                                                    There’s also the git-secrets project (from AWS, first released in 2015) that’s also designed as a pre-commit hook.

                                                    (I used to work for AWS and used git-secrets, but never worked on git-secrets.)