Threads for jcrabbit

  1. 2

    Things like your webmail, your spam filtering, and almost certainly your general security will not be as good as they have.

    This is really the key. It takes a lot to have great security and spam detection. It is almost certain Big Tech Co is able to do this better than anyone else.

    1. 2

      Honestly, in my experience spam filtering with rspamd on my mail server is more effective than gmail’s spam filtering. I don’t know what it’s doing, but rspamd is genuinely magical.

      The point about security is well made of course - it’s very hard to match Google here.

    1. 3

      Cool! I love seeing the Builder pattern here. I remember (maybe incorrectly) it coming up and being criticized in a Go Time podcast. Though we do see similar patterns in other languages as well as some usages in Go (https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/config#LoadOptionsFunc). I’m not saying this library proves that it’s the ideal pattern, but does show that it has its place.

      Wish you the best with this library!

      1. 2

        Personally, normal struct literal syntax looks the cleanest and clearest to me:

        req := Request{URL: url, Handler: ToJSON(&t)}
        

        There’s probably a time and a place for the builder pattern, but it seems pretty rare IMO. Would like to hear your opinion.

        1. 3

          A simple struct literal alone doesn’t work well when the default isn’t a natural “empty” value.

          With a builder, you can do validation in the With or Build method to ensure you have a properly constructed object at all times. You can also indicate which related parameters belong together.

          req, err := RequestBuilder.
              WithUrl(url).
              WithBasicAuth(user, pass).
              WithSerializer(customFn).
              Build()    // sets certain non-nil default values, like a custom DNS resolver
          
          1. 2

            I’m not sure what you mean? How does the builder pattern address the lack of a natural “empty” value (specifically with a “Do()” or “Run()” method filling in the default values as necessary)? From your example, which field or fields would have an unnatural empty value?

            1. 1

              Assume Serializer must be a non-nil pointer. Perhaps it satisfies an interface of

              type Serializer func(t interface{}) err
              

              And you want the built “request” to use some default serializer, but you can configure it with a parameter. The Build method can look through the fields of the RequestBuilder object and choose between the default one or the one passed by WithSerializer.

          2. 1

            It can’t be Request{URL: url, Handler: ToJSON(&t), ...} because it needs to be imported and executed, so it ends up as

            err := requests.Do(ctx, requests.Options{
                URL: url, 
                Handler: requests.ToJSON(&t), 
                ...
            })
            

            which is a little bit ugly, IMO. Tastes vary though. I like that for simple requests you end up only having a single package name invocation.

            1. 1

              Yeah, I agree that the package name qualifier degrades the ergonomics. I usually import under a short name (e.g., pz “github com/weberc2/httpeasy”) which makes things better but not great. It probably wouldn’t be the end of the world to import it as . either. Not a big deal either way.

          1. 1

            I’m curious on why you are asking. Having the project contributors’s buy in and commitment is really essential to success. For a large project, consider how Dropbox completed type checking across most of its Python code.

            https://dropbox.tech/application/our-journey-to-type-checking-4-million-lines-of-python

            1. 1

              I want to know what decisions were made in the process. For example, how far did you go in converting certain data structures to use types, and where did you leave it for future (or never), and why, how much of the project were you able to convert, did it result in some of the tests being removed/added?.

              1. 1

                I see. I’ve only converted small projects with less than 50 files. I was able do conversions one package at a time and complete the conversion over a period of multiple commits.

                1. 1

                  Thanks! Do you have these project repositories online somewhere?

            1. 2

              Not sure if it is not me, but using AWS bills, AZs, and k8s as the example for keeping things simple isn’t a great idea. I don’t understand even one of those things enough to think any solutions involving it is simple.

              1. 5

                continue with https://interpreterbook.com/ along with my commentary on the book itself.

                1. 21

                  This is a relatively complete overview for cryptographers and low-level programmers.

                  The comparison between cryptographic and “general-purpose” hash functions is missing motivations for why non-cryptographic hash functions are used. The obvious motivation is speed; both constant and asymptotic runtime of hash tables and other data structures are sensitive to choice of hash function.

                  While I know it would have made the section a little more confusing, I wish there were a tangential mention of the Cryptographic Doom Principle in the “Encrypt and Hash” section. MAC is necessary but can still be misused.

                  Finally, while it’s pedantic, I feel like we could do a better job of not omitting the biggest elephant in the room. Wikipedia gets this wrong too; from the opening of their article on cryptographic hash functions:

                  A cryptographic hash function (CHF) … is a one-way function, that is, a function which is practically infeasible to invert or reverse the computation.

                  However, one click away, on the page on one-way functions:

                  The existence of such one-way functions is still an open conjecture. In fact, their existence would prove that the complexity classes P and NP are not equal, thus resolving the foremost unsolved question of theoretical computer science.

                  We should use the subjunctive mood here, and speak hypothetically, because we have not yet proven the correctness of the portion of cryptographic research which relies on these hash functions. Speaking plainly, cryptographers should not assume that P is not equivalent to NP, even though evidence suggests P != NP. I know it’s silly, but it has serious ramifications.

                  1. 7

                    While I know it would have made the section a little more confusing, I wish there were a tangential mention of the Cryptographic Doom Principle in the “Encrypt and Hash” section. MAC is necessary but can still be misused.

                    I thought this is covered by AEAD, but that’s in the subsequent section. I’ll make an explicit call-out.

                    Finally, while it’s pedantic, I feel like we could do a better job of not omitting the biggest elephant in the room.

                    Sounds like a separate blog post that deserves to be written and shared here, should you wish to do so. I’m not the right person to make that argument.

                    1. 1

                      So I took a quick look at https://people.eecs.berkeley.edu/~sanjamg/classes/cs276-fall14/scribe/lec02.pdf and, well, I can’t understand it. So I’m going to cheat and ask strangers on the internet for help.

                      What is a good definition for a “one-way function”? I think definition 5 from the paper defines it (but I don’t understand the notation). Is f(x) = 1 an acceptable one-way function? I’ve always thought that crypto hashes are one way because, in practice, they always reduce a large number of bits down to a smaller set of possible hashed values, therefore they are “one-way”. For example, reducing 1 megabyte of data into 1024 bits.

                  1. 2

                    The technical content is pretty good, but man, describing a woman you (presumably) don’t know as your “next wife” is creepy.

                      1. 1

                        The article is 10 years old, I remember jokes like that being kind of in vogue at the time.

                        1. 3

                          Yeah, very much a product of the time. Fortunately it’s much less acceptable these days.

                        2. 1

                          Agreed.

                        1. 2

                          I might be missing something. I don’t get a good sense of what a perceptual hash is from this post. edited: This led me to: https://en.wikipedia.org/wiki/Locality-sensitive_hashing

                          TIL something new!

                          While I’ve read about “Encrypt then MAC” before, it was never intuitive to me why this is important. I can understand why it 1) can’t hurt; 2) ensures the ciphertext and mac are consistent; 3) avoids leaking data about the plaintext. I’m not sure if it adds any other value.

                          Wikipedia says “In information security, message authentication or data origin authentication is a property that a message has not been modified while in transit (data integrity) and that the receiving party can verify the source of the message.[1] Message authentication does not necessarily include the property of non-repudiation.[2][3]”

                          But if a MIM attack is possible (https://tonyarcieri.com/all-the-crypto-code-youve-ever-written-is-probably-broken), then it seems the MIM can also create a legitimate MAC. So I’m confused.

                          1. 24

                            Upgrading @golang versions is actually a pleasurable task for me:

                            1. I’m 99% sure nothing will break.
                            2. Speedups of 5-10% are common.
                            3. New compiler or vet warnings tell me how to improve my code.
                            4. Excellent release notes.

                            Does any other language get this as right?

                            1. 7

                              Go’s secret sauce is that they never† break BC. There’s nothing else where you can just throw it into production like that because you don’t need to check for deprecations and warnings first.

                              † That said, 1.17 actually did break BC for security reasons. If you were interpreting URL query parameters so that ?a=1&b=2 and ?a=1;b=2 are the same, that’s broken now because they removed support for semicolons for security reasons. Seems like the right call, but definitely one of the few times where you could get bitten by Go.

                              Another issue is that the language and standard library has a compatibility guarantee, but the build tool does not, so e.g. if you didn’t move to modules, that can bite you. Still, compared to Python and Node, it’s a breath of fresh air.

                              1. 2

                                I’ve been upgrading since 1.8 or so. There have been (rarely) upgrades that broke my code, but it was always for a good reason and easy to fix. None in recent memory.

                                1. 1

                                  Are semicolons between query params a common practice? I’ve never heard of this before.

                                  1. 2

                                    No, which is why they removed it. It was in an RFC which was why it was added in the first place.

                                  2. 1

                                    1.16 or 1.15 also broke backwards compatibility with the TLS ServerName thing.

                                  3. 4

                                    Java is damn good about backward compatibility.

                                    From what I recall, their release notes are pretty good as well.

                                    1. 3

                                      I had a different experience, going from Java 8 to Java 11 broke countless libraries for me. Especially bad is that they often break at run- and not at compile time.

                                      1. 2

                                        As someone with just a little experience with Go, what’s the situation with dependencies? In Java and maven, it becomes a nightmare with exclusions when one wants to upgrade a dependency, as transitive dependencies might then clash.

                                        1. 3

                                          It’s a bit complicated, but the TL;DR is that Go 1.11 (this is 1.17, recall) introduced “modules” which is the blessed package management system. It’s based on URLs (although weirdly, it’s github.com, not com.github, hmm…) that tell the system where to download external modules. The modules are versioned by git tags (or equivalent for non-git SCMs). Your package can list the minimum versions of external packages it wants and also hardcode replacement versions if you need to fork something. The expectation is that if you need to break BC as a library author, you will publish your package with a new URL, typically by adding v2 or whatever to the end of your existing URL. Package users can import both github.com/user/pkg/v1 and github.com/user/pkg/v2 into the same program and it will run both, but if you want e.g. both v1 and v1.5 in the same application, you’re SOL. It’s extremely opinionated in that regard, but I haven’t run into any problems with it.

                                          Part of the backstory is that before Go modules, you were just expected to never break BC as a library author because there was no way to signal it downstream. When they switched to modules, Russ Cox basically tried to preserve that property by requiring URL changes for new versions.

                                          1. 2

                                            The module name and package ImportPath are not required to be URLs. Them being a URL is overloading done by go get. Nothing in the language spec requires them to be URLs.

                                            1. 2

                                              Yes, but I said “TL;DR” so I had to simplify.

                                          2. 2

                                            I also have only a little experience with Go. I have not yet run into frustrations with dependencies via Go modules.

                                            Russ Cox gave a number of great articles talking about how Go’s dependency management solves problems with transitive dependencies. I recall this one being very good (https://research.swtch.com/vgo-import). It also calls out a constraint that programmers must follow:

                                            In Go, if an old package and a new package have the same import path, the new package must be backwards compatible with the old package.

                                            Is this constraint realistic and followed by library authors? If not, you’re going to run into problems with Go modules.

                                            I’ve run into dependency hell in: Java, JavaScript, Python, and PHP – In every programming language I’ve had to do major development in. It’s a hard problem to solve!

                                            1. 1
                                              1. 1

                                                Is this constraint realistic and followed by library authors? If not, you’re going to run into problems with Go modules.

                                                It is (obviously) not realistic for most software produced in the world.

                                            2. 1

                                              I strongly agree. The first time major stuff broke was Java 9, which is exceedingly recent, and wasn’t an LTS. And that movement has more in common with the Go 2 work than anything else, especially as Java 8 continues to be fully supported.

                                          1. 5

                                            I’m going through Writing an Interpreter In Go (https://interpreterbook.com/). I’m looking forward to learning how the author thinks and structures an Interpreter. Eventually, I’ll get to his next book: Writing a Compiler.

                                            1. 3

                                              I’m 3/4 through and I’ve really really enjoyed it so far.

                                              One thing I found useful albeit tedious at first was typing out every file as I read it. I really wanted to internalize as best I could the content. Then as new concepts were introduced I tried to implement them before reading the next code snippet shown in the chapter.

                                              1. 2

                                                Then as new concepts were introduced I tried to implement them before reading the next code snippet shown in the chapter.

                                                Great idea!

                                            1. 40

                                              Graph database author here.

                                              It’s a very interesting question, and you’ve hit the nerve of it. The long and the short of it is that, much like lambda calculus can represent any program, relational algebra can represent pretty much all database queries. The question comes to what you optimize for.

                                              And so, unlike a table-centric view, which has benefits that are much better-known, what happens when you optimize for joins? Because that’s the graph game – deep joins. A graph is one huge set of join tables. So it’s really hard to shard a graph – this is connected to why NoSQL folks are always anti-join. It’s a pain in the rear. Similarly, it’s really easy to write a very expensive graph query, where starting from the other side is much cheaper.

                                              So then we get to the bigger point; in a world where joins are the norm, what the heck are your join tables, ie, schema? And it’s super fluid – which has benefits! – but it’s also very particular. That tends to be the first major hurdle against graph databases: defining your schema/predicates/edge-types and what they mean to you. You’re given a paintbrush and a blank canvas and have to define the world, one edge at a time. And $DEITY help you if you want to share a common schema with others! This is what schema.org is chasing, choosing some bare minimum.

                                              This is followed on by the fact that most of the standards in the graph database world are academic in nature. If I have one regret, it’s trying to follow the W3C with RDF. RDF is fine for import/export but it’s not a great data model. I wanted to standardize. I wanted to do right by the community. But, jeez, it’s just so abstract as to be useless. OWL goes another meta-level and defines properties about properties, and there’s simpler versions of OWL, and there’s RDFS/RDF* which is RDF about RDF and on and on…. it’s super cool that triples alone can represent pretty much anything, but that doesn’t help you much when you’re trying to be efficient or define your schema. Example: There’s a direct connection to the difference between a vector and a linked list – they both represent an ordered set. You can’t do a vector in triples, but you can do a linked list.

                                              I know I’m rambling a little, but now I’ll get to the turn; I still think there’s gold in them hills. The reason it’s not popular is all of the above and more, but it can be really useful! Especially when your problem is graph-shaped! I’ve implemented this a few times, and things like mapping, and social networks, and data networks, and document-origin-tracing – generally anything that would take a lot of joins – turn out swimmingly. Things that look more like tables (my example is always the back of a baseball card) look kind of nuts in the graph world, and things that look like a graph are wild in third normal form.

                                              So I think there’s a time and a place for graph databases. I just think that a combination of the above counter-arguments and the underlying needs are few enough that it’s under-explored and over-politicized. They work great in isolation, ironically enough.

                                              I’m happy to chat more, but that’s my quick take. Right tool, right job. It’s a shame about how that part of database theory has gone.

                                              1. 10

                                                Full disclosure: I work for Neo4j, a graph database vendor.

                                                Very well said.

                                                I’d add that most of the conversation in responses to OP assume “transactional” workloads. Graphs databases for analytic workloads are a whole other topic to explore. Folks should check out Stanford Prof. Jure Leskovec’s research in the space…and a lot of his lectures about graphs for machine learning are online.

                                                1. 2

                                                  The long and the short of it is that, much like lambda calculus can represent any program, relational algebra can represent pretty much all database queries.

                                                  When faced with an unknown data problem. I always choose RDBMS. It is a known quantity. I suspect I’d choose differently if I understand graph dbs better.

                                                  I would love to see more articles here on practical use for graph dbs. In particular, I’d love to know if they are best deployed as the primary datastore for data or maybe just for the subset of data that your interested in query (e.g., perhaps just the products table in an ecommerce app).

                                                  this is connected to why NoSQL folks are always anti-join. It’s a pain in the rear.

                                                  Interesting. People use NoSQL a lot. They simply do joins in the application. Maybe that’s the practical solution when it comes to graph dbs? Then again, the point of graph solutions is generally to search for connections (joins). I’d love to hear more on this aspect.

                                                  Thank you and the OP. I wish I can upvote this more. :)

                                                  1. 1

                                                    Yeah, you’re entirely right that the joins happen in the application as a result. The reason they’re a pain is that they represent a coordination point — a sort of for-all versus for-each. Think of how you’d do a join in a traditional MapReduce setting; it requires a shuffle! That’s not a coincidence. A lot of the CALM stuff from Cal in ~2011 is related here and def. worth a read. That’s what I meant by a pain. It’s also why it’s really hard to shard a graph.

                                                    I think there’s something to having a graph be a secondary, problem-space-only engine, at least for OLTP. But again, lack of well-known engines, guides, schema, etc — it’d be lovely to have more resources and folks to explore various architectures further.

                                                  2. 2

                                                    You’re given a paintbrush and a blank canvas and have to define the world, one edge at a time.

                                                    That’s such a great way to put it :)

                                                    Especially when your problem is graph-shaped!

                                                    I think we need collective experience and training in the industry to recognize problem shapes. We’re often barely able to precisely formulate our problems/requirements in the first place.

                                                    Which database have you authored?

                                                    1. 5

                                                      Cayley. Happy to see it already mentioned, though I handed off maintainership a long while ago.

                                                      (Burnout is real, kids)

                                                    2. 2

                                                      Thanks for Cayley! It’s refreshing to have such a direct and clean implementation of the concept. I too think there’s alot of promise in the area.

                                                      Since you’re here, I was wondering (no obligation!) if you had any ideas around enforcing schemas at the actual database level? As you mentioned, things can grow hairy really quick and once they are in such a state then the exploration to know what needs to be fixed and the requisite migraions are daunting.

                                                      Lately I’ve been playing with an idea for a graph db that is by default a triplestore under the hood. But with a (required!) schema that would look something commutative diagram-y. This would allow for discipline and validation of data, but also allow you to recognize multiple edge hops that are always there so for some things you could move them out of the triplestore into a quad- or 5- store to produce more compact disk representations to yield faster scans with fewer indexes and give the query planner a bit of extra choice. I haven’t thought it through too much, so I might be missing something or it might just not be worth it.

                                                      Anyway, restriction and grokkability of the underlying schema/ontology does seem like the fundamental limiter to me in alot of cases and was curious if as someone who has alot of experience in the area if you had thoughts on how to improve the situation?

                                                      1. 1

                                                        If you don’t mind me joining in, have you heard of https://age.incubator.apache.org/ ? I’m curious to hear your opinion about whether it can be an effective solution to this problem.

                                                        1. 1

                                                          If I have one regret, it’s trying to follow the W3C with RDF. RDF is fine for import/export but it’s not a great data model. […] it’s super cool that triples alone can represent pretty much anything, but that doesn’t help you much when you’re trying to be efficient

                                                          I’ve been using SPARQL a little recently to get things out of Wikidata, and it definitely seems to have pain points around that. I’m not sure at exactly what level the fault lies (SPARQL as a query language, Wikidata’s engine, etc.), but things that seem to me like they should be particularly easy in a graph DB, like “is there a path from ?x and ?y to a common node, and if yes, give me the path?” end up both hard to write and especially hard to write efficiently.

                                                          1. 2

                                                            This goes a bit to the one reply separating graphs-as-analytics and graphs-as-real-time-query-stores.

                                                            SPARQL is the standard (once again, looking at you W3C) but it’s trying to be SQL’s kid brother — and SQL has it’s own baggage IMO — instead of trying to build for the problem space. Say what you will about Cypher, Dude, at least it’s an ethos. Similarly Gremlin, which I liked because it’s easy to embed in other languages. I think there’s something in the spectrum between PathQuery (recently from a Google paper — I remember the early versions of it and some various arguments) and SPARQL that would target writing more functional paths — but that’s another ramble entirely :)

                                                        1. 16

                                                          Compiling a lot of languages is actually (at least) NP-hard, simply because of the type systems involved. I’ve created a little overview a few months back, when it turned out that Swift’s type checking was undecidable.

                                                          1. 5

                                                            That’s a wonderful overview, thank you! It’s easy to read a lot of type theory papers and then say “okay but how does this apply to the real world”, and a reference like that really makes it easier to connect theory to more concrete ideas.

                                                            1. 1

                                                              Lovely overview. Thanks!

                                                            1. 13

                                                              https://www.hillelwayne.com/post/always-more-history/

                                                              If we flip through the ADM manual, we see that the ADM used “backspace” to mean “move the cursor left” without deleting the current character.3 With ^H and ^J already being used as left and down, it made sense to turn ^K and ^L into up and right.

                                                              1. 2

                                                                My weekends have come down to spending time on LeetCode, because reasons.

                                                                1. 1

                                                                  The second point is actually even more dire. IEEE 754 allows exp(), cos(), and other transcendental functions to be incorrectly rounded because they sometimes cannot be correctly rounded; IEEE 754 author Kahan calls this the “table-maker’s dilemma”.

                                                                  1. 1

                                                                    Dumb questions: Why is it called the table maker’s dilemma? What does “table” mean in this context? What does “how much computation it would cost” mean?

                                                                    1. 1

                                                                      lookup tables; tradeoff between speed, computation, and memory.

                                                                  1. 3

                                                                    Previously on Lobsters, we explored the idea that causation and correlation are distinct. Here, the distinctness goes in the opposite direction from the typical slogan, but it’s still valid: Causation does not imply correlation!

                                                                    1. 1

                                                                      Oh hey, so glad that you’ve posted this here.

                                                                      First of all, I find the point on the author writing about their own research vs other people’s research to be quite interesting.

                                                                      It sounds like you’d actually recommend the book. Unfortunately, this sentiment was buried in the post

                                                                      The simplest answer to this question is that the book really is wonderful, it just has this one little mistake. Noise is indeed an important subject, and three authors who don’t understand correlation and causation can still write an excellent book on the topic.

                                                                      So… I feel like I don’t understand what is meant by “causation”. I feel like I use it in the sense of “force causes acceleration” or “speed of light causes increase in energy to convert into mass.” Reading your post, I feel like there’s another way of looking at it.

                                                                      For example, “You can also get causation without correlation from a non-monotonic relationship.” - I would love to understand what that means.

                                                                      Thanks, again, for the post and sharing it here.

                                                                      • Cheers
                                                                      1. 1

                                                                        I’m hesitant to recommend paying for dead trees, but it doesn’t sound like a bad book to have on a shelf.

                                                                        I think of causality using physics. If we have two events X and Y, and all observers agree that X happened before Y, then X caused Y; X is one of Y’s many causes. The goal of many scientists is to look for statistical evidence which can point them towards possible causes for observed phenomena, but such statistics can only point vaguely.

                                                                        1. 1

                                                                          I like to think of causation as if you forced the causes to happen, would you still get the effects? Like we might say that a thermometer correlates with the temperature in a room, but you can intervene to make a thermometer show another temperature and it wouldn’t change the temperature of the room.

                                                                          By contrast if you intervened and made a room colder, the thermometer would change. In the sense we can say the temperature of the room causes the thermometer to change.

                                                                        1. 2

                                                                          I was hoping to see compilation time for Scala. Oh well ¯_(ツ)_/¯

                                                                          1. 2

                                                                            Actually informative summary of the current state of AI and deep learning.