Threads for philix

  1. 1

    Define your destructors outside the class declaration so they don’t get inlined by the compiler in both functions

    I don’t believe that trick will work when building with LTO, since then any function is eligible for inlining, whether or not it’s in a header.

    I generally build releases with both LTO and -Os; the latter tames excessive inlining.

    1. 2

      When performing the LTO, the compiler (or linker) has a global view of the code and can decide not inlining a destructor that’s called in many places. As you said, -Os helps with that.

    1. 1

      Like you said, this all leads back to the problem that move constructor implementations can vary in how expensive they are to call. Google protobuf performing a full copy if other arena id != this arena id is surprising to me, although it does comply with the C++ standard. It would be better if they reassigned ownership in their Arena allocator. Does anyone know of an easy way to instrument C++ move constructors to detect how expensive they are? If there is a solution with static analysis that would be even better. It would also be helpful if we could manually decorate a move constructor that is expensive in such a way that std::optional warns us if it calls the expensive move.

      A crude way to do this would be to define an interface that has a method bool moveIsExpensive() and log when std::optional is used with an expensive move constructor. And by making a wrapper myproj::optional:

      namespace myproj {
      // interface to describe whether move constructor is expensive
      class MoveCost {
      public:
      bool moveIsExpensive()=0;
      };
      class Object : public MoveCost {
      public:
      ....
      Object(Object&& other) { ExpensiveCopy(other) }
      bool moveIsExpensive() { return true; }
      };
      template <class T>
      class optional {
      private:
      std::optional<T> m_optional;
      public:
      optional(T o) : m_optional(o) {
        if (DEBUG) {
          auto expense = dynamic_cast<MoveCost>(&o);
          if (expense) {
            std::cerr << "warning at " << __file__ << ':': << __line__
              << ": myproj::optional calling an expensive move constructor." << std::endl;
          }
        }
      }
      // delegate the rest to m_optional
      ...
      };
      }
      

      Then you could make a myproj::optional<Object> and get a warning at runtime.

      EDIT: As an aside - would it be better to prefer std::unique_ptr over std::optional for code that returns Google protobuf messages? That would only ever call a single std::move. Or are the ownerships semantics too restrictive for this use case?

      1. 2

        About your EDIT paragraph:

        I think that’s better for binary size, but I still think output parameters is the way to go because you should leave the decision of where to allocate the protobuf object to the caller.

        If the caller wants to use an Arena (like Google backends do), you make it impossible to use your function by allocating the object in the unique_ptr. Impossible without a copy which is undesirable and avoidable.

        What if the caller has that object from a previous iteration in a loop? Reusing protobuf objects reduces the number of memory allocations as google:: protobuf::ParseFromString reuses the allocated buffers when populating the message again.

        1. 2

          You could allow the allocator and/or deleter to be overloaded via template parameters of the function that returns the unique_ptr. But at that point the function signature would be pretty noisy. And the caller would have to provide said allocators and deleters. I agree that the output parameters is the way to go.

          1. 2

            Noisy and you would lose the ability to compile them separately — templates have to be declared and defined inline.

            http://foldoc.org/Separate+compilation

      1. 5

        Define your destructors outside the class declaration so they don’t get inlined by the compiler in both functions — the caller and the callee that returns the optional — to avoid binary size increase.

        As we recently found out, that’s a problem for Rust as well: https://github.com/rust-lang/rust/issues/88438

        1. 3

          Interesting. Both langauages have similar challenges.

          One huge advantage of Rust is that the compiler can statically determine a destructor is not necessary after moving a value. That helps with the problem I’ve described in the article.

          1. 1

            AFAIK the code to statically determine whether a destructor needs to be run after a move in Rust is:

            return false;
            
        1. 8

          Single-file version control. I do have something I called fh, but it’s mostly a joke given its design constraints (and performance characteristics as a result of them). The only drive I have is personal issues with licensing on the existing solutions SCCS (CDDL) and RCS (GPL or OpenBSD’s that has some 4-clause BSD files); SRC is just a wrapper around the aforementioned ones. The part that scares me the most is definitely the diffing. I kind of do want to use interleaved deltas on the backend, but I’ve failed multiple times to even retrieve data from an SCCS weave, much less insert a diff.

          Liberally licensed elliptic curve cryptography over a binary field (e.g. NIST B-163). I just kind of feel like I should know this and occasionally I do run into one of those obscure binary field curves. However, I wholly lack the mathematics for it and never felt like actually sitting down for a week to learn discrete math and elliptic curves and whatnot. Library support is actually lacking (does anything other than OpenSSL even support these curves?) but OpenSSL is kind of the be-all-end-all of cryptography libraries to begin with.

          Self-hosted, open source base for DRM. Keygen and Qeys have some very attractive solutions for DRM that are simple to integrate. But I kind of sort of want my own and have it be open source so that people with a lot of paranoia won’t have to rely on third parties. The irony of open source DRM is not lost on me, no.

          Yet another typesetting system. TeX has great output and is painful to actually work with if you want to customize it; LaTeX exists for a reason after all. Troff has mediocre output (let’s not get into the nitty-gritty tradeoff that is choosing between groff and heirloom-troff here) and is somewhere between amenable and hell to actually work with. Why not both?

          1. 2

            Yet another typesetting system. TeX has great output and is painful to actually work with if you want to customize it; LaTeX exists for a reason after all. Troff has mediocre output (let’s not get into the nitty-gritty tradeoff that is choosing between groff and heirloom-troff here) and is somewhere between amenable and hell to actually work with. Why not both?

            (La)TeX is lovely and Knuth is brilliant, but his lack of PLT expertise shows. I’m exceedingly eager to see what would happen if a PL expert tackled typesetting languages. (Unfortunately, the story of Tex is itself a cautionary tale for any PLT PhD students thinking they’d might like to tackle the problem and finish their dissertation within a decade.)

            1. 4

              TeX is the way it is not because Knuth couldn’t do a better language, but because it was built at a time computer’s couldn’t fit an AST in memory. The TeX compiler has to generate the output with few sequential passes through the code.

              1. 2

                A lot of people also confuse latex (Lamport) with Tex (Knuth)

                Knuth was not interested in creating a markup system but to produce a professional typesetting tool for the expert user that pushed the boundaries of what had been done before.

                It’s unfair to say he didn’t care about abstractions, rather I think he chose the abstractions that served his goals and did not abstract where they were unhelpful or introduced performance penalties.

                Like people complain about having to rerun latex to page numbering right in the table of contents. Iirc knuths text document just generate the TOC at the end and he can then reorder pages in a production step.

                One counterexample to the “no abstractions” argument would be metafont.

              2. 2

                What is “PLT” and how does it apply here? I think the issue with TeX doesn’t have to do with “theory” – it has more to do with design, i.e. the “soft” parts of programming languages.

                It doesn’t really make sense to say Knuth lacks “PLT” expertise. First, he invented a lot of the theory we use in practice, like LR parsing.

                As another example, he also showed up in the LLVM source code for a minimum spanning tree algorithm:

                https://www.reddit.com/r/ProgrammingLanguages/comments/b22tw6/papers_and_algorithms_in_llvms_source_code/

                And: this might be a nitpick, but I’ve heard at least one programming language researcher say that “PLT” doesn’t mean anything outside of PLT Scheme. It seems to be an “Internet” word.

                So bottom line, I don’t disagree that TeX could be improved upon after several decades, but your analysis doesn’t get to the core of it. People who study theory don’t write the programming languages we use. Probably Haskell is the counterexample, but Haskell is also known for bad tooling like package management. On the other hand, Go is known for good, usable tools but not being “exciting” in terms of theory.

                I don’t think you need a Ph.D. anymore to write TeX. When Knuth wrote it, that may have been true, but the knowledge has been disseminated since then.

                1. 6

                  PLT is the initialism for Programming Language Theory. PLT Scheme got its name because it came out of the Rice PLT research group. PLT is simply the field of exploring the abstractions we use to program (and creating the theoretical tools to reason about those abstractions). While we very rarely directly use systems created by researchers (Haskell being a notable exception), the abstractions they develop absolutely shape the development of the programming languages we do use.

                  I’m not arguing that you need a Ph.D. to write TeX. I’m claiming that the task of developing the right abstractions for a typesetting programming language have largely been ignored by the group of people who study abstractions and programming languages (PLT researchers!).


                  Addenda: There’s a very distinct flair common to Knuth’s programming projects. His language design is primary influenced by the details of the machine which they program and the algorithmics of executing them. For instance, the reason TeX documents may require multiple rounds of computation before they’re “done” is because Knuth baked the requirement that a TeX compiler be single-pass into TeX’s execution model. Conversely, Knuth wasn’t at all concerned about statically reasoning about TeX programs, as evidence by the fact that merely parsing TeX is itself turing complete. (And TeX’s development happened during a really exciting period for PLT research: right around the discovery of Hindley-Milner type inference and hygienic macros!)

                  1. 6

                    Words aside, it’s silly to say that Knuth lacks expertise in “PLT”, since his work was foundational to the field.

                    Secondly, I don’t buy the claim that lack of abstractions are what’s wrong with TeX – at least without some examples / justification. Like anything that’s 30 years old, I think you could simply take the knowledge we have today and make a nicer version (assuming you have 5-10 years free :-/ ).

                    TeX is constrained by compatibility and a large user base, which I imagine explains most of its warts – much like Unix shell, and Unix in general. And there’s also the problem that it’s a huge amount of work that you won’t likely won’t get paid for.

                  2. 0

                    LR parsing is completely irrelevant to what PLT research is about.

                  3. 2

                    You should have a look at scribble.

                    1. 1

                      Also SILE, which takes the good parts of TeX, rips them out with no remorse, and glues together using Lua for ease of hacking. One important con is it doesn’t have math typesetting yet… which is, fittingly to the theme of the thread, why I tried to add the support ;)

                  4. 2

                    I’ve always thought of open sourcing Keygen. But I’m trying to grow it into a viable business, and I don’t think that would be wise at the moment. However, I do think open source tooling for things like this are incredibly valuable. (Disclosure: I’m the founder of Keygen.)

                    1. 3

                      But I’m trying to grow it into a viable business, and I don’t think that would be wise at the moment.

                      I can’t help but be curious about the magic sauce (and run my own—despite not even selling any software), but oh well. Maybe someday.

                      Open sourcing seems unwise to me, too. Small-scale deployments would just grab the open source edition and run with it. Plus I don’t think it’s entirely unrealistic to expect the big cloud providers to take it and make it theirs.

                      (Possibly there’s a market for obfuscation that you could also work with, but requires very high expertise in very low level development on multiple platforms; Denuvo had decent success there until they didn’t.)

                      While I have your attention, maybe you’ll find these points interesting which I would be doing differently:

                      • Likely a simple pure binary protocol (possibly with a much more lightweight cryptographical protocol than TLS) for the endpoints that clients require. That might be easier to handle in C/embedded platforms.
                      • Possibly drop the notion of a “user” and leave that to the API consumers’ discretion entirely. People need to stop rolling more authentication schemes out.
                      • Built-in license key checksum. I know your license schemes allow for extension with cryptography, but a minor amount of typo correction before making a possibly expensive network request could be helpful, depending on how things are set up.
                      • Elliptic curves (Ed25519 or P-521, specifically) as alternatives or possibly only signing option. Outright drop any padding scheme for RSA that isn’t PSS.
                      • Always sign and encrypt everything on top of TLS so that TLS isn’t your only line of defense against a cracker.
                      • Possibly considering encrypted certificates (X.509 or something of your own) as bundled information about a license. This could save some database lookups based on license ID, allow a “stateless” CDN for delivery—it only has to verify the certificate’s signature and its expiry time. They could also optionally embed (wrapped) encryption keys to wrap contents in, alleviating the catch-22 mentioned in the dist documentation: You could have regular licenses for a stub application that is used to make a request for a certificate and download the real application with the encryption key contained therein.
                      • SHA256 or higher for any sort of file checksum.
                      • Expose a “simple” API that skips the notion of policy, which is instead managed on the backend server. This can be useful for very small deployments.
                      • Flexible elevated access tokens (deleting a product is such a dissimilarly dangerous operation in comparison to creating one or even just issuing new licneses).
                      1. 2

                        Thanks for the great feedback! I’m actually working on a lot of those, especially introducing ECC into the stack and calculating better file checksums (I’m essentially proxying S3 at the moment, which uses MD5 checksums, much to my dismay). And FWIW, user management is an optional feature of the API, so if it’s not needed, one can simply create user-less licenses and be done with it. Better access token privileges has also been on my list since day 1 — just haven’t gotten around to that one yet. But hopefully soon.

                  1. 1

                    Can anyone point out a good into resource to the query planner? The style of this was good (demystifying is an apt title), and something similar to this laying out how a DB engine would choose how to actually get here would be rad.

                    1. 2

                      Query planning is where there is more variability among databases. The Further Reading section has links to a survey by Graefe on query planning. The most commonly implemented query planners are variations of two frameworks called Volcano and Cascades. Both by Graefe. A basic introduction to Volcano will be very enlightening. I might write about it in the future when I learn more.

                      1. 1

                        Thanks for the search terms!

                    1. 2

                      First of all, great work. But I want people to keep in mind that the whole point of Yoga being written in C is for it to be used across many languages (Java, C#, Objective-C, Swift…) through an FFI. It’s a single codebase that’s fast and well maintained.

                      1. 7

                        Yes! This is a common way of working in high-performance systems (e.g. graphics programming). The plural cases (many vertices, many entities…) is more common than the singular case.