Threads for haxor

  1. 1

    This is really cool! Sorry if this is an obvious question, but how would I start using this for an existing project that already has a config format? Would I use Hay to be my templating system for some other string config language like TOML? Can I generate a JSON config using it? How? Thanks!

    1. 1

      The idea I have is that the app would only have a JSON parser, and it wouldn’t need a config parser like TOML.

      By using Hay, you’re providing the user with some “programmability”, e.g. expressing config variants naturally. For another example, I tried to factor logic out of my Github Actions config, though it’s still quite repetitive as you can see:

      https://github.com/oilshell/oil/blob/master/.github/workflows/all-builds.yml

      So if Github Actions had used Hay, then my config wouldn’t be so repetitive and ugly.

      I also think this simplifies the app. Because a lot of platforms have to provide a lot of bells and whistles, when the programmability could just be done on the user config side.


      I should probably draw a diagram, it would look like

      [ Oil Config Process with Hay Evaluation ] –> JSON –> [ Application Process with JSON parser ]

      A nice thing is that TOML also serializes to JSON, so in theory you could provide the user with the option of both. (Though you would need to massage the JSON in the app – they wouldn’t output identical JSON.)


      Hay is “against” templating :-) You’re not supposed to use it with something like Go templates. Instead the “variation” is configured with Python- and JS-like types, not with text.

      I linked this post in another comment here: Why are we templating YAML, not generating JSON?

      So Hay follows the same philosophy. It is closer to Starlark/Bazel and BCL and further from YAML/Go templates.

      I wrote this page this morning: https://github.com/oilshell/oil/wiki/Survey-of-Config-Languages

      and updated the doc a bit.


      Let me know if you have questions and feel free to post to https://oilshell.zulipchat.com/ (e.g. #oil-help or elsewhere)

      What kind of app are you considering it for? As mentioned it’s still early and it needs feedback, but I will fix bugs. One thing Hay users may miss is that Oil still needs Python-like functions: https://github.com/oilshell/oil/issues/1112

      Also thanks for sponsoring! I’m interested in what sort of PL project you have brewing :-)

      1. 1

        Thanks for all the detail! I think what I’m trying to figure out is the migration path for an existing project or system to move to Oil/Hay from what they’re doing today with YAML, TOML, etc. Many systems may use JSON as an internal/low-level representation, but still present external interfaces through another level of abstraction (such as these config formats). For example, how often does the documentation only mention YAML and nothing else? So it seems like having support for a “wrong way” of templating could help people use Hay now instead of having to wait for their next greenfield project. I hope that makes sense!

        1. 1

          Hm I think if I saw how the existing app works I’d probably have a more concrete answer. What does the config look like?

          Concretely both sourcehut and Github Actions only use YAML as their user interface, and I can imagine them both using Hay. Whether they will do that is a different story of course :-P

          The way I would see a migration going is two different paths:

          1. Templates/YAML -> Expanded YAML -> App
          2. Hay/Oil -> JSON result of Hay Evaluation -> App

          So if people really want to use templates, they can, in a separate code path. But I would not use templates with Hay and Oil! It would get ugly and kind of defeat the purpose. That would be like autoconf generating shell or Python.

          It’s also true that Hay is not a superset of JSON; rather it evaluates to JSON. (YAML almost has that property; I read UCL does as well.)

          So Hay/Oil are really so you can use one language instead of two. It takes the place of both a data-only language like YAML/TOML/JSON, and a template language with logic (to express variants).

          Feel free to send me details here or on any other channel … it’s OK if it’s still early because Hay is still early too :)

      2. 1

        Also here is the more “executable” answer … But note this is the “Inline Hay” with no restrictions. I think platforms would want to use the “Separate file” with some restrictions, but I’m open to feedback on that.

        The difference is whether you can put shell commands in the middle of the config, which could be useful, but also breaks the “hermetic” property of configs.

        $ bin/oil -c 'hay define Rule; Rule foo { version = 42 }; json write (_hay())'
        {
          "source": null,
          "children": [
            {
              "type": "Rule",
              "args": [
                "foo"
              ],
              "children": [
        
              ],
              "attrs": {
                "version": 42
              }
            }
          ]
        }
        

        (this is just like the docs but I wrote it all on one line in the shell)

        And note you can’t generate arbitrary JSON; the output conforms to the schema in the doc. I guess this is because say a Go app will want to do that processing in Go, not necessarily in Oil. There could be ways to extend / relax this though, so again open to feedback.

      1. 4

        How fast or slow was the execution time of training etc? I’m curious!

        1. 5

          Thanks! The execution time for the prediction with the Lisp program was 4 seconds on QEMU and 2 minutes on the i8086 emulator Blinkenlights (https://justine.lol/blinkenlights/) on a 2.8 GHz Intel i7 CPU. On a 4.77 MHz IBM PC, I believe it should run about 590 times slower than QEMU which is roughly about 40 minutes. The training time on TensorFlow for 1000 epochs was 6.5 seconds on a 6GB GTX 1060 GPU. The memory usage for the Lisp program fits into 64 KiB, including the SectorLISP binary, the S-Expression stack for the entire Lisp program, and the additional stack used for evaluating the program, meaning that it should run in the boot process of the original hardware.

        1. 3

          To make this point they would need to increase the table sizes enough so they would no longer fit within memory or disk on a single machine. I guess scale means different things to different people, but this is the definition I’d expect someone to mean in this context.

          1. 7

            I think this is a bad idea because it’s worse for beginners. It’s also self-serving to the Twisted community that has felt spurned for a long time. We’re optimizing for the future Python programmers who will far out number us, and “batteries included” is the way to do that.

            1. 3

              Why is it self serving to the Twisted community?

              1. 6

                Because they want asyncio removed from the standard library because it competes with Twisted. Some more context here: http://pyfound.blogspot.com/2019/05/amber-brown-batteries-included-but.html

                1. 3

                  Because they want asyncio removed from the standard library because it competes with Twisted.

                  I think that’s a very uncharitable reading of the situation. The suggestion is that much of the standard library become third-party on PyPI, not that the modules be eliminated altogether. And the reasoning is pretty sound: it’s an absolute pain to know that something was just added to the Python standard library but you can’t use it because it’ll be at least half a decade until enough users have migrated to a version of Python that has it. If the modules were on PyPI, you’d just tweak the version in your own dependency declaration and be able to rely on it same-day.

                  1. 1

                    Your comment doesn’t make sense to me. Can you clarify? Is “half a decade” a reference to Python 3? There were backports to 2 (https://pypi.org/project/trollius/). Yield from wasn’t in Python until 3.3, and there was a separate package for that version (https://pypi.org/project/asyncio/). What makes asyncio feel good is async and await which didn’t arrive until 3.5, and that’s syntax that you can’t install as a module.

                    We’re trying to optimize for new programmers, and the last thing they should be learning about is package management. It’s hard enough to just learn the language and become productive.

                    1. 1

                      A given Python release series (like 3.7.x) tends to have a support lifetime of around five years from the Python core team. And your installed base as an application developer is going to include a lot of corporate users who standardize on a version and ride it until upstream support ends (or even longer). Since you can’t count on people doing timely upgrades of their Python interpreter + stdlib, you can’t count on them being able to use anything that depends on more recent Python features.

                      We’re trying to optimize for new programmers, and the last thing they should be learning about is package management.

                      Unless you’re giving them a custom distribution of Python that also includes Flask, Django, the full numpy/scipy stack, Jupyter, ML/AI libraries and game toolkits in a single install, they’re going to have to learn how to type pip install at some point in order to be productive beyond the intro to the language.

                      And being a consumer of Python packages is not particularly difficult these days, because it really is just pip install. The variation in toolchains and techniques is almost all on the producer side of packaging, and is mostly invisible to people who just want to install and use unless we scare them off by inaccurately telling them there are dozens of competing options.

                      1. 2

                        I think you’re conflating corporate users of Python with beginners. The point here is that we’re trying to delay the need to install any extra Python packages for as long as possible to maximize proficiency before users have to learn a second tool (pip in this case). If the standard library is empty then beginners have two problems to confront in order to learn the language. It’s a really simple concept.

                        1. 1

                          Nobody’s saying there should be no standard library. Several people are suggesting a much smaller standard library that contains only the really essential things, and move the rest to PyPI.

                          You can still teach beginners how to write functions and classes and do things with just the language itself and zero import statements. And like I said, at some point if they want to write anything non-trivial either you need to be providing a bundled installer that has extra stuff in it, or they need to learn to run pip install.

                          1. 1

                            os, io, pdb, unittest, functools.wraps, collections, socket, … – all of these should be in PyPI? My advice would be to switch to JavaScript and npm if that’s what you want. With Python: “There should be one– and preferably only one –obvious way to do it.”

                            1. 2

                              I keep telling you what people are actually proposing. You keep refusing to engage with that and instead setting up straw men you can knock down. So I’m not going to spend any more time on this.

              2. 1

                Are there links on how you are going about this?

                1. 4

                  This post from one of the core devs covers it pretty well I think:

                  https://snarky.ca/why-python-3-exists/

                  1. 2

                    Thank you for the link. This did not cover anything about the standard library, but it was helpful to get the perspective on Python 3.

                    So expect Python 4 to not do anything more drastic than to remove maybe deprecated modules from the standard library.

                    When I got to this line it became clear to me that the Twisted community and Python core devs are talking past each other.

                    1. 2

                      There is now a draft PEP suggesting that Python deprecate and remove some of the “dead batteries” in the standard library. There’s no reason, in 2019, for every Python user on earth to devote bandwidth to download, and disk space to storing, copies of a module for working with the audio files from 1990s Sun workstations (to take one egregious example of something that’s in the standard library with no real right to be).

              1. 1

                I have the opposite advice:

                https://effectivepython.com/2015/02/12/accept-functions-for-simple-interfaces-instead-of-classes/

                It comes down to what best expresses the intent of the code. For stateful closures, __call__ is often the clearest approach.

                1. 4

                  I’ve always called these Red Herrings:

                  https://en.m.wikipedia.org/wiki/Red_herring

                  1. 2

                    These are red herrings which, in addition to being wrong and a distraction, happen to lead you to the correct conclusion.

                  1. 3

                    How is this different than https://upspin.io/ and many similar things that have come before?

                    1. 15

                      “Please stop doing X” posts are a dime a dozen and rarely offer actionable advice more prescriptive than the simple “stop doing this thing.”

                      Let’s get more posts that say, “Before doing creating something new, here’s how to evaluate what exists for your use case.”

                      1. 2

                        Agreed. There’s no concrete information in this post, and a lot of uncited or made up narrative. And it’s condescending. I expect more from posts on Lobsters.

                        1. 2

                          Please stop writing new serialization protocols

                          also, why does the author feel the need to call people monkeys?

                      1. 3

                        I did a bit of work on the OStatus stack that Mastadon currently uses. There’s definitely room for improvement, but I think it’s better to get there through incremental changes to functionality and composing protocols. Having one all-encompassing spec locks you into a single set of use-cases, which hinders growth and adoption long-term.

                        1. 3

                          ActivityPub’s main design, as you know I think, was done by Evan Prodromou who did most of the design on OStatus. ActivityPub was written, with the initial design also by Evan, to try to overcome some of those limitations.

                          Meanwhile Mastodon did try to incrementally improve OStatus by adding extensions, but that upset people as well because they were deemed as incompatible with the rest of the fediverse (privacy isn’t easy to add-on after the fact in OStatus for one). Now that ActivityPub is moving from OStatus to ActivityPub there’s complaints from much of that same group (not saying that encompasses you)… catch-22…

                          BTW, heya Brett! Remember a very naive young programmer helping with a command line frontend in Python briefly to one of your projects at the Goog back in the day with bgoudie and friends for like… a month? That was me. :) I’ve meant to catch up with you for unrelated reasons, mainly because of some exploration of actor model stuff since then… watch the video on: https://www.gnu.org/software/8sync/

                          1. 3

                            Meanwhile Mastodon did try to incrementally improve OStatus by adding extensions, but that upset people as well because they were deemed as incompatible with the rest of the fediverse (privacy isn’t easy to add-on after the fact in OStatus for one). Now that ActivityPub is moving from OStatus to ActivityPub there’s complaints from much of that same group (not saying that encompasses you)… catch-22…

                            This is not true. Privacy on the level of AP would have been very easy to add, by just using a different salmon endpoint for private messages. This was discussed at length back then, but Mastodon still chose to implement the leaky-by-default changes. The complaints about the move to AP is because Mastodon breaks old ostatus functionality while doing it, but that’s a whole different topic.

                            1. 1

                              Maybe this is true, though I never saw a concrete proposal of how to do it or implementation efforts to show how it could be done? So it still seems theoretical to me. Do you have a link to where the proposed approach was laid out / outlined?

                            2. 3

                              Oh additionally, if you want a more minimal system that isn’t as “all in one” as ActivityPub is, Linked Data Notifications uses the same inbox endpoint and basic delivery mechanism that ActivityPub does, with a lot less of the social networking structure.

                              1. 2

                                Hey good to hear from you! Do you have a link to the part about “upset people as well because they were deemed as incompatible with the rest of the fediverse”? I’ve been out of the loop for a while but I’d be curious to see that.

                                1. 1

                                  It’s kind of hard to find a good summary, but this blogpost talks about it. Basically since there was no nice way to add privacy features to the existing distribution mechanisms, Mastodon kind of tacked it on and would advise the next server as to its privacy level. This lead to complaints that Mastodon was implementing “advisory privacy” since you’d send what was theoretically a private post from a Mastodon server, but everyone on a GNU Social (that’s the new name for StatusNet) server would see it. It could be that there was a way to do it in OStatus, but it wasn’t really worked out.

                                  One major thing that ActivityPub added is email-style addressing… every post is delivered to an individual’s inbox. Of course, like in email, you’re trusting the receiving server to actually do the right thing (and thus you could accuse this of being “advisory privacy” as well, but anything that isn’t end to end encryption can be accused of that), but I don’t get other peoples’ emails in my inbox because the addressing is baked in to the standard so it’s expected that all servers implement that.

                              2. 0

                                Yeah. OStatus is very well done, a nice unity of existing technologies that have been proven to actually work.

                              1. 3

                                Link to the actual product: http://origami.design/

                                1. 1

                                  I can suggest a new title, but I can’t suggest the better link, unfortunately. :/

                                1. 2

                                  Welcome to Lobsters, Alex!

                                  1. 1

                                    @haxor surely you can find a newer than 2012 MacBook that meets your requirements?

                                    1. 1

                                      Oh yeah! I will probably get a new computer at some point. It’s still working well and has all the security updates so I’m not in a rush. I figure I’ll wait until WWDC in June to see if they refresh the MacBooks then.

                                      1. 3

                                        You’re probably aware, but Macrumors has a buyer’s guide that’s always seemed pretty accurate to me. I used it when I purchased an iMac as our family computer (but otherwise, I don’t personally buy Macs).

                                        1. 1

                                          Did you check the 12" macbook? It is really good.

                                        1. 2

                                          Given that it’s an extension of asm.js and JavaScript, that seems like the right tag for now? Or maybe just the web tag?

                                          1. 3

                                            The part that struck me most about this post was how questionable it was of it to be written in the first place. The author publishes details of Parse that were seemingly never made public before, without explicit permission, in hopes to show how his company does it better, and doesn’t make the same mistakes.

                                            1. 1

                                              Yeah the information is dumped in a messy way onto us, then at the end the author’s care went into an ad for their own thing. This reads exactly like a list of “stuff they messed up but we do right”. Which I guess isn’t a bad thing in an objective sense. Still pissed me off that it’s an ad. But I commend the effort to get attention for your own business.

                                              1. 1

                                                Indeed. It makes me question the business model more than anything.

                                              1. 1

                                                My favorite version of this type of post is “Stop Writing Classes” from PyCon 2012.

                                                1. 2

                                                  This was first released in 2013

                                                  1. 2

                                                    Does anyone know if they tried PyPy? Is cache performance or total memory size their real constraint?

                                                    1. 2

                                                      PyPy needs to use the same memory structures as CPython. I would expect that to mean that it also needs to touch memory during collection, but I didn’t quite understand from the article why CPython did that. Because it added the objects to Python lists and had to increase their refcount? Sounds like something both Pythons should be able to avoid.

                                                      1. 9

                                                        CPython’s garbage-collector is reference-count based, which means that when you free a container (like a list), you then have to go through and decrement the reference count of everything that was contained (and if it hits zero, free it and decrement the reference counts of everything it held onto. This is particularly egregious in Python because the language is built on hashmaps - every object, every class, every module has (or sometimes is) a hashmap, so there’s a lot of ref-counting that needs to be done.

                                                        This is also why CPython has a GIL: when any part of the program can reach into any other part at any time and increment or decrement its reference count, the either you need a lock around absolutely everything, or just say “to hell with it” and make one big lock for the whole thing.

                                                        PyPy does not have a GIL, and it does not use reference-count-based garbage collection, so on the face of it, it seems like it would be a lot more CoW-friendly. The docs also say that PyPy does not do a gc.collect() at shutdown, which is another change mentioned in the article.

                                                        1. 4

                                                          PyPy does not have a GIL

                                                          According to their FAQ, they still have a gil. They were looking into replacing the GIL with STM a while ago, but I don’t recall that ever landing.

                                                          1. 1

                                                            Ah, my mistake. I knew they did not use reference-count-based garbage collection, and I knew that was the big sticking point for the CPython GIL, so I leapt to a conclusion.

                                                            1. 1

                                                              The GIL is necessary for compatibility with CPython extensions. Other implementations have removed their GIL because they live in other contexts, but PyPy wants to retain that compatibility and be a CPython drop-in replacement.

                                                              I’m surprised they were able to remove the reference-counting, but I don’t know enough about Python extensions to know why, maybe they indicate ownership of objecs in some other way? Or does PyPy not use refcounting itself, but supports it alongside gc, to support extensions? Or do some extensions just not work with PyPy even though most do?

                                                    1. 1

                                                      Why doesn’t he doesn’t mention Golang in comparison? He mentioned C# but stops there. Go is a similar vintage to Rust and has similar goals, but it takes a different approach to safety. Meanwhile, Graydon’s post (linked in another Lobster’s thread), acknowledges that there are parallel attempts to do this:

                                                      A few valiant attempts at bringing GC into systems programming – Modula-3, Eiffel, Sather, D, Go – have typically cut themselves off from too many tasks due to tracing GC overhead and runtime-system incompatibility, and still failed to provide a safe concurrency model.

                                                      I think you can read into this statement a lot, especially with all of the improvements to Go’s GC that are in version 1.8. It’s great to have two unique approaches to safe concurrency in systems languages (Rust and Go). It’s too early to tell which one will achieve their goals the best.

                                                      1. 2

                                                        To be clear, Rust claims to eliminate data races in safe code at compile time. Go does not. The specific guarantees provided by both memory safe languages differ when it comes to the standard implementations. You can read more details about Go specifically from rsc.

                                                        With that said, in practice, Go provides some nice runtime protections against data races. Its race detector for example is awesome, and since Go 1.6, I believe data races on hash maps will always panic.

                                                        1. 1

                                                          Understood that there are different tradeoffs (zero overhead, etc). I’m talking about “what problems is this language good at solving”, and I think they’re very close given the goal of safety in concurrency.

                                                      1. 14

                                                        I committed myself to daily practice a couple years back. I decided to not break my github streak. Every day I do at least 20 minutes of katas, books, or building something outside my comfort zone. It’s been revolutionary. I’ve worked through SICP, Programming Languages, Let Over Lambda, Land of Lisp, Algorithm Design Manual, Clojure Programming, a good chunk of Data Scientist’s Toolbox, Learn You A Haskell, F# 3.0, Real World Haskell, most of @haskellbook, and I’m currently working through Software Foundations in Coq. I built, deployed, and support a school attendence suite in Clojure. I’ve done countless katas, TDD practice sessions, and little helper programs.

                                                        All in all, deciding to put in 20 minutes a day and tracking it on GH has completely and drastically changed my skill in just four years. I believe if you set yourself the goal of writing something, anything for 20 minutes a day, you will find plenty to keep you busy and interested. You will see your skill rapidly improve. You will get bored of things in your current comfort zone, and so you’ll have to learn new things to stay focused. I can’t recommend it enough. I just crossed 1355 days of streak, and I intend to do it as long as I write software for money.

                                                        1. 7

                                                          I agree with the “20” minutes. But I disagree with covering such broad subject matter, and so shallowly (perhaps he meant something different).

                                                          I think you’ll learn a lot more by doing things end to end, and start to finish. Figure out a small idea for a program you want to create. It should be highly relevant to you or someone you care about. Build the whole thing, including the UI, the backend, the database, tests, deployment, monitoring, etc etc. Use it. Polish it and make it robust. Don’t stop putting time into it until it’s actually done.

                                                          I think perseverance is what makes expert programmers so unique. They can hit problems they’ve never seen and push through without getting discouraged. The only way to learn that is by finishing whole projects, not starting new ones.

                                                          1. 7

                                                            I’d say we are in agreement then! I finished almost every book and course, and the attendance site I built from scratch they are using it to this day. However, I disagree that fully completing SICP and Let over Lambda as shallowly covering Lisp and FP. Completing all 350+ exercises in SICP was one of the hardest things I’ve done and stretched my skill immensely.

                                                            I’m actually curious completely because I want inspiration, if what I do is a shallow study, what does your personal study of it look like? I left out a lot of the books I’ve read that don’t have homework (so I don’t count in my 20 minutes a day) like Domain Driven Design, Clean Code, Code Complete, Pragmatic Programmer, Working Effectively with Legacy Code, Growing Object-Oriented Software Guided by Tests, Death March, Art of Agile Development, Planning Extreme Programming, Extreme Programming Explained, Design Patterns, Implementing Domain Driven Design, Patterns of Enterprise Architecture, Refactoring, Peopleware, Managing Humans, and Becoming A Technical Leader. Those are all fine, but can’t really be practiced so I often don’t recommend them unless the person has mastered more tactical skills.

                                                            I do agree that perseverance is the biggest thing. When stuck and hopeless, I am at my worst when I give up, and at my best when I take a deep breath and look for alternate solutions.

                                                            1. 2

                                                              I want inspiration, if what I do is a shallow study, what does your personal study of it look like?

                                                              What I would suggest is trying to find people who share your passion but have non overlapping skills to review your code; and do the same for them. You want to try to constantly refine and improve the code so it is better than what any of you could do alone.

                                                          2. 3

                                                            How do you balance reading a chunk of e.g. SICP and then coding, presumably something relevant, in 20 minutes? How much would you typically read? Does this only work because you do it every day? Or is it because the texts come with relevant exercises, so you don’t have to come up with your own projects?

                                                            1. 4

                                                              Those books all have exercises along with the text. I usually will read for a few minutes, then write some samples to see what’s up with what I just read, then start on the exercises. If the exercise is hard, I’ll have a few sessions of writing some unit test cases, playing with the api, and then finishing the exercise. At first each time I sat down I’d spend 10 minutes just remembering where I was, but after a year or so I got pretty good at just “jumping in” and can sit down and be typing/reading in under 30 seconds.

                                                            2. 2

                                                              What did you think of Data Scientist’s Toolbox? I’m curious how resources there feel. Did you have an applied math background going in?

                                                              1. 2

                                                                I had effectively zero math going in, but a lot of various programming languages. It was fine, I enjoyed it. I think I’d want to do it a second time without the deadlines, those really bum me out and demotivate me.

                                                              2. 2

                                                                The book Software Foundations came up when people were talking about learning to do stuff like seL4. I noticed it’s Coq where seL4 and many good projects are Isabelle/HOL. Curious if you or anyone else can tell me if Software Foundations' material also covers most of knowledge or skill needed for HOL, too. It plus a HOL-specific tutorial or articles. Or is it and Coq’s approach different enough to need a whole different book?