Threads for nathants

  1. 2

    Similar in concept, but with zero dependencies and designed for confidentiality: https://github.com/richfelker/bakelite

    1. 1

      this is very cool! rss subscribed.

      what url source did inlined crypto dependencies come from?

    1. 1

      This smells similar to bup, but a bit more custom?

      1. 1

        there are a lot of good backup solutions. while doing prior art research, i disliked to broad scope and complex implementation of existing solutions.

        i wanted total confidence in my ability to reason about the data structures and machinery of backup.

        as long as one has confidence that their backups exist and are recoverable, any solution is likely fine.

      1. 9

        Balancing simplicity vs features is quite a challenge - in many ways I like spartan simplicity, but I think pruning is a useful feature. if you want something that supports pruning, access controls and asymmetric keys see my similar tool bupstash:

        https://github.com/andrewchambers/bupstash

        It is very close to 1.0 and has been quite stable for the past year or so.

        1. 2

          this is very cool! i agree that balance is hard. bugs in a backup system are typically discovered when data is lost.

        1. 1

          Cool! I want to know more about the tarball chunks; as it is, I can’t gauge how hard it is to prune old backups.

          1. 1

            thanks! after a data loss event, i swore: never again.

            backups are meant to be immutable. the hash of the tarball is commited to the index.

            not sure what your pruning use case is, but i’m sure the design could be changed to accommodate it.

            you could edit existing tarballs, deleting content you don’t want to keep, then mutate it in object storage and update the index.

            however, it is not a goal of this at present.

            1. 2

              Think about the space consumed; will it grow without bound? Will restoring a system get slower, the more numerous backups will be referenced?

              You could follow something like Time Machine defaults, or keep n hourly backups, X daily backups, Y weekly backups, Z monthly backups, and m annual backups. A common name for this is Grandfather-father-son.

              Do you want to hold hourly or daily backups for, say, five years? (I don’t.) Tarsnap is notably faster when working with less data, so I don’t keep daily backups forever.

              1. 2

                Think about the space consumed; will it grow without bound? Will restoring a system get slower, the more numerous backups will be referenced?

                yes. storage will grow without bound. one easy strategy would be to periodically start a new backup, and destroy all storage associated with an old backup. this would destroy all history and reset state to the current local state.

                You could follow something like Time Machine defaults, or keep n hourly backups, X daily backups, Y weekly backups, Z monthly backups, and m annual backups. A common name for this is Grandfather-father-son.

                true, and that’s a good model. what wanted was something more like git. i don’t expect git to randomly prune old commits or destroy random blobs from history. i expect git to preserve exactly what i commited.

                Do you want to hold hourly or daily backups for, say, five years? (I don’t.) Tarsnap is notably faster when working with less data, so I don’t keep daily backups forever.

                i am not familiar with the internals of tarsnap. here i am creating a single text file containing the metadata of the filesystem. this file is versioned in git. a new backup will do a linear scan over the complete modification history of this file via git log -p index. like storage used, this file grows without bound. this linear scan will eventually become annoyingly slow. i assume tarsnap slowdowns are from a similar reason.

                when that happens, i will create a new backup, but not destroy the old one. this new backup will not be able to deduplicate against the old backup, and so it will copy to storage any files in current local state that already exist in the previous backup. some storage is wasted, but the index history is truncated. nothing is lost, since the old backup is still accessible.

                my backup index is currently:

                >> wc -l index
                
                101965 index
                

                my backup index revision history is currently:

                >> time git log -p index | wc -l
                
                518572
                
                real    0m12.136s
                user    0m11.745s
                sys     0m0.465s
                

                backup-add currently looks like this:

                >> time backup-add
                
                scanned: 101971 files
                new data size: 0.89 mb
                
                real    1m9.056s
                user    0m57.498s
                sys     0m11.347s
                
                >> df -h
                Filesystem           Size  Used Avail Use% Mounted on
                zroot3/data/home     194G   74G  121G  38% /home
                

                once scanning index history is slower than blake2b scanning the filesystem, i will probably start a new backup.

                remote storage for the 96 backups i’ve made in the current backup look like this:

                >> aws s3 ls $bucket/$backup/git/ | py 'sum([int(x.split()[2]) for x in i.splitlines()])' | humanize
                68 MB
                
                >> aws s3 ls $bucket/$backup/tar/ | py 'sum([int(x.split()[2]) for x in i.splitlines()])' | humanize
                4.3 GB
                
                >> aws s3 ls $bucket/$backup/tar/ | wc -l
                96
                
                
          1. 5

            It’s also worth noting that VPC Endpoints to AWS Services keep routing internal to AWS, providing latency improvements (especially for p99 latency).

            1. 1

              last time i tried i saw worse performance with vpc endpoints. haven’t tried recently.

            1. 9

              or switch to internet gateways and go zero trust.

              1. 3

                That’s probably the best course of action. You’ll want identity based authentication/authorization soon enough anyways.

                1. 1

                  Wouldn’t that increase bandwidth charges, among other things?

                  1. 1

                    no. https://aws.amazon.com/vpc/faqs/#Billing

                    Data transfer charges are not incurred when accessing Amazon Web Services, such as Amazon S3, via your VPC’s Internet gateway.

                1. 3

                  deno and fly are fantastic, but lambda can also be exceptionally easy. the ideal lambda zip contains two files:

                  • main (go binary, or other pl single binary)
                  • index.html.gz (webapp with inlined js)

                  example: https://github.com/nathants/aws-gocljs

                  automation: https://github.com/nathants/libaws

                  container lambdas, or containers on fly, are also fine, just make them small. cold starts are way worse with massive containers.

                  fly and deno have some very cool advantages being at edge. their egress pricing is identical to aws though.

                  cloudflare workers + r2 is an interesting target, free egress!

                  the nice thing about using aws is you get the rest of it: ec2, s3, sqs. moving to edge is a fair trade to give up those things, depending.

                  personally i’m sticking with aws and moving egress heavy components to cloudflare. keeping a close eye on deno and fly, they are just so cool!

                  1. 1

                    Inlined JS probably means you need a lax CSP which isn’t a great idea

                    1. 2

                      thanks for bringing up csp!

                      i’ve updated the build so inlined js gets hashed and added to csp. i should have been doing this all along. smh.

                      https://github.com/nathants/aws-gocljs/commit/125196626dbb237c180d4d587bf51261d9a75746

                      https://gocljs.nathants.com/

                      1. 1

                        it probably depends on the threat model.

                        the model i’m most interested in is where the infrastructure is untrusted. the client manages all secrets. this means a different provider needs to serve the html with inlined js, as it would be trivial for the first provider to change and own.

                        then we need to hash the html with inlined js, and verify that easily somehow. ideally saving it offline, and updating purposefully.

                        would csp be of benefit in this model? i’m honestly not sure.

                      2. 1

                        When Lambda was new I shied away due to the pricing. We had a workload that was not so easily predictable and it wasn’t some core thing that directly benefitted from lambda so in the end we ran with a normal VM and 4 processes on it. Quick enough and the 10 or 20 bucks per months were negligible. (it was a workload that was potentially running 24/7 with breaks).

                        Ever since it’s been a bit of a solution in search of a problem - especially with state. (Do I really want to connect to a DB from lambda?) But yeah, these things I mentioned would probably run just as well on Lambda - the IP tool more so. Rare, quick requests. On the other hand if it’s on the internet someone might find it’s a cool idea to hit it once per second on hundreds of machines and then I have my weird AWS bill.

                        1. 1

                          my favorite use cases for lambda are:

                          • scale to zero for low/sporadic traffic services. their cost approaches zero.

                          • cron scheduled tasks managing ec2 and other aws infra. lambda will always be the most reliable part of your stack. things like autoscaling groups and cloudformation pale in comparison to lambda with an aws sdk.

                      1. 2

                        tests should be named fast or slow, and maybe a bit of context.

                        1. 1

                          I happen to work on a codebase that uses pytrst marks for this, because slow tests get run less often and pytest marks are how the system knows which tests count as slow. In a different codebase this might not be relevant at all, or perhaps some other categorization scheme might make sense. This doesn’t affect the names of test functions at all.

                          1. 1

                            sounds like a good setup!

                        1. 1

                          validating yaml, or any input data, is likely a good idea. does the validation have to be based on a human readable schema though?

                          i’ve started manually validating yaml the hard way, and writing human readable but otherwise useless schemas. it feels like a reasonable approach.

                          schema:

                          https://github.com/nathants/libaws#infrayaml

                          manual validation the hard way:

                          https://github.com/nathants/libaws/blob/26fe86c65bbb0ea63ca7d6728836a146964f7c10/lib/infra.go#L2492

                          1. 3

                            Welcome to lobsters! Generally we request that authors also contribute articles from other people and contribute to the community in general, in addition to their own work.

                            Also, I think 7 days is too short for reposting. In the past there have been major software projects posted at a monthly or every-other-month rate, and that was also considered too much for people here.

                            1. 3

                              good to know! my apologies. it won’t happen again.

                            1. 2

                              attempting to restore sleep schedule sanity. sleep metrics for this week were not great.

                              1. 2

                                imagine life without emacs buffers?

                                1. 3

                                  No thanks, I live it daily when using VSCode instead of emacs. While most on my team are sufficiently comfortable with VSCode and so I use it professionally, I always keep emacs around for “serious work” or whatever that means.

                                1. 1

                                  the people have spoken. this project is now known as aws-exec.

                                  1. 2

                                    some call it RPC, some RCE

                                    1. 1

                                      true. in this case the name of the procedure call is bash.

                                    1. 22

                                      can confirm the grinding frustration will continue. the best way to mitigate it is to iterate as fast as you can. take a zero tolerance policy to slow modify-compile-execute-reflect loops.

                                      it doesn’t matter if it’s a binary on your desktop or some rube goldbergian enterprise fizzbuzz on cloud.

                                      if you can iterate in 1s, it’s less frustrating. way less. it can even become fun.

                                      all of my github projects are about faster iteration, so i can be less frustrated. the loop shrinking will continue until morale improves. this is the way.

                                      faster loop on aws: https://github.com/nathants/libaws

                                      faster loop on webdev: https://github.com/nathants/new-gocljs

                                      faster loop on services: https://github.com/nathants/aws-rce

                                      faster loop on browser testing: https://github.com/nathants/py-webengine

                                      faster loop on react: https://github.com/nathants/runclj

                                      faster loop on docker containers: https://github.com/nathants/docker-trace

                                      faster loop on distributed data processing: https://github.com/nathants/s4

                                      1. 1

                                        This is a great philosophy, thanks for sharing your projects

                                        1. 2

                                          of course! life is too short for needless frustration.

                                      1. 1

                                        “It should be easy”, and then sweeps it under the rug! Nice try but I believe it doesn’t address the core problem 🙂

                                        1. 1

                                          you didn’t enjoy the gif then?

                                          1. 1

                                            I did! But no project will be able to address the core problem…! Fullstack web is inherently complex today and for the foreseeable future. It has nothing to do with your skills or your project. I think you’d get better reception if you worded this article as “Opinionated webdev tool to ease devX with Clojure, Go and AWS”.

                                            1. 1

                                              i’m glad! you’re not wrong. the complexity isn’t going anywhere, but the user doesn’t have to care about it or be an expert in it. that should be an option, not a requirement.

                                              by doing simpler things, that are harder to screw up, it can be made easier.

                                              i am confident the typical developer can accomplish the following:

                                              • git clone
                                              • bash bin/ensure.sh
                                              • bash bin/dev.sh # leave this running

                                              i have a related project, libaws. it is used here to handle aws infrastructure.

                                              its original title was:

                                              • opinionated tooling, with a minimal interface, targeting a subset of aws.

                                              its second title was:

                                              • a simpler infrastructure as code specification.

                                              the final title is:

                                              • aws should be easy
                                        1. 3

                                          As a fullstack web developer, this looks like a nightmare to deal with. Go for the back end, Cljs and React for the frontend, and bash for logging/devops? That’s an absurd amount of context switching, when you could pretty much do everything in JavaScript.

                                          1. 2

                                            @vonadz i see from some of your other comments that you are using clojure already. you should try reagent! it is so simple, hasn’t changed in a decade, and is THE reason to use clojurescript. shadow-cljs is relatively recent, and makes interop with the npm ecosystem easy.

                                            1. https://reagent-project.github.io/

                                            2. https://github.com/thheller/shadow-cljs

                                            1. 1

                                              Yeah, I am! I have tried it, but found it hard to justify investing a lot of time into it since I’m already proficient in Svelte and found it to be a better dev experience overall. I’ve built a couple of personal web apps that use Svelte as the front end with a Clojure API as the back end.

                                              1. 1

                                                svelte plus clojure sounds like a good time!

                                                what clicked for me with reagent was the state model. a single, global map holds all state in an atom. any components that derefs the atom gets updated when the atom changes. events mutate the atom with swap. everything is stateless, except the atom.

                                                1. 1

                                                  I felt like reagent and react just have too much abstraction. It’s too far away from vanilla javascript. That’s why I liked Svelte, because it felt closer to what was actually happening.

                                            2. 1

                                              you definitely could do everything in javascript, and as long as it’s easy and fun, you should! for me, it isn’t, and the first problem i run into with fullstack js is client side state management.

                                              go on the backend is just so nice, its hard not to. if you’re not gonna use one lang top to bottom, you have some freedom to choose tools that fit the problem space.

                                              this is the setup i’ve landed on. it works for me. a lot of this is subjective, and in the end tooling choice isn’t important.

                                              fun, fast, easy sdlc is important. it’s the most important thing. it will either amplify or hinder your efforts.

                                              1. 1

                                                Yeah, sorry. I didn’t mean to yuck your yum. I was just giving my perspective on it as small startup founder, albeit in a fairly nonconstructive manner.

                                                1. 1

                                                  all good all good, was happy to see your comment! sounds like you already have fast/easy/fun sdlc. this is good.

                                                  1. 1

                                                    <3 keep rocking on

                                              2. 1

                                                Four languages in one framework is too much? This sounds like a very small stack to me.

                                                1. 2

                                                  let’s not start dropping f-bombs, we only use libraries here friend ;-)

                                                  1. 1

                                                    Was replying to the commenter who mentioned React

                                                    1. 2

                                                      sorry if the joke didn’t come through. library and framework are synonymous in my mind. apparently i need more coffee before the humor centers of my brain come alive.

                                                  2. 1

                                                    Yeah should’ve clarified that my experience is limited to starting my own companies and doing most of the stuff myself or in small teams of no more than 5.

                                                  3. 1

                                                    From a paper at ISCA a few years ago, both Chrome and Firefox are written in about 30 different languages. Different languages are tuned for solving different categories of problem. Limiting yourself to a single language for a project means that you’re making a compromise somewhere.

                                                    1. 1

                                                      it would be nice though. if only go could do frontend. if only reagent existed for not clojurescript.

                                                      1. 1

                                                        Engineering is all about compromising. In my case, the perspective is from a small startup founder who does most things themselves and prioritizes getting things done in many different domains quickly.

                                                    1. 9

                                                      there is only 1 hard problem in cs: state.

                                                      naming things and cache invalidation are state problems.

                                                      1. 1

                                                        Haha, I love this. You simplified one of my favorite quotes.