1. 4

    For a personal project (a desktop GUI app), I’ve actually started in C++ and I regret it. (Re)compilation times are very poor, sometimes adding a comment in 1 header file forces recompilation of the whole project, management of thirdparty libraries is in the stone age era (compared to other languages, like Java, Go or Rust), interoperability with different build systems is a nightmare (especially custom build systems written by “clever” programmers), writing multi-platform code even when using Qt is pretty difficult (requires to use continuous integration with different systems you’re targetting, which for a 1-person project is pretty expensive in terms of time to set it up). C or C++ also makes it pretty easy to overwrite memory by accident, and then it’s not a rare thing for the bug occur only on Release builds, where on Debug builds everything works like a charm. C++ as a language also lags behind other, younger languages, and tries to keep up by introducing lots of stuff that will never be used by an average programmer. C++ now needs lots of quality-of-life frameworks and libraries, which is something I wouldn’t expect for a language that is so “established” as C++. The only real advantage of C++ is that it’s fast and it has RAII (and RAII makes it better than C in my opinion).

    I’m evaluating rewriting my project into a different technology stack, this time Java (well, more precisely Kotlin or Scala, but it’s on JVM). Writing multiplatform code in Java is orders of magnitude easier, because it’s compile-once-run-everywhere (well, JavaFX requires different bundling for different systems, but one can create bundle templates once and just use them later without modification): I can use my IDE on Linux and just test the binary on a Windows VM, I don’t need to recompile it, and test the same thing on macOS. The lack of runtime JVM on target system is not an issue, because it’s possible to bundle a stripped-down version of JRE along with the application itself, and in the Electron era I don’t feel this is something that’s unacceptable by most of the people. JVM is also very fast, it’s faster than most people think!

    1. 1

      In my case, a huge advantage over everything else is native bindings to LLVM.

    1. 4

      If the problem is async, needs concurrency and/or IO related, I’d reach for go. If the problem is simple and doesn’t touch network IO, C. If it’s glue and doesn’t need to be deployed, bash/python.

      1. 1

        Have you tried Flow that was used in FoundationDB?

      1. 6

        AWK can be good for prototyping an idea, but you (very) quickly run into its limitations. No typing, no array literal, functions arent first class citizens, cant pass arrays by value, no imports. Its even missing basic functions like array length.

        But biggest negative is the myriad implementations: NAWK, MAWK (2 versions), GAWK. Makes it very difficult to write portable code.

        1. 6

          AWK can be good for prototyping an idea, but you (very) quickly run into its limitations.

          If I consider when AWK was created (1977), I must say, that it is incredibly well designed and successful piece of software. Yes, it is sometimes ugly, sometimes limited …but it is still in use after 44 years! God bless Alfred, Peter and Brian.

          (regardless we usually use the GNU implementation, it is still based on the original idea and language)

          AWK and classic unix approach is quite limited when it comes to structured data. But we can push it bit further and improve by borrowing ideas from relational data model – and still use classic tools like AWK.

          1. 3

            funny enough, gawk’s –lint option will let you know what constructs are gawk (not posix awk) specific which helps with your biggest negative case. if you use vim, ALE for (g)awk scripts will highlight them inline.

            1. 3

              AWK can be good for prototyping an idea, but you (very) quickly run into its limitations.

              A good programmer can work around these limitations. Just look at dwatch(8) on FreeBSD. Heavy use of awk.

              https://svnweb.freebsd.org/base/head/cddl/usr.sbin/dwatch/

              Or how about an HTTP caching proxy in gawk?

              https://pastebin.com/raw/Fmf1Fu4b

              1. 7

                A good programmer can work around these limitations.

                “Should they?” is a better question. They’re better off using a powerful tool that doesn’t limit them. Then, limit their use of it to what they need for maintainability. Subsets, DSL’s, and so on.

                1. 2

                  Shell script with embedded awk in functions paired with fd redirection AND eval’ed sudo. That looks like a maintenance nightmare for anyone who’s not the original author.

                  1. 1

                    It was reviewed and signed off by three other core developers, so I don’t think that’s going to be a problem.

                    https://reviews.freebsd.org/D10006

                2. 2

                  I don’t write awk for work, more so for pleasure, and its limitations can make it fun to use. It clearly was influential on the languages we use today and it would be interesting to see a programming historian trace that lineage.

                1. 1

                  I’m one of the contributor of Arrow, I will gladly answer questions on Arrow and Gandiva.

                  1. 3
                    • Enjoy relax family life.
                    • Finish Christmas shopping.
                    • Working on my toy jit expression language for bitmaps.
                    1. 5

                      It doesn’t matter at the scale of 2 cores/8GB RAM/100GB disk, but if you start wanting more resources, coloing can be significantly cheaper than a VPS or cloud provider, and provide more stability (no noisy neighbors or strange networking problems). Would be interesting to see that cost comparison as well (amortized over some number of years)

                      1. 2

                        For static workloads, yes this is absolutely true. But if scale starts moving up or down, you’re stuck with the hardware you bought until you buy (and deliver and rack and configure) more.

                        1. 4

                          People tell me this all the time, but when I ask people what their ratio of peak load to typical load is, it’s usually less than the premium that one pays for running in the cloud, unless you’re large enough to negotiate a contract where you get much better than the public rates (which many people do, but that makes it pretty hard to price-compare). Maybe there are workloads where you’re only using your hardware a very small percentage of the time (data analysis, etc), that makes sense, but for a lot of companies with very seasonal traffic (ecommerce sites, etc), I’ve asked people about their peak load and it usually doesn’t seem to justify being in the cloud :/

                          1. 1

                            I both agree and not agree with you. People I know that are on this scale (big enough to own hardware, but not one of the super-players) usually have a relatively steady load and so resources in reserve so that they can provision extra/new apps when the need arises, and still have time to get new hardware, On the other hand, if they actually want to colo (because they need to be out there on internet) their load is much more volatile and harder to predict, so they overprovision a lot.

                            Granted, this is based on personal experience, but I just wanted to bring it up.

                            1. 1

                              How much cheaper is it than (eg) 3 year non-convertible reserved instances?

                              1. 1

                                And people always under value time to delivery. I was in a position where we had to wait month(s) to get newer nodes in an cluster because the rack was full.

                                1. 1

                                  And having people around who can deal with both admin and hardware stuff.

                              2. 1

                                Both are completely valid and it varies organization to organization. I’m an IT consultant and it’s always hit or miss on projects whether the company will have the spare capacity in their vSphere cluster to provision servers when the project demands it. One product I deploy needs four cores and 16GB of RAM dedicated, and you might be surprised at the number of times we put the entire project on hold for months because they need to purchase and install the extra capacity in their VM cluster. If they had a project that was a surprise smash hit and they hadn’t provisioned resources for it weeks in advance, they’re throwing money down the drain.

                                Some companies have the capacity to scale already (which may or may not be a waste of money like you said), while some can scale but it would take weeks or months. There are pros and cons to both approaches and it really depends on the company and product you’re talking to.

                          1. 3

                            The asciinema is next to useless, you spend half the time booting kubectl. I still don’t now what this software is doing.

                            1. 1

                              Hey, Sorry for the useless asciinema.

                              I actually created pathivu in the cluster and used katchi to see all the logs from the pathivu.

                              The end result will look like this https://github.com/pathivu/pathivu#use-katchi-to-see-logs

                              I’ll update the asciinema soon.

                            1. 8

                              Launch Zig tag! For great justice!

                              (sorry, couldn’t resist)

                              1. 1

                                What about zag?

                                1. 4

                                  Zag set us up the bomb. I hate that dude.

                              1. 3

                                I’d say memset every struct you allocate, and always initialise to 0 all stack primitive variables. So many times I’ve seen failure due to refactor that change the control flow where the refactor is slightly broken because a variable is uninitialised in one of the flow path (even with warnings).

                                On the subject, I like the pattern of having:

                                • void my_type_init(my_type*, args...) and void my_type_deinit(my_type*) for initializing/deinitializing internal state but makes no assumption if the pointer is on the stack/heap.
                                • my_type* my_type_create(args...) and my_type_destroy(my_type*) does the allocation on the heap and internally calls init/deinit.
                                1. 4

                                  From here:

                                  The worst part of all this is that required initializers prevent compilers and static-analysis tools from finding real uninitalized-variable errors for you. As far as they’re concerned it was initialized; they don’t know that the initial value, if left alone, will cause other parts of your program to blow up. If you need a real value, what you really want to do is leave the variable uninitialized at declaration time, and let compilers etc. do what they’re good at to find any cases where it’s used without being set to a real value first. If your coding standard precludes this, your coding standard is hurting code quality.

                                  1. 3

                                    Another think to keep in mind is that C99 allows one to declare variables anywhere [1] in the block of code, not just at the top of the block, which can help with the “not initialized error”. And as a C programmer, your pattern is a nice one. I might have to use that.

                                    [1] Almost. You can declare variables of the same type in a for()

                                    for (size_t cnt = 0 ; cnt < max ; cnt++) ...
                                    

                                    but not in a while() statment:

                                    while(int x … ) // illegal

                                  1. 5

                                    Trying to restart Couch to 5k. Getting ready to send a kid off to Grade 1. Cleaning the garage and maybe the cars too. Catching up on leisure reading. Staying away from the internet / videogames.

                                    1. 3

                                      You can do it! After a one year break (winter, knee injury and cat ruined my shoes by peeing in it), I’m very very close to do 5k in 35 minutes.

                                    1. 1
                                      01:arrow/ (pr/4574) $ echo hi >> README.md 
                                      01:arrow/ (pr/4574✗) $ false
                                      01:arrow/ (pr/4574✗) ! 
                                      

                                      The git branch in green with a red dirty marker if the workspace is dirty. The $ transforms to a red ! if the previous command did not exit with 0. The path is truncated to the tip of the tree (top most directory).

                                      1. 6
                                        char buf[src_size];
                                        

                                        wont this fail if source file is larger than RAM? I am guess more robust solutions (cp, rsync) dont have this issue.

                                        1. 3

                                          It should fail if it’s larger than the available stack size, I think.

                                          1. 1

                                            How about char* buf?

                                            1. 3

                                              I dont see that as helping the problem. Youd need a sliding window of a set size, say 1 GB that is emptied after that portion is copied - or you could just ignore the problem if its for personal use

                                              1. 1

                                                What do you mean a sliding window?

                                                1. 3
                                                  1. 3

                                                    Its a buffer implementation - you would need to use something like this for a robust copy solution. if you dont care about supporting larger files you can ignore this

                                                    if you do care about supporting larger files - create buffer of say 1GB - load first 1GB of source file and copy to destination - rinse and repeat until file is copied - you might need to seek as well but I think not as I believe C read moves the cursor as well.

                                                2. 2

                                                  You’ve changed the code now to just do:

                                                    char* buf;
                                                    fread(buf, 1, src_size, src);
                                                  

                                                  Won’t that just fail since buf is uninitialized?

                                                  1. 0

                                                    I tested it, it didn’t

                                                    1. 4

                                                      You’re relying on undefined behavior then, which is inadvisable.

                                                      1. 1

                                                        Are you joking? Even the most cursory of checking triggers the warning:

                                                        $ x86_64-w64-mingw32-gcc  -Wall   copy.c
                                                        copy.c: In function ‘copy’:
                                                        copy.c:14:3: warning: ‘buf’ is used uninitialized in this function
                                                        [-Wuninitialized]
                                                        
                                                        1. 1

                                                          reply

                                                          I think OP is learning C.

                                                1. 2

                                                  If this is for Linux, then I think sendfile() would be fastest, as it happens entirely within the kernel. If you can’t use sendfile() (old Linux kernel, non-Linux Unix system), then I think calling mmap() on the file being copied, then madvise(MADV_SEQUENTIAL) followed by write() would be a good thing.

                                                  1. 1

                                                    no need to write(), just mmap() the output file too and even use MADV_DONTNEED.

                                                    1. 0

                                                      Not Linux-specific

                                                    1. 3

                                                      Small and tracktable project like you just listed seems a perfect way to do it (if you have free time). I would advise using a modern language like go and rust, since the ecosystem and tooling is easier to get started.

                                                      Some other learning projects: https://cstack.github.io/db_tutorial/

                                                      1. 0

                                                        This db tutorial looks amazing. I was thinking about using C, but perhaps Rust or Go is the right way forward. I’ve been getting similar feedback. Especially around Go since it’s used in production quite a bit. On the other hand Rust is genuinely a step forward in terms of PL tech.

                                                        1. 1

                                                          What fsaint said. I’ve done a small CI system in Golang and cli tools like ps and a pseudo MIPS assembler in Rust. Great languages. If you want to do PL type stuff, writing a interpreter/compiler in Go are good books.

                                                      1. 10

                                                        If you find this interesting, you absolutely should read The Third Manifesto (PDF Warning).

                                                        It is an attempt to put relational database languages on a firm, type-safe basis while still leveraging the power of the relational algebra. It’s also just a really fun read.

                                                        1. 3

                                                          EdgeDB/EdgeQL conforms to many (but not all) of the D proscriptions.

                                                          1. 6

                                                            1st1 says in their profile that they’re an EdgeDB employee. Are you?

                                                            1. 3

                                                              Yes, he’s EdgeDB co-founder and the author of the linked blog post.

                                                              1. 12

                                                                Ok, thanks. Just like to be sure about that when I’m reading articles about products and related tech. This is a good one. Welcome to Lobsters to the both of you. :)

                                                                1. 4

                                                                  Thank you! :)

                                                            2. 3

                                                              Python is an unusual choice for a database. Do you not expect latency/performance to be an issue?

                                                              1. 4

                                                                We’ve managed to squeeze a lot of performance out of Python and there are ways to do way more. Python has been amazing at allowing us to iterate quickly. That said, the current plan is to slowly start rewriting critical parts in Rust.

                                                                1. 2

                                                                  :D

                                                                  Any plans for a Rust language interface? I have a medium sized database-oriented project in Rust and it would be cool to try rewriting part of it for EdgeDB.

                                                                  1. 2

                                                                    Not yet. Maybe you want to help us implement it? :)

                                                                    1. 3

                                                                      I would love to, in my Copious Free Time. …of which I have none. It’s very tempting though.

                                                                  2. 1

                                                                    Sounds good, thanks for the insight :)

                                                                  3. 2

                                                                    We’ve posted some initial benchmarks here: https://edgedb.com/blog/edgedb-1-0-alpha-1/

                                                                  4. 2

                                                                    Could you elaborate?

                                                                1. 4

                                                                  This is why I recommend everyone use an ad blocker. Not privacy, not a categorical opposition to advertising, but because the ad networks have terrible quality control. They let nonsense like JavaScript injection, coin miners, and simple fraud through all the time.

                                                                  Television advertisements have legal and network-wide controls in place to ensure that the ads don’t lie. This isn’t from a place of the goodness of their hearts, but rather because if people don’t trust ads to live up to even a minimal amount of truth (“puffery” is okay, but claiming to relieve headaches without clinical trials, or simply eating your money and riding off into the sunset, are not), then they won’t actually buy anything.

                                                                  Internet ads, on the other hand, are as likely to deliver fake antivirus products and phishing attacks as non-advertised sites are, and they get mixed in with sites that are otherwise good. I’m never going to intentionally click a banner ad, because I don’t trust them, and they’re not as relevant as the organic results are. So why would you want to see them?

                                                                  1. 4

                                                                    It’s also an amazing distribution channel for malware. You can programmatically select by multiples criterions like geography, browser and platform type.

                                                                    Say for example you target only devices which you’ve seen the weekend at mar-a-lago and the week at Washington and selectively distribute your Javascript 0 day there. All you need is an Apple IFA and hope that someone plays any game with a viewport embedded.

                                                                    It enables any actor to run Javascript on virtually any browser.

                                                                  1. 2

                                                                    There’s a poor’s man technique to avoid auditing. The third-party ad-tracker (which google don’t control) is configured to have geolocation dynamic dispatching. If the request comes from California/New York (or from a google network block), the tracker forwards to the legitimate site. Otherwise you get to the bad actor site.

                                                                    1. 16

                                                                      Check list to use mmap on files:

                                                                      • No portability (Linux/BSD only). On the surface it’s portable, but the failures aren’t.
                                                                      • Local files on friendly file systems, reading the (fs) source is the only way to know the failure modes, mmap manpage is useless.
                                                                      • File size known, pre-allocated and static under the duration of the mmap. Size can be increased, but beware of mremap (see later point).
                                                                      • Known and controlled bounds on number mmap invocations. mmap is a limited resources like files, see sysctl vm.max_map_count.
                                                                      • Lifetime management is of utmost priority. Calls to munmap are carefully tracked. Make sure that every pointers to the un-mapped regions are reclaimed/invalidated. This is probably the most difficult part. If your pointers are not wrapped with a handle on the mmap resource, you likely have a use-after-free bug laying somewhere.
                                                                      • Always assume mremap will give you a new address, or fail under MREMAP_FIXED. This is especially relevant to previous point.
                                                                      • Using a signal handler to recover from mmap failures is just pure madness and a daunting task to make re-entrant. Rule of thumb is that the only thing you can do in a non-crashing signal handler is to flip an atomic.

                                                                      If you meet all the previous point, go ahead and use mmap. Otherwise trust the kernel and use pread/pwrite. mmap is amazing, but it’s a double-edge-nuclear-foot-gun.

                                                                      1. 3

                                                                        mmap() is POSIX so any Unix system should support it (for instance, Solaris does). I agree, but would also add:

                                                                        • Make sure the file DOES NOT CHANGE from outside your program. So don’t go overwriting the file with a script while you have a program that has mmap()ed the file. Bad things happen.
                                                                        • It can be used to map memory without a file, using MAP_ANONYMOUS. I would wager this is probably the most often used case for mmap().
                                                                        • Once you mmap() a file, you can close the file descriptor. At least, this works on Linux. Mac OS-X and Solaris in my experience (of mmap()ing a read-only file)
                                                                        1. 5

                                                                          Note that map anonymous is not in posix. Actually the spec is quite lengthy and if you read it carefully, there are a lot of caveats and wiggle room left to implementations to only provide a minimal version.

                                                                          https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

                                                                          1. 2

                                                                            That’s surprising (but it shouldn’t be—I mean, the mem*() functions from Standard C were only marked async safe a few years ago in POSIX). On the plus side, it appears that Linux, BSD and Solaris all support MAP_ANONYMOUS.

                                                                          2. 5

                                                                            I meant that the way things go wrong with mmap are not portable. If you’re building a library or a long running daemon, this is critical. Other points I forgot in my list

                                                                            • mmap/munmap is a costly syscall. A corollary of this and the third point in the original list is that you should not mmap many small files. Few large block is the optimal case.
                                                                            • you can only sync on page boundaries, thus if you write, it needs to be on aligned blocks of 4k, otherwise you’ll trigger some serious write amplification on SSDs.
                                                                            • msync is tricky, MS_ASYNC is a noop on Linux and doesn’t report failures, thus you might never know if a write really succeeded and you also have to know about vm.dirty* sysctls, they’re directly correlated with async writeback.
                                                                            • huge pages don’t work with file backed mmap (and doesn’t make sense, there’s only a single dirty bit for the whole page!)
                                                                            • if you have many virtual memory mmaping, the chance of a TLB miss increase and your performance can decrease depending on your access patterns.

                                                                            I’m starting to realize that the mmap man page should be written with all said pitfalls :)

                                                                            1. 2

                                                                              At work, we have a near perfect case use of mmap()—we map a large file shared read-only (several instances of the program share the same data) on the local filesystem (not over NFS) that contains data to be searched through, and it’s updated [1] periodically. That’s the only case I’ve seen mmap() used [3].

                                                                              [1] By deleting the underlying file [2], then moving in the new version. Our program will then pick up the change (I don’t think the system supports file notification events so it periodically polls the timestamp of the file—this is fine for our use case), mmap() the updated filed, and when that succeeds, munmap() the old one.

                                                                              [2] Unix file semantics to the rescue! Since there’s still a reference to the actual data, the file is still around, but the new copy won’t overwrite the old copy.

                                                                              [3] Although it’s possible the underlying runtime system uses it for memory allocation.

                                                                              1. 1

                                                                                When working on IO devices where the bandwidth is equal or greater than the memory bandwidth (on my desktop, I’m capped at 10-12G/sec on a single core, or 48-50G/sec for all cores), you’re eliding one copy of the data. Effectively reaching the upper bound (instead of half).

                                                                        1. 3

                                                                          Thanks for sharing your learnings! This couldn’t have come at a better time, since I’m considering learning Rust by implementing a TUI as well. Don’t ask me for what, I’m embarassed to say, but maybe I’ll share it as well once it reaches a satisfactory state.

                                                                          1. 4

                                                                            This morning it crossed my mind that I would like a gmail TUI client.

                                                                            1. 4

                                                                              Lynx used to work before they killed noscript support. :(

                                                                              But some other terminal browser may work. browsh comes to mind, although I personally think it’s pretty heavyweight.

                                                                              1. 1

                                                                                It appears you can still browse gmail without js. I have to use gmail at work, and after disabling js, I can load a 2005-era gmail webpage.

                                                                                “You are currently viewing [Company] Mail in basic HTML. Switch to standard view | Set basic HTML as default view”

                                                                                1. 3

                                                                                  Per blog post, they disabled javascript free logins. So you’d have to login with one browser and somehow copy the cookies into lynx and hope it works.

                                                                                  https://security.googleblog.com/2018/10/announcing-some-security-treats-to.html

                                                                                  1. 1

                                                                                    Hmm…It’s still been working for me. I’ve been using the HTML GMail interface through www/links+ (links2 to the linux folk) for a long time. And I can log in with 2FA just fine.

                                                                              2. 2

                                                                                Oh I fully agree. I can’t be bothered to configure mutt and the other 5 tools you need. Also likely I’d be happy with just the basics but in TUI form.

                                                                                (My project is not about that)

                                                                                1. 1

                                                                                  Well, depending on your editor preference, there’s a vim plugin, or you can use emacs’ included email client.

                                                                                  Alternately, there was a project several years back called sup that aimed to put a gmail-like interface in the terminal, for any mail server. Sup is written in ruby.

                                                                                  1. 5

                                                                                    I’m writing a TUI MUA in rust, I haven’t released it yet because it seems my free time is not that much. I have some earlier screenshots in its mockup site: https://meli.delivery

                                                                                    1. 2

                                                                                      That does look really promising!

                                                                                    2. 1

                                                                                      sup is indeed really good. I can’t remember why exactly I stopped using it.

                                                                                      EDIT: now I remember: sup is no longer supported (for instance https://github.com/sup-heliotrope/sup/issues/546). There are instructions around to get it to work on an older ruby version (https://wiki.archlinux.org/index.php/sup#Configuration) but sup-config still crashes. Plus it needs offlineimap + mailfilter + msmtpd … that’s a bit too much for my taste

                                                                                      1. 1

                                                                                        it’s a turnoff, I want to edit a config with credentials and be done with it.

                                                                                        1. 1

                                                                                          Absolutely. If you find that or write that, let me know, I want to try it out.

                                                                                      2. 1

                                                                                        interesting.