1. 52
  1. 22

    No surprise Rust has fallen flat here. Rust isn’t a language you can pick up in an hour or two. Also when your program is small enough to fit in your head, you don’t need any language-provided guarantees.

    1. 23

      Even for a simple program like this, I’d still probably reach for Rust, just because it makes it so easy to pull in dependencies: structopt for parsing command-line arguments, walkdir for filesystem traversal, and regex for pattern matching — not to mention the standard library for its awesome iterator adaptors.

      So, out of curiosity, I gave it a shot! It took me <30min, which included checking Serge’s source code to make sure I was matching their functionality.

      What struck me about Serge’s implementation is that it doesn’t rely on any dependencies, or even Cargo. That’s definitely diving into Rust in hard-mode, and I’m very curious about what challenges they encountered. The Rust implementation of glob is very similar to the Zig implementation, so I’d guess most of the time was spent on walk. In contrast, I spent virtually no time at all writing that function, since I simply pulled in walkdir as a dependency.

      (To be clear: I agree completely that Rust isn’t a language you can pick up in an hour to two!)

      1. 12

        And walkdir won’t use a call stack whose size is proportional to the input.

        1. 2

          True. But in fairness to a recursive implementation, the stack depth is proportional to the depth of the input directory tree, which is almost certainly <100. Not always, I’m sure there’s some fringe case, but the max depth on my entire Mac filesystem is 27.

          1. 4

            I wonder how many of these impls fail when the filesystem isn’t necessarily real, or when there’s a circular symlink

            1. 3

              Or a circular filesystem. I don’t recall, can you create loops with mount --bind?

            2. 1

              I recall in the past using OSes with 1) maximum path length 256 characters, and 2) maximum of 8 directories in a path.

          2. 8

            One of the things I that always keep me away from Rust are things like Cargo and the culture of external dependencies. Compared to other modern languages like Go, there are far more restrictions on how a project should look like. Since working with Java and Android, there has always been something inherently suspicious to me about those kinds of languages.

            1. 12

              I hear this a lot, but what strikes me is why this is an issue when there is nothing inherently about Rust that keeps one from vendoring dependencies (which even has dedicated tooling), or copy/pasting inline, or just writing everything from scratch. Sure, it’s not super common (although some do it), and probably not the best option; but it’s still an option. If one doesn’t want any external dependencies, don’t use them.

              1. 7

                One of the things I that always keep me away from Rust are things like Cargo and the culture of external dependencies.

                Would you call this the Cargo cult-ure? /s

                1. 1

                  Dang, beat me to it ;)

          3. 13

            For what it’s worth, we do understand that the lack of stdlib documentation is a problem. Before I expand on that, here’s the current “workarounds” to be immediately productive:

            https://ziglearn.org is a structured introduction to Zig that covers an introduction to Zig that is very friendly to newcomers. Chapter 2 in particular helps solving common problems and showcases some parts of the standard library while doing so.

            On the Zig SHOWTIME YouTube channel there are a few videos that can help. One video in particular is about how to approach the Zig standard library today, while we wait for better docs.

            Asking for help in a Zig community is so easy it’s like cheating, and it’s also a door to knowing when new resources pop up.

            So, what’s up with the stdlib docs? We have an experimental build and, as the name implies, they’re incomplete and not considered good. Why are we not working on them? Because the docs are a JS interface to a bunch of metadata generated by the compiler during the build phase. We are currently transitioning to a self-hosted implementation of the compiler and, until it’s complete enough, we are stuck with the incomplete and buggy metadata that the current C++ compiler generates. I personally plan to get involved in the docs effort once the situation gets unblocked.

            I wish Zig had a better documentation to gain popularity before it becomes too niche and obscured.

            Oh, don’t worry about that :)

            1. 4

              The author gives an example of one of my discomforts with Zig:

              The lack of string handling routines in the stdlib was unexpected, to concatenate strings one has to do everything manually - allocate the buffer, put strings there. Or use formatter and an allocator to print both strings side by side and free the buffer afterwards. It’s still very different from s1+s2.

              Overall, the core language is simple and I enjoyed it, but the stdlib is even more limited than libc. I hope that this is just a sign of an early age of the language.

              Zig is a fantastic language, but it lacks higher level APIs that make things convenient. Some examples being string handling, and no option parser in the stdlib (I was pleased to learn recently that Crystal includes a capable one). Recently I wanted to convert an integer to a string and that took some figuring out… there’s a 5-argument function for this in the stdlib, or you can bufPrint.

              The nice thing about a language like Zig is there’s “no limit”, you can go as fast as you want. But it’ll always be helpful to be able to start with higher-level code and optimize where necessary. I’d love to use Zig for all the benefits it has, but for many things it feels like “hard mode” due to lack of higher-level APIs.

              Any thoughts in this area?

              1. 10

                Any thoughts in this area?

                It’s a problem of managing expectations. We are currently working on the compiler, then it will be the turn of the official package manager and then, eventually, a polish pass on the stdlib based on a clear vision of what should or should not be there and for what reason.

                I’d love to use Zig for all the benefits it has, but for many things it feels like “hard mode” due to lack of higher-level APIs.

                Managing resources efficiently is inherently a bit “hard mode”, but I get what you mean. Going back to the managing expectations idea, you will have to wait a bit for the community to build an ecosystem of “multilevel” APIs. For what it’s worth, that’s the design philosophy behind my Redis client. I don’t know how much the stdlib will be willing to play ball, but if you look at the code in question, the OP had access to std.fs.walkPath which doesn’t seem barebones at all to me.

                no option parser in the stdlib

                True, but check out zig-args and zig-clap, both are very nice to use.

              2. 1

                Asking for help in a Zig community is so easy it’s like cheating, and it’s also a door to knowing when new resources pop up.

                Yet for better or for worse, the developer who needs to get something done in tens of minutes is often a rather antisocial developer.

                Not a counterargument of course—I’m in awe of your community-building and think that’s an absolute positive—just trying to indicate how people trying to form quick impressions will often overlook gems like a supportive community.

              3. 12

                Heck, if people can bring up Ada (which until I joined lobste.rs I thought was a historical footnote), I can bring up Nim :) It even compiles straight to C. Strings are pretty ergonomic, cleanup is automatic thanks to ref-counted GC, the language has an excellent tutorial, good reference docs, mediocre stdlib docs (fairly complete but hard to navigate.)

                1. 5

                  For RosettaCode-like edification purposes/maybe give more detailed color on @snej’s comment, this is what it looks like in Nim. With the Nim tcc backend, it compiles in 475 milliseconds for me (from scratch). “UX Benchmark”-wise, it took me about 6 minutes to just port from his C++, mostly deleting chatter/noise to get this (and another 90 seconds more to fix up his glob_test).

                  import os
                  
                  proc glob*(pattern, text: string): bool =
                    var p, t, np, nt: int
                    while p < pattern.len or t < text.len:
                      if p < pattern.len:
                        case pattern[p]
                        of '*':
                          np = p
                          nt = t + 1
                          p.inc
                          continue
                        of '?':
                          if nt < text.len:
                            p.inc
                            t.inc
                            continue
                        else:
                          if t < text.len and text[t] == pattern[p]:
                            p.inc
                            t.inc
                            continue
                      if nt > 0 and nt <= text.len:
                        p = np
                        t = nt
                        continue
                      return false
                    return true
                  
                  proc walk*(pattern: string, dir=".") =
                    for path in walkDirRec(dir, relative=true):
                      var file = open(path)
                      var lineNo = 0
                      for line in file.lines:
                        lineNo.inc
                        if glob(pattern, line):
                          echo path, ":", lineNo, "\t", line
                      file.close
                  
                  when isMainModule:
                    proc main =
                      if paramCount() != 1:
                        echo "USAGE: ", paramStr(0), " <pattern>"
                        quit 1
                      walk paramStr(1)
                    main()
                  

                  As is, it runs as fast as the C++ (with both compiled with optimizations turned on). It could be optimized in a few obvious ways, of course, but run-time speed was also explicitly not the point of this “benchmark”.

                  Note that this “benchmark” is probably even more dependent upon developer-language familiarity than the usual fare (and even dependent upon text editor search-replace-delete-fu/typing speed).

                  1. 2

                    Cool, thanks!

                    As for tinycc — the speed sounds great, but is its optimizer competitive with GCC or Clang? I could see using it in debug/development builds..l

                    1. 2

                      TinyCC barely has an optimizer..so, absolutely not competitive. And yes, the idea is to use it for debug/rapid dev cycles, not release builds (as I think I alluded to in my first reply to @akavel). That said, unoptimized code tends to be “only” 2.5-10x slower than optimized, though YMMV (a lot). It’s effective for me for rapid edit-compile-test on small data/whatever/edit again cycles, but, as always, all are encouraged to do their own experimentation. :-)

                    2. 1

                      How do I setup Nim to work with tcc? Also, do you maybe know if this would work on Windows? Including the multithreaded features?

                      1. 2

                        Well, I just install the mob branch of tcc and then you can say nim c --cc:tcc foo.nim. For extra credit you can edit $HOME/.config/nim/nim.cfg to default to tcc & switch back to gcc/etc. if you define r so you can say nim c foo.nim for a rapid devel cycle and nim c -d:r foo.nim for an optimized output. You can also just say nim r foo.nim to just run it right away, of course.

                        I’ve used tcc on Windows, but it was like 7 years ago and not with Nim. There are a few adjustments like --tlsEmulation:on for multi-threaded which seems to work ok on Linux. Both Windows & threaded & tcc…you may be off the map. :)

                        There is rapid progress/hard work being done on incremental compilation going on with Nim that should make having a lightning fast C compiler less important (famous last words…).

                        1. 2

                          Thanks for the reply! For my hobby projects I seem to be having fast enough compile times that I usually don’t think about them. The tcc idea interested me more from a minimalism point of view - GCC is not exactly tiny, and I assume tcc is much smaller. But if you say there’s some extra adjustments that need to be discovered, I’ll probably pass for the time being; I’m having hard enough time with multithreading in the default Nim setup, that I couldn’t stomach extra challenge this time. But I’ll keep the idea in mind, maybe one day, definitely sounds alluring, thanks!

                          edit: oh, the mob branch idea is crazy fun! I’m going to submit it as a separate post :)

                  2. 13

                    c’on, where is real Better C?

                    1. 9

                      I was surprised it wasn’t included, given the title, but given that Go is in there, you could even just straight up use regular D.

                    2. 6

                      I feel it’s always unfair to compare C, Rust and Zig with Go simply because of the GC. Of course having no-GC can be useful.

                      Just try writing a simple recursive quicksort in all 4 languages, no surprise it’s simpler to write in Go (Like it would be between C and Python!). Not because of the standard library, or commas at the end of each line… Just because you don’t care about allocations. Because you don’t need to “care” about allocations, ownership, … you just free your mind from 1 big thing, but also the code. You don’t need to write types, box, borrows, … it obviously comes at a cost, but that’s another topic.

                      1. 6

                        I’m my experience Go code definitely can experience heap bloat due to GC. I’m more fond of automatic ref-counting as found in Obj-C, Swift and Nim. In my experience it gets you 90% of the benefit of full GC, at the cost of a few annoyances about managing cycles. Nim even takes care of those with its cycle-detector.

                      2. 6

                        I find it slightly amusing that you cannot implement this program in pure ANSI C because it has no concept of folders or recursing them, so you need a platform library like POSIX or WinAPI.

                        1. 6

                          That’s either really bad or really good, depending on how you look at it…

                          On the one hand, it means you have to bring your own abstraction, which sucks.

                          On the other hand, it means you don’t have to bolt new platforms on top of existing abstractions, which also sucks. For example, Common Lisp had an extraordinarily powerful and flexible path system that consistently blew anything from the ’90s and ‘00s out of the water – which, of course, also meant that there was a great deal of impedance mismatch between that and whatever filesystem abstraction the underlying operating system had. Thinking in terms of the CL abstraction layer was great, but ultimately difficult, because application users thought in terms of their platforms, not in terms of whatever the standard committee had in mind back in the eighties. Also, an embarrassing amount of CL code ended up calling native file manipulation functions via a FFI because reliably mapping each platform’s abstractions to CL’s was a somewhat unpleasant exercise.

                          I suspect C’s longevity is partly due to the fact that it’s small enough that it did not pose significant obstacles to getting it to run on platforms developed long after the PDP-11, with (or without) all sorts of peculiar extensions. I’m not saying it’s a good thing, it’s just a thing.

                        2. 5

                          People who read the resulting C++ code actually have read C or C++ in the past, at least as part of their university classes. Many complained about the use of ::, so I should properly use the namespaces, I guess.

                          using namespaces is heavily frowned upon, usually if you want to get rid of scope resolution operators, usually you do using mynamespace::type;, or alias the namespace to something shorter namespace fs = std::filesystem in the namespace of the functions you’re working in.

                          You should try Ada 20212 as well. I finally downloaded GNAT last night and tried it. I was blown away at how beginner friendly the IDE is at the new docs. I went from “Ada looks super weird” to “Huh, this looks pretty nice, I’m going to look into doing more with this.”

                          1. 9

                            I couldn’t agree more regarding Ada 2012 (and SPARK, a high-reliability subset; see this). I am being consistently blown away reading Barnes’ “Programming in Ada 2012” (still on it), because the language has an ownership-model for pointers (which are called “access types”) and has built-in concurrency. Also the object orientation is very clean and not overboarding with Ada’s concept of class-wide types and the fact that there is a strict distinction between a type and a class, where a class is also merely a set of types. And I say that as someone who hates OOP in C++ and Java.

                            You know, I’m a C guy, and I love simplicity, but the more I work with Ada I realize that it solves so many problems of software engineering that the complexity of implementing an Ada compiler is justified. This is especially true given that we as a developer-community have chosen compiler monoculture anyway and Ada is an ISO standard. You just know, learning Ada, that it has been designed for this purpose.

                            In a direct comparison, Ada is even stricter than Rust, but much more readable and much more tailored towards large codebases with many people working on them. I love tinkering with C, but even though I’ve been working with it for a decade, I just make mistakes due to my human nature. And contrary to Rust, the Ada consortium is not scared of adding things to the specification, which yields a much more refined and streamlined product in the end.

                            What really sells Ada for me is their focus on types (i.e. data structures), because I am a firm believer that good data structures are the reason a program is good, and Ada forces you to really think about your data structures.

                            Ada may not be as “cool” as Rust, but when you look at it, Ada has been calmly in use and development for 40 years only to be discovered by those fed up with the heap of toy languages that have been coming and going over the years.

                            1. 4

                              Ada has a bit of an image problem from Unix/C programmers in the 80s shitting on it for being “bloated” or being exclusively for defense contractors.

                              1. 2

                                That sums it up very succinctly, thank you! It is true that Ada is more complex than C or comparable languages, but if you consider what it can do and what it solves, it relativizes it. I see it this way: If you compare the compilers alone in terms of simplicity, Ada loses. However, if you take static analyzers, linters, etc. into account that we use to fix problems introduced by non-strict languages, we might reach a balance or even outweigh the weight of the Ada tooling.

                                I never really understood why the notion that you should not use Ada because it was created by the DoD was given so much weight. Shall we also not use the Internet because it was invented by DARPA, a DoD agency? Many good things came from military research that would’ve otherwise probably not seen the light of day. One good example is the intramedullary rod technique invented and applied by Gerhard Küntscher during WWII, which was previously heavily rejected by the academic community because they assumed bone marrow should remain untouched to allow best bone healing.

                                1. 2

                                  I never really understood why the notion that you should not use Ada because it was created by the DoD was given so much weight. Shall we also not use the Internet because it was invented by DARPA, a DoD agency? Many good things came from military research that would’ve otherwise probably not seen the light of day. One good example is the intramedullary rod technique invented and applied by Gerhard Küntscher during WWII, which was previously heavily rejected by the academic community because they assumed bone marrow should remain untouched to allow best bone healing.

                                  I think it’s less DoD, and more seen as DoD grift for consultants to write something incomprehensible/useless to “real programmers”.

                          2. 8

                            But of course, C is here to stay despite its age. There are still too many areas where C is the only real choice. And I’m glad that C exists.

                            As long as people keep ignoring Ada, it’s indeed here to stay. But we can change it.

                            1. 4

                              I suspect (and this is totally speculation) that Ada is ignored primarily for two reasons:

                              1. The best compiler for it is non-free/non-open source.
                              2. The place where Ada would be best used is embedded, and most embedded manufacturers provide C compilers, not Ada compilers.

                              Again, totally speculation; I have no data to back that up.

                              1. 2

                                Back when I was first exposed to Ada (early 90s), it was seen as a “bondage-and-discipline” type language, with a lot of broiler plate code (oh, also a verbose language) required to even start (the joke was, “if the code got past the compiler, it would run without crashing”). And that view stuck. But C++ and Java slowly, over the years, slowly accumulated much of what Ada had, only it wasn’t “up front” so to speak.

                                Also, given the number of developers today who have issues with law enforcement using their code, using a language from the DoD is probably out of the question.

                                1. 2

                                  it was seen as a “bondage-and-discipline” type language

                                  I really hate that term. (1) It’s a way to turn your brain off without seriously evaluating the language. (2) It assumes that flexibility is the most important feature of a language. (Are you flexible enough to shoot yourself in the foot? Great!)

                                  1. 1

                                    Are there other types of, ahem, “bondage” type languages?

                                    1. 1

                                      OCaml has been called one, as has Eiffel, and apparently Prolog. Rust definitely qualifies. Cobol and Pascal have also been mentioned. The original definition from the Jargon File implies it’s a designation assigned to anything that is very militant about having a particular paradigm.

                              2. 4

                                Having &str, Str and [u8] in obviously necessary, but surprises a newcomer.

                                Aye. Rust has &str, String, [u8] (and &[u8]), OsStr, OsString, CStr, CString, 7 different types of string. It’s not nearly as bad as Win32 development, where there are literally dozens of string types, but it still confused the heck outta me when I first tried Rust.

                                1. 3

                                  I found the variety of Rust string types fairly straightforward. They’re for different things. Easy.

                                  However, &str itself did confuse me quite a bit. I kept thinking if you always use &str, what is plain str anyway? It took some significant pondering to wrap my head around whole idea of a magical type of undefined size—essentially [u8; ?] where the compiler knows the length ? for static strings. And I’m not even sure that’s correct. As my mental model it has held up so far, but I still don’t really know.

                                  1. 4

                                    Yeah, str is ‘unsized’. The Rust Language Cheat Sheet might help provide a visual idea of how these are laid out:

                                    1. 1

                                      Wow this is a great resource, thank you!

                                    2. 2

                                      I wrote a little about this a while back on reddit: https://old.reddit.com/r/rust/comments/gnd4bd/things_i_hate_about_rust/fr9179w/

                                      Apologies if you aren’t quite the target audience. It’s likely I said a lot of things that you already know. :-)

                                      1. 1

                                        This is great! It confirms my mental model about str and &str.

                                        Although I don’t understand this part:

                                        AsRef::as_ref is &T, so calling as_ref on something that implements AsRef<str> gives you a &str

                                        How does as_ref get ptr+len from a pointer to str? The source for impl AsRef<str> for str isn’t exactly illuminating. I think this is the same question as my question here.

                                        My best guess is Box<T> is a special compiler-implemented type that has an in-memory representation of &T, and thus actually does include ptr+len for Box<str>. This seems to indicate so:

                                        println!("{}", std::mem::size_of::<Box<u8>>());
                                        println!("{}", std::mem::size_of::<Box<str>>());
                                        // 8
                                        // 16
                                        
                                        1. 2

                                          How does as_ref get ptr+len from a pointer to str? The source for impl AsRef for str isn’t exactly illuminating. I think this is the same question as my question here.

                                          Hah. Yeah, in the source for the AsRef<str> for str impl, self has type &str. That’s kind of the trick that dynamically sized types enable. That is, that you can impl traits on a T that does not have a sized known at compile time. In cases like that, you wouldn’t be able to write fn foo(self) since self wouldn’t have a known size. But in fn foo(&self), since self has type &T, it is guaranteed to have a known size.

                                          My best guess is Box is a special compiler-implemented type that has an in-memory representation of &T

                                          I think there are special things about Box, but this actually isn’t one of them. The same is true for Arc and Rc for example. Both of those types can be implemented in pure library code. I don’t believe there are any special things in the language that make it work. The key is that Rc (and similar types) are defined with a T: ?Sized bound, which means T doesn’t have to be sized.

                                          Following the breadcrumbs here a bit might help. An Rc’s internal implementation is actually an RcBox. Its definition is this:

                                          struct RcBox<T: ?Sized> {
                                              strong: Cell<usize>,
                                              weak: Cell<usize>,
                                              value: T,
                                          }
                                          

                                          That’s something anyone can define: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8ad24efeaaa015958abaf338f5100663

                                          The key here is that T does not have a known size at compile time. This actually imposes restrictions on the definition of the struct itself. For example, if you changed its definition to

                                          struct RcBox<T: ?Sized> {
                                              strong: Cell<usize>,
                                              value: T,
                                              weak: Cell<usize>,
                                          }
                                          

                                          then it would not compile because the compiler doesn’t know the size of T and thus the offset of weak is unknown. AIUI, this is all pretty similar to C’s “flexible array member” feature, and it to has to be the last field in a struct.

                                          Stepping up a bit, it might be instructive to follow the trait for how a String gets converted to a &str:

                                          • You might start at String::as_str, but it just returns self, where the type of self is a &String. The trick here is knowing that String impls Deref, so a &String automatically coerces to a &str. But… how is Deref itself implemented?
                                          • The Deref impl just calls std::str::from_utf8_unchecked with &self.vec, where the type of &self.vec is &Vec<u8>. But the type of the parameter for from_utf8_unchecked is &[u8]. So it looks like we need to go find the Deref impl for &Vec<u8>.
                                          • Briefly, we note that the impl of from_utf8_unchecked is a transmute from &[u8] to &str. This is safe because they have the exact same layout in memory. Thus, if we learn how to get a &[u8], then we will have solved this riddle.
                                          • The Deref impl for Vec defers to slice::from_raw_parts. Crucially, this is the point at which the size information is made explicit. Namely, from_raw_parts is called with two arguments: the pointer to the underlying memory and the length of that memory.
                                          • The slice::from_raw_parts impl defers to ptr::slice_from_raw_parts.
                                          • And finally, the impl of ptr::slice_from_raw_parts shows the explicit representation of a &[u8] in memory:
                                          pub const fn slice_from_raw_parts<T>(data: *const T, len: usize) -> *const [T] {
                                              // SAFETY: Accessing the value from the `Repr` union is safe since *const [T]
                                              // and FatPtr have the same memory layouts. Only std can make this
                                              // guarantee.
                                              unsafe { Repr { raw: FatPtr { data, len } }.rust }
                                          }
                                          

                                          And how is Repr defined?

                                          #[repr(C)]
                                          pub(crate) union Repr<T> {
                                              pub(crate) rust: *const [T],
                                              rust_mut: *mut [T],
                                              pub(crate) raw: FatPtr<T>,
                                          }
                                          
                                          #[repr(C)]
                                          pub(crate) struct FatPtr<T> {
                                              data: *const T,
                                              pub(crate) len: usize,
                                          }
                                          

                                          The main compiler magic here I think is just that the representation of &[u8] is defined to be Repr as above. And from there, you can type pun to any one of the equivalent representations.

                                          The other piece of compiler magic here is dynamically sized types as well. The same thing that makes Box<str> work is the same thing that makes Rc<str> or Arc<str> or even your own type. So it’s not specific to Box<str>. And yup, Rc<str> also has a size of two words just like Box<str>.

                                      2. 2

                                        Plain str is useful as there’s also things like Box<str>, Rc<str> and Arc<str>(which are heap-allocated, but non-growable strings, but the first is uniquely owned and the second is immutably shared).

                                        1. 1

                                          I actually didn’t know that, as I’ve always just used String. Neat!

                                          In that case, how does this example work?

                                          let boxed: Box<str> = Box::from("hello");
                                          println!("{}", boxed);
                                          

                                          Shouldn’t the length of the str “hello” in boxed be unknown?

                                          1. 2

                                            In this case, the type of "hello" is actually &'static str. It’s a constant, so the compiler can construct it as a string literal with its size. Calling Box::from("hello") will actually copy the data into a fresh allocation created by the Box. Indeed, it will do this for all &str values because its from impl is only defined for &str.

                                            A more interesting case is getting from a String to a Box<str> without allocating or copying. In that case, the underlying allocation is simply reused.

                                        2. 1

                                          Yeah, I’d say that they’re fundamentally different objects. In particular, I wouldn’t say that [u8] is a string, it’s a contiguous region of bytes. Calling that a string is like calling [u8; 8] a double.

                                          1. 1

                                            Right, but str doesn’t have a + operator like a double. It has a [] slice operator like an array. And it slices bytewise, not by character. While str and [u8] are conceptually and semantically different—str must be UTF-8—they’re still alike in more ways than [u8; 8] and double.

                                            1. 1

                                              str does not have a slice operator, because indexing into UTF-8 is very ambiguous.

                                              You can go from str to [u8] through, but not the other way around.

                                              1. 2

                                                str does not have a slice operator

                                                That might be confusing for folks I think. &s[start..end] is a “slicing operation” in my head at least. I guess I would say, “str cannot be indexed by a single offset, but a substring can be extracted by slicing at offsets at valid UTF-8 boundaries.” A bit longer winded I guess.

                                                As to this:

                                                While str and [u8] are conceptually and semantically different—str must be UTF-8—they’re still alike in more ways than [u8; 8] and double.

                                                I’d say… probably that’s true. But it’s kind of hand wavy and it depends on how you look at it.

                                                1. 1

                                                  For sure hand wavy. I guess what meant was:

                                                  let d = 1.0;
                                                  let a = [u8; 8];
                                                  let s = "string";
                                                  

                                                  The operations you can perform on a and s are more similar than the operations you can perform on a and d.

                                                  str does not have a slice operator

                                                  @skade is this not slicing? impl<I> Index<I> for str, impl SliceIndex<str> for Range<usize>.

                                                  because indexing into UTF-8 is very ambiguous

                                                  Slicing—or whatever it is—into UTF-8 is possible, but doing so incorrectly will panic:

                                                  fn main() {
                                                      // utf-8 char at s[1..3]
                                                      let s = "héllo";
                                                      
                                                      // "hé"
                                                      println!("slicing utf-8 at valid boundary: {}", &s[..3]);
                                                      
                                                      // panics: byte index 2 is not a char boundary
                                                      println!("slicing utf-8 at invalid boundary: {}", &s[..2])
                                                  }
                                                  

                                                  Rust playground link for above code.

                                                  1. 3

                                                    I think what @skade meamt is that you can’t do s[i]. Wording looks confusing to me though.

                                      3. 3

                                        So I’d just throw walkdir and ignore at it and be done.

                                        Yeah it’s more a puzzle but I’ll tell a counter argument: Let’s say I’ve got a fairly simple service, it has exactly 3 API requests. It talks to a 3rd service, let’s call it SNAFU because its the reason we have to create this MitM service, and a DB. Its job is like this: If I call FOO then go to SNAFU with the provided bearer, check for the returned user, call SNAFU again with the actual action (FOO), store the data in the DB. BAR does the same, except removing the data from the DB. And FOOBAR lists all stored data in the DB. And we want to crawl SNAFU regularly to verify that our assumptions about the state of data in SNAFU vs our DB is correct, because people could just use the provided Web-UI which bypasses the API.

                                        Just setup a basic python web service and be done, would be the fast answer according to this post. Well I’ll tell you why not: This service has to run in tandem with SNAFU, it’s more or less an API extension for a missing index of SNAFU. It has to run 24/7, it has to cope with SNAFU changing its API or failing. It’ll handle the same authentication as SNAFU. There are so many places where it could fail and I don’t want to find out about all the “null” and “exception” places during runtime. Or places where python will just accept any type, because it’s JSON - so no one cares. Possibly while crawling, so it’ll more or less start to fail silently. So I’ll make this a rust service and handle all of these cases right upfront.

                                        1. 2

                                          Corresponding Rosetta page. Zig is still missing there.

                                          1. 2

                                            On C++: “I enjoyed the docs, with lots of examples and good readability.” Please, what C++ docs has the author been reading, and where do I find them? https://en.cppreference.com/ is great but it assumes you already know how most things work, which for me is quite problematic.

                                            1. 2

                                              i feel like if the benchmark is for a “better c” the task should be to write your code in a library that can be linked in and called from a C main program. that would rule out go, but the others should still be fine. (also D, ada and probably nim mentioned elsewhere in the thread. not to mention ats.)

                                              1. 1

                                                Perhaps I’m missing something obvious, but is the code mentioned in the article available anywhere? It would be nice to see the same program in the four different languages side by side.

                                                  1. 2

                                                    Yup, it’s available here: https://github.com/zserge/glob-grep (it was linked in the middle of the article, easy to miss for skimmers :))

                                                    1. 1

                                                      Yep, I just skipped right to the first language header. My bad. Thank you.