1. 23

    I don’t love being negative, but this article really rubbed me the wrong way. I’m frustrated because Ably seems like a useful product that I might want to use, but this feels like either a dishonest ad to score points, or technically questionable in a way that undermines their credibility to me.

    I don’t think we can know if google’s choice was efficient, because we don’t have any information about basic questions like “how do people use this feature”? Every number in the article assumes that visitors will stay on the page for five minutes, but they don’t say where that assumption comes from.

    If the duration of page views tends to be shorter (this is on a search results page after all), everything quickly falls apart.

    Also, (and as mentioned in other comments here), we don’t know the resource usage it would take for google to keep persistent connections open. Since the scores are not personalized information, responding to polls could be cached more efficient on google’s side.

    The Websocket calculations are based on a raw Websocket streaming connection, something Ably does not officially support in production.

    The author’s client-side real-time subscription library, weighs in at 169KB minified (ignoring compression which the author argues doesn’t matter ).

    Any company not operating at such scale would be forced to design and implement a more efficient method simply due to bandwidth costs.

    This short, mostly-text article from Ably quibbling over <100KB wasted over a five minute visit, weighs in itself at a 7-second, 5.2MB initial load.

    I just…

    1. 4

      I appreciated the writing for the technical aspects, but it’s also rubbed me off the wrong way.

      I particularly dislike the condescending tone that the other chose to use. Sure the different approaches he proposed can be better than Google’s strategy, but the way the article is written seems very childish to me

      1. 11

        Unrelated, the phrase is “rubbed the wrong way.” “Rubbed me off” means something, er, entirely different.

        1. 1

          thx

      2. 1

        Hey Phil. Thanks for your reply.

        If the duration of page views tends to be shorter (this is on a search results page after all), everything quickly falls apart.

        Sure, it becomes less impactful, but how does it fall apart? The same underlying principle remains. Polling solutions increase latency and overhead, streaming solutions don’t.

        Also, (and as mentioned in other comments here), we don’t know the resource usage it would take for google to keep persistent connections open. Since the scores are not personalized information, responding to polls could be cached more efficient on google’s side.

        Sure, but browsers will keep underlying TCP connections for all HTTP(S) requests open (connection pooling), which means Google still have to termiante these connections for the duration of a visit. I don’t know why those persistent connections would be any more expensive necessarily than an upgraded Websocket connection. Cacheing can occur at the edge with socket connections, it’s what we do at Ably, and we’re not the only ones (PubNub etc).

        The author’s client-side real-time subscription library, weighs in at 169KB minified (ignoring compression which the author argues doesn’t matter ).

        Well in my article I did clearly try and convey that this was not about using Ably, and focussed on raw transports. In the example I provided for SSE, there is no Ably library, and I stated that for the Websockets, we don’t currently support Websockegt connections without an Ably lib, but that is something we are going to release. Currently using Ably without any SDK is possible with XHR Streaming and SSE - see https://www.ably.io/documentation/sse. I appreciate the overhead issues and it’s why we support open protocols. Sorry if that was not clear in the article, that was not the intention.

        This short, mostly-text article from Ably quibbling over <100KB wasted over a five minute visit, weighs in itself at a 7-second, 5.2MB initial load.

        This is certainly not something we’re proud of. We’re continuing to optimize things where we can with the resources we have. Given our size (we’re a small company), we are spending our engineering efforts on our product where we can bring our optimization work on streaming to our customers. Sadly, as a result, our blog (largely Ghost wrapped in our existing site) has plenty of room for improvement. As we grow, I hope the existing optimization tasks in our backlog are prioritized. But we’re not at Google scale, and sadly have to focus on optimization in areas that our customers benefit from. Our blog readers sadly, for now, have to come second.

        I appreciate this may come across as hypocritical, and you’re more than welcome to think that. But I don’t think that changes the analysis of my article, being that Google are saying everyone else has to optimize their sites because they’re trying to make the web better, or you’ll be penalized (https://www.sitecenter.com/insights/141-google-introduces-pe…). And then on the other hand they have over 100k staff and haven’t optimized their own results.

        I hoped what our technical readers take from this article is tips on optimizations they can apply themselves by using streaming transports. Google just happened to be in the firing line because of their scale and ability to do this right. I quote “At Google’s scale, I expected to see the use of common shared primitives such as an efficient streaming pub/sub API, or dogfooding of their own products.” and that is what I was surprised about.

        1. 1

          Just want to chime in in favour of those posts.

          They are actually quite informative!

        1. 2

          Might poke a bit more at bpftrace’s issues this weekend.

          Other than that, I found out about a few cafés in Dublin that I wanna try. Will probably rent a bike with a friend and go around trying coffee tomorrow :P

          1. 4

            Will spend some time poking at bpftrace bugs.

            Just submitted my first PR there: https://github.com/iovisor/bpftrace/pull/700

            Seems like there are plenty of low-hanging fruits :)

            1. 3

              Came to Amsterdam on Friday for my onsite interviews, but they ended up cancelling half of them for reasons.

              So I decided to get a hotel for the weekend and a new return flight.

              Will spend the time touring the city and enjoying the lovely weather. Desperately need my dose of vitamin D!

              1. 2

                As a programmer, you spend a lot of time editing and navigating code.

                In my experience, not as much as all editor-focused write-ups want you to think. Presumably, you spend much more time thinking what is it that you actually want to do. (Disclosure: my perception is informed by me spending most of the time in Python, as opposed to more text- and repetition-heavy languages like Java.)

                1. 1

                  Is it? I’m trying to find breakdowns of how programmers spend their time and it’s tricky. I’d suspect it would be something like half our technical time is spent debugging.

                  1. 1

                    I second this. Also for more verbose languages like Java there are usually very good supporting tools, and You usually don’t use a simple editor, but a semantically aware IDE for the task.

                    1. 1

                      Agreed.

                      I don’t get why people have this need for optimising their text editing so much. They spend so much time trying to learn all these handy keyboard shortcuts and tricks, that the actual time they save when editing text is meaningless.

                      1. 3

                        For me it’s absolutely nothing to do with saving time and everything to do with comfort. It didn’t take long to get over the initial hurdle of just running vimtutor and learning the basic keys to move around and edit. From there it was just editing and occasionally thinking ‘I do this [thing] a lot, is there a way to optimise it? Same for my shell setup. It’s not really a deliberate decision to spend ages customising everything, it’s a gradual process that occurs naturally over years out of curiosity, and it pays off.

                        Edit: the immediate benefit post-vimtutor was being able to keep my hands off the mouse, and that was enough to make it more comfortable than what I had been doing before. I don’t think I would have bothered if it had been that stressful and time consuming just to get started.

                  1. 3

                    I just got my copy of “Data Structures + Algorithms = Programs” by Niklaus Wirth.

                    I’ll head to a nice coffee shop and spend some time going through it.

                    Also just got Android: Netrunner (board game). Will defo spend some time playing it with a friend.

                    1. 4

                      Working on a library for doing deep copying in Go. We don’t need to do it often at $work, but when we do, it’s super inconvenient and error prone to write it out by hand. (Or, if that’s too annoying, we have hacks that just gob encode/decode to get a deep copy, when it works.)

                      The best library out there for this, as far as I can see, is copystructure. But it’s not particularly configurable and doesn’t help with unexported fields.

                      1. 1

                        I don’t program in Go. Can you please explain what an unexported field is. Also, just curious what problem space has you deep-copying data?

                        1. 2

                          It doesn’t really have anything to do with the problem space. You just occasionally want to be able to copy stuff, for example, when you want to ensure nothing shares any memory. Kubernetes has its own bespoke code generation tooling for this, and HashiCorp uses copystructure in a lot of their stuff AIUI.

                          But yeah, unexported fields just refer to the visibility of members in a struct. One can only access unexported fields within the same package in which the struct is defined. Reflection based tools normally respect this, but deep copying is often orthogonal to the visibility of struct fields, particularly since you might want to deep copy something that has unexported fields. Since copystructure doesn’t support this, you wind up having to annoying role your own manual and error prone deep copy implementation for structs with unexported fields, or you wind up just exporting stuff so that it plays nice, which obviously isn’t ideal.

                          1. 1

                            I wish there was an interface and/or de facto standard for this. Then you can just implement what it means for a thing to copy itself at that thing’s level, and anything wrapping that can just internally call the “Copy” of what they wrap.

                            1. 1

                              It’s tricky to just use interfaces to solve this problem. The first issue is that the natural type signature of a copy method is something like this:

                              func (t T) Copy() T
                              

                              But you can’t express that in Go in an interface, so you wind up with this:

                              type Copier interface {
                                  Copy() interface{}
                              }
                              

                              … which is obviously less than ideal, and makes it a bit annoying to use.

                              Second issue is that even if you accept the interface{} here, it’s still a major pain to actually write out the implementation of it. It’s a place in the code that’s just waiting for hard to diagnose bugs and it’s hard to unit test effectively. The simple case where this falls over is when you add a new field to a struct. There’s nothing that will actually tell you that the new field also needs to be added to the Copy implementation for that type. You’ll only find out about it when some other piece of code uses the Copy method and a subtle bug crops up because your copy doesn’t include everything.

                              You can devise unit tests that will fail, but you need another piece of machinery. That is, you need something that says, “fail this test if this value has any zero values in it.” That way, you can write a test that asserts your Copy routine roundtrips correctly and will also fail if you add a new field to a struct but forget to update both the test and the Copy implementation.

                              My idea here is to take a page out of go-cmp’s book and look for methods with the type signature, func (t T) Copy() T and use that before falling back on to automatic reflection based machinery to do the copy. That way, types can precisely control how they are copied when necessary, but otherwise, a Copy will “just work” in the common case.

                            2. 1

                              Cool thanks. Ah it sounds like the problem space is distributed computing, or short lived tasks or containers in this case, which Go is suited for.

                              I do alot of web development and outside of maybe cloning an online Order or some basic strings I don’t do this much.

                              Recently the only copying I’ve done programmatically is to “clone” a tables from db server 1 to server 2 as apart of an adhoc task. Interestingly enough usually I do this via insert into select from statements, but recently learned of Federated tables in Mysql which make the cloning even task between distributed mysql instances easier. Again, in this case as you’ve noted sometimes cloning has shortfalls. In Mysql a Federated table has various limitations such as limited allowed DDL statements on the table, or certain password character limitations.

                            3. 1

                              An unexported field is basically a private field in a struct. Unexported fields are only visible within the package they are defined in.

                              1. 1

                                Thanks

                            4. 1

                              Oh, cool, so you’re still writing Go! Good to know. I somehow though you’re a Rust-only celebrity now ;D Good luck regardless, anywhere the Road takes you! :)

                              1. 1

                                Yeah I never stopped. I just don’t typically do it in my free time any more.

                                1. 1

                                  Given your experience in both, I admit I’m now quite curious what are your thoughts on pros & cons of each? But no pressure or whatever; suppose people often ask you about this ;)

                                  1. 13

                                    That’s kind of a loaded question. :-) People don’t actually often ask me that question. A real answer is probably pretty involved. Personally, I think the most important benefit of Go is its “simplicity.”[1] And by that, I mean that it is very easy for people to hit the ground running with Go without too much fuss at all. There are a limited number of constructs available in the language, so it’s in general pretty rare to stumble into something that’s difficult to understand because of its abstraction. (My own experience supports the pithy phrase, “abstraction solves all problems, except for the problem of abstraction.”) If code is hard to understand, it is usually, at minimum, concrete, such that it doesn’t require thinking carefully through sophisticated type system shenanigans. The only truly complex Go code I run into is either deeply reflection based (which is pretty rare outside highly reusable shared libraries) or a mess of goroutines using a non-standard concurrency pattern. But I’ll take that over the pthreads code I’ve seen in C any day.

                                    Of course, like any good trade off, its strength is also its weakness. When you bump into Go’s abstraction limits, it can sometimes be really annoying. For example, I often really want a generic Option type that encodes the possibility of absence—even if it’s only enforced at runtime—since it isn’t always the case that the zero value is a useful indicator of absence. e.g., Dave Cheney’s famous “functional options” article uses the fact that the default value of 0 can also have meaning in a particular domain, so it isn’t sufficient to treat 0 as “missing” (and thereby resort to some other default). This in turn serves as part of a motivation for avoiding configuration structs entirely. Of course, you wind up with more machinery, but this can be worth it for a highly reusable library.

                                    Rust also has abstraction limits, but I very very rarely run into them in the work I do. Of course, those limits are much much higher than Go, and as a result, using the language is itself much more complex. I personally think it’s still comfortably less complex than using Haskell, and definitely less complex than using C++, but it’s pretty clear that it takes a lot more ramp up time to get started with Rust than it does Go. This has not only been echoed by many other people that have tried Rust (“I gave up a couple times before succeeding, and now writing Rust is not that hard” is a common story), but is also consistent with my own personal experience teaching both Go and Rust. I’ve helped people learn both, and the number of times I needed to go to a whiteboard and carefully explain a somewhat dense snippet of code is markedly different between the languages.

                                    Obviously, this trade off is intentional. Rust isn’t complex for no good reason. There are super good reasons for almost all of its complexity, and they generally boil down to some combination of safety and performance. Go also values safety and performance, but not nearly to the degree that Rust does. In Rust, I can push a lot more invariants into the type system quite naturally and idiomatically, where as with Go, the language actively fights against encapsulation (in the “data hiding” sense) in a lot of circumstances. Again, this comes down to complexity. For example, it’s totally reasonable to want to write your own type that exposes a map-like interface—and maybe uses a map internally or maybe not—but this is just fundamentally impossible because the language blesses the map type with special syntax. You can see this clearly even when the standard library struggles with it, for example, the sync.Map type and its Range method. Moreover, if you’re hiding things, this completely destroys the utility of most of the ecosystem’s reflection based encoding/decoding infrastructure. You wind up having to write error-prone serialization goop if you want to hide your internals. (Error-prone because if the internals change, e.g., with an additional field, then you have to remember to update the serialization goop. It’s possible to write unit tests for this, and we have them at work, but that in turn requires knowing to add the field to the test to serialize in the first place. So we have test helpers for that which can check that all values in a particular type do NOT have the zero value. So if you add a new field, that test automatically fails.) So… encapsulation in Go is possible, you just wind up getting punished for it, so sometimes you invariably choose “encapsulation isn’t worth it” just because of how the language is designed, instead of what makes sense for you specific circumstances. The frequency with which this happens is annoying in my experience. In Rust, I almost never have to make this sacrifice; encapsulation is well supported and nearly effortless compared to Go. (I could articulate some cases where Rust punishes you too, but that gets into the weeds and they are fairly rare.)

                                    [1] - I purposely use the word “simple” in quotes, because people love to imbue their own definition of what “simplicity” means. e.g., You’ll see plenty of people arguing that Haskell is actually “simpler” than Go just by comparing their core language specifications. e.g., The full generality of Haskell makes it “simple.” I don’t think this perspective is 100% wrong, but it just completely misses the very obvious point I’m trying to make and completely derails the conversation into a brain-dead definitional turf war. That is, I think it’s completely uncontroversial to state that, as an observation, it is much easier, in general, to read, write, learn and productively use Go than it is Haskell. I don’t give two poops why that is (“it’s because we don’t teach functional programming as a first language” is a common refrain) and I don’t care if it isn’t true in some cases, because, obviously, individuals vary.

                                    1. 8

                                      This comment is far too good to be buried in a weekend update thread. :)

                                      1. 2

                                        Thanks a lot for the response! And for your time taken to write it. Sure, I know it’s kinda loaded ;) but of all the people, who could be better posed to try and answer it in a balanced way than you! ;) As such, it carries a lot of value and authority to me.

                                        Personally, I’m currently kinda licking my scars after a first stroll into the Rust land. What surprised me most, is I think that it felt more complex than C++ to me; but I’m kinda coming to terms with the fact that it may just be because how far I got in C++, and how used to it I’ve become. Also, I was taken by surprise by the fact, that my difficulties weren’t with borrow checker per se, but rather secondary and tertiary consequences of BC on the typesystem coming at me from the least expected angles. With all that said, and with your report as an important point, I’m kinda starting to think I may just need to come back to trying a few more times in future, until I’ll maybe grok it at some point. But also my appreciation for simplicity and readability of Go is reinforced by your explanations.

                                        One more question, if I may: would you risk comparing the languages on a scale of “malleability” (ease of change), when some new features require deep refactoring of existing codebase? Do you find Go easier, Rust more supportive (b/c invariants in types), or do those kinda balance in your mind? You can answer by PM, or plain refuse to, if you’re afraid of being quoted on this in flame wars ;)

                                        1. 2

                                          Personally, I’m currently kinda licking my scars after a first stroll into the Rust land.

                                          I think that’s okay! I’m aware of a lot of people bouncing off Rust, at least initially. But I’ve heard a lot of success stories where people come back to it and figure it out.

                                          What surprised me most, is I think that it felt more complex than C++ to me; but I’m kinda coming to terms with the fact that it may just be because how far I got in C++, and how used to it I’ve become.

                                          Possibly. I am not a C++ practitioner, so it’s hard for me to say too much. I would hazard a guess that the presence of ubiquitous UB would bring its complexity above Rust almost on its own. In most Rust code that I write, I don’t need to think about UB at all, and I probably write more unsafe than most (since a lot of my work is in core libraries). With that said, I think this is a big rabbit hole, because C also has lots of UB, and I could see an argument to be made that C is simpler than Rust in the same sense that Go is simpler than Rust. It’s a pretty hand wavy thing in general, and I think “unknown unknowns” probably plays into this quite a bit. That is, I think a lot of people who write C or C++ are probably not intimately familiar with the language lawyering necessary to think carefully about whether some piece of code is actually UB or not, so the presence of UB might not contribute to the perceived complexity of the language.

                                          C++ also has a lot of other crud in it, and its template system is, I think, at least as complex as Rust’s parametric polymorphism facilities. But alas, we could circle the drain on this one forever. :-)

                                          One more question, if I may: would you risk comparing the languages on a scale of “malleability” (ease of change), when some new features require deep refactoring of existing codebase? Do you find Go easier, Rust more supportive (b/c invariants in types), or do those kinda balance in your mind? You can answer by PM, or plain refuse to, if you’re afraid of being quoted on this in flame wars ;)

                                          Hmm, good question. I haven’t thought about this one as carefully as I’ve thought about my encapsulation issues detailed in the previous comment. I think both languages ultimately make refactoring substantially easier than languages without a static type system, at least in my experience. I realize reasonable people might disagree on this, but it’s been reinforced for me personally many times throughout the years.

                                          In terms of Rust vs Go… I guess I honestly feel like refactoring is generally pleasant in both languages. I suppose, as you’ve guessed, I have found Go a bit harder to refactor in some cases, but this mostly ties back into my encapsulation argument in the previous comment. That is, sometimes I’ll just give up on encapsulation completely in Go for a particular type, and just expose its internals. Even if I think it’s the “wrong” thing to do all else being equal, you really just do not want to fight the language too much, so you have to pick your battles. In some cases, an exposed representation can make it awfully difficult to rigorously enforce an invariant. Even something as simple as imposing a “fail fast” change where you check an invariant on some piece of data on access is hard when everything is exposed, because you can’t control the access in the first place.

                                          Anyway, that’s all I’ve got for now. Please accept my thoughts with an appropriate grain of salt. I’m waving my hands a fair amount here. :-)

                                          1. 2

                                            Thanks. As to UB, sure, that’s what finally scared me away from C++, but I started finding out about it only once I was quite deep, unfortunately :( And I too fully believe much too many people still don’t grasp well enough what it means. In part it feels to me like many people don’t want to believe; I assume in part being subconsciously afraid of losing so many years of experience when switching tech stacks. Also, before Rust, there was no serious alternative. So, personally, I too hope Rust would eventually replace C++.

                              1. 5

                                Interview prep, got my coding phone screen next Thursday.

                                Will spend a good amount of time cranking through LeetCode problems.

                                1. 3

                                  mind sharing what company are you interviewing for?

                                  1. 3

                                    Uber, Amsterdam

                                  2. 1

                                    Luck! I don’t know where you are in your career path but something I wish I’d learned sooner: Don’t forget that you’re evaluating them as much as they’re evaluating you :)

                                  1. 1

                                    There’s no point in trying to hold more and more information in your head. You can’t.

                                    I completely agree with andyc’s points. Write stuff down.

                                    Sit down, list the things that need to get done for your projects, ruthlessly prioritise them, and track things as you go.

                                    What I like to do (both for personal projects as well as my day job) is to keep a work log. It’s just a simple text file where I list the stuff I worked on, any blockers, etc.

                                    When you have a work log, you can quickly scan over the most recent entries to recall what you were working on.

                                    1. 8

                                      Pretty neat!

                                      Took me a while to understand how v5 worked. In case anyone else has a hard time understanding:

                                      When they say that each entry pointer is offset by 3 bytes (N * 8 + 3), this is due to the dirent_t type:

                                      struct dirent64_t {
                                        ino64_t d_ino; // 8 bytes
                                        off64_t d_off;  // 8 bytes
                                        unsigned short d_reclen;  // 2 bytes
                                        unsigned char d_type;      // 1 byte
                                        char d_name[];                  // not aligned
                                      

                                      Those 3 bytes (d_reclen + d_type) and the first 5 bytes of d_name compose a full 64-bit word (DWORD). After skipping the first 5 bytes, the memory entry pointers (e.g., a + 5) are aligned.

                                      1. 7

                                        That’s right. alignof(dirent64_t) is 8 and offsetof(dirent64_t, d_name) is 19, so d_name is aligned at N * 8 + 3 mark. If we add 5 to such a pointer, it’ll be aligned to 8. memcmp really likes pointers aligned to 8, so we really want to give it our pointers plus 5. But this means we have to handle the first 5 characters manually and we should do it faster than memcmp. If we’ll just go over the first 5 bytes one-by-one, we might as well pass the pointer to memcmp unmodified because it can also go one-by-one.

                                        The insight is that we can compare the first 5 characters faster than memcmp if we mutate the strings as a preparation step. In fact, we’ll compare not 5 but 8 first characters. Before we even start sorting, we reverse the order of the first 8 characters in every string. If the string is shorter than 8, we’ll scoop some past-the-end garbage, but that’s fine – we’ve arranged things so that past-the-end memory is also within our own buffer.

                                        Then, to compare two strings, we read the first 8 bytes as uint64_t via an unaligned read (on Skylake such reads aren’t penalized) and compare these two 64-bit integers. If one is less than the other, it means the string is less than the other (we reversed bytes specifically to achieve this property on Little Endian architecture). If the numbers are equal, we ask memcmp to compare everything after the 5th character, effectively comparing characters 6-to-8 twice.

                                        Hope that helps.

                                      1. 1

                                        I’m confused about Jai. I’m sure I’ve seen streams where Jonathan Blow demoed their LLVM backend.

                                        Did that change at some point?

                                        1. 1

                                          I believe they have more than one backend (the first one was even slower as it compiled to C which was then compiled and linked again). Their fast compilation is meant for development and debugging. It produces reasonable code but then release builds can take much longer to regain the extra performance left on the table.

                                          1. 1

                                            Yeah I remember the C backend, but haven’t seen any other besides the LLVM one.

                                            Just curious if there’s more info on that

                                        1. 4

                                          At $dayjob we changed our python codebases from 80 chars to 88 chars (90 chars, but lint tools configured to avoid the gutters), and most devs have changed their default terminal window sizes to at least 90 chars. Our codebase /feels/ like it has improved – awkward arbitrary line breaks have been reduced considerably. I don’t find 90 chars to have negatively impacted horizontal legibility either. Overall, a great improvement.

                                          1. 4

                                            I agree 100% with this. I’ve also found that in the codebases I’ve worked in (Java, Javascript, C#, Rust), 80 characters is just too few columns in many cases, but it’s extremely important to still have a conservative limit (100 characters at most).

                                            1. 4

                                              You guys should totally look into Black [1] and not bother with formatting at all. Set whatever line width you prefer and let it take care of the rest.

                                              This was one of the best things that happened where I work. Every python project has black run on it by default

                                              [1] https://github.com/ambv/black

                                              1. 2

                                                Yes, we do use black on all our python projects. We just have it configured for the line length I noted above.

                                            1. 2

                                              Currently going through Linkers & Loaders by John R. Levine.

                                              Really enjoying it, the explanations are pretty straightforward.

                                              1. 3

                                                I’m working on a C compiler (in C).

                                                The lexer is almost finished, I have some pieces for the parser, and codegen.

                                                But now I want to get the preprocessor done before I keep working on the parser. Still need to figure out how to go about it. I’ll probably implement it first as a separate tool and then integrate it into the compiler.

                                                1. 3

                                                  Funny enough I’m working on a C like in Go right now. Just finishing my parser before I go to code gen.

                                                  1. 2

                                                    Neat!

                                                    How are you planning to go about code generation? I started out experimenting with Destination-drive code generation [1] because it seems simple enough. But I might first use LLVM to make sure the front-end is working fine and then focus on the back-end.

                                                    [1] https://www.cs.indiana.edu/~dyb/pubs/ddcg.pdf

                                                  2. 3

                                                    I brainstormed a project for a better preprocessor, but never ended up doing it. These docs/projects might be useful?

                                                    https://www.spinellis.gr/blog/20060626/index.html

                                                    https://github.com/alexfru/SmallerC

                                                    http://recc.robertelder.org/

                                                    http://blog.robertelder.org/7-weird-old-things-about-the-c-preprocessor/

                                                    1. 1

                                                      Definitely useful!

                                                      This will come in handy soon, thanks for pointing out those resources. So far I’ve only been reading through Fabrice Bellard’s TCC [1] to get a feel for what I need to do.

                                                      [1] https://bellard.org/tcc/

                                                      1. 1

                                                        The TCC code s pretty gross.

                                                        1. 1

                                                          Ill warn you that TCC might not be easiest to learn from given it was a submission to the obfuscated, C contest. The only one I know designed specifically for education, with an accompanying book, is LCC. It’s not OSS/FOSS but was free for personal use.

                                                          1. 1

                                                            I didn’t find it that bad, but definitely not straightfoward.

                                                            I skimmed through LCC’s accompanying book but didn’t look at the source code. Have you gone through LCC’s material? How do you find it?

                                                            1. 1

                                                              Oh, I didn’t study it or even use it. I have no idea.

                                                              I collected as part of research for diverse and verified compilation for C programs. I needed a bunch of compilers. At least one needed to be easy to understand. At least one formally-verified. They would all compile the same code run with same input to hopefully get same output. Once I had the list, I was done with that for what that exploration would need should I ever have to use that approach.

                                                      2. 1

                                                        Btw: I collected a bunch of tests here you might be able to use:

                                                        https://github.com/c-testsuite/c-testsuite

                                                      1. 3

                                                        Very cool.

                                                        In practical terms, I wonder when one would want to use Haskell in C versus using e.g., Inline-C in Haskell.

                                                        I have only tried this once, when trying to write a special purpose static site generator; I had to build a couple million pages as quickly as possible, and Inline-C gave me a significant boost.

                                                        1. 5

                                                          Probably if you want a little bit of C in a primarily Haskell project then inline-c would be the most convenient way to go. Whereas boosting a C program with some Haskell parsing or whatever would probably be easiest with this makefile+ghc approach. It also appears that inline-c depends on template haskell, so might be a little slower to build.

                                                          1. 4

                                                            Take myself as an example:

                                                            At my job, we have a data processing pipeline that I developed in Haskell last year. Unfortunately, the “data stream provider” that I used to feed the pipeline is no longer supported inside the company. The currently supported way of doing this sort of stuff is to use an in-house C++ framework.

                                                            The problem is that rewriting all of that Haskell in C++ is error prone and doesn’t really move the project forward.

                                                            I think what this post offers will be useful to me. I’ll try to the new C++ framework just to handle to IO part and keep all the business logic in the current Haskell code.

                                                            1. 3

                                                              I think the use cases are similar to how we do it in CHICKEN. If you want to use Haskell as a scripting/extension library for a mostly C-based project, I’d use the approach from this link. If you just want to speed up something or do an off-the-cuff C API call, inline C would be the way to go.

                                                              So basically, it depends if you’re thinking Haskell-first or C-first.

                                                            1. 2

                                                              That looks pretty good! There’s also a course on software rasterization by the same author: https://github.com/ssloy/tinyrenderer

                                                              1. 2

                                                                I guess half of my #includes on header files would be unnecessary if forward declarations of STL types were universally supported by the C++ standard. I have no idea on how hard that would be to implement, though.

                                                                1. 1

                                                                  I don’t think this is a “hard to implement” thing. My guess is that it is still done for backwards compatibility reasons

                                                                1. 4

                                                                  I’m working on Synacor’s challenge.

                                                                  The idea there is that they give you the spec for a fictitious machine and a binary to be executed. There are hidden secrets in the binary that you gotta uncover to progress through the challenge.

                                                                  So far I’ve only managed to get the first secret and now I’m stuck with the program just printing some output with missing pieces and asking for the user’s input. Not sure how to proceed now.

                                                                  What I’ll try to do is write a disassembler so I can read the program properly and figure it out from there.

                                                                  What I might also do is write visual emulator/debugger so I can step through the instructions manually and see what is the VM’s memory state. May be totally overkill, but it sounds fun!

                                                                  1. 4

                                                                    Things I don’t know:

                                                                    • Compilers: I understand how individual parts of a compiler work but I have no idea how to put them together effectively. I’ve done my share of toy interpreters but never completed a compiler for a legit (i.e., not a toy) language.
                                                                    • Virtual Machines: More in the sense of programming languages (tied to compilers ^). I get the general idea of how they work but I’ve got no clue on how to implement one.
                                                                    • Cache friendly programming: I have a general intuition but I don’t have a method for this. How do I figure out if my code is cache friendly or not? What are good strategies to keep code cache-friendly?
                                                                    • Graphics programming: I sort of have some idea of how it works, but I’m mostly clueless. I also don’t have a good understanding of computer graphics concepts. I’ve been planning to get started with this for years but never actually did.
                                                                    • Network protocols: My understanding is very shallow. I get the general idea of say TCP, but I don’t understand it enough to reason about traffic behaviour under some situations (e.g., congestion). I’m also clueless about routing protocols: BGP, MPLS, IS-IS, etc.
                                                                    • Assembly: I’ve done some programming in ARM assembly many years ago, but I can barely remember anything at this point. I think knowing some x64 asm could be handy, but I have no idea how to even read it. All of those different addressing modes just put me off.
                                                                    • Some language other than Haskell/Python: I’ve been coding in Haskell and Python for a few years now and that’s all I really know at this point. I’d really like to learn some other more mainstream language that I could use for lower level stuff. I can get around Go and C++, but that’s about it. Can’t call myself proficient in any of them.