1. 21

It’s Monday again, so let’s share the things we’ve been working on. Feel free to ask for help - many members of the community are experts and more than willing to share advice or expertise.

This week I am trying to ship support for retries in urllib3. I wrote a blog draft about the new feature - it offers you Requests-like levels of code shaving, and simplicity, for making resilient HTTP requests.

https://gist.github.com/kevinburke/31d006a5f205392ff57a

Andrey likes it, but since this code affects so many people, we’d like to get it looked at by at least one more reviewer. I am happy to trade code reviews with you, if you’d like to review this branch.

I’m also getting ready for a 3 month trip, and trying to set up archiving for a bunch of documents I have saved over the years.

  1.  

  2. 6

    This is long because this is an ambitious week …

    This week, I’m going to take a bit of a break from the Objective C / Dylan bridge that I’ve been working on. It is at a state where I need to generate bindings for the various frameworks on OS X. (Help in this area is welcome. It would involve writing some Python code on OS X and is pretty straight forward.)

    Instead, I’ll be working on generating Qt bindings for Dylan by continuing my changes to the stand-alone qt-generator that I’ve been extracting from Qt Jambi. This will be useful for any language that wants Qt bindings.

    I’ll also be working a bit more on modifying the Dylan compiler to know how to emit dtrace probes on Mac OS X.

    I’ve also been playing with the Atom editor some. Last week, I wrote a Dylan mode for it. I also wrote about the Deuce editor architecture hoping to convince the Atom team to implement some of the features that we have in Deuce. My favorite Deuce feature is the ability to search for something across files and the results are put into a single buffer that pulls from each of the files. So you can search for “Callers of X” and then easily edit all callers and save the search results buffer, resulting in the changes getting saved to all of the underlying files. This allows for an editing experience similar to Smalltalk environments, but still using files underneath.

    We’re also in the process of starting a new build tool for Dylan that will be plugin based and more like something like lein from Clojure. This will also improve our ability to do editor integration as it simplifies the procedure for performing a build, running tests, etc. It will also be useful with the Objective C / Dylan bridge as it will allow us to provide a plugin for creating Mac OS X bundles and so on.

    Finally, I’m working on submitting some fixes upstream to emscripten that have resulted from my contract work.

    As always, if anyone out there is interested in trying out Dylan and helping with any of the above, I’m more than happy to help.

    1. 5

      My book chapter on a tiny search engine has a first draft done, which benefited greatly from the first technical reviewer, but it still needs a lot of work. I’m hoping to get incremental indexing, pluggable index storage backends, parallelism, and automatic index discovery in, but they may not all make it in; what’s there needs some cleanup, too.

      I’m also trying out more time-series database stuff and implementing a succinct query interface for our time-series data for $work.

      1. 1

        I’m also trying out more time-series database stuff and implementing a succinct query interface for our time-series data for $work.

        Since it’s for work, I’m not sure if you’ll be able to report on it/release it, but consider this a declaration of interest. I’ve been thinking of adding some sort of temporal data to my IMDb database (e.g., votes/rankings over time for movies), and I’ve been struggling a bit to think of a good way to interface with that data. (I don’t really have any experience dealing with temporal data.)

        Your search engine work is phenomenal too. :-)

        1. 1

          I’m reasonably happy with it but I think “phenomenal” is a vast overstatement! But I’m happy you’ve enjoyed it.

          I don’t feel like I have a good handle on time-series databases yet, still.

      2. 4

        I built a lightweight promise library for javascript inspired by jQuery.Deferred. Yeah, there are a few of those already but I enjoyed making it.

        1. 3

          The ongoing, damn, eventual consistency and distributed systems is hard Storm topology and now that the tuning and metrics chapter of Storm Applied is done, on to the basic troubleshooting chapter.

          I guess its just Storm, storm, storm.

          1. 2

            The ongoing, damn, eventual consistency and distributed systems is hard Storm topology and now that the tuning and metrics chapter of Storm Applied is done, on to the basic troubleshooting chapter.

            If we wait long enough, will that sentence eventually become consistent? :P

            Looking forward to learning Storm, someday. Can you compare Storm and Spark? Or is that the wrong question?

            1. 3

              Probably won’t ever become consistent.

              You can compare Storm and Spark.

              Spark looks very interesting. I intentionally dodged comparing Spark to Storm in the current iteration of Chapter 1 of Storm Applied because I know Storm well, I only know Spark from the outside. I’m hoping that we get a chance to investigate Spark in 2015. Right now, we can make Storm do everything we want.

              If we had a hadoop cluster or were using HDFS standalone, I’d be more interested in Spark now. Without having gained knowledge with those (or mesos and yarn), adding Spark brings a requirement for us to learn a lot of new bits of infrastructure at once.

              For people who are interested using Storm’s Trident abstraction and have operational experience with mesos/yarn/hadoop, I think Spark would be worth serious consideration.

          2. 3

            I gruelingly planned out a strategy for starting a more HA like infrastructure for my “side” project that is starting to make money. The goal was to keep the one server and db, running for the entire upgrade. This week I am going to slowly stage the plan out and see what I run into. This is particularly tricky for me because my development chops are quite a bit more than my system’s engineering chops.

            1. 3

              I am working on making the linking of TokuMX slightly more sane.

              The MongoDB code has some “magical” base classes whose constructors register themselves in a global map of instances. This is done so that if you want to make a subclass that does something interesting, you just declare it somewhere, override a virtual function or two, make sure the base constructor is invoked with a unique string name, and then declare a global instance of the derived type. So something like

              class FooRunnable : public Runnable {
                public:
                  FooRunnable() : Runnable("foo") {}
                protected:
                  void run() { cout << "magic!" << endl; }
              } fooRunnableInstance;
              

              Since it’s a global instance, its constructor (which registers it) gets run before main(), as long as the compilation unit is included in the executable.

              The linker is insane, which makes ensuring that all such compilation units necessarily make it in to the right executables a monstrosity of a task (because of how static library linking actually works). MongoDB does it with some kind of terrifying SCons magic. I’m trying to do it with some less insane CMake magic, but I think until OBJECT libraries can have dependencies I’m going to have to do it in a pretty hacky way.

              1. 1

                If anyone cares, I have indeed solved this in a pretty hacky way.

                Instead of using OBJECT libraries, everything is still a STATIC library in CMake, so we get the benefit of CMake’s recursive dependency tracking. I wrote some CMake script to walk the dependency tree and collect an exhaustive list of libraries needed, and de-duped it, then merged all of those libraries into a single archive using a technique adapted from MySQL. I then linked with just that massive library, using -Wl,-whole-archive, and it seems to work.

                One nasty thing I found: While it’s safe to have multiple object files in a static library with the same name (ar doesn’t represent directories, so ar rcs dir1/foo.o dir2/foo.o will result in a static library containing two different foo.os), it is very nasty to extract such an archive, you need to list the contents, and then use the -N argument to ar to get different instances of foo.o out.

              2. 3

                I’m coauthoring a book titled Mastering The Internet of Things. This week I’m working on the chapter about protocols and network architectures. It features a comparison between CoAP and MQTT/-SN. It’s tedious but interesting stuff. The book should be published in June.

                1. 2

                  This week I implemented support for indirect addressing to EZ8. I also added comments to the RTL source so that it’s easier to understand. This week I’ll be working on a more detailed blog post about the project with block diagrams explaining the microarchitecture. I’ll probably also look into writing an LLVM backend for it so that you can program the processor in C instead of assembly.

                  1. 1

                    Cool! One of my long-time pet projects is a simple top-to-bottom hardware/software stack, like NAND-to-Tetris; I want to remove the air of mystery around the lower layers of computation. EZ8 looks like you could use it as the bottom level of such a thing.

                    1. 2

                      I’ve been working on that, as well. First step has been working to implement each chapter of NAND-to-Tetris in hardware (but I’m only on chapter 2 right now, courtesy of the real world intruding). I have physical test boards for everything in chapter one, though. I built the first run based on using transistors, but the next run will use all 7400s in the interest of space.

                      1. 1

                        Oh my god, and I thought what I was doing was baroquely hardcore. I couldn’t even fathom implementing EZ8 in SSI chips.

                        1. 1

                          My estimate was that Calculus Vaporis should be a bit under 1000 NAND gates, or maybe 600 if you build it bit-serial, so it’s pretty easy for me to fathom implementing it in SSI chips; but you also need RAM! Like probably around 32768 bits of RAM at least. And this is why many computers of the 1940s and 1950s used weird things like magnetic drum memory or acoustic delay line memory. I have no idea how I would go about making acoustic delay line memory.

                          1. 3

                            Building one bit of memory takes four transistors with RTL or one half of a 7400 (based on an /SR latch); the board I laid out for a byte was about two inches long (with headers) and a half inch wide. It takes up space fast. I quickly caught on to why early systems used “weird” alternatives; it’s one thing I’m looking into but we’ll see how that goes.

                            1. 2

                              If you’re talking about cross-coupled inverter latches, SRAMs today generally use 6 transistors per bit. Two each for the inverters, and then transmission gates on each end. If you don’t have the latter, you won’t be able to individually address a word.

                              You can of course create DRAM using just two transistors (one used as a cap, the other as a transmission gate), but then you have a lot more complexity in the controller, so you probably won’t save much overall unless you have a very large memory.

                      2. 2

                        Well, this isn’t really the bottom-most level of the stack, since it’s only at the RTL level. To get really top-to-bottom, you’d have to design the layout and then design your own transistor process. Anybody have a license for Cadence Virtuoso and a spare fab?

                      3. 1

                        After looking more closely at the LLVM documentation for writing a backend, I’ve decided it’s probably a bit too complicated for me to tackle right now. I think I’ll instead go the route of implementing a lisp compiler for the EZ8.

                        1. 3

                          Maybe see about porting a small Forth to it instead. There are some notes in tokthr recommending the eForth Model as one of the simplest Forths to port: about 30 machine-code primitives (which are documented) and Bob’s your uncle.

                          1. 2

                            Oh yeah, I forgot about Forth. That might actually be a better idea, since EZ8 has a limited instruction stack so I can’t reasonably support any form of recursion except tail recursion.

                            1. 1

                              Hmm, does that mean you can’t do indirect jumps? I guess I should look at the instruction set.

                      4. 2

                        I’m going to design a presentation for the Haskell DC Meetup, probably talking about the abstract definition of datatypes and how that leads to both natural folds and things like free monads. I’m picking this topic because it (a) mixes theory and practice which was explicitly something the meetup group seemed interested in (b) introduces free monads which are a compelling way to understand monads generally, and (c) gives a lead-in to talk about co-data and encodings of OO in Haskell—another topic that’s usually quite interesting.

                        1. 2

                          Did more user testing on my project Fire★ (http://www.firestr.com) and am going to make UI changes to fix some user confusions. For example, in the alert tab I have a button to start a new session when someone comes online. This confused some people where they thought they were “accepting” the connection. Also it wasn’t clear which users are in which sessions and which users are online and how to add them to a session. So will be rejiggering the UI.

                          1. 2

                            I’m trying to strong-arm SQLAlchemy into reflecting all available information from an existing database. This is usefully close, so I just need to make sure our geo types are preserved correctly and then I’ll be able to start making progress on a new project. PostGIS is basically magic.

                            1. 1

                              I hope to find some time to work on my Doxygen -> MediaWiki -Project. (Screenshot: http://i.imgur.com/W2AaDXy.png ) There are a lot of decisions to be made regarding multi-projects, custom pages, edit rights.., but I’ll probably just spend some time trying to figure out how to make nicer graphviz outputs. They really don’t fit into the look-and-feel of this project and have been bothering me for some time.

                              I’m also thinking a lot about synthetic image generation for testing image processing algorithms, and hope to spend the weekend with a raytracer and the LabVIEW 2013 3d-reconstruction algorithm (http://zone.ni.com/reference/en-XX/help/370281U-01/imaqvision/stereo_pal/), figuring out it’s strengths and limitations.

                              1. 1

                                Right now I’m writing a re-implementation of the Arrowsmith System using Grails and Groovy, as a learning project to experiment with Literature Based Discovery. Depending on how this goes, we may consider trying to introduce some LBD elements into our product suite here at Fogbeam.

                                1. 1

                                  And in a late entry, here’s jfred —

                                  This week I’ve worked on a couple really interesting things.

                                  On the paid-for side, I’ve been researching how to implement strong chains-of-trust for code. Our product at work has the potential to be audited at any point by various Alphabet Soup agencies (mostly the FDA), so we have a very interesting constraint to try to fit into a rapid-pace, agile workflow. At any point, we have to be able to establish and produce that the code running in production is the same as the code we claim runs in production. In particular, this means ensuring that access to production servers is strictly controlled, that builds are archived and never changed, and generally that we are exceptionally anal-retentive when it comes to futzing about with what we deploy.

                                  On the not-pad-for side, I’ve been working on a few different little projects. Setting up a packer definition for an Arch Vagrant machine (the ones on Vagrantbox.es are almost 2 years old), working a bit with python (because NLTK looks pretty neat), and playing around with Bittorrent Sync as a replacement for Dropbox. I like Dropbox a lot, but I want to sync some more sensitive materials (GPG keys, in particular), and don’t want to have private keys living on a remote machine. BTSync, coupled with some automatic encryption of the private signing keys I have set up for various things, makes it easy to move keys around safely.