Threads for sirpengi

  1. 7

    This week I’m working on the sharing side of our end-to-end encrypted file storage and sharing network, Peergos.

    I ported our erasure codes from Java to Javascript, but for a 5 MiB file the encode took 60s and the decode 360s. Whereas the Java version took 3s in either direction. After removing some unnecessary copying and object creation I got the encode down to 3s and the decode down to 240s. But I’m struggling to find the missing 10X in the decode. It seems like both Chrome and Firefox are constantly trying to JIT the hot function, which is finding the syndromes of the Galois Polynomial, and never settling down to a jitted version. No idea why. If anyone wants to try their hand, I’ve set up a minimal JSfiddle (WARNING: the fiddle takes about 4 minutes on a good machine and it uses the event thread, so the browser might think it’s frozen - it displays the time the two phases took, upon completion).

    Also last week, I got public links to files working (essentially, the key is encoded in the URL). There’s a demo which shows the current state.

    1. 2

      Your jsfiddle seems a bit buggy. The GF.exp function tries to index into nonexistent this.exp[y] (I believe this should be this.expa[y], but after fixing that there’s an exception in GaloisPolynomial.findErrors

      1. 1

        Thank you, sirpengi. I fixed that and the following errors, but then the syndromes are totally wrong. Looks like some of my “optimisations” have introduced errors. Back to the original JS for me. I’m too used to type safety.

        1. 1

          For what it’s worth, here’s a correct, but not “optimised” version that takes ~400s, JsFiddle

          1. 2

            Sadly I don’t think this version is correct either. I noticed again that all calls to GaloisPolynomial.eval got an undefined x value (turns out, Galois.exp was defined as a Uint8Array but a few lines later overwritten with a function). In any case, I started to poke around again (https://jsfiddle.net/41tuqcsL/1/) by adding a “use strict” directive and there are other issues with the code.

            You should probably continue working with the “use strict” directive, as not having that on has been silencing some errors, and having it on enables some optimizations that aren’t otherwise possible.

            I’d also recommend trying a type-checked flavor of javscript (say, http://www.typescriptlang.org/ or http://flowtype.org/)

            1. 1

              Thank you very much, Sirpengi! I’ve fixed the correctness bugs, and the resulting decode ran in 3s, although the encode is now 40s. Once that’s optimised, we’ll probably spin this out into a separate library in case anyone else needs javascript erasure codes optimised for large byte arrays.

    1. 1

      Just wrapped up at my job on Monday and I’ll be starting at a new company next week. The new position is more heavily front-end, so I’ll be spending my time off doing discovery into fancy frontend things I haven’t had a chance to touch. To that extent, I’m working on a JS framework that uses idempotent views (virtual-dom), immutable data (mori) and channels (jschannels). I’m taking this framework seriously in that I’m thinking hard about what features are necessary, but I’m not taking it that serious to think anybody will put this into production. It’s been interesting seeing the benefits/limitations of using channels to manage UI state.

      1. 5

        The problem with less is you can’t pipe it to anything. Sure, it’s got the ability to tail a file and also to filter that file (common things you need when you’re tailing a file) but beyond that you better hope any utility you need is implemented in less.

        1. 1

          Exactly. Almost always, I pipe tail -f to grep either to monitor error, exception, expected request hits.

        1. 3

          Been thinking about finite state machines lately, and wanting to start using them in code involving protocol handling and UI states. To that effect, I’ve started working on a library that implements a FSM over backbone’s model. I’ve got some working code, but I’m not entirely happy with the api I’ve exposed, so comments or suggestions are welcomed: https://gist.github.com/sirpengi/e862b9e9c0af70fe42c2

          1. 1

            Continuing from last week, added a few more things to http://graphcake.com/ (my pastebin service for throwaway time series graphs). Added ability to export chart data into various formats, as well as embed charts on other services.

            1. 4

              Outside of day to day work, I’m tinkering on a paste-bin style service for time series graphs. It’s up and running on http://graphcake.com/ and I’m slowly adding new features as I find time.

              1. 1

                Hmm, cool. Any way to dump a bunch of data in as csv/etc? I know a bunch of people that track weight/lifting/etc as xls/csv. It’d be extra nice of you to allow exporting data as csv as well.

                1. 1

                  I added bulk import and exporting data to the todo list. Data import kind of falls into the realm of the “time travel” feature (since it involves adding data points that aren’t right now). I’ll also need to figure out a good UI solution for the two input features.

              1. 2

                What are you trying to create another lobster clone for?

                1. 11

                  I’d assume they want to use it for something other than tech news.

                  1. 4

                    I, for example, have opened a lobster clone for Russian developers https://develop.re/

                    1. 2

                      that’s a great domain name

                    2. 3

                      We have a small dev community in Hawaii, and we deployed our own lobster clone because we wanted a place to share and discuss links/events. We also wanted something we could customize a bit (so that ruled out reddit, also reddit is generally too public for the type of discussion we wanted), so rather than NIH another link-sharing site we started with the lobsters codebase.

                    1. 4

                      Why a new implementation instead of PyPy:

                      • method-at-a-time JIT (like V8) instead of tracing JIT
                      • conservative garbage collector
                      • LLVM backend instead of RPython
                      • no GIL

                      FYI, GvR is currently working at Dropbox. GitHub repo.

                      I wish it targeted v3.x and not v2.7 though.

                      1. 1

                        For me the biggest draw is their intention of making C extension modules just work. Pypy has slowly started getting C extensions working, but even when it works it’s slower than CPython (due to the compatibility layer involved). Having a drop-in JIT'ing replacement that still works with the likes of numpy/scipy is very exciting to me.

                        I know people like to rag on the GIL and python, but I honestly don’t think it’s that big of a deal, and they don’t really have any concrete plan on how to get rid of the GIL (the README at least outlines a plan of approach for C extensions, it says nothing about how they plan on getting rid of the GIL).

                        1. 1

                          Obviously it’s liable to change, but listing GIL removal as a primary goal is a good sign. There are plenty of CPU bound Python programs that can’t be solved with multiprocessing, either at work or in my personal projects. Gearman or Celery is overkill for these projects.

                      1. 2

                        Aren’t Promises just implemented using javascript callbacks? While I’m no big fan of Node (I’d much rather prefer to use Tornado), you can still build your application with Promises everywhere and still interact with APIs that require callbacks. All the callback needs to do is fulfill a promise. It’s also not hard to make a thin, Promise-speaking wrapper that handles that for you.

                        1. 2

                          In what seems like a neverending saga, more Jenkins configuration, more Ansible, more Devops, and a bit more social engineering in the IT department.

                          I’ve blown past the honeymoon phase with ansible, I’m starting to feel like the pieces it lacks compared to Chef are bigger than I first assessed. Chef’s ability to let you create what ansible would term a ‘module’ rapidly and easily is pretty killer. That said, I haven’t dug into the details of building Ansible modules, so my judgement will remain reserved.

                          On the math front, I’m still feeling less than confident, it’s stunning how much information leaks out of your brain over the course of only a few years. That said, I’m scheduling my GRE2’s this week whether I feel prepped or not, and may the FSM have mercy on me when the time comes. I think that – even if I get into some schools, I won’t be able to start for a while (finances are, I am increasingly convinced, the most difficult branch of mathematics), but it’ll be good to know whether folks’ll want me.

                          1. 2

                            I’ve blown past the honeymoon phase with ansible, I’m starting to feel like the pieces it lacks compared to Chef are bigger than I first assessed.

                            This is also the conclusion I’ve been coming to recently, except I only have previous experience of Puppet not Chef. Roles were a great step to reusability but I feel myself wanting to call parts of a role during a playbook rather than the whole thing.

                            For example take the example of Apache/nginx, you’ll want the core configs and packages in place and then you’ll want per-site configurations to be copied into place. Puppet handled this by having ‘methods’ on a module, which you call when you wanted to install a site, this allowed it to be done from multiple locations; Ansible however requires you to pass in a list to do things multiple times which means knowing all your sites at the time you want to make the call - and this leaves a nasty taste in my mouth as it sometimes ruins separation of concerns. You can include part of a role by simply including the task file, and that you can also pass parameters too - but then you lose all the advantages of using a role such as auto-loading variables & handlers from the correct files.

                            If what I’ve explained can be done in a nicer way then pray tell..

                            1. 3

                              Exactly, w/ Chef, you define little re-usable resources which I suspect have some parity with Puppet’s “methods”, in particular, for an nginx recipe, I might have the default recipe install a blank nginx, and then have it ship with a few different LWRP (“Lightweight Resource Providers”) which give you new, custom resources for creating ‘sites’, so it might look like:

                              site 'my-cool-site' do
                                enabled true
                                other_properties { blah: 'foo', port: '8080' }
                                # etc
                              end
                              

                              In particular, it makes each recipe not only function as a way to install some software, but also to give you tools/abstractions for configuring it in other recipes. With Ansible, the closest thing seems to be to write custom modules in some python-based framework. This in-and-of-itself isn’t terrible, but I gather that it essentially separates what I’d consider a single responsibility. That is to say, where Chef and Puppet treat a recipe/module as a sort of object with an external API (via LWRP’s or puppet-methods, resp.), and an initialization (chef’s default recipe, puppet’s equivalent). Ansible seems to treat these ideas as wholly separate. Initializing this ‘object’ is the job of the role, interacting with it is the job of a separate ‘custom module.’ This idea is emphasized by the fact that the role is written as a collection of yaml, and the module is written as python.

                              This becomes exceptionally frustrating when you want to add some abstraction to otherwise basic shell commands. In Chef, for instance, you can wrap the jenkins-cli tool really quite easily, giving a pleasant API for managing most of a jenkins install without having to remember the exact invocations of the tool. With Ansible, I’m stuck having to either write the shell, or write some generic python which ultimately executes shell.

                              What would be ideal, instead, is to have some way of using ansible to define ansible modules. Most of these tools are essentially little declarative, mostly-functional languages anyway, so it’d be brilliant to be able to treat the low-level ansible tools as combinators like you can in Chef.

                              I will say this, I really like the ansible and ansible-playbook tools, I found them quite simple to use compared to knife. I haven’t dug into puppet very much (I probably should), but I have watched a few videos about the “Marionette Collective” tool, which is puppet based, and looks pretty neat. I feel like that would give me at least that part in a perhaps-better setup. But like I said, I have to reserve most of my judgement of Ansible’s ability till after I dig into the module definition APIs.

                              1. 2

                                Thanks, it’s interesting to get the perspective on Chef having never used it.

                                I think my next foray into the world of provisioning tools will be with Salt if anyone has anything to say about that?

                              2. 2

                                I separate those concerns by placing the common config in a webserver role, and creating another role for each site (e.g., foo.com role). The webserver role makes sure all the necessary base bits are included, and the site role handles the site code and site configuration.

                                Otherwise, even if you have a consolidated role, you can use ansible tags to specify/exclude certain tasks. In this method, you’d tag your common tasks with a webserver_common tag, and your site specific tasks with foo.com.

                                1. 1

                                  That’s not quite the same use case, near as I can tell (though I’ve not used tags very heavily). In particular, it solves half the problem, but (perhaps) the unimportant half.

                                  Indeed, I’ve separated several of my roles into more granular roles (some which might even be called “abstract”, in the sense that they aren’t really configuring a single piece of software, but several to fulfill some purpose). The real thing that I find myself missing is not the ability to do host-specific configuration (though I suspect I will need that in the relatively near future), it’s the extensibility.

                                  When I have a LWRP that ships with a recipe in chef, it gives me an inverted control system for configuring that software later, and extensibly. For instance, I use chef to manage my personal server, which I use for a few different things, including a minecraft world, a mumble server, and some personal webservices I wrote to manage various batch jobs I need to run. I serve these on the same VM, and use nginx to correctly proxy between them. Whenever I need to add a new service, it’s easy to write a recipe which depends on the main nginx recipe to get those LWRP’s, configure the appropriate site-available/enabled files, and otherwise forget about the existence of nginx. Indeed, most of these services run as little Grape or Sinatra apps, so I have a recipe which has a LWRP for setting up the standard configuration of those services in an abstract way, and my individual recipes just use the LWRP it provides and install whatever packages are needed separately.

                                  With ansible (and it sounds like you know, so if you could direct me towards a better nirvana, I’d appreciate it), it seems that the equivalent would be to build an nginx module, with a site action which gets filled in, then to fully mimic the setup, have another ‘ruby_service’ (or otherwise better named) module which relies somehow on the former (I don’t know if there is a way to enforce dependencies between custom modules) to create the abstract service.

                                  I should reiterate, I don’t know that the latter methodology is better or worse, only that it is presently unfamiliar; and seems kind of clunky. One of the nice things about Chef (for many of it’s faults) is that it’s easy to extend the language (in a sense) to suit your particular needs. In the case of my little service cluster, I’m generally not looking for efficiency of running – most of the services remain off (indeed, I use knife to toggle them on, and they all default off after some period of time), rather, it was easy to build the system to make it easy to add new services rapidly, and with minimal boilerplate.

                                  I should also say (since this could be construed as me being pretty down on ansible), that I really like a lot of what Ansible does. I mentioned before (I think) that it’s significantly nicer to use than knife for interactive use (e.g., for actually doing things to machines), and certainly a lot more straightforward in terms of features. It’s also nice that the primary language of configuration is exactly the same as the primary language of manipulation (that is, ansible (the command) responds to all the same actions as ansible the scripting tool. This compared to knife, which does technically, but only as a sort of afterthought). I guess my ideal tool would be one which measures between the wonderful simplicity of ansible the tool, but preserves the ability for DSL creation of chef. I’m mixed as to whether I’d prefer a DSL like chef for the scripting, to the simple YML of ansible. I will say that I like the former for making more complicated LWRPs, but the latter makes me think of ways not to need them, so it’s probably a wash.

                            1. 8

                              I’m wrapping up my LMDB port to Go this week. I just need to get deletion, tree rebalancing and page reclamation working. And a whole bunch of documentation. The library is pre-alpha but here’s the repo for anyone interested in poking around:

                              https://github.com/boltdb/bolt

                              I started using the testing/quick standard library which is a QuickCheck-style library for black box testing. It’s a cool library but I never hear anyone talking about it.

                              1. 2

                                Very cool. I haven’t used LMDB before, how would you compare it in practice with LevelDB?

                                1. 3

                                  I used LevelDB before as a backing store in the past but I had some issues with it when using it from multiple threads using the C API. It worked pretty well but it’s approach is somewhat complicated. It has multiple levels of storage files that have to be compacted periodically so performance can be variable.

                                  LMDB uses a mmap’d B+tree that is updated in-place so there’s no compaction required. The mmap is read-only so data structures can be mapped directly to the underlying data so there’s no memcpy() required. (Writes occur using vectorized I/O on a regular file descriptor)

                                  Ultimately I like the LMDB approach because it’s simple and grokable. My implementation omits the niche features in LMDB and it’s currently 1500 LOC.

                                  1. 2

                                    Thanks, that’s helpful. I’m using LevelDB in a project (on the topic of things we’re working on this week) but I’ve only read about LMDB. In the reading it’s hard to sort out its real merits from the author’s ranting but being able to use values directly from mapped memory sounds useful. One caveat to the LMDB benchmarks for anyone who’s following this and skims them is that they disable compression in LevelDB. This probably speeds it up for the memory workloads but will hurt if you’re using slower storage like AWS EBS volumes.

                                    1. 3

                                      Also, take a look at BangDB.

                                      http://highscalability.com/blog/2012/11/29/performance-data-for-leveldb-berkley-db-and-bangdb-for-rando.html

                                      I’m currently trying to fight through integrating it into a C project (with an autotools flag to switch between LevelDB, BDB, and BangDB).

                                      1. 2

                                        Thanks for that! BangDB looks pretty awesome, and I’ll probably give it a try later. I’ve been looking at cross-platform kv store as a persistence layer for a lua project. My initial thought was leveldb, and while it’s supposed to support windows I couldn’t get it to compile at all. I ended up using UnQLite (and wrote luajit bindings for it), but I’m still open to anything with a decent license.

                                  2. 2

                                    https://symas.com/is-lmdb-a-leveldb-killer

                                    We state quite clearly that LMDB is read-optimized, not write-optimized. I wrote this for the OpenLDAP Project; LDAP workloads are traditionally 80-90% reads. Write performance was not the goal of this design, read performance is. We make no claims that LMDB is a silver bullet, good for every situation. It’s not meant to be – but it is still far better at many things than all of the other DBs out there that do claim to be good for everything.

                                    Disclaimer: I have no experience with LMDB. I maintain one of the ruby LevelDB wrappers: https://github.com/vjoel/ruby-leveldb-native.

                                1. 5

                                  Likely going to pivot on the game I was working on last week. Finally put some time in on figuring out the game design and mechanics, and will be changing up a few things.

                                  Also, I’d like to add more features to the unofficial lobste.rs chrome extension (https://bitbucket.org/sirpengi/lobx) I whipped up in response to https://lobste.rs/s/ulyq3l/feature_request_ability_to_hide_posts

                                  1. 1

                                    Well there is a “request” to hide users https://lobste.rs/s/87phed/proposal_a_filter_for_users

                                  1. 3

                                    Drowning in callback hell trying to use IndexedDB.

                                    1. 1

                                      (assuming you’re using javascript) I’ve found Promises greatly alleviate the problem of callback-hell

                                      1. 1

                                        Wow, that was fast!

                                        1. 1

                                          I’ve done chrome extension work before, and with backbone + backbone.LocalStorage doing all the heavy lifting it was pretty easy to get out. Please give it a try and send any issues/requests to the bitbucket repo!

                                          1. 1

                                            All the same, I’m impressed. And so far, so good. Works like a charm.

                                      1. 1

                                        Last week I reported two projects (recommendation as a service, lua powered 2d game), and it’s pretty much the same, although I’ve only made progress on the game, and haven’t touched the recommendation service any (although I really should, I want something half-baked before heading into the upcoming startup-weekend).

                                        On the game side of things, I’ve implemented some gameloop hotspots (the lighting in particular) in C, and made a subreddit to post WIP videos.

                                        1. 2

                                          Just found out about stackmachine from this. It looks awesome! I’m probably more excited than most because I’m working on an indie game using love2d, so this fits right up my alley as something I’d probably want to use as a distribution.

                                          Do you have any docs about how to prepare binary elements for distribution? (my game requires a few compiled shared libraries: I’ve got a few .so files for linux and .dll for windows). I can see there being issues between whether you’re providing 32 or 64bit versions.

                                          Also, no linux download? Or is that in the works?

                                          (Just tried a signup, and after submission I’m led to a 404 page? But it appears my account was created just fine, and I’ve made my way to my dashboard now?)

                                          1. 4

                                            My company is investigating Clojure as a data science tool (we’re heavily invested in the Java ecosystem, but top data scientists generally dislike Java) and I’m working on a library that will improve Clojure’s debugging story, including that pertaining to some classes of performance bugs and numerical issues. I hope to have it into the open-source world in the next month or two.

                                            1. 1

                                              You should look into python. Between scipy and pandas, you have everything you need for data sciencey things. And if you’ve heavily invested in a hadoop infrastructure (I assume that’s what you mean by java ecosystem), https://github.com/Yelp/mrjob is totally awesome.

                                            1. 4

                                              I always have a few projects that I jump between. Currently the list is:

                                              2d platforming game in Lua (using love2d) - https://bitbucket.org/sirpengi/lovestory/ My latest success is porting my lighting code into C. Luajit’s ffi makes that super easy to do, and it gained me back a lot of legroom to implement more game logic in Lua.

                                              Recommendation engine as a service: http://savant-api.com/ SIte is just a placeholder. Still building out the backend.

                                              1. 2

                                                A “2d world-building game for couples” sounds intriguing. Could you give a more detailed explanation of what you envision the game being?

                                                1. 1

                                                  if you’ve ever played Terraria (or the newer in beta StarBound), you’ll already know what I mean by 2d world builder. MineCraft is another (more famous) world builder, but 3d. I enjoy all of these games, and I always play with groups of friends, but one thing I feel lacking is any co-op dynamics. Nothing in their game designs reward team play over solo play. If you play together you just become effectively twice as powerful, with twice the inventory.

                                                  I don’t yet have a clear vision of what I want yet, but that’s the direction I’m leaning towards.

                                              1. 1

                                                To me the thing stopping me from using Go is that you can’t write shared libraries with it: https://code.google.com/p/go/issues/detail?id=256 I generally work in python (although a lot of lua lately) and I’d like the option to rewrite CPU intensive bits in go rather than C. However, using go isn’t possible.

                                                1. 1

                                                  Looks exciting, and glad to see makeplaylive actually ship a piece of hardware. I remember they had a kde tablet (called Vivaldi now?) that’s been on the cusp of release for ages (took preorders and everything a long while ago). It’s been a long time since any encouraging news came out on that tablet.