1. 3

    It’s probably way out of the intended scope, but could Mitogen be used for basic or throwaway parallel programming or analytics? I’m imagining a scenario where a data scientist has a dataset that’s too big for their local machine to process in a reasonable time. They’re working in a Jupyter notebook, using Python already. They spin up some Amazon boxes, each of which pulls the data down from S3. Then, using Mitogen, they’re able to push out a Python function to all these boxes, and gather the results back (or perhaps uploaded to S3 when the function finishes).

    1. 3

      It’s not /that/ far removed. Some current choices would make processing a little more restrictive than usual, and the library core can’t manage much more than 80MB/sec throughput just now, limiting its usefulness for data-heavy IO, such as large result aggregation.

      I imagine a tool like you’re describing with a nice interface could easily be built on top, or maybe as a higher level module as part of the library. But I suspect right now the internal APIs are just a little too hairy and/or restrictive to plug into something like Jupyter – for example, it would have to implement its own serialization for Numpy arrays, and for very large arrays, there is no primitive in the library (yet, but soon!) to allow easy streaming of serialized chunks – either write your own streaming code or double your RAM usage, etc.

      Interesting idea, and definitely not lost on me! The “infrastructure” label was primarily there to allow me to get the library up to a useful point – i.e. permits me to say “no” to myself a lot when I spot some itch I’d like to scratch :)

      1. 3

        This might work, though I think you’d be limited to pure python code. On the initial post describing it:

        Mitogen’s goal is straightforward: make it childsplay to run Python code on remote machines, eventually regardless of connection method, without being forced to leave the rich and error-resistant joy that is a pure-Python environment.

        1. 1

          If it are just simple functions you run, you could probably use pySpark in a straight-forward way to go distributed (although Spark can handle much more complicated use-cases as well).

          1. 2

            That’s an interesting option, but presumably requires you to have Spark setup first. I’m thinking of something a bit more ad-hoc and throwaway than that :)

            1. 1

              I was thinking that if you’re spinning up AWS instances automatically, you could probably also configure that a Spark cluster is setup with it as well, and with that you get the benefit that you neither have to worry much about memory management and function parallelization nor about recovery in case of instance failure. The performance aspect of pySpark (mainly Python object serialization/memory management) is also actively worked on transitively through pandas/pyArrow.

              1. 2

                Yeah that’s a fair point. In fact there’s probably an AMI pre-built for this already, and a decent number of data-science people would probably be working with Spark to begin with.

        1. 3

          What did your query plan look like? Did you consider a materialized view or any other technique that’d let you solve this in the database?

          1. 2

            I also was curious if they’d exhausted the SQL options. Appears from the brief Group Builder example, they could’ve generated raw SQL queries that hit all the right indexes, without doing the unions and group by.

            As a query builder, this Group Builder looks pretty slick. The technical discussion about their history and solutions is also interesting and instructive. The Go language propaganda is kind of distracting from an other wise interesting article. Not saying that Go didn’t provide the benefits described, it just seems a little irrelavant.

            1. 2

              It looks to me as if they are just constructing boolean expressions in the web interface, e.g., the first screenshot corresponding to (as far is I understand it):

              ...
              WHERE gender != 'male' AND ( age > 50 OR censor_rating_r18 < 0.2 )
              

              (it’s not clear to me how they actually store/process the censor rating, so i’m just guessing here)

              So it’s a matter of translating from the interface to a boolean expression with “Subgroup”s mapped to parentheses and comparisons based on exclude/include specification.

              I don’t really see the necessity for unions, or am I missing something?

          1. 2

            The graphs are especially interesting if one uses the Random Article feature Wikipedia offers, for example:

            but it’s not always that fascinating, since sometimes there just aren’t enough links:

            The only thing I’m afraid of it that this isn’t spamming the Wiki servers too much. But from what I’m reading out of the network trace, they’re only downloading the pictures, and have the article links cached somewhere. It’s certainly better than on of the first Perl scripts I wrote to check the “Clicks to Hitler” Game, which will probably be unplayable from now on.

            Edit: I missed the GitHub link, and it says there that they use a database dump wikipedia updates each month.

            Edit 2: Also interesting, if one take the last link from the first list, and reverses the two, one get’s a much more boring graph: 4 degrees of separation from Antichloris eriphia to COMIT.

            1. 6

              I find relations with many links boring, if you have enough links it’s obvious you can relate anything to to anything else, what I find very funny though are short links:

              Anime to obesity in two steps.

              Philosophy to unemployment in two steps.

              Mathematics to virginity in two steps.

                1. 1

                  Beautiful.

                2. 3

                  The GitHub repository itself has a section with interesting combinations.

                1. 1

                  After browsing the wiki for a bit and discovering Jd (a J columnar RDBMS), I remembered K (and kdb+), as both the syntax and the immediate mention of a columnar RDBMS reminded me of it. So it seems, both are APL-inspired.

                  1. 1

                    You can also locate military bases in sparsely populated areas: https://twitter.com/tobiaschneider/status/957317886112124928

                    1. 1

                      A couple of German universities use it to give researchers easy access to cloud computing: https://www.bw-cloud.org/en/project

                      1. 12

                        I never understood the popularity of solarized. It lacks contrast and makes my eyes hurt.

                        1. 12

                          There was a blog post which said it was made with science or whatever. Science can’t be wrong.

                          1. 4
                            1. 3

                              The implication that the goodness of something so subjective can be quantified really irks me. However, I think a lot of people ate this up, as I’ve seen people non-ironically citing this as a reason it is good.

                              1. 2

                                I hear it’s Cave Johnson’s favorite IDE color scheme.

                              2. 5

                                I’m more and more in favour of highlighting comments more than the individual parts in the code (variables, strings, …) – and I find that comments often have the least contrast :(

                                1. 3

                                  In Visual Studio Code you can quite easily try this out since you can add your own customizations to the highlighting in the settings. For instance, you could add

                                      "editor.tokenColorCustomizations": {
                                          "comments": "#e1a866"
                                      }
                                  

                                  to change the color of all comments.

                                2. 1

                                  I think it depends a lot on lighting. I use the dark theme at evening/night, and don’t have a lot of light in the room. More contrast rich themes like Monokai hurt my eyes in that setting.

                                  The Solarized theme that comes with Visual Studio Code actually uses a base color with more contrast than the original design. But I find that rather annoying in the light theme, especially since they also use bold.

                                  1. 2

                                    More contrast rich themes like Monokai hurt my eyes in that setting.

                                    That makes sense. It’s funny, at night I will continue using typical white-on-black high-contrast color schemes but just drop the monitor brightness a lot if I happen to be hacking away in the dark. Usually I just turn the lights on, though.

                                    1. 1

                                      For me both variants of Solarized are difficult to read in the daytime on a nice display and borderline unusable at any time of day on a low-end display. On the other hand, I find high-contrast dark themes too harsh, so I tend to use dark themes that are somewhere in the middle (~#999 on ~#222) and higher-contrast light themes (~#222 on ~#f5f5f5).

                                      1. 1

                                        gruvbox dark works well for me :)

                                        1. 2

                                          I think the red is perhaps, well, a bit too red in gruvbox. The (over-)use of red/orange/pink in many Solarized themes was part of the reason I made this variant.

                                          Darktooth is another interesting gruvbox-like theme.

                                    2. 1

                                      Agree on the importance of contrast. Lots of color themes are happy to use tons of different colors on things that aren’t completely semantically different (a numeric literal doesn’t always need to stand out a lot) while ignoring the more subtle details such as contrast.

                                      I want the attention to detail Solarized has, but with more contrast, and something besides an ocean or a piece of parchment as the background. I’ve been using a version of Github’s color scheme in my editor for awhile, but have yet to really find a color theme that I really like.

                                    1. 1

                                      When my students draw diagrams (E/R, Venn, …) using Preview.app’s markup tools I cannot really blame them, as it essentially gets the job done. However, using 100 circles to fill another circle with a color (I’m not making this up) does not quite seem right to me. Although I would have preferred a proper drawing tool (actually LaTeX/TikZ), which would look much more professional in my opinion, they solved the exercise. I still find it difficult to argue why I think the presentation is as much contributing to solving the exercise as the content. After all, we are communicating thoughts/information, I guess?

                                      1. 4

                                        I end up using ncdu a lot in practice, and it solves my problems.

                                        I used to use a TreeMap program (on Windows) called SpaceMonger, which was both pretty and useful. I always wanted something like that on Linux. There are dozens of similar programs, but somehow the visualization was not as good.

                                        ncdu doesn’t really visualize much, but the key it is that it’s fast enough and shows enough to get the job done quickly (identify the very few big files you should delete).


                                        Somewhat of a tangent, if I ever get around to writing a visual version of ncdu, I would now use flame graphs and not treemaps:

                                        http://www.brendangregg.com/blog/2017-02-06/flamegraphs-vs-treemaps-vs-sunburst.html

                                        I mentioned this in another comment: treemaps are really hard to get right from the user perspective. The preponderance of programs that are “not as good as SpaceMonger” shows this. I think that flame graphs will work perfectly for this. But admittedly I haven’t tried it yet.

                                        1. 1

                                          I’ve thought about using flamegraphs for it before too, Brendan has an interesting post where he tried it and it seems to be a decent way to visualise it - http://www.brendangregg.com/blog/2017-02-05/file-system-flame-graph.html

                                          1. 2

                                            I find duviz (which was linked to in a comment on this blog post) really helpful as well!

                                        1. 1

                                          Serious question: isn’t it super scary to run “real” things on SQLite? Or are things like backup tools also well developed for SQLite at a “reasonable load”.

                                          1. 5

                                            Depends on how real you want to get. As a backup strategy, copying a file works pretty reliably and isn’t subject to bugs in the replication log or whatever. Easy to verify the copy is correct, too.

                                            1. 3

                                              I don’t know why it would be.

                                              It’s unusual for web applications, since they have to setup servers and all of that anyway, but SQLite is used by a ton of standalone applications.

                                              1. 2

                                                It is the most used sql db on the planet, why would that be scary?

                                                1. 2

                                                  The test suite is quite substantial as well.

                                              1. 4

                                                […] Stash is extremely useful when someone randomly asks you to check out another branch, but you’re right in the middle of something.[…]

                                                I like to use git worktree for that. Let’s me quickly check out any other branch (or create new ones with small hotfixes) without disturbing what I am currently working with or me having to manage k other clones of the same repository.

                                                1. 32

                                                  As a person with super shitty eyesight I’ve noticed that Solarized is probably the worst color scheme I’ve ever used.

                                                  Designers - don’t forget about those of us who need a little extra contrast :)

                                                  1. 13

                                                    I have good eyesight and I still find it a terrible color scheme. It’s just trendy, it’s not objectively “good” on any level.

                                                    1. 11

                                                      Try gruvbox, the most important color scheme in computing history.

                                                      1. 2

                                                        Gruvbox does look a ton better. If I wasn’t so lazy I might switch from my default Emacs scheme to this!

                                                        1. 4

                                                          For Emacs, I find darktooth to be a nice gruvbox-based color scheme with enough contrast (ymmv). Plus it’s a few keystrokes away on MELPA.

                                                          1. 2

                                                            Sweet! I’ve grabbed it and am using it at the moment. So far, I’m annoyed by the lack of contrast in org-mode, but otherwise seems good.

                                                            clarification: the dates in org-mode. Other links are fine. So, maybe this is ok!

                                                      2. 8

                                                        I never understand how most make comments barely visible (even for me still having relatively good eyesight). I mean they’re like the most important part of code, for me anyways. Dozens of high-contrast colors for every keyword, then 0 contrast for comments against the background…

                                                        1. 2

                                                          It tells a lot about the person who made the color scheme; the only comments in their code are probably disabled debug code.

                                                        2. 4

                                                          Same, it’s far too low-contrast for me.

                                                          EDIT: and it gets substantially worse if (for some reason), you’re stuck with a less-than-amazing display.

                                                        1. 6

                                                          Another downside they forgot to mention is that vector images are harder to downsize. With bitmaps, you know exactly what value each pixel will have, whereas with vectors you have to lay them out so the image will still be crisp at small sizes after antialiasing. This is what makes good font design more constrained on low-resolution screens, and why most fonts contained a bitmap for each character before screen resolutions got bumped up.

                                                          1. 2

                                                            That’s what caught my eye in the second image. The 16x16 bitmap version looks much better, imho.

                                                            1. 1

                                                              I know hinting is a thing for fonts, but can it also be included in SVGs? I feel like that’s exactly what is needed here.

                                                          1. 6

                                                            The cert is invalid at this site (according to my browser)

                                                            1. 1

                                                              Indeed. The cert was issued to github but is being used my tensorflow…

                                                              1. 2

                                                                Because they redirect transparently to the Github Pages site of their repository, which then ofc is not included in Github’s certificate for its Pages.

                                                            1. 1

                                                              That is a really nice writeup of the project!

                                                              1. 1

                                                                Nice and interesting to read, even for someone who has no experience in embedded programming or electrical engineering like me!

                                                              1. 2

                                                                The cool thing about storage in DNA is the built in redundancy. I once heard a very interesting (although very theoretical) talk about databases in DNA. One hurdle, still, seems to be the slow sequencing speed.

                                                                1. 7

                                                                  This is a very thin wrapper around the psycopg2 library. Not sure why it’s necessary vs. using psycopg2 directly.

                                                                  1. 1

                                                                    It seems nice for a python beginner that is used to SQL and has his data sitting in a db. I can see myself using it for small in-class projects at uni.

                                                                  1. 2

                                                                    Did this seriously just start downloading more than a hundred megabytes as soon as I opened it? That seems pretty user-hostile.

                                                                    1. 1

                                                                      In the first couple of seconds, it downloaded around 4 MB for me. Which is kind of the size of a big website now, anyway.

                                                                      Interestingly, the linked to page (instant.io) complains about my browser being not supported, although the webtorrent one did download right away.

                                                                      1. 1

                                                                        O____o weird. I had opened the tab in the background, so when I switched to it it’d already downloaded 35mb or so.