1. 2

    Looking at how the cockroachdb folks implemented their distributed datastore and the years it took, it is extremely hard and takes a lot of validated tests (like how foundationdb did them) to ship a production-grade fault-tolerant product.

    @aphyr, hopefully the arangodb folks contact you to test/validate their raft-based datastore. Both them and cockroachdb have friendly apache2 licenses so that’s always a good sign!

    1. 14

      And also how booking.com deceives clients

      Such a terrible company.

      1. 2

        I’m trying to understand what you are after with the “single executable” part?

        1. 2

          Self-contained. For the most part controversy I guess? :-)

          1. 5

            Right. That controversy you may have. I guess we have rather differing interpretations of self-contained.

            $ file start.sh

            start.sh: POSIX shell script, ASCII text executable
            

            $ file target/hprotostuffdb-rjre

            target/hprotostuffdb-rjre: ELF 64-bit LSB executable
            

            $ grep JAR start.sh

            JAR=comments-all/target/comments-all-jarjar.jar
            $BIN $PORT comments-ts/g/user/UserServices.json $ARGS\
              $PUBSUB $ASSETS -Djava.class.path=$JAR comments.all.Main
            

            $ objdump -p target/hprotostuffdb-rjre |grep RPATH

            RPATH                $ORIGIN:$ORIGIN/jre/lib/amd64/server
            

            $ objdump -p target/hprotostuffdb-rjre |grep NEEDED

            NEEDED               libpthread.so.0
            NEEDED               libjvm.so
            NEEDED               libcrypto.so.1.0.0
            NEEDED               libssl.so.1.0.0
            NEEDED               libz.so.1
            NEEDED               libstdc++.so.6
            NEEDED               libgcc_s.so.1
            NEEDED               libc.so.6
            

            $ find . -name ‘*so’

            ./target/jre/lib/amd64/server/libjsig.so
            ./target/jre/lib/amd64/server/libjvm.so
            ./target/jre/lib/amd64/libzip.so
            ./target/jre/lib/amd64/libnet.so
            ./target/jre/lib/amd64/libjava.so
            ./target/jre/lib/amd64/libnio.so
            ./target/jre/lib/amd64/libverify.so
            

            I’m not even going into the rest of the jre scaffolding. I guess you could argue the stuff under comments-ts is not part of the “comment-engine”, but it’s there, and it (or something equivalent) is needed anyway. Admittedly, only two of the files in the entire package have the ‘executable’ flag set, so you can have half your cake if that’s the criteria for being self-contained :-)

            1. 4

              Thanks for the detail response.
              It was my way of showing people that jvm apps can have “golang-style” deployments where you ship a binary and run and be only 12MB (my production nginx binary is 14MB)

              But realistically, if you have the jvm installed, the jar is only 455kb and that is only the one that needs to be shipped along with the 92kb js and 7.1kb css. That is how I deploy internally.

              With golang, you do not have this choice.

              1. 4

                Ah, so now I am starting to see the points that you are really trying to make.

                1. Bundling of dependencies. I don’t think there’s much novelty to it; proprietary and floss applications alike have distributed binaries with bundled dependencies for a long long time. This includes many applications that bundle a jvm.

                2. A jvm bundle can be reasonably small. Admittedly I haven’t paid attention to it, but I’ve had jvm bundles before, and I don’t recall them being outrageously large.

                Calling it a “single executable” or self-contained might not be the best terminology to get the point across. Even more so when you consider that the executable also depends on many libraries that are not bundled; see objdump output above and compare to the list of bundled shared objects. Any one of these system libraries could go through an ABI change (in worst case one where existing symbols are removed or modified to work in an incompatible way, without symbol versioning…), and when that happens, your uhm.. self-contained single executable thingy won’t run right, or at all. It’s not just a theoretical concern, it’s something people using binary distributions of proprietary software (e.g. games) may have to feck with rather often.

                I can’t comment on how this compares to golang deployments, which I’ve never used.

                1. 1
                  1. Pretty much agree.
                  2. A lot of ppl dismiss the jvm as bloated (in terms of size and runtime memory). I guess it all depends how one uses it (along with the knobs to tune). I run mine at 128mb memory max, and that could handle 130k req/s. My usage of the jvm is like a stored-procedure language though. All the heavy lifting is on the C++ libs that I’m depending on.

                  I understand your points and appreciate your comments. Cheers

                2. 2

                  Recent versions of go have additional build modes: https://golang.org/cmd/go/#hdr-Description_of_build_modes

                  Theoretically you could deploy your code as a plug-in.

          1. 1

            Coming back to it a day later, listByPostId takes 200msec to load! Maybe a progressive download system would be better? I can only see the first three comments without scrolling on my screen.

            1. 1

              I’ve added that as an enhancement on gh.

              Once implemented, the config would look like:

              // the root comments to initially fetch
              window.comments_initial_limit = 3
              
              // collapse all replies by default (this is already available)
              window.comments_collapse_depth = 1
              
            1. 1

              Neat!

              You could get it to load a little faster if the js file had the CSS embedded in it (i.e. by creating tags dynamically)

              You could get it to load a lot faster if you split the code into multiple modules: Most people don’t make a comment, so no need to bring the editor, etc along just to render some things.

              Have you seen how Yahoo! does their JS ads? They create an iframe, and then have some JS code synchronize the width/height with the parent. This makes it possible for them to have a lot of control over the rendering/layout, but it also makes it easier to improve without introducing bugs on your users’ sites.

              1. 1

                Hey @geocar, I’ve come across you before from your “fast-servers” article. Great read!

                You could get it to load a little faster if the js file had the CSS embedded in it (i.e. by creating tags dynamically)

                Linking to my reply about customization

                You could get it to load a lot faster if you split the code into multiple modules: Most people don’t make a comment, so no need to bring the editor, etc along just to render some things.

                There’s not much overhead really since the editor is simply a textarea and a callback function to directly post the comments to the server :-)

                Have you seen how Yahoo! does their JS ads? They create an iframe, and then have some JS code synchronize the width/height with the parent. This makes it possible for them to have a lot of control over the rendering/layout, but it also makes it easier to improve without introducing bugs on your users’ sites.

                That’s pretty cool. I would say that you can also just use an iframe and embed the script from there, in order to limit it from affecting the host site.

                The css is designed to fill whatever height/width it’s parent has.

                1. 1

                  The css is designed to fill whatever height/width it’s parent has.

                  An example might be more clear. Try: http://ptest.ysm.yahoo.com/SampleYPASearch/?p=cars and see how the iframe and the script communicate to fill the space.

              1. 1

                Another question: How does your project compare to isso? They seem to be pretty similar. Maybe more general, what is the motivation for your comment engine?

                1. 2

                  From looking at isso’s demo, it’s advantages are:

                  1. smaller js file
                  2. no css to load (maybe it creates tag dynamically as posted by @geocar)
                  3. has upvote/downvote support

                  For #1, the showdown(renderer) + dompurify(sanitizer) alone exceeds the 40kb js file they have.

                  Kudos to them for creating such a small js.

                  For #2, it can be done but separating the css actually gives the user flexibility.

                  One can easily customize by not including the link tag and instead replace it with an inline style like:

                  <style>
                  #comments button { /* button style here */ }
                  // etc
                  </style>
                  

                  #3 is relatively easy to implement, so it can be added if requested

                  The motivation to be honest is to demonstrate how to rapidly create real-world apps with protostuffdb, which is essentially leveldb + uWebSockets

                  Once the latter is fully integrated (next release), realtime comments are effortlessly doable.

                1. 3

                  What I would personally like is a system that works without Javascript but I am not sure if that is possible at all in a static blog setting.

                  (I posted this under your comment in another thread but I think it fits better here.)

                  1. 4

                    It would not be possible I’m afraid. For one, you need to sanitize the comments posted by users. Without js, that cannot be done.

                    (On second thought, putting it in an iframe and doing everything server-side could work?)

                    1. 11

                      You can’t rely on client side sanitation.

                      1. 5

                        I don’t sanitize on user input/submit, but on render, which addresses xss

                        1. 2

                          Thanks for clarifying.

                      2. 5

                        (On second thought, putting it in an iframe and doing everything server-side could work?)

                        That’s the approach I was thinking of, and then just have the engine recompile the comments thread whenever a new comment is accepted.

                        At some point though one wonders if it makes sense.

                        1. 2

                          At some point though one wonders if it makes sense.

                          Word.

                          The demo right now is served by a $2.5 vultr instance and for self-hosters, it makes perfect sense to simply respond with json, and let the client-side do all the expensive sanitization and rendering.

                          1. 6

                            If sanitization is done client-side, can’t it be bypassed with javascript hackery?

                            1. 3

                              The comments are sanitized on render, not on user input/submit. So a user’s xss scripts could actually be stored in the server’s database but would not be executed on client-side render.

                              1. 2

                                Why not just sanitize on upload? Seems simpler.

                                1. 6

                                  When (not if) you find new issues in your sanitizer you’ll have to reprocess all your existing entries. It’s easier to run it on the fly when needed, with the latest version of your cleanup code.

                                  1. 2

                                    Great point! This extra benefit did not cross my mind.

                                  2. 2

                                    It is indeed simpler but based from the limited research I did on xss, sanitizing on render is the correct way to do it.

                                    1. 4

                                      Validate on input. Sanitise on render.

                                2. 2

                                  Security-related stuff can never be done of the client when you don’t trust the client. It’s whole reason for investment into Trusted Computing.

                                  1. 1

                                    Man in the middle, or if you’ve exploited whoever is serving the JS, it can certainly be bypassed. But in that case, why bother making a comment when you can just insert JavaScript wherever you want in the page itself.

                                    JavaScript hackery in the comment? Only if you wish to attack yourself. All other clients/users will still sanitize it away, because they don’t have your manipulated variant that does not have sanitization. That said, improper sanitization or broken sanitization would be vulnerable, but that’s merely because it’s, well, broken.

                                    If I’m wrong (I have this vague feeling that I am), please, do tell me.

                                    1. 2

                                      I interpreted sanitization to mean “this comment is approved for submission into the database”. If you can override that sanitization function to always return true, then you can submit whatever comments you like.

                                      It would be something of a waste to sanitize after-the-fact. Imagine watching the page load hundreds of spammy comments, then hide them. Not only would it look odd, but it would slow down the page load.

                                      Even worse, a comment could attempt SQL injection to attack the server. SQL sanitation should always be done on the server.

                                      1. 1

                                        Ah, yes, of course. SQL sanitation, and anything of a similar nature, should be done server-side. I was thinking more along the lines of unexpected HTML appearing in comments as opposed to an attack on the server.

                                        Additionally, comments could be sanitized on an as-needed basis, and be hidden by default.

                                        1. 1

                                          I interpreted sanitization to mean “this comment is approved for submission into the database”.

                                          That is how it is usually interpreted. But in this case, the sanitization is done on render.

                                          Even worse, a comment could attempt SQL injection to attack the server. SQL sanitation should always be done on the server.

                                          Its a good thing that the backend (leveldb) is immune to sql injection :-)

                                          1. 1

                                            It would be something of a waste to sanitize after-the-fact. Imagine watching the page load hundreds of spammy comments, then hide them. Not only would it look odd, but it would slow down the page load.

                                            Spammy comments (xss or not), is solved by a moderator. Stripping the content and specifying “flagged as spam” or “flagged as inappropriate” like other news sites do.

                                            I’ll be adding comment updates (for moderators only?) in the next release.

                                            P.S. Might add realtime support after that since the current http backend is powered by uWebSockets

                                          2. 1

                                            Man in the middle, or if you’ve exploited whoever is serving the JS, it can certainly be bypassed. But in that case, why bother making a comment when you can just insert JavaScript wherever you want in the page itself.

                                            Right. Barring any openssl vulnerabilities, serving the comments/assets over https should prevent that?

                                            1. 1

                                              It should, but an attacker could, through various means, convince the target to trust their certificate, perhaps by asking them to “click some buttons” to enable access to a different website. Also, the vast majority of people would probably ignore warnings that are given by browsers about certificates, depending on the warning. Or, perhaps the attacker has crafted their own certificates. Or some of your HTTPS-ified code happens to request something over HTTP for some reason.

                                              The world is fraught with peril! But yes, it should prevent that.

                                      2. 1

                                        “That’s the approach I was thinking of, and then just have the engine recompile the comments thread whenever a new comment is accepted.”

                                        This was what I arrived at in a high-level design, too. It’s also simple enough to generate that the generator could also be highly optimized and safe with current tooling. The analysis part is overhead that would already happen in a dynamic design so is immaterial.

                                      3. 2

                                        Yeah but that would kind of defeat the purpose of a static blog. An option would be to let a commenter commit a file to the blog (not sure how) and then somehow include the file in the blog post. Recompiling could be done with git hooks but the other stuff I am not sure about.

                                        1. 5

                                          Yeah but that would kind of defeat the purpose of a static blog

                                          Blogs are fairly static, updates are very seldom. Commenting however, is dynamic and would be better implemented with a database imho, especially if you consider the very-nested nature of comments. How would you model that efficiently from static files?

                                          For me it boils down to efficiency. Blogs are most efficiently served by static files, and comments by a database. (edited for formatting)

                                          1. 3

                                            “How would you model that efficiently from static files?”

                                            Pick a data structure that’s good for working with comments, read/dump that in efficient format (eg LISP expressions), and cache popular files in memory.

                                            1. 2

                                              Hmm. Doing that in c++ with a serialization library like capnproto could definitely work. I believe this is the perfect usecase for the said library.

                                              Using flatbuffers would not be efficient since you cannot modify the buffers after building it (for persistence). Although for messaging, it is great (I’ve deployed in on my projects)

                                          2. 1

                                            ikiwiki is a (old and mature) static blog (and wiki and whatever) engine. It has plugins for comments. It uses a CGI script which takes the form inputs and regenerates the pages.

                                      1. 2

                                        (local Markdown dweeb steps into scene)

                                        Would you consider using something like markdown-it for CommonMark compliance?

                                        1. 1

                                          Sure. A PR would be very much appreciated :-)

                                          1. 1

                                            I was halfway there to a dev environment, but unfortunately protostuffdb doesn’t include a macOS binary!

                                            1. 1

                                              It is very trivial to support since its a unix. But I don’t have a macos machine to test against :-/

                                              As a temporary workaround, you can still develop the client-side part.

                                              Just edit index.html and change:

                                              window.rpc_host = 'http://127.0.0.1:5020'
                                              

                                              to

                                              window.rpc_host = 'https://rpc.dyuproject.com'
                                              

                                              And comment out:

                                              window.comments_post_id = 1
                                              
                                              1. 2

                                                Thanks! I’ve made the PR. I certainly understand the concern about code size, so feel free to ignore or otherwise hack apart the PR as you see fit.

                                                1. 1

                                                  Thanks for the PR!

                                                  I’m thinking this could be the default impl and have showdown as an extra dist/build.

                                                  Anyone else think this is a good idea?

                                                  I could see some (most?) js devs out there think like:

                                                  150kb minified for a comment engine? No thanks.

                                                  Especially when it is hosted on their site/blog

                                                2. 1

                                                  Thanks! I’ve made the PR. I certainly understand the concern about code size, so feel free to ignore or otherwise hack apart the PR as you see fit.

                                            2. 1

                                              Looking at the minified size, markdown-it seems to be almost 3x larger than showdown. We could definitely support 2 js distributions, giving the user the option to choose.

                                            1. 2

                                              This is my last week of my “sabbatical” and knowing that it’s coming to an end I’ve been being a lot better about spending my time on projects.

                                              Right now I’m working on writing a note-taking app. It’s mostly an excuse to learn how to use GTK+ with rust. It’s my first non-completely-trivial thing I’ve done with GTK, so it’s been quite a learning experience. However, I’m hoping to also end up with a useful app at the end.

                                              Screenshot of what I’ve got so far: http://tinyimg.io/i/pvIpNm9.png and the super messy code: https://gitlab.com/azdle/onefold Still have a long way to go.

                                              1. 2
                                                1. 1

                                                  merp. Was set to private, public now.

                                                  1. 2

                                                    Don’t worry. I can relate. I’ve a lot of projects on gitlab but only a few are public :-)

                                              1. 4

                                                I just finished my 1-week sprint of a plug-n-play comment engine for websites and especially static blogs.

                                                Now time for some rest :-)

                                                1. 1

                                                  Hey I thought about something similar, sounds pretty cool :) what I would personally like is a system that works without Javascript but I am not sure if that is possible at all in a static blog setting.

                                                  1. 1

                                                    What’s cool? The sprint part or the commenting part? Answering you on your other post.

                                                1. 2

                                                  Working on getting my first private beta release ready for my Electron-based remote Linux server admin tool (http://www.serverwrangler.com/).

                                                  1. 1

                                                    Interesting. I subscribed even though I managed all my servers via ssh cli