1. 4

    has anyone gotten a blue lobster yet?

    1. 2

      It seems certain given our volume of traffic, but I don’t check the logs for it and they get logrotate’d out after a week or two. Might be fun to add a cron job to the Lobsters repo to grep for them.

      1. 4

        you should make it so the person who gets the blue lobster gets a blue lobster hat

        1. 2

          While I love the idea or a blue lobster hat or even a listing of who & when saw the blue lobster, this however could lead to some people trying to abuse the system to get the achievement and by the nature of how this would go, that process would generate arbitrary load on lobsters.

      2. 1

        rain1’s profile pic was a blue lobster.

      1. 2

        Aided with dynamic typing, which has to do all type checks at runtime means compiler time optimizations are not as trivial. So number crunching is best left to the JVM, Go, or other languages that compile to native. It is no surprise that floating-point operations on Erjang, the version of Erlang running on the JVM, was 5000% faster than the BEAM.

        You could use a Rust/C++/C NIF for the number-crunching part and still benefit from Erlangs concurrency. Isn’t this the case?

        1. 2

          You could use a Rust/C++/C NIF for the number-crunching part and still benefit from Erlangs concurrency. Isn’t this the case?

          Regarding number-crunching performance in languages. The most intensive computations frequently performed now are orchestrated by python scripts using numpy and various machine learning libraries. It’s not python, it’s heavily optimized C++ and Fortran doing calculations - I’m just stating that to think how important pure number crunching power is versus the ability to interoperate with a language that can.

          Erlang is reduction based and writing a NIF is potentially problematic for long running computations as that would block your process from generating reductions and yielding back control. You would make a NIF for quick ops but not something that would take a long time to compute. Now Erlang has another mechanism called ports where you essentially communicate with an external program with a byte oriented interface. That way you can have a component to perform such heavy computations and have Erlang feed and consume data to/from it. It will likely not be great for all purposes but it should be more than enough for a lot of use cases.

        1. 5

          To be honest, I’d choose Go, C# ASP.net MVC or Node.js. Node.js would probably be the easiest to start with, but any tricky numeric logic would be better served in Go or C#.

          It really depends on the project, but those three languages stand out in my head for not being crazy hard to learn and having decent websocket support.

          Elixir also has a strong support for websockets and concurrency, but it’s can be a bit to learn.

          1. 5

            What about learning Elixir is difficult? Understanding BEAM? I was considering Phoenix

            1. 7

              I am currently learning Erlang (properly this time around) for fun, intend to dig into Elixir and Phoenix after that.

              I don’t think you need to know Erlang or BEAM internals to start with Elixir. I think the parent might be referring to the scope of the ecosystem - there is a lot you can learn like OTP, distributed computing, fault tolerance. If you want to start from Erlang I went with Joe Armstrongs thesis and now following up with Learn you some Erlang.

              I had past Rails experience and Phoenix seems very easy to jump into from the rails perspective, it’s also pretty impressive. Just a few days ago I stumbled on this video which shows how to build a real-time twitter clone in 15 minutes. If you have rails background you probably seen a very similar video for rails about building a blog. Give it a try and see if that’s what you are looking for in a framework.

              Erlang and Elixir both have been heavily used in the gaming industry. For example the Call of Duty backends use(d?) it, Discord is also using it. I found a similar slide showing off Elixir for a game backend but can’t locate it atm.

              It would need to support secure websockets and have decent concurrency (think a game).

              It does seem to perfectly fit for what you want to build, of course that is assumed after the quote above. It might be perfect or completely unacceptable based on what exactly you want to do.

              1. 4

                Elixir on its own isn’t too bad. I’ve tried to learn Phoenix a couple times for side projects. At least for me, learning Phoenix will have to be a project unto itself. Not that it’s crazy, there are just enough moving parts a default Phoenix application exposes at once that I feel the need understand that it gets a bit overwhelming for a small project.

                There’s also the fact that Elixir is functional and immutable, which can be an adjustment as well.

                If you’re familiar with Rails, Phoenix might be easier to understand, but I don’t have that background.

                That being said, if it’s highly interesting to you, it’s not a crazy amount to learn.

            1. 3
              jamie=# select null = null;
               ?column?
              ----------
               f
              (1 row)
              

              What version of Postgres behaves like this?

              fudo=# select version();
                                                                version                                                  
              -----------------------------------------------------------------------------------------------------------
               PostgreSQL 9.2.10 on x86_64-unknown-freebsd10.4, compiled by gcc (FreeBSD Ports Collection) 4.8.5, 64-bit
              (1 row)
              
              fudo=# select null = null;
               ?column? 
              ----------
               
              (1 row)
              
              fudo=# 
              
              fudo=# select version();
              ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
              │                                                                       version                                                                       │
              ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
              │ PostgreSQL 11.5 on amd64-portbld-freebsd12.0, compiled by FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1), 64-bit │
              └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
              (1 row)
              
              Time: 0.581 ms
              fudo=# select null = null;
              ┌──────────┐
              │ ?column? │
              ├──────────┤
              │          │
              └──────────┘
              (1 row)
              
              Time: 0.275 ms
              
              1. 1

                Oh oops, I must have screwed up while pasting stuff. Thanks.

              1. 14

                So looking at their pricing page, it looks like a number of features will only be available on public repos, and you still need to get a plan to get those features on your private repos (draft PRs, code owners, protected branches; scroll down a bit on that page for a table).

                Also looks like GitHub actions and the package storage is a big part of their pricing strategy now, whereas before (up to a year ago or so) it was mostly about charging for private repos.

                1. 2

                  It’s interesting to me that GitHub seems to be aligning with Bitbucket on this one (https://bitbucket.org/product/pricing) rather than Bitbucket playing catch up.

                  I have to admit this makes me less likely to want to work on my personal open source git hosting project because moves like this make users less likely to move over… because a small open source solution will never have the same feature set something like GitHub does.

                  1. 20

                    I have to admit this makes me less likely to want to work on my personal open source git hosting project

                    So I started working on my own analytics project last year, with the plan to make this a sustainable source of income, and every time someone announces a “we’ve built a new analytics product” here or on HN I’m a little bit discouraged. When I started last year, there were very few alternatives; now: many more.

                    Then I look at the new products, and many are neat and seem well-built but … they’re also different than what I’m building in various ways (technical, UI, business), so then I’m like “this is okay, no problem!”

                    This is also the value stuff like Sourcehut gives us. For me, personally, I think sourcehut is a terrible product, I would not enjoy using it at all because the workflow just doesn’t jibe with me. But clearly it is useful for many people, and it’s certainly offering something different than the GitHub and clones of the world, so I think it’s a cool project just for that, regardless of my personal opinion of it.

                    If you’re working on a “GitHub clone”, then yeah, it may not be worth it to continue (which was already the case before, since there are a whole bunch of them already). But if you’re working on something that solves the same problem in a different way: then there’s certainly value in that, and if you do it well enough you should get users/customers regardless of what GitHub is doing.

                    That’s just my 2c on that anyway; focus on your own story and don’t be too distracted on other people’s stories.

                    1. 5

                      So I started working on my own analytics project last year, with the plan to make this a sustainable source of income, and every time someone announces a “we’ve built a new analytics product” here or on HN I’m a little bit discouraged. When I started last year, there were very few alternatives; now: many more.

                      Usually you should be happy when this happens. Competition showing up is validation that the area you are investing in or the project you are building actually has a market. Don’t get discouraged, the pie is often big enough to feed everyone at the table and it’s better to eat a pie in company than stare at an empty plate alone.

                      1. 2

                        Yeah, agreed, and I’m happy there are alternatives people can choose from. Also helps keep me sharp.

                      2. 3

                        Wow, thanks for the quality response. I’ve been feeling a bit down lately and I think a bit of it just came out in my original post… I wasn’t expecting such a well thought out response.

                        At least as far as I can tell, there seems to be a bunch of missing tools that stick to simplicity. Goatcounter does a fantastic job of this in the analytics space. That was one of the original goals of my project as well - a simple way to securely deploy git repos, definitely not a GitHub clone, The original plan was mostly for personal use but after taking a closer look at Goatcounter, maybe I’ll have to see if I can come up with some small enterprise focused features as well.

                        In any sense, thanks for the encouragement, I really do appreciate it.

                      3. 2

                        this makes me less likely to want to work on my personal open source git hosting project

                        I get what you are saying, but on the other hand this is the best time to push for such solutions. I think there is room for more of these types of projects.

                        1. 1

                          I don’t know, Github should catch up to others at this point, imo. Paid private repos was always a negative thing to me and when they sold to Microsoft, a lot of peoples trust in Github was betrayed. Its a good move for them right now and should have been done sooner in my opinion. I do find myself using alternatives for private projects and Github is sort of a portfolio for me. I don’t really see myself migrating all my private repos to GH in the future though.

                      1. 11

                        Every time an on-disk database page (4KB) needs to be modified by a write operation, even just a single byte, a copy of the entire page, edited with the requested changes, is written to the write-ahead log (WAL). Physical streaming replication leverages this existing WAL infrastructure as a log of changes it streams to replicas.

                        First, the PostgreSQL page size is 8KB and has been that since the beginning.

                        The remaining part. According to PostgreSQL documentation[1] (on full page writes which decides if those are made), a copy of the entire page is only written fully to the WAL after the first modification of that page since the last checkpoint. Subsequent modifications will not result in full page writes to the WAL. So if you update a counter 3 times in sequence you won’t get 3*8KB written to the WAL, instead you would get a single page dump and the remaining two would only log the row-level change which is much smaller[2]. This is further reduced by WAL compression[3] (reducing the segment usage) and by increasing the checkpointing interval which would reduce the amount of copies happening[4].

                        This irked me because it sounded like whatever you touch produces an 8KB copy of data and it seems to not be the case.

                        [1] - https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-FULL-PAGE-WRITES

                        [2] - http://www.interdb.jp/pg/pgsql09.html

                        [3] - https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-WAL-COMPRESSION

                        [4] - https://www.postgresql.org/docs/11/runtime-config-wal.html#GUC-CHECKPOINT-TIMEOUT

                        1. 8

                          Under BeOS, to provide some compatibility with Unix, files had an owner and a group. The owner of all files user ID 0, whose name was “Baron”

                          1. 5

                            In OpenSolaris, the default username was “jack”. For whatever reason. The never ending source of confusion for all people trying OpenSolaris for the first time.

                            1. 3

                              Similarly, Oracle has a default user of Scott with password Tiger

                              In the online Select Star mailing list, the question was asked-Why was Scott’s password ’tiger?? Nearly every response got it right.

                              It was Scott’s pet cat called ’tiger?.

                              And, who was Scott? His first name was Bruce not Scott. Bruce Scott was employee number #4 at the then Software Development Laboratories that eventually became Oracle. He co-authored and co-architected Oracle V1, V2 & V3.

                              For those that do not know, Bruce Scott went on to co-found Gupta Technology (now Centura Software) with Umang Gupta in 1984.Where is he today? He’s the CEO and founder Pointbase, Inc.

                              http://www.dba-oracle.com/t_scott_tiger.htm

                          1. 67

                            What should people use instead?

                            Real secure messaging software. The standard and best answer here is Signal,

                            Oh please. They aren’t even close to sharing the same level of functionality. If I want to use Signal, I have to commit to depending on essentially one person (moxie) who is hostile towards anyone who wants to fork his project, and who completely controls the server/infrastructure. And I’d have to severely limit the options I have for interfacing with this service (1 android app, 1 ios app, 1 electron [lol!] desktop app). None of those are problems/restrictions with email.

                            I don’t know what the federated, encrypted ‘new’ email thing looks like, but it’s definitely not Signal. Signal is more a replacement for XMPP, if perhaps you wanted to restrict your freedom, give away a phone number, and rely on moxie.

                            1. 14

                              I think Matrix is getting closer to being a technically plausible email and IM replacement.

                              The clients don’t do anything like html mail, but I don’t think I’d miss that much, and the message format doesn’t forbid it either.

                              1. 28

                                If you can’t send patches to mailing lists with them then they’re not alternatives to email. Email isn’t just IM-with-lag.

                                1. 5

                                  Email can be exported as text and re-parsed by Perl or a different email client.

                                  Until that functionality is available, I won’t consider something a replacement for email.

                                  1. 4

                                    In all fairness: cmcaine says “Matrix is getting closer”.

                                    1. 3

                                      Matrix is a federated messaging platform, like XMPP or email. You could definitely support email-style use of the system it’s just that the current clients don’t support that. The protocol itself would be fine for email, mailing lists and git-send-email.

                                      The protocol also gives you the benefits of good end-to-end encryption support without faff, which is exactly what general email use and PGP don’t give you.

                                      1. 2

                                        Adding patch workflow to Matrix is no different to adding it to XMPP or any other messaging solution. Yes, it is possible but why?

                                        I can understand you like Matrix but it’s not clear how Matrix is getting closer to e-mail replacement with just one almost-stable server implementation and the spec that’s not an IETF standard. I’d say Matrix is more similar to “open Signal” than to e-mail.

                                        1. 2

                                          “Getting closer” is a statement towards the future, yet all of your counter arguments are about the current state.

                                          1. 3

                                            If I only knew the future I’d counter argument that but given that the future is unknown I can only extrapolate the current and the past. Otherwise Matrix may be “getting closer” to anything.

                                            Do you have any signs that Matrix is getting e-mail patch workflow?

                                      2. 2

                                        Mailing lists could move to federated chatrooms. They moved from Usenet before, and in some communities moved to forums before the now common use of Slack.

                                        I’m not saying it would be the best solution, but it’s our most likely trajectory.

                                        1. 6

                                          Mailing lists existed in parallel with Usenet.

                                          1. 5

                                            Both still exist :)

                                            I do think, actually, that converting most public mailing lists to newsgroups would have a few benefits:

                                            1. It’d make their nature explicit.
                                            2. It’d let us stop derailing designs for end-to-end encryption with concerns that really apply only to public mailing lists.
                                            3. I could go back to reading them using tin.

                                            Snark aside, I do think the newsgroup model is a better fit for most asynchronous group messaging than email is, and think it’s dramatically better than chat apps. Whether you read that to mean slack or any of the myriad superior alternatives to slack. But that ship sailed a long time ago.

                                            1. 4

                                              Mailing lists are more useful than Usenet. If nothing else, you have access control to the list.

                                              1. 2

                                                Correct, and the younger generation unfamiliar with Usenet gravitated towards mailing lists. The cycle repeats.

                                              2. 4

                                                Mailing lists don’t use slack and slack isn’t a mailing list. Slack is an instant messaging service. It has almost nothing in common with mailing lists.

                                                It’s really important to drive this point home. People critical of email have a lot of good points. Anyone that has set up a mail server in the last few years knows what a pain it is. But you will not succeed in replacing something you don’t understand.

                                                1. 4

                                                  The world has moved on from asynchronous communication for organizing around free software projects. It sucks, I know.

                                                  1. 3

                                                    Yeah. Not everyone, though.

                                                    Personally I think that GitHub’s culture is incredibly toxic. Only recently have there been tools added to allow repository owners to control discussions in their own issues and pull requests. Before that, if your issue got deep linked from Reddit you’d get hundreds of drive by comments saying all sorts of horrible and misinformed things.

                                                    I think we’re starting to see a push back from this GitHub/Slack culture at last back to open, federated protocols like SMTP and plain git. Time will tell. Certainly there’s nothing stopping a project from moving to {git,lists}.sr.ht, mirroring their repo on GitHub, and accepting patches via mailing list. Eventually people will realise that this means a lower volume of contributions but with a much higher signal to noise ratio, which is a trade-off some will be happy to make.

                                                    1. 2

                                                      Only recently have there been tools added to allow repository owners to control discussions in their own issues and pull requests. Before that, if your issue got deep linked from Reddit you’d get hundreds of drive by comments saying all sorts of horrible and misinformed things.

                                                      It’s not like you used to have levers for mailing lists, though, that would stop marc.org from archiving them or stop people from linking those marc.org (or kernel.org) threads. And drive-bys happened from that, too. I don’t think I’m disputing your larger point. Just saying that it’s really not related to the message transfer medium, at least as regards toxicity.

                                                      1. 3

                                                        Sure, I totally agree with you! Drive-bys happen on any platform. The difference is that (at least until recently) on GitHub you had basically zero control. Most people aren’t going to sign up to a mailing list to send an email. The barrier to sending an email to a mailing list is higher than the barrier to leaving a comment on GitHub. That has advantages and disadvantages. Drive-by contributions and drive-by toxicity are both lessened. It’s a trade-off I think.

                                                        1. 3

                                                          I guess I wasn’t considering a mailing list subscription as being meaningfully different than registering for a github account. But if you’ve already got a github account, that makes sense as a lower barrier.

                                            2. 5

                                              Matrix allows sending in the clear, so I suppose this has the “eventually it will leak” property that the OP discussed?

                                              (A separate issue: I gave up on Matrix because its e2e functionality was too hard to use with multiple clients)

                                              1. 5

                                                (A separate issue: I gave up on Matrix because its e2e functionality was too hard to use with multiple clients)

                                                and across UA versions. When I still used it I got hit when I realized it derived the key using the browser user agent, so when OpenBSD changed how the browser presented itself I was suddenly not able to read old conversations :)

                                                1. 2

                                                  Oh! I didn’t know that!

                                            3. 5

                                              Functionality is literally irrelevant, because the premise is that we’re talking about secure communications, in cases where the secrecy actually matters.

                                              Of course if security doesn’t matter then Signal is a limited tool, you can communicate in Slack/a shared google doc or in a public Markdown document hosted on Cloudflare at that point.

                                              Signal is the state of the art in secure communications, because even though the project is heavily driven by Moxie, you don’t actually need to trust him. The Signal protocol is open and it’s basically the only one on the planet that goes out of it’s way to minimize server-side information storage and metadata. The phone number requirement is also explicitly a good design choice in this case: as a consequence Signal does not store your contact graph - that is kept on your phone in your contact store. The alternative would be that either users can’t find each other (defeating the point of a secure messaging tool) or that Signal would have to store the contact graph of every user - which is a way more invasive step than learning your phone number.

                                              1. 9

                                                even though the project is heavily driven by Moxie, you don’t actually need to trust him

                                                Of course you must trust Moxie. A lot of the Signal privacy features is that you trust them not to store certain data that they have access to. The protocol allows for the data not to be stored, but it gives no guarantees. Moxie also makes the only client you can use to communicate with his servers, and you can’t build them yourself, at least not without jumping hoops.

                                                The phone number issue is what’s keeping me away from Signal. It’s viral, in that everyone who has Signal will start using Signal to communicate with me, since the app indicates that they can. That makes it difficult to get out of Signal when it becomes too popular. I know many people that cannot get rid of WhatsApp anymore, since they still need it for a small group, but cannot get rid of the larger group because their phone number is their ID, and you’re either on WhatsApp completely or you’re not. Signal is no different.

                                                And how can you see that a phone number is able to receive your Signal messages? You have to ask the Signal server somehow, which means that Signal then is able to make the contact graph you’re telling me Signal doesn’t have. They can also add your non-Signal friends to the graph, since you ask about their numbers too. Maybe you’re right and Moxie does indeed not store this information, but you cannot know for sure.

                                                What happens when Moxie ends up under a bus, and Signal is bought by Facebook/Google/Microsoft/Apple and they suddenly start storing all this metadata?

                                                1. 5

                                                  Signal is a 501c3 non-profit foundation in the US, Moxie does not control it nor able to sell it. In theory every organization can turn evil but there is still a big difference between non-profits who are legally not allowed to do certain things vs corporations who are legally required to serve their shareholders, mostly by seeking to turn a profit.

                                                  And how can you see that a phone number is able to receive your Signal messages? You have to ask the Signal server somehow, which means that Signal then is able to make the contact graph you’re telling me Signal doesn’t have.

                                                  There are two points here that I’d like to make, one broader and one specific. In a general sense, Signal does not implement a feature until they can figure out how to do that securely and with leaking as little information as possible. This has been the pattern for basically almost every feature that Signal has. Specifically, phone numbers are the same: The Signal app just sends a cryptographically hashed, truncated version of phone numbers in your address book to the server, and the server responds with the list of hashes that are signal users. This means that Signal on the server side knows if any one person is a Signal user, but not their contact graph.

                                                  1. 3

                                                    In theory every organization can turn evil

                                                    Every organization can also be bought by an evil one. Facebook bought WhatsApp, remember?

                                                    The Signal app just sends a cryptographically hashed, truncated version of phone numbers in your address book

                                                    These truncated hashes can still be stored server-side, and be used to make graphs. With enough collected data, a lot of these truncated hashes can be reversed. Now I don’t think Signal currently stores this data, let alone do data analysis. But Facebook probably would, given the chance.

                                                    1. 6

                                                      Every organization can also be bought by an evil one. Facebook bought WhatsApp, remember?

                                                      WhatsApp was a for-profit company, 501(c)3 work under quite different conditions. Not saying they can’t be taken over, but this argument doesn’t cut it.

                                                2. 3

                                                  The phone number requirement is also explicitly a good design choice

                                                  No, it’s an absolutely terrible choice, just like it is a terrible choice for ‘two factor authentication’

                                                  Oh but Signal users can always meet in person to re-verify keys, which would prevent any sim swap attack from working? No, this (overwhelmingly) doesn’t happen. In an era where lots of people change phones every ~1-2yr, it’s super easy to ignore the warning because 99% of the time it’s a false positive.

                                                  The alternative would be that either users can’t find each other (defeating the point of a secure messaging tool)

                                                  This is a solved problem. I mean, how do you think you got the phone numbers for your contacts in the first place? You probably asked them, and they probably gave it to you. Done.

                                                3. -8

                                                  Careful there… you can’t say bad things about electron in here….

                                                1. 2

                                                  Fuzziness warning:

                                                  I remember using Postgresql years ago and that I had trouble finding a canonical way of escaping user terms. E.g. Include a quoted user provided term in a more complex full text search query. Maybe there exist an easy answer but I did not find it.

                                                  1. 3

                                                    Not sure I understand your requirements but were you looking for something like websearch_to_tsquery which translates a Google like query ie.

                                                    signal -“segmentation fault”

                                                    into

                                                    ‘signal’ & !( ‘segment’ <-> ‘fault’ )

                                                    which means find documents with signal present and segmentation followed by fault not present.

                                                    https://www.postgresql.org/docs/11/textsearch-controls.html

                                                    It is quite new (appeared in PostgreSQL 11) so it might have not existed when you last checked.

                                                    The <-> (followed by) operator appeared in PostgreSQL 9.6

                                                    1. 1

                                                      Probably shouldn’t have commented. It was around 9 years ago.

                                                      I wanted to insert a user provided term into a premade query string. Fully quoted so that it means “exact match”. As far as I remember, I did not find documentation on how to quote safely in the query language and no function to do so.

                                                      Edit : I mean quoting inside a query string and not on the SQL level.

                                                      1. 2

                                                        I have run into this fairly recently.

                                                        There’s a builtin function for ‘simple’ search (which is almost what you’ve asked for), but there’s still no ‘just find documents that match this string even if it has query directives in it’.

                                                        1. 1

                                                          A bit weird, isn’t it?

                                                          It assumes that if we want to provide more complex queries to the user, they have to use one of the provided syntaxes on the database level. If you want to translate from a different query language, you really need quoting.

                                                          Probably easy to provide inside of Postgresql as a function.

                                                          1. 1

                                                            It’s not weird. Full Text Search works because it tokenizes & stemmes complex documents. Then performs a search on that.

                                                            This way words:jump jumping jumped all get indexed as ‘jump’. Then any search for either form will match. Searching verbatim how it stands in text, including all the stop words that get evicted during parsing/stemming would counter all the performance you gain from full text search and at that point you just perform regular string matching.

                                                            1. 1

                                                              That’s not what either me or pkolloch are talking about.

                                                              to_tsquery parses its inputs, which are interpreted in the postgres tsquery language. There is no builtin input-escaping function for that language. As far as I know, the characters :|*'&() have special meaning, but there may be others.

                                                              You can use plainto_tsquery if you only have user input, but if you want to combine user input with non-user input for powerful searches, you are going to run into trouble quite rapidly.

                                                          2. 1

                                                            You want prepared statements

                                                            https://www.postgresql.org/docs/11/sql-prepare.html

                                                            which can safely interpolate and quote user provided inputs. Slightly less safe (easier to misuse) are all the available string manipulation functions from the quote_ family and format (https://www.postgresql.org/docs/11/functions-string.html).

                                                            Additionally most good database drivers handle proper quoting themselves (like psycopg2 for python) - there are a lot of gotchas but in all cases that has been a solved problem.

                                                            1. 1

                                                              Yes, I used prepared statements or some quoting framework support.

                                                              Not talking about quoting on the SQL level, that’s clear but inside of a query string.

                                                      1. 0

                                                        by replicating only files we can easily achieve a simple synchronization solution without tempering with databases itself

                                                        This makes it immediately not useful with most databases (MySQL, PostgreSQL, SQLite, Oracle) as a sanpshot of disk state is not a recoverable consistent point.

                                                        They seem to be aware of that:

                                                        Although the base idea is very simple, it can only be adapted with databases that meet certain criteria. Firstly data updates must be operated atomically on a single file, so that the peer-to-peer system can propagate individual updates. Secondly most database systems use memory mapped files and mutate data in unpredictable positions all over the files of its transaction log and mutable indices.

                                                        They open up with

                                                        For simplicity we leave the conflict resolution for another time.

                                                        That is one of the main hard problems to solve with database replication. If they don’t even approach this then what does it has to offer?

                                                        1. 1

                                                          It’s just a proof of concept used to explain the general idea of replication. The point is to explain the idea with a simple example.

                                                        1. 7

                                                          The big bad boy servers like postgres have quite a set up cost…

                                                          ie. It will take you a fair amount of time and effort before you can even say ‘select “hello world”;’

                                                          Can I recommend sqlite…?

                                                          Incredibly small, simple and easy fast and rock solid.

                                                          It’s utterly ubiquitous (odds on it is in your browser AND your phone!) Not just one instance either.

                                                          The documentation is excellent https://sqlite.org/lang.html

                                                          And the gui tools are lovely. https://sqlitebrowser.org/dl/

                                                          Or you can an firefox addon that allows you to go poking about all the sqlite db’s in your browser (or elsewhere) http://add0n.com/sqlite-manager.html

                                                          Lastly I cannot help but have my favourite whinge.

                                                          Database professionals have lost their way hopelessly.

                                                          Their core ideas come from relational algebra…. and they have forgotten them or never quite understood them.

                                                          Once you have understood the basics of SQL and want to level up…..

                                                          Read “SQL and Relational Theory: How to Write Accurate SQL Code” by C.J. Date and take him deadly seriously.

                                                          Suddenly your SQL becomes much much smaller and more reliable.

                                                          Of course, I suggested you move away from Postgres unless you need it……

                                                          …but when you do need a serious SQL server, I strongly recommend it.

                                                          1. 2

                                                            The big bad boy servers like postgres have quite a set up cost…

                                                            ie. It will take you a fair amount of time and effort before you can even say ‘select “hello world”;’

                                                            in what context? $package_manager install, add a user/db, then open the command line client and start writing SQL? roughly the same for MariaDB and postgres

                                                            1. 1

                                                              Modern package systems have made it hugely much easier than it was… but still, it’s a fair bit to read and understand (what’s a user, what’s a db, what’s a role, oh dear, the DB notion of a users and grants is utterly unrelated to the unix notion… but I need to have both) and then some stuff to do before you ready to go compared to sqlite.

                                                              If the aim is just learning sql, sqlite is just so easy.

                                                              1. 1

                                                                I thought that too. Main counterpoint I can think of is the backup strategy of “copy the file” vs pg_dump, but tbh even that isn’t a huge amount.

                                                                1. 1

                                                                  Main counterpoint I can think of is the backup strategy of “copy the file” vs pg_dump

                                                                  “copy the file” is not a secure backup strategy for sqlite ;) Regardless of the tech you choose you will have some homework to do: https://www.sqlite.org/backup.html

                                                                  1. 2

                                                                    “Copy the file” will lose your most recent writes, but I thought it always yields a transactionally-consistent view of the database (it has to, as a power failure could stop writes at any time).

                                                                    I take daily backups, and I don’t care if they are actually backing up data from 10 minutes earlier than the scheduled backup time.

                                                                    1. 1

                                                                      “Copy the file” will lose your most recent writes, but I thought it always yields a transactionally-consistent view of the database (it has to, as a power failure could stop writes at any time).

                                                                      You are mixing up two failure cases. The sqlite DB itself is transactionally-consistent and will recover on power loss during writes.

                                                                      ‘Dumb copy’ of a consistent sqlite file that fails on write during copy may end up being corrupt. You’re on the mercy of the filesystem to recover that and even if so the resulting file may not be usable by sqlite itself as there is no write ahead log for that copy operation the db could recover from.

                                                                      The docs I linked above mention exactly that scenario:

                                                                      if a power failure or operating system failure occurs while copying the database file the backup database may be corrupted following system recovery.

                                                                      1. 1

                                                                        I’d be astonished if a power failure mid-backup ever resulted in a working backup.

                                                                        That said, it looks like vacuum into is probably what I actually want.

                                                              2. 2

                                                                Database professionals have lost their way hopelessly.

                                                                Their core ideas come from relational algebra…. and they have forgotten them or never quite understood them.

                                                                How so? What do you base this assertion on? I will agree that many people that work with databases lack basic relational theory knowledge and this is a problem. But what is specific about “Database Professionals”? How does it differ from [insert your programming language] professionals?

                                                                Once you have understood the basics of SQL and want to level up….. Read “SQL and Relational Theory: How to Write Accurate SQL Code” by C.J. Date and take him deadly seriously.

                                                                This is certainly good advice. Note the word basics. If one doesn’t dig into relational theory fundamentals, then there is really the very basics of SQL that can be understood.

                                                                It would also recommend the original paper by Codd, where it all started (so to speak) https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

                                                                1. 2

                                                                  Reliance on autoincrement keys is a clue. It immediately tells you they don’t actually know what their primary key is.

                                                                2. 2

                                                                  I should definitely mention sqllite. Even if, as some sibling comments mention, installing MySQL or Postgres is pretty easy with a package manager, new developers are far less likely to be stymied by bizarre admin or install errors with sqllite.

                                                                1. 3

                                                                  w3schools uuughhhhh

                                                                  https://meta.stackoverflow.com/questions/280478/why-not-w3schools-com

                                                                  I have reached a point with them that I literally filter them out in google searches.

                                                                  1. 2

                                                                    It’s funny to see a discussion on Stack Overflow bashing quality of another resource while it itself is known for highly misleading and often damaging answers.

                                                                    1. 1

                                                                      Pot calling kettle of course, but w3schools is historically pretty rough. They are simply saying avoid using it as a citation because it is often incorrect. Now I hear they’ve tried to clean up their act and all that but there’s so many better resources. Even just getting a decent book on the subject would do you far better than trying to read w3schools.

                                                                  1. 1

                                                                    So what can you do? It turns out that in some databases like PostgreSQL, there is an easy solution for that:

                                                                    create index concurrently index_name on table(column);

                                                                    Yes, that’s it. This will switch the default behavior of the locking table and will create the index in parallel to standard operations that your app is doing on the DB.

                                                                    This is dangerous advice. Before blindly following do read the documentation and understand the trade offs. It’s not the default behavior for a reason:

                                                                    There are several caveats to be aware of when using this option — see Building Indexes Concurrently.

                                                                    source: https://www.postgresql.org/docs/11/sql-createindex.html

                                                                    source: https://www.postgresql.org/docs/11/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

                                                                    1. 1

                                                                      Good point, it made sense in our project, but as with everything there are drawbacks, and it is a bad idea to copy-paste from blog to project without researching on your own.

                                                                    1. 2

                                                                      Zero Zero Zero by Roberto Saviano describing the global cocaine market. Famously the book was found in one of El Chapo hideouts.

                                                                      Had they asked, authorities might have learned that Guzmán, leader of the notorious Sinaloa Cartel, would have preferred Zero Zero Zero, the 2013 book by Roberto Saviano that explored the world of drugs and dirty money. That was the book he left on his bedside table last October when he was forced to flee his hideout in Las Piedrosas, in the heart of Sinaloa Cartel country in the Golden Triangle.

                                                                      source: https://mexiconewsdaily.com/news/a-depressed-el-chapo-gets-to-enjoy-a-book/

                                                                      I also started reading on the bus to work. Should have done that ages ago. I didn’t before as the trip is very short (roughly 8 minutes) but it’s still enough to make progress and spend the time better and sometimes with increased traffic it is a quality of life change :)

                                                                      1. 2

                                                                        Slow count(*) in Postgres is actually a bit of a FAQ

                                                                        1. 7

                                                                          This is unrelated to the actual article and what it explores.

                                                                          The blog talks about the difference between count(1) and count(*). The former being by many historically considered faster as it was often assumed to be optimized by the database as a constant not null literal that doesn’t require re-evaluation whereas count(*) would require evaluating the whole row.

                                                                          The blog debunks this and shows that what is considered faster is actually slower in PostgreSQL. What you are pointing out is actual counting performance, which is not measured in this post.

                                                                          1. 1

                                                                            Right, sorry for the confusion. It has not been my day!

                                                                        1. 1

                                                                          Nice article and a good general overview.

                                                                          The first linked youtube video is dead, removed. There is no way to check what it was linking to.

                                                                          There is no mention of jsonpath: https://www.youtube.com/watch?v=70dBszaO67A (PDF: https://www.pgcon.org/2019/schedule/attachments/539_jsonpath-pgcon-2019.pdf)

                                                                          Which is very relevant, considering that a large portion of the article compares query syntax and indexing. This is even relevant now if you happen to use jsquery.

                                                                          edit: strikeout for the jsonpath nit, I see it is indeed mentioned in the conclusion.

                                                                          1. 9

                                                                            I recently finished Gomorra by Roberto Saviano, a very emotional, engaging and enraging testimonial of devastation and corruption happening right under our nose. Read it if you think organized crime is distant, with no direct impact on your life.

                                                                            Right now I’m going through The Art of Strategy.

                                                                            1. 4

                                                                              I’ve read Gomorroa a couple of years ago. It’s indeed quite terrifying.

                                                                            1. 1

                                                                              Erm… I didn’t read the entire article, but all I do is createdb (and createuser) for each project. Is there anything wrong or worse with this approach vs. what is described in the OP?

                                                                              1. 2

                                                                                Nothing wrong. This however is worthwhile because it makes it easier to kick-start people not used to Postgres. I recall an episode of @garbage.fm where our own @jcs when asked why lobste.rs was using MySQL said something to the like that this was the database he was already used to and he never remembered how to quickly setup Postgres DB access for the app for local development while MySQL just worked out of the box.

                                                                                1. 2

                                                                                  Not at all, same that I do. If one of the main issues is getting new developers up and running I’ve found https://postgresapp.com/ super useful as you can have multiple postgres versions running and it handles updates and all that for you. It’s mac only though but I don’t think it would be too hard to make something similar for mac and windows as well.

                                                                                  One of the first things I do in a db-backed project as well is setup db migration and db instantiation code so that I can get projects up and running with minimal fuss moving forward. It’s too easy to get into a kludge of SQL and other scripts peppered all over and lost one-off statements modifying your database schema/data while you’re in early/heavy development. Getting db migration stuff working right away means new developers on the project don’t have to worry about that stuff too much if it isn’t in their domain.

                                                                                  1. 2

                                                                                    The difference is that your approach changes the global state of the system. Including for example having a daemon running and potentially listening on the network, as mentioned in the article.

                                                                                    With the article’s approach, you don’t have to worry about name collisions between the database names, or with leaving behind junk in the system program after getting rid of a project.