Threads for bruth

  1. 3

    I understand the edge case and solution, but I personally have never run into this. I suppose when I put a uniqueness constraint on a column, I always expect it to have a value (and thus I also make is not null). I have a hard time thinking of a real use case where you only want one null value in a column.

    1. 1

      I’ve often relied on NULL not being unique in unique constraints on multiple columns, some of which optional. For example, if you’re using a kind of polymorphism, you might have a set of foreign key columns, which are supposed to be be unique combined with some other value. But yeah, just for a single column it’s a bit weird.

      1. 1

        Perhaps if you have a table storing the results of something like

        SELECT w, x, y, sum(z)
        FROM foo
        GROUP BY CUBE (w, x, y)
        

        so you want to ensure that there is (e.g.) only one row with all NULLs for w, x, and y. You could do this with a materialized view, but unfortunately those don’t support partial updates.

        1. 1

          but unfortunately those don’t support partial updates

          Here’s hoping incremental view management makes it to Postgres 15!

          1. 1

            It looks like this just creates appropriate triggers automatically, so this is something you could do manually right now. I believe triggers are generally not as efficient when doing bulk inserts since they operate on each row individually. So if you are updating a bunch of rows you may get a better plan by writing out the query manually.

            1. 2

              so this is something you could do manually right now

              Yes, but imagine how unwieldy and fragile it would get when there are multiple joins and aggregates involved.

              since they operate on each row individually

              Since Postgres 10(look for “AFTER trigger transition tables”), AFTER triggers can operate on all the changed rows in a statement in bulk. Here’s an example. I don’t know where I read it (Edit: here it is), but AFAIK, this feature was actually added in preparation for the upcoming incremental view maintenance feature.

              1. 1

                Very neat. I had held off on using triggers because of this issue. I’ll definitely look into this.

      1. 6

        Company: The Children’s Hospital of Philadelphia - Department of Biomedical and Health Informatics

        Company site: https://dbhi.chop.edu

        Position(s): Bioinformatics Scientist

        Location: Remote, optionally onsite in Philadelphia

        Description: This role will work with bioinformatics scientists and software engineers to operationalize a set of pipelines used to align and annotate whole exome and genome sequencing data. This role will also collaborate with data archivists and metadata librarians to advise as a subject matter expert for genomic research data contributions into Arcus.

        Tech stack: Standard BFX tools, GATK-based pipelines using WDL, managed Cromwell on Terra and via Amazon Genomics CLI. We encourage pushing the tech forward in this space, so we encourage suggestions if you have experience with other BFX tools to expedite researchers workloads..

        Compensation: Competitive salary, vacation, retirement plan with matching, medical/dental/vision/disability/life (single or family) insurance

        Contact: The above job post to apply, or email me at ruthb@chop.edu to setup a video call to talk about the position

        1. 1

          I have an friend that works there and enjoys their job! CHOP is on the top of my list for places to apply if/when I leave grad school.

          1. 2

            I appreciate hearing that! I am sure I can find some paid student/intern positions if you have time in the future.

        1. 40

          Looks like the employee is based in the UK. As you might expect, most of the responses to his announcement are Bad Legal Advice. This comment is also going to be Bad Legal Advice (IANAL!) but I have some experience and a little background knowledge so I hope I can comment more wisely…

          The way FOSS (and indeed all private-time) software development works here for employees is that according to your contract your employer will own everything you create, even in your private time. Opinions I’ve heard from solicitors and employment law experts suggest that this practice might constitute an over-broad, “unfair”, contract term under UK law. That means you might be able to get it overturned if you really tried, but you’d have to litigate to resolve it. At any rate the de facto status is: they own it by default.

          What employees typically do is seek an IP waiver from their employer where the employer disclaims ownership of the side-project. The employer can refuse. If you’ve already started they could take ownership, as apparently is happening in this case. Probably in that scenario what you should not do is try to pre-emptively fork under some idea that your project is FOSS and that you have that right. The employer will likely take the view that because you aren’t the legal holder of the IP that you aren’t entitled to release either the original nor the fork as FOSS - so you’ve improperly releasing corporate source code. Pushing that subject is an speedy route to dismissal for “gross misconduct” - which a sufficient reason for summary dismissal, no process except appeal to tribunal after the fact.

          My personal experience seeking IP waivers, before I turned contractor (after which none of the above applies), was mixed. One startup refused it and even reprimanded me for asking - the management took the view that any side project was a “distraction from the main goal”. Conversely ThoughtWorks granted IP waivers pretty much blanket - you entered your project name and description in a shared spreadsheet and they sent you a notice when the solicitor saw the new entry. They took professional pride in never refusing unless it conflicted with the client you were currently working with.

          My guess is that legal rules and practices on this are similar in most common law countries (UK, Australia, Canada, America, NZ).

          1. 27

            The way FOSS (and indeed all private-time) software development works here for employees is that according to your contract your employer will own everything you create, even in your private time.

            This seems absurd. If I’m a chef, do things I cook in my kitchen at home belong to my employer? If I’m a writer do my kids’ book reports that I help with become privileged? If I’m a mechanic can I no longer change my in-laws’ oil?

            Why is software singled out like this and, moreover, why do people think it’s okay?

            1. 10

              There have been cases of employees claiming to have written some essential piece of software their employer relied on in their spare time. Sometimes that was even plausible, but still it’s essentially taking your employer hostage. There have been cases of people starting competitors to their employer in their spare time; what is or is not competition is often subject to differences of opinion and are often a matter of degree. These are shadow areas that are threatening to business owners that they want to blanket prevent by such contractual stipulations.

              Software isn’t singled out. It’s exactly the same in all kinds of research, design and other creative activities.

              1. 12

                There have been cases of people starting competitors to their employer in their spare time;

                Sounds fine to me, what’s the problem? Should it be illegal for an employer to look for a way to lay off employees or otherwise reduce its workforce?

                1. 4

                  what’s the problem?

                  I think it’s a pretty large problem if someone can become a colleague, quickly hoover up all the hard won knowledge we’ve together accumulated over the past decade, then start a direct competitor to my employer, possibly putting me out of work.

                  You’re thinking of large faceless companies that you have no allegiance to. I’m thinking of the two founders of the company that employs me and my two dozen colleagues, whom I feel loyal towards.

                  This kind of thing protects smaller companies more than larger ones.

                  1. 2

                    …start a direct competitor to my employer, possibly putting me out of work.

                    Go work for the competitor! Also, people can already do pretty much what you describe in much of the US where non-competes are unenforceable. To be clear, I think this kind of hyper competitiveness is gross, and I would much rather collaborate with people to solve problems than stab them in the back (I’m a terrible capitalist). But I’m absolutely opposed to giving companies this kind of legal control over (and “protection” from) their employees.

                    1. 3

                      Go work for the competitor!

                      Who says they want me? Also I care for my colleagues: who says they want them as well?

                      where non-competes are unenforceable

                      Overly broad non-competes are unenforceable when used to attempt to enforce against something not clearly competition. They are perfectly enforceable if you start working for, or start, a direct competitor, profiting from very specific relevant knowledge.

                      opposed to giving companies this kind of legal control

                      As I see it we don’t give “the company” legal control: we effectively give humans, me and my colleagues, legal control over what new colleagues are allowed to do, in the short run, with the knowledge and experience they gain from working with us. We’re not protecting some nameless company: we’re protecting our livelihood.

                      And please note that my employer does waive rights to unrelated side projects if you ask them, waives rights to contributions to OSS, etc. Also note that non-compete restrictions are only for a year anyway.

                      1. 1

                        Who says they want me? Also I care for my colleagues: who says they want them as well?

                        Well then get a different job, get over it, someone produced a better product than your company, that’s the whole point of capitalism!

                        They are perfectly enforceable if you start working for, or start, a direct competitor, profiting from very specific relevant knowledge.

                        Not in California, at least, it’s trivially easy to Google this.

                        As I see it we don’t give “the company” legal control: we effectively give humans, me and my colleagues, legal control over what new colleagues are allowed to do, in the short run, with the knowledge and experience they gain from working with us.

                        Are you a legal party to the contract? If not, then no, it’s a contract with your employer and if it suits your employer to use it to screw you over, they probably will.

                        I truly hope that you work for amazing people, but you need to recognize that almost no one else does.

                        Even small startups routinely screw over their employees, so unless I’ve got a crazy amount of vested equity, I have literally zero loyalty, and that’s exactly how capitalism is supposed to work: the company doesn’t have to care about me, and I don’t have to care about the company, we help each other out only as long as it benefits us.

                      2. 1

                        Go work for the competitor?

                        Why would the competitor want/need the person they formerly worked with/for?

                        1. 1

                          Why did the original company need the person who started the competitor? Companies need workers and if the competitor puts the original company out of business (I was responding to the “putting me out of work” bit) then presumably it has taken on the original company’s customers and will need more workers, and who better than people already familiar with the industry!

                    2. 1

                      Laying off and reducing the workforce can be regulated (and is in my non-US country). The issue with having employees starting competitor products is that they benefit from an unfair advantage and create a huge conflict of interest.

                      1. 2

                        Modern Silicon Valley began with employees starting competitor products: https://en.wikipedia.org/wiki/Traitorous_eight

                        If California enforced non-compete agreements, Silicon Valley might well not have ended up existing. Non-enforcement of noncompetes is believed to be one of the major factors that resulted in Silicon Valley overtaking Boston’s Route 128 corridor, formerly a competitive center of technology development: https://hbr.org/2016/11/the-reason-silicon-valley-beat-out-boston-for-vc-dominance

                        1. 1

                          I don’t think we are talking about the same thing. While I agree that any restriction on post-employment should be banned, I don’t think it is unfair for an organization to ask their employees to not work on competing products while being under their payroll. These are two very different situations.

                        2. 2

                          If the employee uses company IP in their product then sure, sue them, that’s totally fair. But if the employee wants to use their deep knowledge of an industry to build a better product in their free time, then it sucks for their employer, but that’s capitalism. Maybe the employer should have made a better product so it would be harder for the employee to build something to compete with it. In fact, it seems like encouraging employees to compete with their employers would actually be good for consumers and the economy / society at large.

                          1. 1

                            An employee working on competing products on its free time creates an unfair advantage because the employees have access to an organization IP to build its new product while the organization does not have access to the competing product IP. So what’s the difference between industrial espionage and employees working on competing products on their free time?

                            1. 1

                              If the employee uses company IP in their product then sure, sue them, that’s totally fair.

                              That was literally in the comment you responded to.

                    3. 4

                      Joel Spolsky wrote a piece that frames it well, I think. I don’t personally find it especially persuasive, but I think it does answer the question of why software falls into a different bucket than cooking at home or working on a car under your shade tree, and why many people think it’s OK.

                      1. 3

                        Does this article suggest the employers view contracts as paying for an employee’s time, rather than just paying for their work?

                        Could a contract just be “in exchange for this salary, we’d like $some_metric of work”, with working hours just being something to help with management? It seems irrelevant when you came up with something, as long as you ultimately give your employer the amount of work they paid you for.

                        Why should an employer care about extra work being released as FOSS if they’ve already received the amount they paid an employee for?

                        EDIT: I realise now that $some_metric is probably very hard to define in terms of anything except number of hours worked, which ends up being the same problem

                        1. 2

                          Does this article suggest the employers view contracts as paying for an employee’s time, rather than just paying for their work?

                          I didn’t read it that way. It’s short, though. I’d suggest reading it and forming your own impression.

                          Could a contract just be “in exchange for this salary, we’d like $some_metric of work”, with working hours just being something to help with management? It seems irrelevant when you came up with something, as long as you ultimately give your employer the amount of work they paid you for.

                          I’d certainly think that one of many possible reasonable work arrangements. I didn’t link the article intending to advocate for any particular one, and I don’t think its author intended to with this piece, either.

                          I only linked it as an answer to the question that I read in /u/lorddimwit’s comment as “why is this even a thing?” because I think it’s a plausible and cogent explanation of how these agreements might come to be as widespread as they are.

                          Why should an employer care about extra work being released as FOSS if they’ve already received the amount they paid an employee for?

                          As a general matter, I don’t believe they should. One reason I’ve heard given for why they might is that they’re afraid it will help their competition. I, once again, do not find that persuasive personally. But it is one perceived interest in the matter that might lead an employer to negotiate an agreement that precludes releasing side work without concurrence from management.

                          1. 1

                            I only linked it as an answer to the question that I read in /u/lorddimwit’s comment as “why is this even a thing?” because I think it’s a plausible and cogent explanation of how these agreements might come to be as widespread as they are.

                            I think so too, and hope I didn’t come across as assuming you (or the article) were advocating anything that needs to be argued!

                            I didn’t read it that way. It’s short, though. I’d suggest reading it and forming your own impression.

                            I’d definitely gotten confused because I completely ignored that the author is saying that the thinking can become “I don’t just want to buy your 9:00-5:00 inventions. I want them all, and I’m going to pay you a nice salary to get them all”. Sorry!

                      2. 3

                        There is a huge difference: We’re talking about creativity and invention. The company isn’t hiring your for changing some oil or swapping some server hardware. They’re hiring you to solve their problems, to be creative and think of solutions. (Which is also why I don’t think it’s relevant how many hours you actually coded, the result and time you thought about it matters.) Your company doesn’t exist because it’s changing oil, the value is in the code (hopefully) and thus their IP.

                        So yes, that’s why this stuff is actually different. Obviously you want to have exemptions from this kind of stuff when you do FOSS things.

                        1. 2

                          I think the chef and mechanic examples are a bit different since they’re not creating intellectual property, and a book report is probably not interesting to an employer.

                          Maybe a closer example would be a chef employed to write recipes for a book/site. Their employer might have a problem with them creating and publishing their own recipes for free in their own time. Similarly, maybe a writer could get in trouble for independently publishing things written in their own time while employed to write for a company. I can see it happening for other IP that isn’t software, although I don’t know if it happens in reality.

                          1. 3

                            I think the “not interesting” bit is a key point here. I have no idea what Bumble is or the scope of the company, and I speak out of frustration of these overarching “legal” restrictions, but its sounds like they are an immature organization trying to hold on to anything interesting their employees do, core to the current business, or not, in case they need to pivot or find a new revenue stream.

                            Frankly if a company is so fearful that a couple of technologies will make make or break their company, their business model sucks. Technology != product.

                            1. 2

                              Similarly, maybe a writer could get in trouble for independently publishing things written in their own time while employed to write for a company

                              I know of at least one online magazine’s contracts which forbid exactly this. If you write for them, you publicly only write for them.

                          2. 10

                            This is pretty much my (non-lawyer) understanding and a good summary, thanks.

                            If you find yourself in this situation, talk to a lawyer. However I suspect that unless you have deep pockets and a willingness to litigate “is this clause enforceable” through several courts, your best chance is likely to be reaching some agreement with the company that gives them what they want whilst letting you retain control of the project or at least a fork.

                            One startup refused it and even reprimanded me for asking - the management took the view that any side project was a “distraction from the main goal”

                            I think the legal term for this is “bunch of arsehats”. I’m curious to know whether you worked for them after they started out like this?

                            1. 6

                              I think the legal term for this is “bunch of arsehats”.

                              https://www.youtube.com/watch?v=Oz8RjPAD2Jk

                              I’m curious to know whether you worked for them after they started out like this?

                              I left shortly after for other reasons

                            2. 2

                              The way FOSS (and indeed all private-time) software development works here for employees is that according to your contract your employer will own everything you create, even in your private time

                              Is it really that widespread? It’s a question that we get asked by candidates but our contract is pretty clear that personal-time open source comes under the moonlighting clause (i.e. don’t directly compete with your employer). If it is, we should make a bigger deal about it in recruiting.

                              1. 1

                                I would think the solution is to quit, then start a new project without re-using any line of code of the old project - but I guess the lawyers thought of this too and added clauses giving them ownership of the new project too…

                              1. 3

                                Company: The Children’s Hospital of Philadelphia - Department of Biomedical and Health Informatics

                                Company site: https://dbhi.chop.edu

                                Positions: Software Engineer

                                Location: Philadelphia, PA, however currently remote

                                Description: This role will work with a team of software engineers and bioinformatics scientists focused on architecting and implementing optimized data services, interfaces, and integrations for ‘omics data (genomics, proteomics, etc).

                                We are looking for highly creative people who share our mission to advance child health and who will thrive in a continuous learning environment, acquiring and applying both new technical skills and biomedical domain knowledge. Existing bioinformatics knowledge is not required as long as you are willing to learn!

                                Tech stack: Go, Python, Kubernetes, GCP, AWS (however, we encourage other tech where it makes sense)

                                Contact: Apply through the link SWE link above, or message me directly to ask questions

                                1. 4

                                  One question I have about this pattern is how to handle the data. If I keep it in SQLite, then it’s not in the right format to check into Git. If I keep it as CSVs or JSON, then SQLite just becomes a boring implementation detail that doesn’t add much versus other ways of searching and indexing.

                                  1. 4

                                    Yeah I’m not sure what the point of the SQLite is if your data is read-only. You’ll need it in separate files for version control, so any indexes or denormalisation, or views you need could all be done at build time instead of compiling all your data into SQLite and doing those things at runtime using SQLite.

                                    1. 5

                                      SQLite gives you the ability to easily run server-side SQL queries against it. That’s useful even against small amounts of data, and super-useful once you data grows over 100MB or so.

                                      I often use this for search (since SQLite has great FTS built in) - eg https://datasette.io/-/beta?q=fts which currently searches over 1500 items.

                                      I also use it for things like the “Use location” button on https://www.niche-museums.com/ - only just over 100 items at the moment but I hope to continue growing that for years to come.

                                      1. 2

                                        I suppose if you still need or want to rely on SQL itself as the language for expressing queries then you would want this. Otherwise even if you have an in-memory data structure, you would need to implement the lookups and what not yourself.

                                        1. 1

                                          Yeah, I see the value of using Datasette: hey, someone built a whole database viewer website for me, so I don’t have to! I see less value in using SQLite for some arbitrary other baked data site, as opposed to “the data are a bunch of CSVs/JSON files that get get loaded into memory as needed”. It would have to be something where, e.g., I want SQLite to be my full text search engine specifically.

                                      2. 1

                                        So, I think I ended up doing something similar with https://bible.junglecoder.com (see https://github.com/yumaikas/rumination/tree/main/json_bible). I didn’t have a name for it at the time, but yeah.

                                        1. 1

                                          That’s a similar pattern but not quite the same because it looks like you don’t have any server-side code running against the data.

                                          1. 1

                                            I do have a tiny amount of logic in https://github.com/yumaikas/rumination/blob/main/serve.janet, but none of it is transforming the data as it stands.

                                            1. 1

                                              My apologies, yeah that’s totally the baked data pattern.

                                        2. 1

                                          I like keeping the data in git - most of my sites using this pattern have content stored in a git repo as YAML or as a bunch of Markdown files. I also often pull the data from CSV files stored in git (since a bunch of places publish data as CSV on GitHub these days - eg the data I pull into https://covid-19.datasettes.com )

                                        1. 1

                                          Thank you for doing this research and writing it up! I am looking forward to reading. There may be other terms, but did you happen to evaluate the term “architect?” I feel like it is in a similar boat of being analogous to “engineer.”

                                          My personal view is that words do have meaning, but their meaning differs based on context. Likewise, words may be useful to different audiences of people. What is generally always true is needing to explain what I do beyond just the term “engineer” or “developer” or “architect” or “programmer.”

                                          This may be a odd analogy, but I think about the term “cook” vs “chef.” Many of the best chefs refer to them as cooks.. so maybe words also attempt to reflect status or humbleness.

                                          1. 1

                                            “Software architect” is already a title and it’s gotten a bit of a negative connotation.

                                            https://www.joelonsoftware.com/2008/05/01/architecture-astronauts-take-over/

                                          1. 8

                                            First off, congrats on putting this out there. There have been many attempts to replace or abstract over SQL with few successes. In the README you note the target audience:

                                            It is designed for use by data engineers, analysts and data scientists.

                                            Based on my experience these folks have different degrees of experience with SQL (ranging from 0-100) and likewise the comfort/willingness to learn alternate languages.

                                            Can you speak more towards who the actual audience is? Likewise, what are the actual advantages of Preql over well-written, properly managing SQL code? It’s all code at the end of the day, so what advantages would inspire me to use this?

                                            1. 8

                                              Thanks!

                                              There have been many attempts to replace or abstract over SQL with few successes

                                              My approach is to use the existing SQL databases, and always allow a fallback to SQL if necessary. I’m not aware of other attempts that took this approach (there is one library for R that does it halfway)

                                              what are the actual advantages of Preql over well-written, properly managing SQL code?

                                              I’d say there are many advantages. For example, it is very hard to re-use code in SQL, because there is very little support for functions. Even Postgres, that supports defining them, won’t let you pass them as arguments. In Preql functions are first-class, and act just like functions do in Python or Javascript. Also, since the functions (and the rest of the code) are part of the client, and not the server, you can use version control, such as git, to keep history, code review, accept pull-requests, and so on.

                                              Another one is type-safety. SQL’s approach to types causes a lot of trouble, for example SELECT "a" + 1; returns 1, which makes no sense, and is hard to debug in a long query. In Preql, "a"+1 throws an exception. In addition, SQL’s handling of NULLs is a source of many issues.

                                              I don’t want to get too long-winded. A few more points are mentioned in this page: https://preql.readthedocs.io/en/latest/comparison_sql.html

                                              who the actual audience is

                                              I think there are several audiences -

                                              • Anyone who has to deal with complex business logic using databases. I’ve seen SQL queries that were hundreds of lines long, and they were very hard to understand and maintain.

                                              • Data Engineers, who need to pipeline structured data, change its configuration or structure. That’s very common for ML projects, but not only.

                                              • Analysts who work with databases, and need to interact with them on a daily basis. The short syntax, REPL autocompletion, and the ability to create your own library of functions (e.g. for research), can be very useful in the daily work.

                                              I hope that clears things up a little!

                                              1. 5

                                                SQL’s approach to types causes a lot of trouble, for example SELECT “a” + 1; returns 1, which makes no sense, and is hard to debug in a long query. In Preql, “a”+1 throws an exception. In addition, SQL’s handling of NULLs is a source of many issues.

                                                This depends on the dialect, right? SQLite is very dynamic, but PostgreSQL will error out on this unless you explicit cast things (and 'a'::int will error out as well). In MySQL/MariaDB it’s dynamic by default (or used to be, not sure what the current defaults are), but can be configured to be more strict. I’m not sure about other engines as I never worked with them.

                                                At any rate, the biggest question I have for something like this is one of performance; sometimes even quite simple changes to a query can cause huge performance differences. How “smart” is Preql in generating reasonably optimised queries?

                                                1. 4

                                                  This depends on the dialect, right?

                                                  Absolutely right. Which is actually part of the problem. Every dialect has different restrictions, semantics, and syntax, and it’s hard to remember all the little gotchas.

                                                  How “smart” is Preql in generating reasonably optimised queries?

                                                  Right now Preql mostly relies on the database engine to optimize the query. But, it’s definitely in my future plans. The SQL passes through being an Ast just before the final compilation, which makes at least some class of optimizations relatively easy.

                                                  But I think that well-written and performance-aware Preql code currently already generates reasonably efficient SQL statements.

                                                2. 3

                                                  I’m not aware of other attempts that took this approach (there is one library for R that does it halfway)

                                                  Wanted to ask if you could take a look at this Haskell package [1] OpalEye and see if does something similar (albeit for Postgres only).

                                                  With regards to the License. Yes, I think in any corp with staff lawyers, this will not be allowed (because it is different, and it has lots of ambiguity in the intent: * as long as your product doesn’t base its value on exposing the Preql language itself to your users.* ) But smaller shops, devs would look at it, still, I think.

                                                  I was trying to think if I could come up with a suggestion for your License (am I not a lawyer).

                                                  But this was not easy for me (although I like challenges fitting stuff into legal frameworks) couple of obstacles:

                                                  • (a) First I am not sure that a programming language definition (or interface as you imply) – can be copyrighted in all the jurisdictions (EU, USA, etc).

                                                  Probably library part (if it exists) could be copyrighted, but I think language is more problematic. see [2], and see [3]

                                                  So you can certainly copyright your work doing something with language definition, but not the language definition. Then, the issue with your license is that it relies on a notion of an ‘interface’, which is not copyrightable. And I had rarely seen a license where ‘interface’ was well a well defined terminology.

                                                  • (b) Again challenging your inclusion of word ‘Interface’.

                                                  If I create a language just like yours, by simply prefixing every keyword in your language (so not parenthesis, commas, semicolons) with ‘vL-’. Then I would expose that to my users, and then strip the ‘vL-’ to passing to your interpreter. Did I just ‘bypass’ a requirement for commercial license ?

                                                  • (c) I think your intention is to prevent a Compute resource provider, to integrate your language into their offer (without obtaining commercial license).

                                                  For example say, a well known corp A offers a paid ‘lambda function’ service were developers get charged by how often their program is switched from dormant to active state, and that program can be written by a developer in Python or Preql or Python+Preql.

                                                  So, you do not want them to offer PreQL without getting a commercial license from you, (let me know if I am interpreting the intent correctly)?

                                                  If yes, then may be you can recast your definitions. Do not use ‘interface’. and see if you can fit into software-as-service or language-interpreter-as-service.
                                                  Then you can look at CocroachDB license and see if something you can leverage. And that would also solve (b).

                                                  Only in that case, remember that people will not be able to build PreQL playground online services (paid or unpaid or ad based) without getting your commercial license…

                                                  [1] https://github.com/tomjaguarpaw/haskell-opaleye/blob/master/Doc/Tutorial/TutorialBasic.lhs

                                                  [2] https://phys.org/news/2011-11-language-copyrighted-eu-court.html

                                                  [3] https://cdt.org/insights/copyright-week-software-interfaces-shouldnt-be-copyrighted/

                                                  1. 3

                                                    Thank you for your thoughtful response, and sorry for taking a while to reply.

                                                    I do see some similarity between OpalEye and Preql. I would say that OpalEye is somewhere on the continuum between ORM and Preql, and I can also see some advantage to the Haskell integration. But personally, I prefer the clean syntax of Preql, and the integration with Python is especially useful for data-science. Also, it is seems to me that my implementation is much more advanced.

                                                    With regards to the License

                                                    Thanks for the warning. To be honest, I’m not too concerned with corporations. They very rarely, if ever, contribute back to open-source, and my impression is that their developers are usually those who follow trends, rather than set them.

                                                    I’m much more interested in attracting use by open-source projects and small startups.

                                                    As for your points:

                                                    a) I am not copyrighting the language, but the implementation. If anyone wants to write an entirely new implementation of Preql, they can do that. But if they want to use mine, I should be able to determine how it’s used.

                                                    b) You are describing an absurd product that no one will ever make or use. I believe that it’s possible to make the wording less ambiguous, if it comes to that.

                                                    c) You got the gist of my intention. I also want to prevent database and analytics software from offering up Preql as an interface.

                                                    I think that Cockroach’s license doesn’t fit Preql, and they are very different kinds of software, despite both working with databases. But I appreciate your advice.

                                                    So far, the main thing that might persuade me to reconsider my license, is just how much attention it gets. I’d much rather that people focus on Preql itself, rather than the license (no matter if negatively or positively), but for some reason it attracts a lot of fascination.

                                                    1. 2

                                                      thank you for the follow up. Hope to see PreQL grow into a standard tool for data exploration!

                                                  2. 1

                                                    Another one is type-safety. SQL’s approach to types causes a lot of trouble, for example SELECT “a” + 1; returns 1, which makes no sense, and is hard to debug in a long query.

                                                    Definitely depends on the dialect, the statement above will fail with a type mismatch on SQL Server. SQLite does return 1, but it’s typeless by design: https://www.sqlite.org/datatypes.html

                                                1. 1

                                                  I have done this for years, but not because white boarding, READMEs, formal programs, or any other means is less useful. They each serve a purpose for different contexts and fidelities of the system. Coding just happens to be the highest fidelity of expression which has its trade-offs if you jump right into it for a complex system.

                                                  1. 4

                                                    I don’t see event-based and poll (request-reply) being mutually exclusive. Having a broker to distribute the published event is necessary and independent subscriptions (and progress) are necessary on the consumer-side. Assuming the event delivery eventually occurs, there is no real difference between that and a consumer performing a request to fetch the event.

                                                    Request-reply can be useful for bootstrapping clients or (as the articular suggests) initiating a sync with the authoritative source of the information. Whether this initiation simply involves a transfer of a single state object or a stream of events to replay, that is necessary to get in sync (up to some moment). An event-based pub/sub model for general distribution is still superior to a server distribution or client poll for when online changes occur.

                                                    In other words, request-reply are useful during startup and if a timeout/fault is detected.

                                                    1. 2

                                                      Lots of financial exchange market data feeds work this way. They have a “snapshot channel” (which usually pulses on a timer) and an “incremental channel”. At startup (or restart after a crash), you can bootstrap state from the snapshot channel, then consume small deltas from the update channel. Since everything is produced by a single producer, you can use the timestamps in messages to know where you are in the stream.

                                                    1. 2

                                                      Why would I use this instead of kafka? Presumably this either has a different sweet spot or is an attempt to be better in a way that is concretely articulable?

                                                      1. 3

                                                        Other than the ecosystem (which shouldn’t be discounted), Pulsar is quite a bit of ahead in terms of architecture and API decisions than Kafka. Arguably, they had the second mover advantage.

                                                        One key feature I particularly like is that storage is offloaded to Bookkeeper rather than storing it locally on broker nodes. Thus storage is decoupled from the brokers (where as Kafka stores data on broker nodes). This makes scaling the storage and compute (brokers) trivially easy. The trade-off is that you need to operate a Bookkeeper cluster (in addition to the brokers and Zookeeper).

                                                        In addition, storage can be “tiered” which means, old messages (by some definition of old) can be offloaded transparently to object storage such as S3. This is often cheaper storage than what Bookkeeper uses at the expense of latency when requesting it. It may not be necessary, but its a nice option for keeping all the data and archiving it.

                                                        Personally, I am keeping an eye on the NATS project, specifically the new development around Jetstream (https://github.com/nats-io/jetstream#readme). The simplicity of the API and operability of the cluster (no external dependencies) is very attractive. As the README notes, its still in development, but the creator Derek Collison is actively working on clustering and remaining features over the next few months.

                                                        1. 2

                                                          Don’t know if it is ok to link to the angry orange site, but here’s a number of comments directly addressing your question:

                                                          https://news.ycombinator.com/item?id=21912855

                                                        1. 3

                                                          I listen to quite a few podcasts most of which are interviews with creators, founders, developers of various tech. Not only is it useful for learning about about new things, but the interview style dialogue provides a lot more foundation and motivation between the tech. Contrast this with most product or tech websites which can be difficult to assess the utility.

                                                          Addressing the “shallow learning” question, if something seems relevant to any of my work I tend to build small prototypes in a domain I am working on to assess its utility.

                                                          1. 1

                                                            Any podcasts you’d recommend to someone interested in compilers / embedded?

                                                            1. 2

                                                              A recent compiler episode which I enjoyed was from the CoRecursive podcast: https://corecursive.com/037-thorsten-ball-compilers/ (there are many good episodes on that podcast).

                                                              A recent embedded episode I recall was on GoTime called “Hardware hacking with TinyGo and Gopherbot” https://changelog.com/gotime/84. I am not in this space so I am not aware any podcasts dedicated to this topic (although I am sure there are a couple at least).

                                                          1. 1

                                                            Continuing to learn the trials and tribulations of managing people. I am happy and proud of building a team, but the manager role as a bridge to the organization is hard. Balancing this with being an IC among the team is challenging as well.

                                                            Also, I just got Uncle Bob’s Clean Agile book in the mail.. I have never really never anything of his, but his historical context on Agile and the true intent behind it was convincing enough that I bought his book to learn about the roots and how it applies today.

                                                            1. 4

                                                              I saw the Strange Loop demo, and the biggest unanswered question was “how do you refactor an existing function?” This wasn’t answered and the edit docs aren’t covering that case. I think the answer is supposed to be “the tooling takes care of it”, but that sounds risky to me.

                                                              1. 2

                                                                My understanding from the demo is that the refactored function would get a new hash and the name associated with the previous function’s hash would get updated. The speaker noted that the name association is stored separately, so a dependency/refactor update is trivial for this reason. Dependent functions can update that reference and that is it (assuming a true refactor of maintaining the same type signature). Although I may not be understanding some subtly of your question.

                                                                1. 1

                                                                  It’s the “Dependent functions can update that reference and that is it” that I’m hung up on. One of the selling points is that you can have two versions of the same function, which eliminates dependency conflicts. Consider the following case:

                                                                  A -> B -> C
                                                                  A' -> B -> C
                                                                  

                                                                  I discover a bug in C’s implementation and refactor it to C'. The tooling automatically updates B, which calls C, to B', which calls C'. Do we transitively update A? What about A'? What happens when the call chain is now 20 functions deep? Case two:

                                                                  A'' -> B' -> C'
                                                                  A' -> B -> C
                                                                  

                                                                  Turns out there was a second bug in C, and I have not yet pulled C'. I release C'' off C. How do we merge the change with C'? What if there are merge conflicts? Do we end up with two a fragmented ecosystem? Case three:

                                                                  A -> B -> C -> D -> E -> F
                                                                  

                                                                  C and F are in separate libraries. I see a bug in C and make C', somebody else at the same time sees a bug in F and pushes F'. What happens to A?

                                                                  1. 1

                                                                    Here is the applicable part of the StrangeLoop talk: https://youtu.be/gCWtkvDQ2ZI?t=1395

                                                                    The speaker’s example relies on/assumes different namespaces (at 26:16), but maybe the suggestion is that if you want to maintain two different versions, then they must ultimately be named differently. So a refactor of an existing type would not actually differentiate itself as a separate version unless you name it something different.

                                                                    That said, since all types are content addressable, you can still give each type a different name. It may be a matter of whether you choose to do that in your source, or you simply keep the one name and therefore the new version implicitly replaces the previous versions (similar to git, but at type level rather than file).

                                                                    Do we transitively update A?

                                                                    Correct, this is not answered in the talk. I can only speculate that the IR of hashes are updated to reflect the change unless you give it a new name in the textual/source representation. My guess is that if a fix to C or F is pushed, references will be implicitly updates (from the name C or F to the new hash). The Merkle tree will update accordingly. Of course if the name of C’ or F’ are changed and pushed, then the existing types will not implicitly update.

                                                                    Again this is speculation, but I am enjoying the conversation.

                                                                    1. 3

                                                                      Some details about propagation: https://twitter.com/unisonweb/status/1173942969726054401

                                                                      The way update propagation works is this: first we visit immediate dependents of the old and update them to point to the new hash. This alters their hash. We repeat for their dependents, and so on…

                                                                      …if the update isn’t type preserving, the ‘todo’ command walks you through a structured refactoring process where the programmer specifies how to update dependents of old hashes.

                                                                      Dependency chains in codebases written by humans tend to be pretty small. If it were even 100 that would be a lot.

                                                                      Once this manual process of updating reaching a “type preserving frontier”, the rest of the way can be propagated automatically.

                                                                      Also, these mappings from old hash to new hash are recorded in a Unison “patch”. A patch can be applied to any Unison namespace

                                                                      Important asterisk: for changes to type declarations, right now all we support is manual propagation. Which can be quite tedious. We are working on fixing this!

                                                              1. 4

                                                                Company: The Children’s Hospital of Philadelphia - Department of Biomedical and Health Informatics

                                                                Company site: https://dbhi.chop.edu

                                                                Position(s): Software Engineer

                                                                Location: Philadelphia, PA

                                                                Description: The Children’s Hospital of Philadelphia (CHOP) Research Institute and its Dept of Biomedical and Health Informatics (DBHi) are seeking a software engineer to help build an enterprise-level data and informatics platform called “Arcus”. The Arcus team integrates with major scientific initiatives in the Research Institute strategic plan, high-impact research areas such as lifespan, rare diseases, novel devices and therapeutics, and precision health.

                                                                This role will work on a small team focused on architecting and implementing a cloud-native platform that supports the goals of the Arcus program. We are looking for highly creative people who share our mission to advance child health and who will thrive in a continuous learning environment, acquiring and applying both new technical skills and biomedical domain knowledge.

                                                                Contact: Above position link or ruthb@chop.edu

                                                                1. 12

                                                                  The article sounds like a good explanation of why communities shouldn’t always have final say in language features.

                                                                  1. 12

                                                                    I think a better way to phrase this is that programmers generally see the language as their adversary. They want to enlarge the space of possible programs to include the one they want to write.

                                                                    In contrast, for language designers programs are their main adversary. They want to maintain or sometimes shrink the space of possible programs so that bad/invalid programs can’t be expressed.

                                                                    Inevitably, these groups will clash over this as many proposed features while making some good programs possible also make some bad programs possible as well. The best way forward is to lobby for language features which make both groups happy by showing that a feature doesn’t make it possible to write bad programs. It’s not enough to say it’s unlikely the feature will be abused, you have to show that it’s impossible, since once a feature is added it is very hard to take away without breaking backwards compatibility.

                                                                    Worse, the more features you add the harder it is to prove it would have some weird interaction with some other part of the language. All of this is largely invisible to users of languages and not said upfront by language designers often enough.

                                                                    1. 5

                                                                      Sounds taboo to even think this, but I fully agree. The community is not always right.

                                                                      1. 1

                                                                        I agree. Something the community does not have is the comprehensive vision its creators has. This isn’t to say the community opinion is not valid, but rather a solution to an immediate problem simply may not align with the long-term goals. Whether or not this case was a “false flag” (I don’t believe this was the case), its something community members should keep in mind and not get bent out of shape when it feels like the creators are making an executive decision.

                                                                      1. 4

                                                                        Hi folks. I am hiring someone for my team at The Children’s Hospital of Philadelphia in the Department of Biomedical and Health Informatics (DBHi). My team is building a “data platform” to improve how pediatric research is done at CHOP. This is a institute-wide program and we have a lot of internal support and funding being contributed. We put an emphasis on learning and applying methods and tech from both academia and industry since healthcare is like 10 years behind everything else.

                                                                        For those curious about the tech we are currently using.. we are leveraging GCP managed services (although we will be going multi-cloud eventually) and standardizing on Kubernetes for deploying applications and other workloads. We are primarily a Go shop for backend services. Python and R are also very popular for more of the analytical workloads. For Web frontends we are using React and other familiar libraries in the ecosystem.

                                                                        We are interested in people who can solve problems and learn to communicate effectively with nurses, physicians, and researchers. The tech is less important, but we try to standardize a bit to reduce the number of decisions we need to make. The domain is way more challenging than the tech at this point (healthcare is complex).

                                                                        Here are a few links about CHOP, DBHi, and our department’s work on Github. The data platform called “Arcus” has a separate org on Github, but most of the repos are private right now. We open source nearly everything since most of our grant funding come from tax payor dollars.

                                                                        Here is the formal job posting but if you have questions please just ask me.

                                                                        1. 1

                                                                          What does Arcus do?

                                                                          1. 2

                                                                            Arcus is being built to support both retrospective and prospective research. The former is research on existing data while prospective involves collection of new data from subjects (and thus requires recruitment and consent, etc.) In both cases, the most interesting and novel research involves questions that span multiple modalities like clinical, genetics, imaging, wearables, etc. However this is particularly hard to support without deliberate effort of creating effectively a graph of linked data for CHOP patients (and eventually broadening the scope to other external subjects).

                                                                            So concretely, the first set of functionality we are building are “cohort discovery” interface that enables researchers to interactively build a course-grain research cohort very easily with a fallback to a data request team to handle edge cases. The second major bit is, given this cohort of subjects, what data are available about them? This is where the subject-level data index comes in spanning both datasets we extract (from our medical record) as well as research datasets that have been indexed. We will be showing the overlap between their cohort of interest and the datasets.

                                                                            There is a whole host of interesting challenges in doing this, many of which are around privacy, security, and compliance. Various authorization requirements, etc. The data management and indexing is another fun one. I can describe anything in more detail if you want.

                                                                            1. 1

                                                                              Here is a public, but not advertised education landing page for the program. Very much a work in progress, but out education lead for the project has done a great job putting a lot of content together to help educate researchers on how to leverage the capabilities of Arcus (once developed). https://education.arcus.chop.edu

                                                                        1. 3

                                                                          I am implementing a basic event store on top of Postgres with some kind of log per table partitioning scheme. Ideally, it would leverage the Timescale postgres extension, but cloud providers don’t yet support the extension natively, so I won’t get that optimization yet. This also includes a gRPC frontend and a client API will be designed/optimized for simplifying building read models. Only a couple days in progress and does not handle pushed changes yet.

                                                                          There are like a million ways to implement these stores (and I believe I evaluated most of the OSS solutions out there), but I wanted to try out something simple and learn what I need to optimize for.

                                                                          1. 1

                                                                            I would love to hear of anyone else’s experience with event stores, either usage of existing ones or creating their own.

                                                                          1. 7

                                                                            Beautiful. I’m a little sad that this is only a spec and not an implementation.

                                                                            1. 5

                                                                              Looks like there is one in the works.. https://github.com/wolfgang42/rockstar-js

                                                                            1. 4

                                                                              This is my favorite.

                                                                              The single quote character in Rockstar is treated as a letter of the alphabet. This seems unusual until you remember that I ain’t talkin’ ‘bout love is a perfectly valid rock’n’roll sentence.

                                                                              1. 2

                                                                                Interesting read for me: I loved Racket because of how accessible it is and because of Dr. Racket, but I ran into some speed and memory issues. I moved on to Common Lisp and liked it too, but ran into some garbage collection weirdness that bothered me (I was using SBCL). The cool link in this article was for gerbil scheme.

                                                                                At heart I’m on the look out for a practical functional language that enforces immutability, makes it natural to prototype single threaded code and then make it multi-threaded and has efficient data structures for large arrays. I’m possibly looking for a lazily evaluated Python with a compiler that takes it down to the metal. It’s possible I’m looking for Haskell, but the proliferation of libraries in Haskell confuses the heck out of me.

                                                                                1. 5

                                                                                  If you’re already familiar with Lisp, I’d recommend looking at Clojure. It focuses on using functional style and immutability, while also being a small and pragmatic language.

                                                                                  1. 4

                                                                                    Haskell sounds like a logical next step. Rather than worry about all the libraries just dip into them when necessary. To get started I’d suggest looking at the vector package for efficient arrays.

                                                                                    1. 2

                                                                                      I recently learned about Coconut, which is a functional language that compiles to Python. http://coconut-lang.org/. Here is a podcast interview with the author: https://www.functionalgeekery.com/episode-94-evan-hubinger/