1. 7

    meta: misuse of the ‘video’ tag, which is used to note whether the linked page is video content (this is an article)

    1. 7

      Ah good ole FOaaS! This brings back memories, as one of my greatest useless internet accomplishments was getting the Shakespeare route added. https://news.ycombinator.com/item?id=6069514

      Nice little project, good example of a Go CLI. Thanks for sharing!

      1. 2

        Oh, so that was you. The Shakespeare one is among my top 10s

        Thank you for your words. I wasn’t sure whether I should share this project or not but I have gotten really good feedback for it on reddit and now here. Means a lot and it just made learning Go more fun.

      1. 2

        Starting a text classifier and named entity recognition

        1. 2

          What tech stack?

          I’ve used StanfordNLP’s NER on a project previously (we literally just needed NER and some date recognition, no sentiment/etc) and while we got it to work, the amount of work required to get it to a usable stage felt like overkill - it didn’t help that I had to delve back into java to get a usable http interface for it.

          1. 2

            If your looking for something better and non-java (with a more permissive license) I recommend checking out spaCy - https://spacy.io

            API is a pleasure to work with and lots of really good NER comes with the pretrained models.

            1. 1

              It was more of a general curiosity than a current requirement, but thanks for the reference.

            2. 1

              I am doing NER with spacy, classification with tensor flow. I am also experimenting with prodi.gy a tool that is developed by the same guys than spacy and offer an easy interface to worwith. For now I still have some issues with my own word vectors (4M words) I have some buffer overflows that I do not yet understand.

          1. 12

            I would actually wait until GDPR to kick in before deleting Facebook, or any other online account for that matter, so that keeping user information even after a user has requested deletion is simply against the law.

            1. 2

              I don’t think the fines for violating GDPR are large enough to make Facebook think twice about ignoring it. Short of dissolving Facebook and seizing its assets under civil forfeiture, no civil or criminal penalty seems severe enough to force it to consider the public good.

              1. 17

                don’t think the fines for violating GDPR are large enough

                Actually, they are very large:

                Up to €20 million, or 4% of the worldwide annual revenue of the prior financial year, whichever is higher [0]

                Based on 2017 revenue [1] of $40B, that’s $1.6 Billion Dollars

                But it’s not just the fines. The blowback from the stock hit and shareholder loss, as well as cascading PR impact, is a high motivator too.

                [0] https://www.gdpreu.org/compliance/fines-and-penalties/ [1] https://www.statista.com/statistics/277229/facebooks-annual-revenue-and-net-income/

                1. 3

                  0.04 << 1 until you can quantify the cascading PR impact. It will not effect their day-to-day operations from an economic standpoint.

                  I would be curious to know how many people have actually taken action on their FB usage based on the recent CA news outbreak. I am willing to bet it’s miniscule.

                  1. 1

                    1.6 billion dollars vs deleting the data of one user who wants to leave?

                    1. 1

                      The fines are per distinct issue (not number of people affected). If Facebook breaches GDPR with multiple issues, then Facebook could get hit by a large percentage of their annual revenues.

              1. 1

                If you’re in England go to the Boring conference this weekend! https://boringconference.com/

                1. 3

                  The offhand ‘even perl’ in there struck me as unfair. It reminds me that perl is actually pretty fast (specifically at startup, but my recollection was also that it runs quickly):

                  $ time for i in `seq 1 1000`; do perl < /dev/null; done
                  
                  real    0m2.786s
                  user    0m1.337s
                  sys     0m0.686s
                  
                  $ time for i in `seq 1 1000`; do python < /dev/null; done
                  
                  real    0m19.245s
                  user    0m9.329s
                  sys     0m4.860s
                  
                  $ time for i in `seq 1 1000`; do python3 < /dev/null; done
                  
                  real    0m48.840s
                  user    0m30.672s
                  sys     0m7.130s
                  
                  
                  1. 1

                    I can’t comment on how fast Perl is, but you are measuring the time taken to tear down here too.

                    The correct way would be to take the raw monotonic time immediately before invoking the VM, then inside the guest language immediately print it again and take the difference.

                    P.S. Wow Python3 is slower.

                    1. 2

                      but you are measuring the time taken to tear down here too.

                      I guess so? I’m not sure that’s a useful distinction.

                      The people wanting “faster startup” are also wanting “fast teardown”, because otherwise you’re running in some kind of daemon-mode and both times are moot.

                      1. 1

                        The people wanting “faster startup” are also wanting “fast teardown”

                        Yeah, I guess I agree that they should both be fast, but if we were measuring for real, I’d measure them separately.

                        1. 1

                          I’m not sure that’s a useful distinction.

                          If latency matters then it could be. If you’re spawning a process to handle network requests for example then the startup time affects latency but the teardown time doesn’t, unless the load gets too high.

                      2. 1

                        Hah before I read the comments I did the same thing! My results on a 2015 MBP - with only startup and teardown on an empty script, and I included node and ruby also:

                        ~/temp:$ time python2 empty.txt 
                        real    0m0.028s
                        user    0m0.016s
                        sys     0m0.008s
                        
                        ~/temp:$ time python3 empty.txt 
                        real    0m0.042s
                        user    0m0.030s
                        sys     0m0.009s
                        
                        ~/temp:$ time node empty.txt 
                        real    0m0.079s
                        user    0m0.059s
                        sys     0m0.018s
                        
                        ~/temp:$ time perl empty.txt 
                        real    0m0.011s
                        user    0m0.004s
                        sys     0m0.002s
                        
                        ~/temp:$ time ruby empty.txt 
                        real    0m0.096s
                        user    0m0.027s
                        sys     0m0.044s
                        
                        1. 2

                          Ruby can do a bit better if you don’t need gems (and it’s Python 3 here):

                          $ time for i in $(seq 1 1000); do ruby </dev/null; done
                          
                          real	0m31.612s
                          user	0m27.910s
                          sys	0m3.622s
                          
                          $ time for i in $(seq 1 1000); do ruby --disable-gems </dev/null; done
                          
                          real	0m4.117s
                          user	0m2.848s
                          sys	0m1.271s
                          
                          $ time for i in $(seq 1 1000); do perl </dev/null; done
                          
                          real	0m1.225s
                          user	0m0.920s
                          sys	0m0.294s
                          
                          $ time for i in $(seq 1 1000); do python </dev/null; done
                          
                          real	0m13.216s
                          user	0m10.916s
                          sys	0m2.275s
                          
                          1. 1

                            So as long python3 is faster than ruby/node, we are ok…?

                        1. 25

                          9PM Friday night for a freely available community service. If you don’t hear it enough from us, your humble users, we once again thank you for maintaining Lobsters!

                          1. 13

                            I appreciate your kind words. We do as much of our maintenance windows in the evenings and through the weekend as we can. Compared to the weekdays and daylight hours, more folk have signed off or are otherwise away from their keyboards. It lowers the impact for most of our users.

                            We do have customers from all over the world–in addition to some keeping odd hours for reasons other than being in a different timezone–but working late or on the weekend when we need to bring machines down is the best option for the largest number of our users.

                          1. 1

                            Why oh why did they have to call it “bionic”…were they trying to increase general nomenclature confusion?

                            https://en.wikipedia.org/wiki/Bionic_%28software%29

                            1. 4

                              All the names that are actual words are already taken by someone somewhere. Naming something these days with a word is a guaranteed collision.

                              1. 1

                                See also Apple’s A11 Bionic processor (the one that’s in the iPhone X)

                                1. -1

                                  Increasing general nomenclature confusion is kind of all the Ubuntu release code-names are good for, yes.

                                1. 9

                                  I’m buying a fucking house. Today. Then I’m going to be cleaning it up. Moving in the next couple weeks.

                                  UPDATE - just came from the closing meeting: https://youtu.be/4-0utDrWa5w

                                  1. 3

                                    Congrats!

                                    1. 2

                                      Thanks!

                                  1. 6

                                    Putting the finishing touches on my talk for Haystack tomorrow. I will be enjoying that conference and also the Tom Tom Machine Learning conference day later in the week.

                                    Hoping the weather in Charlottesville is nice!

                                    1. 5

                                      I feel like this was written by my future self. I’ve been desperately clinging to my 2013 MBP, and dread upgrading because of the touchbar. Very nice article and if (when) I need to upgrade I will definitely be using the setup he has here!

                                      1. 1

                                        This was inevitable, no? When technology like this is researched and then developed for use by privileged parties, it is only a matter of time before it is leaked to or mimicked by other parties…especially given how widespread the deployment and use was by law enforcement agencies.

                                        1. 3

                                          Having done both significant work developing and maintaining first on-prem and then SaaS, I will never go back to on-prem. NEVER! It’s an overhead nightmare. You will always have stubborn customers that refuse to upgrade and demand support for a 3 year old system getting older by the day. And you’ll need to keep copies of old configurations lying around that can be ready in a heartbeat to debug some problem that inevitably impacts your SLA.

                                          Maybe there are some cases where you may consider on-prem. But if you have a sufficiently complicated system and deployment, with a decent amount of customers, no amount of container magic sauce will fix the fact that you effectively give up control of the environment and upgrade path. Go on-prem, and enjoy the inescapable pain and suffering you are guaranteed to face.

                                          EDIT - the above is a bit ranty and not in the context of the article, so I should note that for GDPR, the reasons given are IMO not good enough, and just offloads the problems to your customers, requiring more hassle for you in the long run. GDPR compliance isn’t something that is solved with a technical and operational punt. And making the improvements in your own SaaS will be easier technically, operationally, and compliance-wise. We are neck deep in GDPR refactoring (organization and system wise), and if you make good informed design decisions the regulation will end up improving your application and your processes. An on-prem system still needs to do this. You would still need to be able to give end users their data inventory and right to be forgotten. But with on-prem you now need to train all your IT customers how to use it, and deal with all the overhead of things inevitably going wrong or being misunderstood.

                                          Saying GDPR is a good reason to go on-prem is like saying fire-safety is a good reason to have lots more houses.

                                          1. 4

                                            We are neck deep in GDPR refactoring (organization and system wise), and if you make good informed design decisions the regulation will end up improving your application and your processes.

                                            Well said.

                                            The issue I see at times is people trying to workaround GDPR instead of embracing it.

                                            And if you embrace it, you just see how much more control you get on your system, for example:

                                            • it forces you to understand those components that just work but nobody want to touch
                                            • it forces management to ponder the legal risks of bad engineering practices, carefully pondering if “move fast and break things” is a good idea, after all
                                            • it forces you to add tons of logs and updated documentation, to be able to explain automated decision in court
                                            • it forces you to adopt basic engineering practices like automated build and test processes, to be able to reproduce everything in a digital forensic environment

                                            All in all, GDPR is one of the best laws I’ve read in ages.
                                            It’s designed to protect people by improving software quality.

                                            1. 1

                                              Related, but when the company I worked for previously was purchased by a publicly listed US company, the SarbOx regulations forced us to implement much better processes for development. For example, a QA department was mandated (we didn’t have one before).

                                              Regulations can be onerous, but the best way to handle them is to try to work them to your favor.

                                            2. 2

                                              You will always have stubborn customers that refuse to upgrade and demand support for a 3 year old system getting older by the day.

                                              Sounds like you can just force the upgrade or deny service like the cloud players do. You have to be willing to loose some customers. Some subset of these will upgrade because the reason they don’t is that the behavior is tolerated.

                                            1. 4

                                              If I had a nickel for every time an unrealistic simulator gif made me laugh…

                                              1. 2

                                                I don’t understand, isn’t Telegram meant to be end to end encrypted? Why would having keys that telegram keeps allow snooping? If Telegram has a way to read user messages then it is NOT secure.

                                                1. 3

                                                  It’s not encrypted except for the optional secret chats. Client-server communication is of course encrypted, and data at rest is encrypted by Telegram’s keys.

                                                  I’m not sure why people seem to think that Telegram is secure.

                                                  1. 1

                                                    Telegram is completed encrypted, but only secret chats are end-to-end encrypted.

                                                    I’m not sure why people seem to think that Telegram is secure.

                                                    Good snake oil marketing. Do not trust Telegram for anything sensitive.

                                                1. 5

                                                  Plattsburgh has an allotment of 104 megawatt-hours of electricity per month. When the city goes over this amount, it has to buy electricity on the open market for far higher prices - cost can be seven times higher. When this happens, the residents must share the expense.

                                                  This is a strange setup. I can see why Bitcoin mining is causing problems in that town.

                                                  1. 3

                                                    The facility that serves Plattsburgh is Lower Saranac Hydroelectric. Some output data can be found here: http://globalenergyobservatory.org/geoid/1042

                                                    I am not sure if the above is accurate, but it quotes an average output of about 30GWh per year, much higher than the allotted 104 MWh per month.

                                                    Even getting that data took me some digging, and I wasn’t able to find what other townships the facility supports. but it is possible that everyone in a certain area gets a slice and when the total amount reaches a threshold then they need to get power elsewhere.

                                                    EDIT – more clicking and I found this nice output chart with more recent data: https://www.quandl.com/data/EIA/ELEC_PLANT_GEN_10214_WAT_ALL_M-Net-generation-Lower-Saranac-Hydroelectric-Facility-10214-conventional-hydroelectric-all-primemovers-monthly

                                                  1. 8

                                                    Two other things that help, unmentioned in the article: linting and static code analysis

                                                    Installing and using them to fail builds means having your very own pedantic and uncompromising code reviewer.

                                                    They can’t do everything and they are not for everyone, but if you don’t trust your own code (or the code you copy and paste from the web), then they are good tools to have in your workflow.

                                                    1. 5

                                                      I’ll corroborate your comment by adding that case studies on formal specification, proof, and automated analyzers often showed that just making the code (esp control flow) simple enough for those tools to handle caught errors by itself. Then, use over time reinforces that in coding style in a way that prevents and catches more.

                                                      1. 1

                                                        I’m conflicted about them.

                                                        When they are good, they’re great.

                                                        But I have seen ridiculous code that people have written just to shut the linter up. Grr. That’s worse than useless.

                                                        On balance I think they are worthwhile, some of Rubocop’s “cops” are way over the top though.

                                                      1. 1

                                                        My field is ‘search’ (not SEO, but real low level search tech).

                                                        If I had to put a big what’s next prediction related to my field - we’re almost on the cusp of something passing a domain-specific Turing test (chatbots are really a search problem). I say domain specific - because we’re not even close to a general chatbot turing test pass yet. But when restricted to a narrow field of content and context, it’s going to happen soon.

                                                        1. 2

                                                          (chatbots are really a search problem)

                                                          Could you explain what you mean here?

                                                          1. 3

                                                            For the purpose of my comment, let’s say search is more or less synonymous with Information Retrieval. IR is the science of matching a query with data in a repository for returning the correct response to the query. A response is one or more pieces of data that is most relevant given the query context, intent, and substance. Chatbots extend this to a dialog of multiple query response pairs, and typically limits the response to either a single relevant answer, or a counter interrogative to gain more context, intent, or substance.

                                                            Many times the context and intent are unknown (the substance is part of the query), so the search is two part: finding the context and intent during query analysis, and then using that along with the substance to find the best response.

                                                            So take for example, the interrogative in an ecommerce chatbot: “what is the status of my order?” The context can be derived from the logged in user (the person asking the question), their data in the system, and the area of the site in which they are asking the question. The intent can be derived from the domain (this is important to my prediction) and query structure as ‘Order Status’. The substance is contributing to intent derivation in this query, as ‘status of my order’ can be an altLabel of the concept ‘order status’ (an important distinction for the prediction, because you have an ontology specific to the domain). In this example, when all three pieces are available, then you can structure the search to return an order (or list of orders) in your repository that are most relevant (relevant is likely recent unfulfilled orders). If you prefer you can have the bot ask a question to narrow the result set like ‘I see these three orders…which one are you interested in?’ and then fulfilling with a search based on the response to the follow up question.

                                                            The context and intent are the difficult part to solve. Restricting to a domain narrows the possibilities for both.

                                                        1. 3

                                                          Really nice article and domain specific application of compression.

                                                          Interestingly I had no idea that lichess used MongoDB. I run (well, I built and let it run without maintenance) a chess related game site. I used RethinkDB since it seemed like a perfectly logical use of a document db to keep PGNs. I sorely regret that decision, and have a half-done Postgresql implementation in the works. The performance of the nosql db is abysmal when you want to do anything more than create and read a single document (for example, listing and aggregation performance are terrible even with indexing).

                                                          My tiny little toy site will never get to the amazing scale of lichess, and props to them to keeping things fast. I am very happy to see this writeup and technical detail.

                                                          1. [Comment from banned user removed]

                                                            1. 8

                                                              A great overview of how good it is to finally have laws seeking to protect my data and my privacy. Yes, it is a pain to implement for businesses and developers, but that is a good thing - because it finally forces us to think about private data management and the implications of how it is used. Before GDPR, the default was take as much data as possible, keep it forever, and don’t care about how it is used or if it is leaked.

                                                              If only there was something like this in the USA…

                                                              1. [Comment from banned user removed]

                                                                1. 3

                                                                  I don’t disagree that government surveillance is a bad thing, but that doesn’t make private surveillance magically ok.

                                                                  1. [Comment from banned user removed]

                                                                    1. 2

                                                                      If two people have cameras in my room, of course I’m going to be glad to get rid of one - that lets me focus my attention on removing the other. Progress is welcome, even when it’s not the end of the fight.

                                                                      1. [Comment from banned user removed]

                                                                        1. 2

                                                                          The bottom line is that GDPR doesn’t help at all.

                                                                          Personally, I’d rather want not to be spied on by private, unaccountable organizations than the government, which at least in the west can be reformed to a certain degree. Unless one has a government-paranoia (which in my eyes is a more of an American phenomenon), I believe people see this as an improvement, if only marginal.

                                                                          And ultimately, if it doesn’t “help at all”, as you argue, what’s the problem then?