1. 4

    While it is usually safe to assume that sensible values have been set for CC and LDD, it does not harm to set them if and only if they are not already set in the environment, using the operator ?=.

    I don’t think that the ?= accomplishes that, at least with GNU make. I don’t have CC set in my environment, but GNU make uses a default of CC = cc anyway. So, as far as I know, CC ?= gcc will never be helpful in GNU make, will it?

    There’s a list of variables set by GNU make here, but you can check your own setup with make -p.

    Question: I do a lot of development with MinGW and MSYS. This environment does not use cc by default. So how can I write a Makefile that will work with MSYS and a normal Linux/macOS setup without forcing CC=gcc? Do I set CC only if ifeq (default,$(origin CC))?

    1. 1

      I think you’re right; this would be superfluous in GNU make (and also bmake – I’m not sure about other makes). AFAICT ?= is not posix, so doing this would make the makefile less portable with no obvious advantage.

    1. 2

      Interesting article. I’ve written a lot of C macros, and it’s nice to see so much info in one place.

      One thing not mentioned in the article, that I always thought was weird, is the defined operator. It’s useful because you can use it with other operators (e.g. || or &&), but I just find the syntax a weird special case.

      1. 15

        I have a story about this.

        A while ago I was interested in getting the statistical medcouple function into Python’s statsmodels. The problem is that this function is computed via a nontrivial but clever algorithm. It was described in an obscure paper from the 1970s that was really hard to read. The implementation in statsmodels is using a slow O(n^2) algorithm, whereas better O(n log n) implementations exist.

        So I find such an implementation in R, written by the same authors of the medcouple paper. Now, R is GPLed. Statsmodels is GPL-phobic. I could have just translated the R implementation into Python, but it didn’t seem fair to me, because I really did not understand the medcouple implementation until I read and translated the R code. Since statsmodels won’t accept the GPL, they shouldn’t accept the code I wrote.

        My solution was to write the medcouple Wikipedia article in generic pseudocode (that looks suspiciously like Python). This is now the spec part of the clean-room reverse engineering process. I’m glad to see that some people have stumbled onto that page and used it to create new implementations of the algorithm. Now I’m just waiting for someone to use this page to fix statsmodel’s implementation.

        1. 1

          Hold on - have you’ve just told on yourself?

          I really didn’t understand the medcouple implementation until I read […] the R code.

          Isn’t this effectively creating a derived work in another language based upon the original GPLed code? Shouldn’t your derived work also be GPLed?

          1. 4

            It should be and it is:

            http://inversethought.com/hg/medcouple/file/default/medcouple.py

            But I also wrote a spec, the Wikipedia article. I described the algorithm in as much detail as I could. The spec should be enough for someone else to reimplement this.

            1. 3

              I don’t understand your reasoning. Why do you consider your python code to be a derivative work, but you don’t consider the Wikipedia pseudo-code you wrote to be a derivative work (and therefore GPL and not Creative Commons)? If your python code is a derivative work, why does the copyright notice only have your name?

              1. 3

                The Wikipedia article is a description of the algorithm that I cobbled together from various sources, which I amply cited. At no point do I just grab the R code and translate it for Wikipedia. The pseudocode I wrote based on my understanding of the algorithm as described by the papers I read and cited. I did do separate “literal” translations into Python and C++, and those I do consider derivative works of the original, which is why I GPLed them.

                As to why my copyright notices don’t mention the original copyright holders, I’m not sure if that’s necessary. Am I required to keep their names in order to satisfy my GPL obligations?

        1. 2

          I’ve implemented Aho-Corasick a few times in my career. It’s one of my favorite algorithms. A visualization like this would have been helpful. Nice work on that! I remember having to work it out on paper each time.

          I posted one of my implementations here: https://github.com/codeplea/ahocorasickphp

          1. 15

            Really interesting article! I’m considering switching for the same reasons - although I really wish there was a viable third option.

            I noticed that your site uses https://fonts.googleapis.com. You might want to consider self-hosting your fonts. Google’s CDN seems like a pretty obvious scheme to get tracking on the few remaining sites that don’t use Google Analytics.

            1. 5

              Try sailfish. I use it as my daily driver.

              1. 4

                It is. The site is using a fairly new theme and I haven’t got around to it yet. It’s on my list though.

                1. 3

                  Awesome, I’m glad it’s on your radar!

                  Honestly, I wonder why browsers still send the referrer header by default. It appears to have no advantage to the user - only benefits trackers.

                2. 2

                  Not sure what kind of tracking you’re concerned about, but here’s a pretty explicit privacy outline: https://developers.google.com/fonts/faq#what_does_using_the_google_fonts_api_mean_for_the_privacy_of_my_users

                  1. 5

                    From your link:

                    Google Fonts logs records of the CSS and the font file requests

                    It seems reasonable to assume that they are logging IP, User-Agent, and Referrer. If you don’t care about their tracking, then it’s nothing to worry about. However, if you’re avoiding Google Analytics specifically because you don’t want Google seeing your site’s traffic, then it seems that using their CDN is pretty counterproductive.

                    1. 5

                      I worked on the Google Analytics team. For what it’s worth, we didn’t do anything interesting with your site’s traffic, in fact the data wasn’t allowed to be touched by any other org. But I totally respect the idea of privacy for privacy’s sake (you shouldn’t need a reason to value privacy).

                      1. 6

                        And I worked on the Google Fonts team. There are many concerning things that Google is doing regarding privacy. Tracking users through fonts is not one of them.

                        1. 6

                          Not yet, perhaps.

                          1. 3

                            Ok, I’m not going to argue with that. Just giving information so that people don’t make this decision on unfounded fear.

                          2. 3

                            Everybody says “Google is doing concerning things” but we never hear from the people who do it. Just every department saying “not us!”

                            1. 3

                              Exactly. They coud also be lying. Im not saying I believe any are. Just that they :

                              1. Went to work for a surveillance company helping it achieve its goals in some way that might boost its numbers. They accepted doing that for money and other benefits.

                              2. Said they didnt or wouldnt do some privacy-invading thing.

                              The contradiction there can indicate anything from a personal line they didnt cross to deception. So, I just cant rely on any of those claims. Instead, I look at past behavior and where incentives push a company. Google’s indicates they arent trustworthy.

                          3. 3

                            When? Couldn’t this change in the meantime? I remember some Internet moment when it was discovered that Google changed some public policy docs and dropped words claiming they don’t mix data from separate services

                    1. 4

                      It’s interesting that they didn’t explicitly prohibit ICE itself, only collaborators.

                      Also, this change definitely infringes on the other lerna contributors’ copyrights, despite the explanations given by the original author. The should have used a contributors license agreement. I wish Github had better tools/policies regarding licensing and CLAs.

                      1. 8

                        They do ban “Microsoft Corporation” and its subsidiaries. Doesn’t that include Github!?

                        1. 4

                          It does, so Lerna is not available to GitHub under MIT license. GitHub is still okay, because GitHub is granted a license to publish under GitHub Terms of Service D.4. As I understand, GitHub can publish Lerna, but can’t use it.

                          1. 1

                            IANAL, but the purchase is not finalized yet.

                            (I work for Microsoft)

                            1. 1

                              The state of the purchase doesn’t really change anything here.

                          2. 7

                            this change definitely infringes on the other lerna contributors’ copyrights

                            Everyone’s contributions (and the whole project right before the license switch) are still available under MIT. MIT permits sublicensing. I guess they should’ve kept the old license in the repo and mentioned what it applies to… but there’s no actual requirement that “old git revisions don’t count as included with the Software” :)

                            CLAs are terrible and unnecessary.

                          1. 4

                            I write at https://codeplea.com sometimes.

                            1. 3

                              This is awesome! Signed up. Are you planning on open sourcing it? I’m sure I would self host something like this.

                              1. 4

                                Thanks! I’d love to open-source it. However, I can’t justify the time commitment it would take yet. I’ve open-sourced many smaller projects, and I always feel compelled to answer every email I get. I wish I could post it and then just ignore it, but I really can’t.

                                I did make a deal with myself a long time ago. If I can get enough supporters on Patreon, I will open-source it. The code is already pretty cleaned-up and ready to go. It’s pure PHP, no dependencies. No frameworks or anything.

                                1. 2

                                  Post code with no email address to contact you? ;)

                                  1. 1

                                    I actually did that with one project. People find a way.

                                    Anyway, my name is all over this by now.

                                  2. 1

                                    Sweet project, man! I’ve already set it up to email me whenever my name is mentioned on reddit. I’m just a poor student so I can’t justify a patreon (or I could, if it were just 1 project, but there are so many projects I’d love to support) so I’ll have to content myself with just saying thanks, it’s a great idea and a great post explaining it.

                                    1. 2

                                      I’m glad you like it! I put it online in the hope that others would find it useful, so I appreciate your comment!

                                1. 6

                                  I just finished up .NET bindings for my technical analysis library. It’s something I’d been meaning to do for a while.

                                  1. 7

                                    Not really a fan of this idea. This article isn’t so much about “defending” your website as it is about attacking anyone who scans it. Vulnerability scanners are often run from servers that are themselves compromised, so retaliatory attacks like this can further victimize people who have already been owned :(

                                    Still pretty neat on a technical level though.

                                    1. 11

                                      Many people are not even aware that they’ve been compromised… At least that helps in a way!

                                      1. 2

                                        Just because you’re being attacked from compromised server, doesn’t mean that you’re not being attacked.

                                      1. 2

                                        I’ve been running SSH on multiple servers with non-standard ports for years. Yet, I rarely, if ever, get failed login attempts. Is this really a thing?

                                        1. 1

                                          Worked for me for a long time, but they’ve found me now :’(

                                        1. 2

                                          This is neat. I really enjoy seeing how short the experts can make one-liners.

                                          When I put F5Bot public, the Reddit scraper was only a handful of lines. It worked for a while, but today it’s thousands of lines. It’s amazing how many edge-cases come up to the surface. Also, the Reddit API is really quirky.

                                          1. 3

                                            I’m trying to finalize the scripting interface for Tulip Charts. I really want to release a public alpha soon. I’m trying to find the right balance between brevity and simplicity and elegance fro the API. In the end I guess I’ll just need to pick something and go with it.

                                            1. 1

                                              Why wait to open source it if that was your intention from the start?

                                              1. 1

                                                No real reason, other than it’s less work to publish it later. I’ll probably put it up on Github soon anyway.

                                            1. 3

                                              I’m still plugging along on the business end of Turnkey Telemetry.

                                              In my spare time, I’m trying to make F5Bot monitor all of Reddit, instead of only specific sub-reddits. Reddit’s API and rate limiting don’t make this very easy. I think I’ll need to take an approach that makes multiple simultaneous requests.

                                              1. 5

                                                I’m trying to find better sales channels for my startup, Turnkey Telemetry. It’s not as fun as writing code or building hardware. I could use some advice if anyone has experience with the sales end of a similar product line.

                                                1. 7

                                                  It looks like https://barnacl.es/ is the place for you.

                                                  1. 1

                                                    Hadn’t seen that. Thanks!

                                                1. 1

                                                  Looks like a fairly complete list of examples.

                                                  In practice, the problem I see most often is the implicit pointer cast. Implicit casts are an inherent part of C programming, so it actually shows up everywhere. However, C++ culture holds that casts are generally evil, and the language makes it difficult (while preferring C++’s flavor of OO instead). Most of these other issues you don’t often see in real code.

                                                  Also, I find it a bit odd to phrase it as C being a subset of C++, instead of saying that C++ is a superset of C. I guess they technically mean the same thing, but saying C is (not) a subset almost implies that C came after C++.

                                                  1. 1

                                                    I got these from a couple sources, then someone pointed out this Wikipedia entry which looks more complete.

                                                    Yeah, the implicit cast is part of the C philosophy that the programmer knows what they’re doing.

                                                    I phrased it that way because that’s how I’ve been hearing the claim, C is a subset.

                                                  1. 4

                                                    This is cool. What algorithm are you using?

                                                    1. 3

                                                      On another discussion their response to a similar question was something like this:

                                                      I can share that our platform is built on top of ARIMA models, but with a lot of pre-processing work done previously to try and figure out automatically the best parameters to use, as well as a lot of previous hand-tweaking done by ourselves in-house using different datasets (we started out tuning it for forecasting energy consumption, but figured that the resulting models were performing well enough to warrant testing in other domains).

                                                    1. 5

                                                      Trying to get the last few bugs out of F5Bot, my free social network monitoring service. I think I’m going to rewrite a lot of the parser code today.

                                                      1. 2
                                                        1. 1

                                                          Yeah. I’m working on that. I’ve had a couple users email me, and I’ve manually reset their passwords. If you need me to manually reset yours, just let me know.