The article mentions how content farms need to get successful people to link to them but didn’t go into much detail, but this is an important part of SEO - PageRank was famously innovative because it took the linking relationship into account.
I agree that detecting if the content itself is AI-generated or in general low-effort content is hard or impossible, but I wonder if detecting “unnatural” linking and back-linking patterns might still hold some promise.
That feels like it would just turn into an arms race, though. Search engines start looking for a pattern, content farms figure out how to create that pattern artificially, rinse and repeat.
I actually wonder if we’re going to end up back where Yahoo! was at in the late 90s and early 2000s with a directory. Maybe you have to pay for access, but verified human moderators would select content and, presumably, if the content changed substantially the link would be automatically flagged for re-review. Something like that.
I think analyzing the network of links is a bad idea since that’s what the bots are already optimizing to stymie.
I wonder how far we are from a model that can reliably read a blogpost and answer “Did this webpage actually explain anything and answer the user’s query?” Even if it doesn’t fact check and made up answers slip through, that would get rid of most of today’s content farm posts and make it more expensive to fake. Although then you have an incentive to make your content mills persuasive liars, but if they really give harmful advice that can hurt people or libel famous brands and celebrities, then maybe the scummy SEO types won’t want the liability.
We have succeeded in making Matrix wildly successful, but Element is losing its ability to compete in the very ecosystem it has created. It is hard for Element to innovate and adapt as quickly as companies whose business model is developing proprietary Matrix-based products and services without the responsibility and costs of maintaining the bulk of Matrix.
Obviously they’re not naming any names, but now I’m curious who those companies are.
I don’t think there’s any specifics. Word on the street is that they were heavily underbid in some public contracts in their target market, the public sector. And the problem here is pretty obvious: Element offers Matrix hosting, but also covers the development. Competitors host Matrix/Element, but don’t bring the product forward, but rely on Elements output.
I can also see why they bring the products into Element: the Foundation has not managed to attract that many collaborators beyond Element, so Element can just make clear it’s their products from now on.
I can also see why they bring the products into Element: the Foundation has not managed to attract that many collaborators beyond Element, so Element can just make clear it’s their products from now on.
I’ve long felt like Element has had a hand in that as well though, with them seemingly doing a lot of planning and decision making on their own rather than in a public setting, so it’s always been pretty hard to know where they’re even planning on taking the project until it’s already there.
For a bit of context, there is also a Go2 draft proposal that introduces a check keyword. It behaves similar to rust’s try! macro (now ? suffix operator), and also comes with a handle block which behaves like defer but just for checked errors.
And as a tongue-in-cheek aside, you could always write exceptional Go.
Even more syntax, more “advanced” features like Generics. And the next thing we will see is another language popping up and “replacing” golang, because Go is now to new developers what all the other languages were to the ones that picked up golang. And the cycle repeats.
Go hasn’t really take on any significant language change other than generics. Moreover, the Go team was always open about adding generics at some point, and what they did was essentially making good their promise.
I wish people would stop crapping on Han unification if they can’t read Hanzi/Kanji. It is totally appropriate that those characters in the article are the same. If they were if different, it would be like 4 joined up and 4 in two strokes being separated, or 7 with a slash in the middle being separated. They’re different ways to write the same thing, and have all been used in Japan within the last 100 years.
There were serious issues. Unicode has eventually added dedicated code points undoing worst cases of the unification.
It’s not just about readability, but a cultural issue. People can read a foreign way of writing a character, but it’s not how they write it.
China and Japan culturally care a lot about calligraphy. To them it’s not just a font. Number of strokes does matter, even the direction of strokes is significant. Japan even has a set of traditional variants of characters that are used only in people’s names.
As a Chinese person, my cultural feeling is that I would identify Kanji characters with Hanzi characters. Many characters do look the same and are written the same. Differences in stroke order or the direction of certain strokes feel more like “inessential” font differences. Visibly different forms between Kanji and Hanzi are similar to simplified vs traditional: more elaborate font differences, but still essentially the same characters.
One interesting angle is how Kanji characters are read in Chinese: they’re just read like native Chinese characters, completely ignoring the Japanese pronunciation and any shape differences. For example, the protagonist of Slam Dunk, 桜木花道 is always read as yīng mù huā dào in Mandarin Chinese, despite that (1) 桜 is written 樱 in Chinese (2) 花 is written slightly differently and (3) the Japanese pronunciation, Sakuragi Hanamichi, being kunyomi, bears no resemblance to yīng mù huā dào.
On a meta level, there’s no distinction between Hanzi and Kanji in Chinese: they’re both 汉字, pronounced hàn zì. I don’t know for sure whether Japanese people have this distinction, but it’s probably illuminating to see that the Japanese wiki page 漢字 encompasses the Chinese character system in all its regional variants.
Thanks for your input. There are 新字体 and 国字 (和製漢字), but as far as I know this distinction is only academic. Kunyomi/onyomi/etc. distinction is impossible to ignore, but that’s more related to words’ etymology than writing.
Visibly different forms between Kanji and Hanzi are similar to simplified vs traditional: more elaborate font differences, but still essentially the same characters.
Normally, I’m the first to say that the differences between simplified and traditional are overblown; however, I think it’s also eliding a bit to claim they’re essentially the same.
My mental model is that simplified originally was a surjective function. (That’s not true anymore.) But, while characters like 電/电 are onto and 只/隻 are grammatically awkward, characters like 复 can be downright misleading.
n.b. these differences matter less to Mandarin speakers, since simplified was made for it. (e.g. characters homophonous in Mandarin were merged) But the Japanese (and Korean, but that’s a different story) simplification projects came to different conclusions because they’re for different cultures and languages.
There were a few bugs and ghost characters in the process, which is to be expected when you’re digitizing tens of thousands of characters, but the basic idea of unification is sound. I had a friend who wrote her family name 櫻木 instead of the usual 桜木. Well sure, enough, that comes through on Lobsters because both variants are encoded. So too is the common 高 vs 髙 variation. The point is to be able to encode both variants where you would need them both in a single text without having to duplicate everything or worse (do we need variants each of Japanese, Korean, and Vietnamese? for pre-War and post-War Japanese? for various levels of cursive? etc.). It was a success.
Calligraphy can’t be represented in text at all. For that you need a vector image format, not a text encoding.
You’re saying unification is sound, but you can spell your friend’s name correctly only because these characters weren’t unified.
do we need variants each of Japanese, Korean, and Vietnamese? for pre-War and post-War Japanese
Yes! Unicode has Middle English, Old Church Slavonic with its one-off ꙮ, even Phoenician and Hieroglyphs. There’s old Kana in there already. East Asia should be able to encode their historical texts too.
UCS-2 was meant to be limited to contemporary characters due to 16-bit limit, but Unicode changed course to include everything.
CJK having a font dependency is reminiscent of legacy code pages.
I’ve worked on a website developed in Hong Kong, and it did track locale and set lang and language-specific font stacks to distinguish between the zh-xx and jp variants. They do care.
I’ve mentioned calligraphy not in technical sense, but cultural. The characters and their strokes are a valued tradition.
People may disagree about strokes of some complex characters, or there may be older and newer ways to draw a character, but that doesn’t mean the differences don’t matter.
I think the technical solution of mapping “characters” to code points to glyphs, applied to both alphabets and logograms suggests a perspective/commonality that isn’t the best way to view the issue.
You could also think of the character variants as differences in spelling. In English you have US and GB spellings, as well as historical spellings, and also some words with multiple spellings co-existing. These are the same mutually intelligible words, but if you did “English word unification”, it’d annoy some people.
I’m not familiar with this issue, but why not just add markers for similar characters to distinguish the cultural variant to be used where it is relevant?
Re Han unification - what do native speakers think? I assume there’s a diversity of opinion.
I’ve also thought for a while that “Greco” unification would be good - we would lose the attacks where words usually written in one script are written in the identical looking letter from another script.
Last I looked into the discussion about Han unification, I got the feeling that people in China (and maybe Japan) were annoyed that their input was not specifically requested before and during the discussion to proceed with Han unification. But I really don’t know enough about these scripts to have an opinion.
Regarding Greek letters, is this a common attack vector? What characters are most often used? From the Cyrillic set?
Painting “East Asian engineers” as a unitary body here is doing a lot of lifting.
The vast majority of pre-Unicode encoding schemes were both under unique resource constraints (ASCII compat? Fixed / variable length encoding? National language policy?) and designed for specific domains.
But to wit: Big5 unified some characters, HKSCS then deunified them because they weren’t the same in Cantonese.
Painting “East Asian engineers” as a unitary body here is doing a lot of lifting.
Apologies, that was not my intent. I meant that a lot of unification decisions have been made by engineers who are familiar with the issues, rather than, say, American engineers with little knowledge of Han characters.
I think Han unification was basically a good idea, but the problem is that it’s unclear where to draw the line. In fact, many Chinese characters that seem like they should be unified are separate in Unicode just because they are separate in JIS X 0208/0212/0213. Hooray for round-trip convertibility (sarcasm intended)!
From what I understand, Asian people get it much worse: many Chinese, Japanese, and Korean logograms that are written very differently get assigned the same code point
The logograms are not “very different”. Any educated person would see them as the same.
Complaints about the screen resolution are a matter of aesthetics, unless you work on visual digital media. In practice, a low resolution is often easier to use because it doesn’t require you to adjust the scaling, which often doesn’t work for all programs.
That said, the X220 screen is pathetically short. The 4:3 ThinkPads are much more ergonomic, and the keyboards are better than the **20 models (even if they look similar). Unfortunately the earlier CPU can be limiting due to resource waste on modern websites, but it’s workable.
The ergonomics of modern thin computers are worse still than the X220. A thin laptop has a shorter base to begin with, and the thinness requires the hinges to pull the base of the top down when it’s opened, lowering the screen further. The result is that the bottom of the screen is a good inch lower than on a thick ThinkPad, inducing that much more forward bending in the user’s upper spine.
The top of the screen of my 15” T601 frankenpad is 10” above my table and 9.75” above the keyboard. Be jealous.
Complaints about the screen resolution are a matter of aesthetics, unless you work on visual digital media.
A matter of aesthetics if the script your language uses has a small number of easily distinguished glyphs.
As someone who frequently reads Chinese characters on a screen, smaller fonts on pre-Retina screens strain my eyes. The more complex characters (as well as moderately complex ones in bold) are literally just blobs of black pixels and you have to guess from the general shape and context :)
Complaints about the screen resolution are a matter of aesthetics, unless you work on visual digital media.
I strongly disagree here. I don’t notice much difference with images, but the difference in text rendering is huge. Not needing sub-pixel AA (with its associated blurriness) to avoid jagged text is a huge win and improves readability.
Good for you. Your eyesight is much, much better than mine
I’d be pretty surprised by that, my eyesight is pretty terrible. That’s part of why high resolution monitors make such a difference. Blurry text from antialiasing is much harder for me to read and causes eye strain quite quickly. Even if I can’t see the pixels on lower resolution displays, I can’t focus as clearly on the outlines of characters and that makes reading harder.
As an X220 owner, while I concede someone may like the aesthetics of a low-resolution screen, the screen is quite bad in almost all other ways too. But you’re definitely right about aspect ratio. For terminal use a portrait 9:16 screen would be much better than 16:9. Of course external displays are better for ergonomics and nowadays large enough to work in landscape, too.
I took a look at CharaChorder, it’s also a chording keyboard but it’s in a very different subcategory.
Twiddler is for one-hand use, has few keys (12), and you use chording to access the full range of characters.
CharaChorder is for two-hand use, has a normal number of keys (for CharaChorder One, each of the 4 directions on a key counts as a distinct key), and you use chording to type whole words faster.
I also found this review of CharaChorder on YouTube and it seems to have had some major bugs, although the manufacturer says that they’ve fixed most of those.
I love this device and the device makes me sad and it also makes me sad to realize that there is just something wrong with me, because I love it anyway.
I knew from way back when that the Tek Gear folks would ship me a unit that was not glued together if I asked them to. Indeed, they did.
Glue and the silicon holding strap-case are the only things holding it together. Without the glue.. it creaks when I use it. The case separates easily into two pieces.
Inside, a big green PCB with little .. snappy momentary switches, of the normal through-hole design.. Hold on, time out. This needs to be a blog post with pictures.
Ugh, IOU a post, xiaq.. I’ll do it this weekend.
The little HAT switch (thumb stick) is profoundly useless.
getting good muscle memory takes a long time, I wouldn’t say I ever got to that point. The experience is also much slower, the fastest I’ve ever seen someone claim to get with a twiddler is 30 wpm. Far better than nothing, but it takes some adaptation for it to feel comfortable to use a computer at that speed, if you’re used to 100+ wpm.
I also am still not sure if I’m holding the twiddler correctly. It takes a couple of tries to get the hand strap just right. In my specific case it was better to find a two handed keyboard that works for my specific needs.
I never managed to get used to my twiddler. I have a twiddler 2, I believe, and found the keys to be very low in tactile feedback, and the shape didn’t fit my particular hand well, but others milage may vary.
Twiddler 3 has a curved shape and comes with a wrap so I’m hoping it will be reasonably comfortable to hold.
I am concerned about the tactile feedback though. Twiddler 3 (from its look) seems to have soft rubber keys like a remote control, which is probably not great. It’s also hard to tell how much travel the keys have from watching videos.
I’ve been looking into keyboards that I can use on the go and came across what seems to be the most popular portable chording keyboard these days. This page is a somewhat more interesting starting place as it teaches you the layout.
Other terminal file managers include nnn and ranger. Their user interfaces are more complex, but they have more features. They, too, can be configured to cd on exit (nnn’s docs, ranger examples 1 and 2).
The author is confusing type definitions and type aliases.
The type Color int syntax in Go does not make Color an alias to int. It defines Color as new type with int as the underlying type. The syntax for aliasing is type Color = int. (In Haskell terms, Go’s type Color int is newtype Color = int, and Go’s type Color = int is type Color = int.)
This distinction is important because you can define methods on Color because it’s a new type; if it’s an alias of int that wouldn’t be allowed. In fact, this is exactly the same mechanism as type Foo struct { ... }, which defines Foo as a new type with the an anonymous struct as its underlying type.
The fact that you can define type Color string and then assign a string literal to a Color-typed variable is not because Color is an alias of string, it’s because string literals are untyped and can be assigned to any type whose underlying type is string (written as ~string in a constraint). You can’t do this:
var s string = "foo"
var c Color = s // compiler error
To get back to the original point - I’ve always suspected that the omission of enums (and sum types to some extent) has to do with Go’s (original?) focus on developing networked services. In a networked services you have to assume that every enum is open and closed enums are generally an anti-pattern.
It’s a bit of an extreme position to force this on the whole language though, closed enums that don’t cross network boundaries are useful and convenient.
What I’ve always suspected is that like everything else in Go it was started from C. C’s enums are useless, so they removed it from Go, which I think was a good idea (better not have enums at all than have garbage ones). One spanner in that wheel is that Limbo had sum types via pick (I think they were pretty shit tho, at least in terms of syntax / UX).
I don’t believe closed enums to be an anti-pattern in networked services. They’re a great thing when generating as they ensure you can’t generate nonsense content, and on intake you validate them at the boundaries, just like you have to validate an “open enum” because the software does not handle every possible value of the underlying type, and you don’t want to end up in nonsense land.
I speculate the motivation is exactly that Go doesn’t want to get in the business of doing validations - if something physically fits in a uint32, it can be put in a type with uint32 as the underlying type with an explicit conversion. It’s up to the programmer to decide how to do the validation by writing code.
This makes a lot of sense in networked services. If a protocol defines a flag that is supposed to have one of 6 values, and the server now sees a 7th value it doesn’t know, failing isn’t always the correct thing to do. You may want to just fall back to some default behavior, log it, and so on.
Sure, you can do that if the language provides validation for you, but then the language also needs to provide facilities for handling failed validations. And that sounds like something the designers of Go didn’t want in the language. Go would rather give you low-level tools than leaky high-level ones - it’s the same mentality that has led to the error handling situation.
Exhaustive switches on enums are also a double-edged sword across API boundaries. Once you expose a closed enum as part of your API, there will be consumers who try to do exhaustive switches on them. This means that adding a new possible value is a breaking change, so all closed enums get frozen the moment you publish your API.
Again, there is still a niche for closed enums - when it’s internal to a package, or when it’s part of a protocol or API that will literally never change, and I feel Go designers probably underestimated the size of that niche. I’m just speculating on why they decided to not have it after all - it probably came from a mentality that focuses a lot on network and API boundaries.
This makes a lot of sense in networked services. If a protocol defines a flag that is supposed to have one of 6 values, and the server now sees a 7th value it doesn’t know, failing isn’t always the correct thing to do. You may want to just fall back to some default behavior, log it, and so on.
Sure, you can do that if the language provides validation for you, but then the language also needs to provide facilities for handling failed validations.
What sort of weird-ass scenario did you cook up here? There isn’t any of that, or any need for the language to provide validation. You do whatever validation you want when you convert from the protocol to the internal types, exactly as you handle invalid structs for instance, or just out of range values for whatever types.
Go would rather give you low-level tools than leaky high-level ones
That is an assertion with no basis in reality.
it’s the same mentality that has led to the error handling situation.
Love for being awful?
Exhaustive switches on enums are also a double-edged sword across API boundaries. Once you expose a closed enum as part of your API, there will be consumers who try to do exhaustive switches on them. This means that adding a new possible value is a breaking change, so all closed enums get frozen the moment you publish your API.
Which is what you want 99 times out of 100: most of the time the set of possible values is fixed (in the same way nobody’s extending the integer), and most of the rest you want to be a breaking change in the same way switching from 1 byte to 2, or signed to unsigned, or unsigned to signed, is a breaking change.
In the few cases where you foresee new values being routinely added and that not being a breaking change, don’t use sum type. Or do that anyway if the language supports non-exhaustive sum types.
While you’re correct, the distinction doesn’t really matter to the complaint, even if the parameter is typed it’s just a conversion away:
var s string = "foo"
var c Color = Color(s) // now compiles
so the point remains that:
if the type is exported, any caller can subvert your carefully enumerated set of values
because there’s no actual safety, the language can’t rightly check for exhaustive switches, so you’re left with a default an an assertion on every switch
It is relevant because even though it doesn’t prevent deliberate conversions, it prevents accidental uses.
For better or for worse, Go programmers don’t tend to demand absolute guarantees from the language when it comes to the type system. This is very different from the community of other statically typed languages, probably because a lot of Go programmers have a background from dynamically typed languages like Python.
Speaking about language, it should be “Language Zoo”, not “Languages Zoo”. Because it is a compound word (not a compounds word). Sorry, but I see this too often.
If all the author needed was a blog, maybe the problem is that his tech stack is way too big for his need? A bunch of generated HTML file behind a Nginx server would not have required this amount of maintenance work.
Is the caching of image at the edge really necessary? So what if it take a little while to load them. Just by not having to load a front end framework and making 10 API call before anything is displayed, the site will already load faster than many popular site.
If the whole point is to have fun and learn stuff, the busy work is the very point of course. Yet all this seems to be the very definition of non value added work.
I know that I could put this burden down. I have a mentor making excellent and sober use of Squarespace for his professional domain - and the results look great. I read his blog posts myself and think that they look good! It doesn’t have to be like this. […]
And that’s exactly why I do it. It’s one of the best projects I’ve ever created.
So I think the whole point is to have fun and learn stuff.
Inventing your own static site generator is also a lot of fun. And because all the hard work is done outside the serving path, there’s much less production maintenance needs.
I’ve been seriously considering dropping Markdown and just transforming HTML into HTML by defining custom tags. Or finally learning XSLT and using that, and exposing stuff like transforming LaTeX math into MathML via custom functions.
Node.js or package.json or Vue.js or Nuxt.js issues or Ubuntu C library issues
CVEs that force my to bump some obscure dependency past the last version that works in my current setup
Debugging and customizing pre-built CSS frameworks
All of these can be done away with.
I understand that the point may be to explore new tech with a purposefully over-engineered solution, but if the point is learning, surely the “lesson learned” should be that this kind of tech has real downsides, for the reasons the author points out and more. Dependencies, especially in the web ecosystem, are often expensive, much more so than you would think. Don’t use them unless you have to.
Static html and simple CSS are not just the preference of grumpy devs set in their ways. They really are easier to maintain.
There’s several schools of thought with regards to website optimization. One of them is that if images load quickly, you have a much lower bounce-rate (or people that run away screaming), meaning that you get more readers. Based on the stack the article describes, it does seem a little much, but he’s able to justify it. A lot of personal sites are really passion projects that won’t really work when scaled to normal production workloads, but that’s fine.
I kinda treat my website and its supporting infrastructure the same way, a lot of it is really there to help me explore the problem spaces involved. I chose to use Rust for my website, and that seems to have a lot less ecosystem churn/toil than the frontend ecosystem does. I only really have to fix things when bumping packages about once per quarter, and that’s usually about when I’m going to be improving the site anyways.
There is a happy medium to be found, but if they wanna do some dumb shit to see how things work in practice, more power to them.
A bunch of generated HTML file behind a Nginx server would not have required this amount of maintenance work.
Sometimes we need a tiny bit more flexibility than that. To this day I don’t know how to enable content negotiation with Nginx like I used to do with Apache. Say I have two files, my_article.fr.html, and my_article.en.html. I want to serve them under https://example.com/my_article, English by default, French if the user’s browser prefers it over English. How do I do that? Right now short of falling back to Apache I’m genuinely considering writing my own web server (though I don’t really want to, because of TLS).
This is the only complication I would like to address, it seems pretty basic (surely there are lots of multilingual web site out there), and I would have guessed the original dev, not being American, would have thought of linguistic issues. Haven’t they, or did I missed something?
Automatic content negotiation sucks though? It’s fine as a default first run behavior, but as someone who lived in Japan and often used the school computers, you really, really need there to also be a button on the site to explicitly pick your language instead of just assuming that the browser already knows your preference. At that point, you can probably just put some JS on a static page and have it store the language preference in localStorage or something.
And generate a bit of HTML boilerplate to let the user access the one they want. And perhaps remember their last choice in a cookie. (I would like to avoid JavaScript as much as possible.)
If JS isn’t a deal breaker, you can make my_article a blank page that JS redirects to a language specific page. You can use <noscript> to have it reveal links to those pages for people with JS turned off.
Browsers have had multiple user profiles with different settings available, for more than a decade now (in the case of Firefox I distinctly remember there being a profile chooser box on startup in 2001–2).
Which is fine if you can actually make a profile to suit your needs. If you cannot make a profile, you are stuck with whatever settings the browser has, and you get gibberish in response as you might not understand the local language.
Look, the browser is a user agent. It’s supposed to work for the user and be adaptable to their needs. If there are that many restrictions on it, then you don’t have a viable user agent in the first place and there’s nothing that web standards can do about that.
I’ve read that, and I try to follow idioms as much as possible, but I’m having a really hard time with this one because I really can’t find any justification for it (i.e. what makes Go so special that decades old advice is suddenly irrelevant?). I can see it if “small scope” means 1 or 2 lines, but beyond this I’m having a really hard time keeping track of them.
But even if one accepts the idiom, how do you solve my issue when the scope is not small?
Ah, variable names. Length is not a virtue in a name; clarity of expression is. A global variable rarely used may deserve a long name, maxphysaddr say. An array index used on every line of a loop needn’t be named any more elaborately than i. Saying index or elementnumber is more to type (or calls upon your text editor) and obscures the details of the computation. When the variable names are huge, it’s harder to see what’s going on. This is partly a typographic issue; consider
This is such a dishonest argument, though, because it presents a false dilemma between very short names and “huge” names.
In Python I routinely write loops like:
for entry in BlogEntry.query():
# do thing with entry...
This is more readable than a single-letter name would be, and doesn’t fall into any “huge name” trap I’m aware of.
It’s not the 1970s anymore. Our computers have the disk and memory space to let us use names like “format” instead of “fmt”. And so we should, and should leave the 1970s conventions in the dustbin of history where they belong.
Your Python code is equally well expressed if entry is named e, as far as I can see. It is not obvious that entry “is more readable than a single-letter name” of e, at least without more context.
Preferring fmt over format is not a decision made from 1970s-era technical constraints. It’s possible to prefer the shorter form over the longer form, even if our servers have bajillions more available resource-bytes.
Because, as I said, it presents a false dilemma. If the only possible choices were single-character names, or “huge” unwieldy names, there might be a point. But there are in fact other options available which aid in readability by providing context without being “huge”.
It is not obvious that entry “is more readable than a single-letter name” of e, at least without more context.
Single-letter names rarely provide context. If a file contains multiple functions, each of which contain at least one loop, single-character names fail to differentiate which loop (or which function) one is looking at.
Also, I find it somewhat amusing that single-character loop variables are extremely likely to lose context or even collide during refactorings, yet as far as I’m aware the reason why Go stylistically discourages some other naming conventions (such as a “this” or “self” parameter name for functions which receive an instance of a struct defined in the same file) is that refactoring might cause loss of context.
But mostly, the single-character thing just feels to me like another instance of Go’s designers attempting to stand athwart the history of programming language design and practices, and yell “Stop!”
The normal guidance in Go is to name variables with expressivity (length) proportional to their lifetime, and the constraints of their type.
The name of a variable that’s only in scope in a single line block needs to provide far less context compared to the name of a variable that’s in scope for an entire e.g. 100 line function.
A variable of type SpecificParameter can be named sp because that meaning is unambiguous. A variable of type string that represents an e.g. request ID should probably not be called simply id, better e.g. reqID, especially if there are other string parameters that exist with similar lifecycles.
If a file contains multiple functions, each of which contain at least one loop, single-character names fail to differentiate which loop (or which function) one is looking at.
It’s not the job of a variable name to disambiguate at the file, or really even function, level. An index variable for a one-line for loop is probably appropriately named i no matter how many times this situation is repeated in a function or file.
A variable of type SpecificParameter can be named sp because that meaning is unambiguous. A variable of type string that represents an e.g. request ID should probably not be called simply id, better e.g. reqID, especially if there are other string parameters that exist with similar lifecycles.
Or name them SpecificParameter and requestID. Pointlessly shortening variable names far too often leads to names that confuse rather than communicate. Is reqID a request ID? A requirement ID? A requisition ID? If I see that name out of context – say, in a debug log – how am I supposed to know which one it is?
And even in context it’s not always clear. Going back to the “self” param debate, as I understand it the standard Go approach if I want to write, say, a function that takes and operates on a Server struct, is to declare the parameter as s. Some obvious and more communicative names are ruled out by other aspects of Go’s naming rules/conventions, but how exactly is that informative? If I’m not directly in that file, and just see a call to the function, I’m going to need to either go look up the file or rely on IDE lookup showing me the type of the parameter to figure out what it’s supposed to be, since outside of the original context it might be s for Server, or might be s for Socket or for SSLContext or SortedHeaders or…
(and the loop-index case does still lose context, and also needing to name loop index variables is an indication of a language with poorly-thought-out iteration patterns)
But really it just comes down to a question of why? Why does requestIDneed to be shortened to reqID? Is the target machine going to run out of disk space or memory due to those four extra characters in the name? No. So there’s no concrete resource constraint requiring an abbreviated name. Is there any additional clarity provided by naming it only reqID? No, in fact clarity is lost by shortening it. What is the alleged gain that is so overwhelmingly important that it requires all variable names to be shortened like this? As far as I can tell, it really is just “that’s the way we did it with C in the 1970s”.
And it’s not even consistent – someone linked below a buffered I/O module that’s shortened to bufio, but the Reader and Writer types it works with are not shortened to Rdr and Wrtr. Why does it have a function Discard and not a function Dscrd? Why NewReaderSize and not NwRdrSz? There’s no principle here I can detect.
Once I’m familiar with the convention, it’s a lot easier for me to recognize the shape of reqID than it is to parse the text “requestID”.
Right now I’m working in a codebase where we have to write sending_node_component all over the place and it’s so miserable to both type and parse that the team unanimously agreed to just alias it is snc.
Why are you abbreviating ‘identifier’ throughout your post? What about SSL? Communication is full of jargon and abbreviations that make sense in context. They’re used to make communication smoother. Code does the same. Variable names are jargon.
But really it just comes down to a question of why?
Because to many people, it’s simply more readable.
So there’s an agreed-upon set of standard names which all programmers everywhere are trained on, aware of, and understand? Cool, so why does this thread – asking about how to name variables – exist in the first place?
They’re used to make communication smoother.
And I gave examples of how abbreviated variable names fail at this, where slightly-longer versions do not have the same issues. I note that you haven’t actually engaged with that.
Do you honestly believe that all of them are basically equally useful and informative, even when viewed out of context (say, in a log or error message when trying to debug)?
I never see variable names in log files, let alone without additional information.
Do you typically dump raw variable names into logs, with no additional context? Don’t you think that indicates a problem with the quality of your logs?
It seems pathological to focus on readability of variables in the context of poorly written log messages.
Log messages are one example of when a name might appear out of context.
Do you believe all of the above names are basically equally useful and informative, even when viewed out of context? Because if you work someplace where you never have to debug based on small amounts of context-missing information, perhaps you are lucky or perhaps you are THE ONE, but the rest of us are not. So I’d appreciate your input on the hypothetical missing-context situation.
I think the basic disagreement is whether the names need to make sense without context because I don’t think they have to. The context is the surrounding code.
The claim is that the verbosity of a variable name ought to be a function of the lifetime of that variable, and the ambiguity of its name versus other in-scope variables.
Variable names always exist in a context defined by source code.
If you have a for loop over SpecificParameter values which is 3 lines of code long, there is no purpose served by naming the variable in that loop body as specificParameter, in general. It’s laborious for no reason.
Variable names always exist in the context of the source code in which they are written. There is no realistic way that the literal name of a variable as defined in source code could end up in a debug log.
Single-letter names rarely provide context. If a file contains multiple functions, each of which contain at least one loop, single-character names fail to differentiate which loop (or which function) one is looking at.
I would say if the name of the function is not visible on screen at the same time, your function is probably too long. But what’s your point here? That local variables should have names that are unique in a source file, although they are not in global scope?
But mostly, the single-character thing just feels to me like another instance of Go’s designers attempting to stand athwart the history of programming language design and practices, and yell “Stop!”
This one I don’t understand. Newer languages do not have/allow longer variable names than older languages (for the most part). So I don’t see how the length of variable names has anything to do with the history of language design.
I would say if the name of the function is not visible on screen at the same time, your function is probably too long.
I’d argue there’s no such thing as a function that’s too long, only too complex. Do you think this function would be worthy of splitting, being over a hundred lines long?
In my opinion it’s not, as it’s just a linear list of tasks to be done in order to generate a chunk of bytecode for a function. It’s like a cake recipe. It’s a list of steps telling you how to mix and bake all the different ingredients to produce a cake.
Now obviously, some cake recipes are fairly complicated, and they can be split into sub-recipes - but that is a result of deep nesting being hard to keep track of, not length. In Unreal Engine for instance there are lots of functions which go incredibly deep in terms of nesting, and those could probably be factored out for readability, but I still wouldn’t be so sure about that, as moving out to smaller functions tends to hide complexity. It’s how accidental O(n²) happens.
Which is why I tend to factor code out to separate functions if I actually need to reuse it, or to make a “bookmark” showing a future extension point.
If it was for e in BlogEntry.entries(), I’d agree (and I’d say it should be renamed to this because query is too vague) because then you know you’re getting an entry back from that method. But a generic BlogEntry.query() doesn’t give you any hint as to what’s coming back - which is where for entry in ... is helpful because it signals “hey, I’m getting an entry object back from this”. Means you don’t have to hunt down BlogEntry.query and find out what it returns.
I’d argue if that’s not clear then it’s not e that is the problem, but .query that is. Also if you use some form of IDE the “hunting” down is usually okay. You anyways want to know whether it returns an error. And sometimes you want to know whether it’s some response type rather than the actual data or something. But in this example I think e would be as clear as entry - at least it is for the scenarios I can think of.
what makes Go so special that decades old advice is suddenly irrelevant?
Go is a reaction against earlier mainstream languages, in particular C++ and Java, and part of the reaction is to “reset” what are acceptable and what are not.
The most straightforward definition of a transition is “a process during which a system starts at state X and ends at state Y”. Forbidding the system from passing through any of a set of states can be interesting, but that’d be an additional requirement. (Maybe there’s some way to define a transition that incorporates this requirement as an inherent property, but I can’t think of it.)
Also, the state of the system includes the velocity and the acceleration of each node [1]. The transition from 2 to 6 goes through a state where the relative position of the nodes are the same (or very close to) state 5, but it does not go through state 5 itself because the system isn’t stationary at that moment.
[1] Otherwise you can just swing the pendulum randomly at different speeds; the arms will accidentally line up from time to time, and you can claim to have achieved the transitions.
As an Old Fart who’s been fascinated by type and typography since my teens, it’s been amazing to see the progress of computer text. My first computer didn’t even have lower case letters!
Coming from english, you might think ligatures are just fancy fluff. I mean, who really cares if “æ” is written as “ae”? Well, as it turns out, some languages are basically entirely ligatures. For instance “ड्ड بسم” has individual characters of “ड् ड ب س م”.
My favorite story about this (which I may have regaled you with before, being an Old Fart) is about a meeting between Apple and Sun about Java2D, circa 1998. The type expert from Apple is describing all the tables inside TrueType fonts for things like ligatures, and how supporting them is not optional. One of the Sun guys scoffs, “yeah, but how many people really care about fancy typography like ligatures?” The Apple person gives him a Look and says “about two billion people in India, the Middle East and Asia.” [Don’t take the number literally, it’s just my recollection.]
Every time I see beautifully rendered Arabic or Devanagari or Chinese or…. in my browser it still amazes me.
I’m too young to have actually experienced it first hand, but according to the Chinese textbooks and magazines I’ve read, in the 70s a lot of people genuinely thought the Han script wouldn’t survive because it was “incompatible” with computers.
To clarify, Chinese typography is relatively straightforward - Chinese characters are fixed width, and there is no ligature. I believe Japanese is in a similar position, hiragana and katakana are usually the same width as kanji. There are of course a lot of corner cases and advanced techniques, but putting characters on a grid probably already gets you 80% the way there.
The challenging part back was how to store the font, which consists of at least several thousand everyday characters, each of which needs to be at least 32px x 32px to be legible.
To be pedantic, Hangul is ligatures, but Korean has other writing systems too, for example Hanja, which are descended from Chinese characters (hence why the Han character space is referred to as CJK, for Chinese/Japanese/Korean), which actually gets into a fun gotcha about text rendering that this article didn’t cover: unified Han characters in the CJK space are rendered dependent on the font, which means that:
They will be rendered as their Simplified/Traditional Chinese/Hanja/Kanji representations depending on what font is used to render them, meaning external hinting needs to be used if these languages are mixed in one place
Hanja especially, but also Kanji, are used situationally and a large number are not commonly used, hence these characters may not be present in common Japanese/Korean fonts. In those cases, the cascade rules apply and a Chinese equivalent may be shown, potentially in an entirely different style.
They will be rendered as their Simplified/Traditional/Hanja/Kanji representations
Simplified and traditional characters are usually encoded separately because they are considered to be “different characters”. Only “the same character” with “regional stylistic differences” are unified. There is a lot of nuance, corner cases and mistakes.
Still inaccurate though, because “Chinese” is not one regional variant :)
Let’s say that the same character is shared by simplified Chinese, traditional Chinese, Korean Hanja and Japanese Kanji. There’s usually only one stylistic variant in simplified Chinese [1], but up to 3 different stylistic variants in traditional Chinese: mainland China [2], Hong Kong / Macau, and Taiwan.
[1] Because it’s standardized by mainland China’s government, and is followed by Singaporean Chinese and Malaysian Chinese
[2] Traditional Chinese is still used in mainland China, mostly for academic publications that need to cite a lot of old text
I don’t remember hearing about Chinese computers in the old days — China’s tech wasn’t as advanced and these wasn’t as much trade. Old Japanese computers used only Hiragana and/or Katakana, which were fortunately limited enough in number to work in 8-but character sets. In the mid 80s I used some HP workstations that had Katakana in the upper half of the encoding after ASCII; I remember those characters were useful in games to represent monsters or other squiggly things :)
Computers were pretty late to the party, the problems with Chinese text started with movable type. Until a few hundred years ago it was (apparently, from reading about the history of typography a long time ago) fairly common for neologisms in Chinese to introduce new ideographs. Often these would be tweaks of old ones (or compositions of parts of existing ones) to convey meaning. Movable type meant that you needed to get a new stamp made for every new character. Computers made this even worse because you needed to assign a new code point in your character set and get fonts to add glyphs. This combination effectively froze a previously dynamic character set and meant that now neologisms have to be made of multiple glyphs, allowing the script to combine the disadvantages of an ideographic writing system with the disadvantages of a phonographic one.
This combination effectively froze a previously dynamic character set and meant that now neologisms have to be made of multiple glyphs, allowing the script to combine the disadvantages of an ideographic writing system with the disadvantages of a phonographic one.
Well, this is a bit of an exaggeration.
Modern Chinese words are already predominately two or more characters, and neologisms are typically formed by combining existing characters rather than inventing new characters. Modern Chinese is full of homophones, and if you invent a new character today and say it out aloud it will almost certainly be mistaken for another existing character. (In case it isn’t clear, unlike Japanese, which has a kunyomi system, characters in Chinese have monosyllabic pronunciations as a rule; there are a few exceptions that are not very important.)
Most of the single-character neologisms are academic, where characters are written and read much more frequently than spoken and heard. All the chemical elements have single-character names, and a bunch of chemical and physical terms got their own new characters in the early 20th century. The former still happens today when new elements are named, but the latter process has already slowed down greatly before the advent of computers.
There are some non-academic, single-character neologisms that are actually spoken out. I don’t have time to fully delve into that here, but one interesting thing is that there are a lot of rarely used Chinese characters encoded in Unicode, and reusing them is a very common alternative to inventing brand new characters.
In the future people, will be amazed at how primitive our merely-32-bit Unicode was. By then every character will be a URI, allowing new glyphs and emoji to be created at whim. (This development will be closely followed by the first plain-text malware.)
As a lisper, I would disagree with the basic premise that “shells” or programming environments cannot have a good REPL and be a good programming language.
The solution is that we have one tool, but there are two things, and so there should be two tools.
At risk of being obvious; “Things are the way they are for reasons.” These two tools will exist on day 1 and on day 2 somebody will make sure both of them are scriptable, because being scriptable is the killer feature of shells. It’s what makes shells useful. Even DOS had a scriptable shell, it’s that great a feature.
As a lisper, I would disagree with the basic premise that “shells” or programming environments cannot have a good REPL and be a good programming language.
The original article says that (good?) programming languages require “readable and maintainable syntax, static types, modules, visibility, declarations, explicit configuration rather than implicit conventions.”
As a non-lisper, I’ve never found Lisp syntax readable or maintainable. The books I’ve read, and the Lispers I’ve known, all swore that it would all click for me at some point, but nope. I get lost in all the parentheses and identical looking code.
As a random example, here’s a function I found online that merges two lists:
(defun myappend (L1 L2)
(cond
; error checking
((not (listp L1)) (format t "cannot append, first argument is not a list~%" L1))
((not (listp L2)) (format t "cannot append, second argument is not a list~%" L2))
; base cases
((null L1) L2)
((null L2) L1)
; general case, neither list is empty
(t (cons (car L1) (myappend (cdr L1) L2)))))
I would put out my eyes if I had to read code like that for more than a day. The last line with its crystal clear ))))) is the kicker. I know that this may be (largely? entirely?) subjective, but I think it’s a relatively common point of view.
My problem with that code isn’t the parens per se. It’s that you as the reader have to already know or be able to infer what is evaluated and what is not. defun is a primitive function and it’s really being called, but myappend and L1 and L2 are just symbols being defined. cond is a function and is really called, but the arguments to cond are not evaluated. However, cond does execute the first item of each list it is past and evaluate that to figure out which branch is true. Presumably cond is lazy only evaluates until it reaches its first true condition. format I guess is a function, but I have no idea what t is or where it comes from. (Maybe it’s just true?) null is a primitive symbol, but in this case, maybe it’s not a value, it’s a function that tests for equality? Or maybe cond handles elements with two items differently than elements with one item? cons, car, and cdr are functions and they are really evaluated when their case is true…
Anyhow, you can work it out with some guesswork and domain knowledge, but having syntax that distinguishes these things would be much more clear:
defun :myappend [:L1 :L2] {
cond [
[{not (listp L1)} {format t "cannot append, first argument is not a list~%" L1}]
[{isNull L1} {L2}]
[{isNull L2} {L1}]
[{true} {cons (car L1) (myappend (cdr L1) L2)}]
]
}
I don’t love the parentheses, but they’re not my biggest stumbling block with Lisp either. I find it hard to read Lisp because everything seems the same—flat somehow. Maybe that’s because there’s so little visual, explicit syntax. In that case, (maybe?) we’re back to parentheses. I’m not sure.
It’s because you’re reading the abstract syntax tree almost literally. In most languages, syntactic patterns create structure, emphasis, guidance. What that structure costs is that it can be a wall in your way when you’d prefer to go against the grain. In Lisp, there is almost no grain or limitation, so you have surprising power. In exchange, structure is your job.
Thanks: you’ve described the flatness (and its cause) much better than I could have. (I also appreciate your adding a benefit that you get from the lack of given or required structure.)
def myappend(l1, l2):
if not isinstance(list, l1):
pass
elif …
nothing surprising with “myappend”, “l1” and “l2” and no need for a different syntax^^
cond is a macro. Rightly, it’s better to know that, but we can infer it given the syntax. So, indeed, its arguments are not evaluated. They are processed in order as you described.
(format t "~a arg) (for true indeed) prints arg to standard output. (format nil …) creates a string.
So, I don’t know that Lisp would necessarily click for any given person, but I did find that Janet clicked things in my brain in a way few languages have since.
I built a lot of fluency with it pretty quick, in part because it was rather small, but had most of what I wanted I’m a scripting language.
That doesn’t apply to Common Lisp, though, which is both large and with more than a few archaic practices.
That being said, a lot of folks bounce off, and a lot of the things that used to be near exclusive to Lisp can be found elsewhere. For me, the simplicity of a smaller, postmodern parens language is the appeal at this point (I happen to be making my own, heh)
subjective. BTW Lisp ticks the other boxes. Personal focus on “maintainable”.
To merge two lists: use append.
Here’s the code formatted with more indentation (the right way):
(defun myappend (L1 L2)
(cond
;; error checking
((not (listp L1))
(format t "cannot append, first argument is not a list~%" L1))
((not (listp L2))
(format t "cannot append, second argument is not a list~%" L2))
;; base cases
((null L1)
L2)
((null L2)
L1)
;; general case, neither list is empty
(t
(cons (car L1) (myappend (cdr L1) L2)))))
Did you learn the cond macro? Minimal knowledge is required to read Lisp as with any other language.
Sure, but what I posted is (pretty obviously) teaching code. The function is called myappend because (presumably) the teacher has told students, “Here is how we might write append if it didn’t exist.”
Here’s the code formatted with more indentation (the right way)
Thank you, but that did nothing to make the code more readable to me. (See below on “to me.”)
subjective…Did you learn the cond macro? Minimal knowledge is required to read Lisp as with any other language.
I’m not sure, but you seem to be using “subjective” as a way of saying “wrong.” Or, to put this in another way, you seem to want to correct my (and Carl’s) subjective views. As a general rule, I don’t recommend that, but it’s your choice.
I’m glad you enjoy Lisp and find it productive. But you may have missed my point. My point was that I—and many people—do not find List readable or maintainable. (It’s hard to maintain what you find unpleasant and difficult to read, after all.) I wasn’t saying you or anyone else should change your subjective view; I was just stating mine. To quote jyx, “Yes, feeling is subjective and is fine.” I didn’t mean to pick on something you love. I was questioning how widely Lisp can play the role of great REPL plus readable maintainable programming language.
hey, right, I now find my wording a bit rigid. My comment was more for other readers. We read a lot of lisp FUD, so sometimes I try to show that the Lisp world is… normal, once you know a few rules (which some busy people expect to know when they know a C-like language).
To merge two lists: use append.
rewording: “dear newcomer, be aware that Lisp also has a built-in for this”. I really mean to say it, because too often we see weird, teaching code, that does basic things. Before, these examples always bugged me. Now I give hints to my past self.
My point was that I—and many people—do not find List readable
OK, no pb! However I want to encourage newcomers to learn and practice a little before judging or dismissing the language. For me too it was weird to see lisp code at the beginning. But with a little practice the syntax goes away. It’s only syntax, there is so much more to judge a language. I wish we talked less about parens, but this holds for any other language when we stop at the superficial syntax.
or maintainable
but truly, despite one’s tastes, Lisp is maintainable! The language and the ecosystem are stable, some language features and tooling explicitly help, etc.
I think saying that every paradigm is about state is taking things a bit too far. Sure state management is a cross-cutting concern that can interact with a lot of things, but that doesn’t mean these things are defined purely in their interaction with states.
OO and FP are indeed mostly about states, but OO is also about inheritance, which in turn can be about computation rather than state. FP is also about higher-order functions, which again show up in computations more frequently than in state management.
Declarative vs imperative can also be about computation.
Services can also be about computations. In fact, in a lot of microservice-based architectures, you will have a large number of almost stateless services [1] that talk to each other, with one or a few databases at the very bottom that manages all the states. In other words microservices are more often about distributing computation rather than distributing states.
[1] Cache is a notable exception, but cache is arguably a performance optimization and does the change the stateless nature
but OO is also about inheritance, which in turn can be about computation rather than state
Alan Kay, who coined the term, would disagree. More modern OO languages are increasingly moving away from the idea that subtyping should be coupled to implementation inheritance, because it turned out to be a terrible idea in practice. Even in OO languages that support it, the mantra of ‘prefer composition over inheritance’ is common.
I think more important than Kay, is what the inventors of object-orientation would say - Nygaard and Dahl. They have sadly passed, but I imagine what they would say (particularly Nygaard) is that there is a huge problem space to which some aspects of a language might be applied, and others not. To lump everything together as the same for all is problematic. Not all software is the same. For example, implementation inheritance has been very successful for a type of problem - the production of development and runtime frameworks that provide default behavior that can be overridden or modified.
Alan Kay doesn’t get to decide what words mean for everybody else, especially when he’s changed his mind in the 40+ years since he originally coined the term.
Moreover I feel the description of OO vs FP and declarative vs imperative seem to imply that the amount of states to manage is given, and the different paradigms are just different ways to manage them. While it’s true that every system has inherent states, there can also be a lot of accidental states and different paradigms can have very different amount of accidental states.
there can also be a lot of accidental states and different paradigms can have very different amount of accidental states
Completely agree! This was really just a short article showing a different way you can view programming philosophies, and maybe evaluate them based on the trade offs they make about state management. I didn’t really dive into how certain approaches may introduce additional state, and whether that tradeoff is worth it, but it’s an important thing to consider when designing systems.
It’s more of a conceptual model that can be applied to the various programming philosophies, and shows how they differ but also how they are similar. It’s focusing on the trade offs which is important, rather than a “one true way.”
I’m slightly confused by what you mean by “computation.” Inheritance and higher-order functions both are ways of structuring code with the goal to get it more correct.
I did leave out the topic of performance from this article, and I think that is a very important axis in which the philosophies have varying takes as well. It felt like it would have made the article too unwieldy if I tried to tackle that as well.
Storing your Dropbox folder on an external drive is no longer supported by macOS.
As someone who has used Windows, macOS, Linux, and FreeBSD extensively as professional desktop OSs I still don’t understand the love so many hackers have for Apple kit.
Because not everyone needs the same features as you. I like that MacOS behaves close enough to a Linux shell, but with a quality GUI. I particularly like emacs bindings in all GUI text fields, and the ctrl/cmd key separation that makes terminals so much nicer to use. I like the out-of-the-box working drivers, without having to consult Wikis about which brands have working Linux drivers. I like the hardware, which is best in class by all metrics that matter to me, especially with Apple Silicon. I like the iPhone integration, because I use my iPhone a lot. I like AppleScript, and never bothered to learn AutoHotKey. I like that my MacBook wakes up from sleep before I can even see the display. I like the massive trackpad, which gives me plenty of space to move around the mouse. I like Apple Music and Apple Photos and Apple TV, which work flawlessly, and stream to my sound system running shairport-sync. I like Dash for docs, which has an okay-ish Linux port but definitely not the first class experience you get on MacOS. I like working gestures and consistent hotkeys, tightly controlled by Apple’s app design guidelines. I like that I can configure caps lock -> escape in the native keyboard settings, without remembering the X command or figuring out Wayland or installing some Windows thing that deeply penetrates my kernel.
I use Linux for servers. I have a Windows gaming PC that hangs on restart or shut down indefinitely until you cut power, and I don’t care enough to fix it because it technically still functions as a gaming PC. But for everything else I use MacOS, and I flat out refuse to do anything else.
As someone who ran various Linux distros as my main desktop OS for many years, I understand exactly why so many developers use Apple products: the quality of life improvement is staggeringly huge.
And to be honest, the longer I work as a programmer the more I find myself not caring about this stuff. Apple has demonstrated, in my opinion, pretty good judgment for what really matters and what’s an ignorable edge case, and for walking back when they make a mistake (like fixing the MBP keyboards and bringing back some of the removed ports).
You can still boot Linux or FreeBSD or whatever you want and spend your life customizing everything down to the tiniest detail. I don’t want to do that anymore, and Apple is a vendor which caters to my use case.
I am a macOS desktop user and I like this change. Sure, it comes with more limitations, but I think it is a large improvement over having companies like Dropbox and Microsoft (Onedrive) running code in kernel-land to support on-demand access.
That said, I use Maestral, the Dropbox client has become too bloated, shipping a browser engine, etc.
I’m not a Dropbox user so I’ve never bothered to analyse how their kext worked. But based on my own development experience in this area, I assume it probably used the kauth kernel API (or perhaps the never-officially-public-in-the-first-place MAC framework API) to hook file accesses before they happened, download file contents in the background, then allow the file operation to proceed. I expect OneDrive and Dropbox got special permission to use those APIs for longer than the rest of us.
As I understand it, Apple’s issue with such APIs is twofold:
Flaws in kernel code generally tend to lead to higher-severity security vulnerabilities, so they don’t want 3rd party developers introducing them. (They keep adding plenty of their own kernel space drivers though, presumably because of the limitations of the user space APIs they’ve provided. And because Apple’s own developers can of course be trusted to write flawless kernel code.)
(Ab)uses of kernel APIs like Kauth for round-tripping kernel hooks to user space lead to priority inversion, which in turn can lead to poor performance or hangs.
These aren’t unreasonable concerns, although the fact they’re still writing large amounts of kernel vulnerabilitiespanic bugs code themselves somewhat weakens their argument.
So far, they’ve been deprecating (and shortly after, hard-disabling) kernel APIs and replacing them with user-space based APIs which only implement a small subset of what’s possible with the kernel API. To an extent, that’s to be expected. Unrestricted kernel code is always going to be more powerful than a user space API. However, one gets the impression the kernel API deprecations happen at a faster pace than the user space replacements have time to mature for.
In this specific case, NSFileProvider has a long and chequered history. Kauth was one of the very first kernel APIs Apple deprecated, back on macOS 10.15. It became entirely unavailable for us plebs on macOS 11, the very next major release. Kauth was never designed to be a virtual file system API, but rather an authorisation API: kexts could determine if a process should be allowed to perform certain actions, mainly file operations. This happened in the form of callback functions into the kext, in the kernel context of the thread of the user process performing the operation.
Unfortunately it wasn’t very good at being an authorisation system, as it was (a) not very granular and (b) leaving a few gaping holes because certain accesses simply didn’t trigger a kauth callback. (Many years ago, around the 10.7-10.9 days, I was hired to work on some security software that transparently spawned sandboxed micro VMs for opening potentially-suspect files, and denied access to such files to regular host processes; for this, we actually tried to use kauth for its intended purpose, but it just wasn’t a very well thought-out API. I don’t think any of Apple’s own software uses it, which really is all you need to know - all of that, sandboxing, AMFI (code signing entitlements), file quarantine, etc. uses the MAC framework, which we eventually ended up using too, although the Mac version of the product was eventually discontinued.)
Kauth also isn’t a good virtual file system API (lazily providing file content on access atop the regular file system) but it was the only public API that could be (ab)used for this purpose. So as long as the callback into the kext didn’t return, the user process did not make progress. During this time, the kext (or more commonly a helper process in user space) could do other things, such as filling the placeholder file with its true content, thus implementing a virtual file system. The vfs kernel API on the other hand, at least its publicly exported subset, is only suitable for implementing pure “classic” file systems atop block devices or network-like mounts. NSFileProvider was around for a few years on iOS before macOS and used for the Usual File Cloud Suspects. Reports of problems with Google Drive or MS OneDrive on iOS continue to this day. With the 10.15 beta SDK, at the same as deprecating kauth, everyone was supposed to switch over to EndpointSecurity or NSFileProvider on macOS too. NSFileProvider dropped out of the public release of macOS 10.15 because it was so shoddy though. Apple still went ahead and disabled kauth based kexts on macOS 11 though. (EndpointSecurity was also not exactly a smooth transition: you have to ask Apple for special code signing entitlements to use the framework, and they basically ignored a load of developers who did apply for them. Some persevered and eventually got access to the entitlement after more than a year. I assume many just didn’t bother. I assume this is Apple’s idea of driving innovation on their platforms.)
Anyway, NSFileProvider did eventually ship on macOS too (in a slightly different form than during the 10.15 betas) but it works very differently than kauth did. It is an approximation of an actual virtual file system API. Because it originally came from iOS, where the UNIXy file system is not user-visible, it doesn’t really match the way power users use the file system on macOS: all of its “mount points” are squirrelled away somewhere in a hidden directory. At least back on the 10.15 betas it had massive performance problems. (Around the 10.14 timeframe I was hired to help out with a Mac port of VFSforGit, which originally used kauth. (successfully) With that API being deprecated, we investigated using NSFileProvider, but aside from the mount point location issue, it couldn’t get anywhere near the performance required for VFSforGit’s intended purpose: lazily cloning git repos with hundreds of thousands of files, unlike the kauth API. The Mac port of VFSforGit was subsequently cancelled, as there was no reasonable forward-looking API with which to implement it.)
So to come back to your point: these limitations aren’t in any way a technical necessity. Apple’s culture of how they build their platforms has become a very two-tier affair: Apple’s internal developers get the shiny high performance, powerful APIs. 3rd party developers get access to some afterthought bolt-on chicken feed that’s not been dogfooded and that you’re somehow supposed to plan and implement a product around during a 3-4 month beta phase, the first 1-2 months of which the only window in which you stand any sort of slim chance of getting huge problems with these APIs fixed. Even tech behemoths like Microsoft don’t seem to be able to influence public APIs much via Apple’s Developer Relations.
At least on the file system front, an improvement might be on the horizon. As of macOS 13, Apple has implemented some file systems (FAT, ExFAT and NTFS I think) in user space, via a new user space file system mechanism. That mechanism is not a public API at this time. Perhaps it one day will be. If it does, the questions will of course be whether
3rd party developers actually get to use it without jumping through opaque hoops.
The public API actually is what Apple’s own file system implementations get to use, or whether it’s once again a poor quality knock-off.
It can actually get close to competing in terms of features and performance compared to in-kernel vfs. I’ve not looked for comparative benchmarks (or performed any of my own) on how the new user space file systems compare to their previous kernel-based implementations, or the (more or less) highly-optimised first-tier file systems APFS and HFS+. (FAT and ExFAT are hardly first-tier in macOS, and NTFS support is even read-only.)
(The vfs subsystem could be used for implementing a virtual file system if you had access to some private APIs - indeed, macOS contains a union file system which is used in the recovery environment/OS installer - so there’s no reason Apple couldn’t export features for implementing a virtual file system to user space, even if they don’t do so in the current kernel vfs API.)
Part of that may also be the hardware, not the software.
My datapoint: I’ve never really liked macOS, and tried to upgrade away from a MacBook to a “PC” laptop (to run KDE on Linux) two years ago. But after some research, I concluded that - I still can’t believe I’m saying this - the M1 MacBook Air had the best value for money. All “PC” laptops at the same price are inferior in terms of both performance and battery life, and usually build quality too (but that’s somewhat subjective).
I believe the hardware situation is largely the same today, and will remain the same before “PC” laptops are able to move to ARM.
macOS itself is… tolerable. It has exactly one clear advantage over Linux desktop environments, which is that it has working fonts and HiDPI everywhere - you may think these are just niceties, but they are quite important for me as a Chinese speaker as Chinese on a pre-hiDPI screen is either ugly or entirely unreadable. My pet peeve is that the dock doesn’t allow you to easily switch between windows [1] but I fixed that with Contexts. There are more solutions today since 2 years ago.
[1] macOS’s Dock only switches between apps, so if you have multiple windows of the same app you have to click multiple times. It also shows all docked apps, so you have to carefully find open apps among them. I know there’s Expose, but dancing with the Trackpad to just switch to a window gets old really fast.
[macOS] has exactly one clear advantage over Linux desktop environments, which is that it has working fonts and HiDPI everywhere
Exactly one? I count a bunch, including (but not limited to) better power management, better support for external displays, better support for Bluetooth accessories, better and more user-friendly wifi/network setup… is your experience with Linux better in these areas?
My pet peeve is that the dock doesn’t allow you to easily switch between windows
Command+~ switches between windows of the active application.
This is one area that I’d concede macOS is better for most people, but not for me. I’m familiar enough with how to configure power management on Linux, and it offers much more options (sometimes depending on driver support). Mac does have good power management out of the box, but it requires third party tools to do what I consider basic functions like limiting the maximal charge.
The M1 MBA I have now has superior battery life but that comes from the hardware.
better support for external displays
I’ve not had issues with support for external monitors using KDE on my work laptop.
The MBA supports exactly one external display, and Apple removed font anti aliasing so I have to live with super big fonts in external displays. I know the Apple solution is to buy a more expensive laptop and a more expensive monitor, so it’s my problem.
better support for Bluetooth accessories
Bluetooth seems suck the same everywhere, I haven’t notice any difference on Mac - ages to connect, random dropping of inputs. Maybe it works better for Apple’s accessories which I don’t have any of, so it’s probably also my problem.
better and more user-friendly wifi/network setup
KDE’s wifi and network management is as intuitive. GNOME’s NetworkManager GUI used to suck, but even that has got better these days.
Command+~ switches between windows of the active application.
I know, but
Having to think about whether I’m switching to a different app or in the same app is really not how my brain works.
It’s still tedious when you have 3 or more windows that look similar (think terminals) and you have to pause after switching to ensure you’re in the correct window.
I really just want to use my mouse (or trackpad) really to do this kind of GUI tasks.
I’ve used a Mac for 6 years as my personal laptop and have been using Linux on my work laptop.
Back then I would agree that macOS (still OS X then) was much nicer than any DE on Linux. But Linux DEs have caught up (I’ve mainly used KDE but even GNOME is decent today), while to an end user like me, all macOS seems to have done are (1) look more like iOS (nice in some cases, terrible in others) and (2) gets really buggy every few releases and returns to an acceptable level over the next few versions. I only chose to stay on a Mac because of hardware, their OS has lost its appeal to me except for font rendering and HiDPI.
A bluetooth headset can be connected in two modes (or more), one is A2DP with high quality stereo audio, but no microphone channel, and the other one is the headset mode which has low quality audio but a microphone channel. On macOS the mode will be switched automatically whenever I join or leave a meeting, on Linux this was always a manual task that most of the time didn’t even work, e.g. because the headset was stuck in one of the modes. I can’t remember having a single issue with a bluetooth headset on macOS, but I can remember many hours of debugging pulseaudio or pipewire just to get some sound over bluetooth.
My pet peeve is that the dock doesn’t allow you to easily switch between windows
It sounds like you found a third-party app you like, but for anyone else who’s annoyed by this, you may find this keyboard shortcut helpful: when you’re in the Command-Tab app switcher, you can type Command-Down Arrow to see the individual windows of the selected app. Then you can use the Left and Right Arrow keys to select a window, and press Return to switch to that window.
This is a little fiddly mechanically, so here’s a more detailed explanation:
Press the Command key and keep it held down for the rest of this process.
Type Tab to enter the app switcher.
Use the Tab and Backtick keys to select an app. (Tab moves the selection to the right; Backtick moves it to the left.)
Press the Down Arrow key to enter “app windows” mode.
Press any of the arrow keys to activate the focus ring. I’m not really sure why this step is necessary.
Use Left and Right Arrow keys to select the window you’re interested in.
Press Return to open the window.
Now you can release the Command key. (You can actually do this any time after step 4.)
(On my U.S. keyboard, the Backtick key is directly above Tab. I’m not sure how and whether these shortcuts are different on different keyboard layouts.)
This seems ridiculous when I write it all out like this, but once you get it in your muscle memory it’s pretty quick, and it definitely feels faster than moving your hand to your mouse or trackpad to switch windows. (Who knows whether it’s actually faster.)
Thanks, I’ve tried this before but my issue is that this process involves a lot of hand-eye loop (look at something, decide what to do, do it). On the other hand, if I have a list of open windows, there is exactly one loop - find my window, move the mouse and click.
I hope people whose brain doesn’t work like mine find this useful though :)
Thanks for posting this! I have an old macbook that I never really used OSX on because I hated the window management. I gave it another serious try after seeing your post and I’m finding it much easier this time around.
I ended up using https://alt-tab-macos.netlify.app over your alt tab, but I am using Hammerspoon for other stuff. In particular hs.application.launchOrFocus is pretty much the win+1 etc hotkeys on Windows.
Once you factor in the longevity of mac laptops vs pcs, the value proposition becomes even more striking. I think this is particularly true at the pro level.
I use both. But to be honest, on the Linux side I use KDE plasma and disable everything and use a thin taskbar at the top and drop all the other stuff out of it and use mostly the same tools I use on macOS (neovim, IntelliJ, Firefox, etc…).
Which is two extremes. I’m willing to use Linux so stripped down in terms of GUI that I don’t have to deal with most GUIs at all other than ones that are consistent because they’re not using the OS GUI framework or macOS.
There’s no in between. I don’t like Ubuntu desktop, or gnome, or any of the other systems. macOS I am happy to use the guis. They’re consistent and for the most part. Just work. And I’ve been using Linux since they started mailing it out on discs.
I can’t tell you exactly why I’m happy to use macOS GUIs but not Linux based GUIs, but there is something clearly not right (specifically for me to be clear; everyone’s different) that causes me to tend to shun Linux GUIs altogether.
If I cared about hacking around with the OS (at any level up to the desktop) or the hardware, I wouldn’t do it on Apple kit, but I also wouldn’t do it on what I use every day to enable me to get stuff done, so I’d still have the Apple kit for that.
What would you, @FrostKiwi, suggest as an alternative to the Han Unification? Let’s imagine, for a moment, that Unicode doesn’t need to be backwards compliant.
I’m not OP but I’ve been driven to distraction by the same issue. It’s frustrating that Unicode has 「A」「А」「Α」「𝝖」「𝞐」 but they forced unification of far more visually distinct CJK characters.
IMO the Unicode consortium ought to introduce unambiguous variants of all CJK hanzi/kanji that got over-unified, and deprecate the ambiguously unified codepoints. Clients can continue to guess when they need to render an ambiguous codepoint (no better option), and new inputs wouldn’t have the same problem.
Aside: My solution for Anki was to add a Japanese font (EPSON-KYOKASHO) to the media catalog and reference it in CSS. This works cross-platform, so I get consistent rendering on Android, macOS, and Ubuntu. The relevant card style is:
I disagree. You can go into any used bookstore in Japan, and pull a book off the shelf with “Chinese” style pre-war kanji in it. It would be silly to say that those books should be encoded with completely different codepoints than post-war books. Yes, there were a few overly aggressive unifications in the first round, but those have all been disambiguated by now, and what’s left really are just font difference. If you were to approve splitting out all those font differences into new characters, then you’re on the road to separating out the various levels of cursive handwriting, which multiplies the number of characters by at least 4x, maybe more. The point of Unicode is not to preserve a single way that characters look. For that, we have PDF, SVG, etc. It’s just trying to preserve the semantics and handle pre-existing encodings. It’s a blurry line, for sure, but I think they did as well as they could have under the circumstances.
Now emoji, on the other hand, that’s an ongoing disaster. :-)
Splitting all the font differences would already cause an up to 4x multiplication just for characters used in mainland China, which has two font standards that apply to both traditional and simplified characters: https://en.wikipedia.org/wiki/Xin_zixing
The size of the Unicode code space is 1,114,112 [1], and Han characters occupy 98,408 codepoints [2] right now (8.8%). A lot of those characters are rarely used, but unfortunately that also means they tend to be quite complex, made up of a lot of components, which increases their chances of them having (at least theoretical) regional variants. If Unicode tries to encode every regional variant of every character as separate codepoints, there might be a real danger of Han characters exhausting the Unicode code space.
Correction: it would cause a 2x multiplication, not 4x, for characters used in mainland China since simplified and traditional characters are already encoded separately.
You can go into any used bookstore in Japan, and pull a book off the shelf with “Chinese” style pre-war kanji in it.
Readers with a western background might be tempted to reason by analogy to 「a」 having two forms, or how 「S」 looks very different in print vs handwriting, and there are characters (e.g.「令」) for which there are multiple valid shapes, but CJK unification is more like combining Greek, Latin, and Cyrillic.
I can walk into a bookstore in the USA and find a book written in Greek, but that doesn’t mean 「m」 and 「μ」should be the same codepoint and treated like a font issue. Iμaginε ρεaδing τεxτ wρiττεn likε τhiς! Sure it’s readable with a bit of effort, but it’s clearly not correct orthography.
It would be silly to say that those books should be encoded with completely different codepoints than post-war books. […] If you were to approve splitting out all those font differences into new characters, then you’re on the road to separating out the various levels of cursive handwriting,
I disagree with your position that character variants between Chinese and Japanese can be universally categorized as “font differences”.
There are some characters like 「骨」 or 「薩」 for which that’s true, but once the stroke count or radicals become different then I think it’s reasonable to separate the characters into distinct codepoints. It’s especially important if you ever want to have both characters in the same sentence, since per-word language tagging is impractical in most authoring tools.
Consider the case of mainland China’s post-war simplification. The simplified characters were assigned separate codepoints, even though following the “CJK unification logic” would have 「魚」 and 「鱼」 be assigned the same codepoint. The fact that I’m able to write that sentence in text (instead of linking pictures) is IMO evidence that separate codepoints are useful.
Additionally, I think it’s worth pointing out that the Unicode approach to CJK has never been shy about redundant codepoints to represent semantic differences. If there’s separate codepoints for 「骨」「⾻」「⻣」, why is it that 「餅」 doesn’t have variants for both 「𩙿」 and 「飠」?
I don’t know the people who decided on Han unification, but I imagine its starting point is perception of the users of Han characters, rather than abstract principles. Han characters are perceived as the same writing system with some regional differences by its users: indeed, Hanzi, Kanji, Hanja and Chữ Hán all mean “Chinese characters”.
At the same time, Greek and Latin alphabets are perceived by their users to be different writing systems, and regional variants of the Latin alphabets are again perceived to be essentially the same writing system.
You can argue how rational those perceptions are and where the line really “should” be drawn, but that doesn’t change the fact that there are almost universal perceptions in these cases.
Consider the case of mainland China’s post-war simplification. The simplified characters were assigned separate codepoints, even though following the “CJK unification logic” would have 「魚」 and 「鱼」 be assigned the same codepoint. The fact that I’m able to write that sentence in text (instead of linking pictures) is IMO evidence that separate codepoints are useful.
I am again going to point to perceptions, and in this case also China’s post-WW2 simplification. The character reform effort produced not just a traditional vs simplified table, but also an “old form” vs “new form” table to cover stylistic differences [1]. So again a perception exists that the simplified/traditional axis and the “old forms”/“new form” axis are distinct. In fact if you look at the “old form”/“new form” table, most of those variants are indeed unified in Unicode.
If there’s separate codepoints for 「骨」「⾻」「⻣」
One of the concessions of Han unification was “round trip integrity”; converting from a pre-existing Han encoding must not lose information. So if a source character set encoded two variants separately, Unicode also had to. This may be one of those cases.
Or it was just a mistake. There are a lot of mistakes when it comes to encoding Han characters in Unicode.
[1] See https://zh.wikipedia.org/wiki/新字形; I linked to the English version of the article in a sibling comment, but unfortunately the English page leaves out the most interesting part, the table itself.
Yeah. A thought experiment is, should we undo “Roman unification” and have separate codepoints for English, French, and German? English orthography and French orthography are different! The French have ç and « and different spacing around punctuation. Uppercase and lowercase Roman letters are separate, but shouldn’t we also always use different codepoints for italics, since historically they evolved separately and only got added in later? A complication is that in practice, because Unicode incorporates different encodings, it ends up having a bunch of unsystematic repeats of the Roman alphabet, including italics, but they’re only used on places like Twitter because you can’t use rich text. Users prefer to think about these things as “the same letter” presented different ways.
Specifically on the issue of stroke counts, stroke counts are not stable. People can count the same character differently, even when it is visually identical. There are lots of posts about this on Language Log. Here is a typical one: https://languagelog.ldc.upenn.edu/nll/?p=39875
Thank you for your elaborate response! Why don’t you write a Unicode proposal? It’s not as unreachable as it might sound and they are usually open for suggestions. If you take your time, you might be able to “rise” in the ranks and make such a difference. These things can change the world.
Thank you for your elaborate response! Why don’t you write a Unicode proposal?
More knowledgable and motivated people than I have spent person-centuries trying to convince the Unicode consortium to change their approach to CJK, so far largely without success.
If you take your time, you might be able to “rise” in the ranks and make such a difference.
It’s difficult to express how little interest I have in arguing with a stifling bureaucracy to solve a problem caused by its own stubbornness. If I ever find myself with that much free time I’ll sign up for an MMORPG.
For me it’s rather straight forward. Unicode even defines blocks of characters that were found on an obscure ancient rock. [1] and some codepoint inclusions are so rare, as to exist only once in script, and even then it was written wrong and had to be corrected [2].
And yet, it can’t properly capture a language, making unicode essentially “Implementation defined”. Not much of a standard then. If I look into the Source code of some webapp repos I have access to, there are over 100mb of just fonts files going: Japanese - Bold, Japanese - Normal, Japanese - Italics, Japanese - Italics Bold, Traditonal Chinese - Bold, Traditonal Chinese - Normal … you get the idea.
We have space for obscure stuff, then we have the space to properly define a language and throw in the important sinographs, that cause the use of multiple font file copies being distributed just to support multiple asian languages. And all of that, just one new codepoint block away, which would even be backwards compatible. (Of course there are more modern Open Font formats which include all variants, saving just the regional different ones as a difference and zip compressing it all, but fixing this at the Unicode level is still in the realm of possibility I think)
Also, if you open the floodgate of encoding regional variants, you probably also want to all historical variants, and that I’m sure will use up the Unicode code space…
Unicode even defines blocks of characters that were found on an obscure ancient rock.
The point is that they are distinct characters, not glyphs of known characters. Unicode won’t add codepoints when a new medieval manuscript has a new way to write the Latin letter a, for example.
The point is that they are distinct characters, not glyphs of known characters.
Ohh, now I get your point. There indeed is a difference.
I guess this boils down to certain sets of sinographs teetering on the edge of being a distinct character vs a different glyph of the same one. With one of the more egregious examples changing the number of strokes in the radicals𩙿vs飠 as mentioned above by @jmillikin. With my personal frustration being the radical 令 (order) being flipped in Chinese vs Japanese to what looks almost like a hand written version of 今 (now) like in the word 冷房. *valid, but bad example by me, as explained here
The job of the consortium is indeed not easy here. I’m just sad to see there being not enough exceptions, even though things like 𠮟る vs 叱る received new code points, thus amending Unicode after-the-fact, for crossing that line from glyph to distinct meaning, yet the more deceptively different looking examples did not.
I guess this boils down to certain sets of sinographs teetering on the edge of being a distinct character vs a different glyph of the same one.
It’s true that a lot of cases straddle the line between different fonts and different characters, but for most of font variations (like the 冷房 example you gave) there is no question that they are the same characters.
Cross-language font differences really only appear significant when you consider the different standard typeset fonts. They are usually smaller than the difference between typeset and handwritten fonts, and often smaller than the difference between fonts from different eras in the same language.
With my personal frustration being the radical 令 (order) being flipped in Chinese vs Japanese to what looks almost like a hand written version of 今 (now) like in the word 冷房.
If it’s any consolation, the right-hand version of 「冷」 is totally normal in typeset Japanese. I live in Tokyo and see it all the time (on advertisements, etc). You may want to practice recognizing both typeset and handwritten forms of some kanji, though when you practice writing it’s best to practice only the handwritten form.
If you’re learning Japanese and live overseas, Google Images can be a useful resource to find examples of kanji used in context:
You can also look for words containing similar kanji. In this case, there’s a popular genre of young-adult fiction involving “villainess” (悪役令嬢, “antagonist young lady”) characters, so you can see a lot of examples of book covers:
Ohh, that’s a misunderstanding, should have worded my comment better. I am fully aware of that and have no issue reading that. It’s just that the Chinese version of code point has a glyph, that has a totally different appearance: https://raw.githubusercontent.com/FrostKiwi/Nuklear-wiki/main/sc-vs-jp.png
I’m sorry, I’m having trouble understanding what you mean. In your linked image, both versions of 「冷」 look normal to me. The left hand side is handwriting, the right hand is typeset. It’s similar to how “a” or “4” have two commonly used forms.
There’s some Japanese-language coverage of the topic at the following links, if you’re interested:
I know much less about simplified Chinese, so I might be completely off here, but I don’t think 「冷」 has a simplified version.
If you mean that they didn’t recognize it in general, then I don’t know what to tell you. Take a look at a site that offers kanji practice sheets for children (example: https://happylilac.net/p-sy-kanzi.html) and you’ll find that form in the kanji learned in 4th grade (example: https://happylilac.net/k4-50.pdf).
Oooooooooooh.
Today I learned^^ When I hand write I am fully aware of radicals like 言、心、必 and friends being different when compared to the printed version. I was not aware the same is true for 令. Many thanks for explaining! The misunderstanding was on my part, thx for your patience.
The article mentions how content farms need to get successful people to link to them but didn’t go into much detail, but this is an important part of SEO - PageRank was famously innovative because it took the linking relationship into account.
I agree that detecting if the content itself is AI-generated or in general low-effort content is hard or impossible, but I wonder if detecting “unnatural” linking and back-linking patterns might still hold some promise.
That feels like it would just turn into an arms race, though. Search engines start looking for a pattern, content farms figure out how to create that pattern artificially, rinse and repeat.
I actually wonder if we’re going to end up back where Yahoo! was at in the late 90s and early 2000s with a directory. Maybe you have to pay for access, but verified human moderators would select content and, presumably, if the content changed substantially the link would be automatically flagged for re-review. Something like that.
I think analyzing the network of links is a bad idea since that’s what the bots are already optimizing to stymie.
I wonder how far we are from a model that can reliably read a blogpost and answer “Did this webpage actually explain anything and answer the user’s query?” Even if it doesn’t fact check and made up answers slip through, that would get rid of most of today’s content farm posts and make it more expensive to fake. Although then you have an incentive to make your content mills persuasive liars, but if they really give harmful advice that can hurt people or libel famous brands and celebrities, then maybe the scummy SEO types won’t want the liability.
You assume that they’re all in a jurisdiction where you can prosecute.
The interesting bit of why this is happening is in Element’s blog post:
Obviously they’re not naming any names, but now I’m curious who those companies are.
I don’t think there’s any specifics. Word on the street is that they were heavily underbid in some public contracts in their target market, the public sector. And the problem here is pretty obvious: Element offers Matrix hosting, but also covers the development. Competitors host Matrix/Element, but don’t bring the product forward, but rely on Elements output.
I can also see why they bring the products into Element: the Foundation has not managed to attract that many collaborators beyond Element, so Element can just make clear it’s their products from now on.
I’ve long felt like Element has had a hand in that as well though, with them seemingly doing a lot of planning and decision making on their own rather than in a public setting, so it’s always been pretty hard to know where they’re even planning on taking the project until it’s already there.
I’m certain this also a valid perspective on the issue! TBH, I mostly speak to element employees, rarely to matrix contributors on the outside.
For a bit of context, there is also a Go2 draft proposal that introduces a
check
keyword. It behaves similar to rust’stry!
macro (now?
suffix operator), and also comes with ahandle
block which behaves likedefer
but just for checked errors.And as a tongue-in-cheek aside, you could always write exceptional Go.
Even more syntax, more “advanced” features like Generics. And the next thing we will see is another language popping up and “replacing” golang, because Go is now to new developers what all the other languages were to the ones that picked up golang. And the cycle repeats.
I’d rather see golang stay as it is to be honest.
Go hasn’t really take on any significant language change other than generics. Moreover, the Go team was always open about adding generics at some point, and what they did was essentially making good their promise.
I wish people would stop crapping on Han unification if they can’t read Hanzi/Kanji. It is totally appropriate that those characters in the article are the same. If they were if different, it would be like 4 joined up and 4 in two strokes being separated, or 7 with a slash in the middle being separated. They’re different ways to write the same thing, and have all been used in Japan within the last 100 years.
There were serious issues. Unicode has eventually added dedicated code points undoing worst cases of the unification.
It’s not just about readability, but a cultural issue. People can read a foreign way of writing a character, but it’s not how they write it.
China and Japan culturally care a lot about calligraphy. To them it’s not just a font. Number of strokes does matter, even the direction of strokes is significant. Japan even has a set of traditional variants of characters that are used only in people’s names.
(I can read Kanji)
As a Chinese person, my cultural feeling is that I would identify Kanji characters with Hanzi characters. Many characters do look the same and are written the same. Differences in stroke order or the direction of certain strokes feel more like “inessential” font differences. Visibly different forms between Kanji and Hanzi are similar to simplified vs traditional: more elaborate font differences, but still essentially the same characters.
One interesting angle is how Kanji characters are read in Chinese: they’re just read like native Chinese characters, completely ignoring the Japanese pronunciation and any shape differences. For example, the protagonist of Slam Dunk, 桜木花道 is always read as yīng mù huā dào in Mandarin Chinese, despite that (1) 桜 is written 樱 in Chinese (2) 花 is written slightly differently and (3) the Japanese pronunciation, Sakuragi Hanamichi, being kunyomi, bears no resemblance to yīng mù huā dào.
On a meta level, there’s no distinction between Hanzi and Kanji in Chinese: they’re both 汉字, pronounced hàn zì. I don’t know for sure whether Japanese people have this distinction, but it’s probably illuminating to see that the Japanese wiki page 漢字 encompasses the Chinese character system in all its regional variants.
Thanks for your input. There are 新字体 and 国字 (和製漢字), but as far as I know this distinction is only academic. Kunyomi/onyomi/etc. distinction is impossible to ignore, but that’s more related to words’ etymology than writing.
Normally, I’m the first to say that the differences between simplified and traditional are overblown; however, I think it’s also eliding a bit to claim they’re essentially the same.
My mental model is that simplified originally was a surjective function. (That’s not true anymore.) But, while characters like 電/电 are onto and 只/隻 are grammatically awkward, characters like 复 can be downright misleading.
n.b. these differences matter less to Mandarin speakers, since simplified was made for it. (e.g. characters homophonous in Mandarin were merged) But the Japanese (and Korean, but that’s a different story) simplification projects came to different conclusions because they’re for different cultures and languages.
There were a few bugs and ghost characters in the process, which is to be expected when you’re digitizing tens of thousands of characters, but the basic idea of unification is sound. I had a friend who wrote her family name 櫻木 instead of the usual 桜木. Well sure, enough, that comes through on Lobsters because both variants are encoded. So too is the common 高 vs 髙 variation. The point is to be able to encode both variants where you would need them both in a single text without having to duplicate everything or worse (do we need variants each of Japanese, Korean, and Vietnamese? for pre-War and post-War Japanese? for various levels of cursive? etc.). It was a success.
Calligraphy can’t be represented in text at all. For that you need a vector image format, not a text encoding.
As for numbers of strokes, read https://languagelog.ldc.upenn.edu/nll/?p=40492 People don’t always agree how many strokes a character has.
You’re saying unification is sound, but you can spell your friend’s name correctly only because these characters weren’t unified.
Yes! Unicode has Middle English, Old Church Slavonic with its one-off ꙮ, even Phoenician and Hieroglyphs. There’s old Kana in there already. East Asia should be able to encode their historical texts too.
UCS-2 was meant to be limited to contemporary characters due to 16-bit limit, but Unicode changed course to include everything.
CJK having a font dependency is reminiscent of legacy code pages.
I’ve worked on a website developed in Hong Kong, and it did track locale and set
lang
and language-specific font stacks to distinguish between the zh-xx and jp variants. They do care.I’ve mentioned calligraphy not in technical sense, but cultural. The characters and their strokes are a valued tradition.
People may disagree about strokes of some complex characters, or there may be older and newer ways to draw a character, but that doesn’t mean the differences don’t matter.
I think the technical solution of mapping “characters” to code points to glyphs, applied to both alphabets and logograms suggests a perspective/commonality that isn’t the best way to view the issue.
You could also think of the character variants as differences in spelling. In English you have US and GB spellings, as well as historical spellings, and also some words with multiple spellings co-existing. These are the same mutually intelligible words, but if you did “English word unification”, it’d annoy some people.
Huh, my iPad does not have the fixed glyph for the multiocular O: it still has 7 eyes instead of 10 https://en.m.wikipedia.org/wiki/Multiocular_O
I’m not familiar with this issue, but why not just add markers for similar characters to distinguish the cultural variant to be used where it is relevant?
That is, in fact, what Unicode does https://en.m.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
Thanks
Re Han unification - what do native speakers think? I assume there’s a diversity of opinion.
I’ve also thought for a while that “Greco” unification would be good - we would lose the attacks where words usually written in one script are written in the identical looking letter from another script.
Last I looked into the discussion about Han unification, I got the feeling that people in China (and maybe Japan) were annoyed that their input was not specifically requested before and during the discussion to proceed with Han unification. But I really don’t know enough about these scripts to have an opinion.
Regarding Greek letters, is this a common attack vector? What characters are most often used? From the Cyrillic set?
Every east Asian text encoding scheme does unification. The decision to unify was made by east Asian engineers.
Painting “East Asian engineers” as a unitary body here is doing a lot of lifting.
The vast majority of pre-Unicode encoding schemes were both under unique resource constraints (ASCII compat? Fixed / variable length encoding? National language policy?) and designed for specific domains.
But to wit: Big5 unified some characters, HKSCS then deunified them because they weren’t the same in Cantonese.
Apologies, that was not my intent. I meant that a lot of unification decisions have been made by engineers who are familiar with the issues, rather than, say, American engineers with little knowledge of Han characters.
Yep usually Cyrillic, but Greek’s two Os can work nicely.
I think Han unification was basically a good idea, but the problem is that it’s unclear where to draw the line. In fact, many Chinese characters that seem like they should be unified are separate in Unicode just because they are separate in JIS X 0208/0212/0213. Hooray for round-trip convertibility (sarcasm intended)!
Is this mentioned in the linked article? Or did you just get reminded of it because Unicode?
The logograms are not “very different”. Any educated person would see them as the same.
Thanks. I searched the page for “Han” and “unification” and got no hits.
Complaints about the screen resolution are a matter of aesthetics, unless you work on visual digital media. In practice, a low resolution is often easier to use because it doesn’t require you to adjust the scaling, which often doesn’t work for all programs.
That said, the X220 screen is pathetically short. The 4:3 ThinkPads are much more ergonomic, and the keyboards are better than the **20 models (even if they look similar). Unfortunately the earlier CPU can be limiting due to resource waste on modern websites, but it’s workable.
The ergonomics of modern thin computers are worse still than the X220. A thin laptop has a shorter base to begin with, and the thinness requires the hinges to pull the base of the top down when it’s opened, lowering the screen further. The result is that the bottom of the screen is a good inch lower than on a thick ThinkPad, inducing that much more forward bending in the user’s upper spine.
The top of the screen of my 15” T601 frankenpad is 10” above my table and 9.75” above the keyboard. Be jealous.
A matter of aesthetics if the script your language uses has a small number of easily distinguished glyphs.
As someone who frequently reads Chinese characters on a screen, smaller fonts on pre-Retina screens strain my eyes. The more complex characters (as well as moderately complex ones in bold) are literally just blobs of black pixels and you have to guess from the general shape and context :)
I strongly disagree here. I don’t notice much difference with images, but the difference in text rendering is huge. Not needing sub-pixel AA (with its associated blurriness) to avoid jagged text is a huge win and improves readability.
Good for you. Your eyesight is much, much better than mine.
I am typing on my right hand monitor right now. It is a 27-inch (1440 × 2560) Apple Thunderbolt Display.
My left screen is a Built-in Retina Display and it is 27-inch (5120 × 2880).
Left has 4x the pixel density of right.
At 30-40cm away, I can’t see any difference between them.
If I peer with my nose 2cm from the screen the left is marginally sharper but I would hate to have to distinguish them under duress.
I’d be pretty surprised by that, my eyesight is pretty terrible. That’s part of why high resolution monitors make such a difference. Blurry text from antialiasing is much harder for me to read and causes eye strain quite quickly. Even if I can’t see the pixels on lower resolution displays, I can’t focus as clearly on the outlines of characters and that makes reading harder.
I think you would be hard pressed to demonstrate a measurable difference in readability, accounting for what people are used to.
You accidentally double posted this comment.
thanks
As an X220 owner, while I concede someone may like the aesthetics of a low-resolution screen, the screen is quite bad in almost all other ways too. But you’re definitely right about aspect ratio. For terminal use a portrait 9:16 screen would be much better than 16:9. Of course external displays are better for ergonomics and nowadays large enough to work in landscape, too.
Does anyone know how Twiddler compares to CharaChorder?
I took a look at CharaChorder, it’s also a chording keyboard but it’s in a very different subcategory.
Twiddler is for one-hand use, has few keys (12), and you use chording to access the full range of characters.
CharaChorder is for two-hand use, has a normal number of keys (for CharaChorder One, each of the 4 directions on a key counts as a distinct key), and you use chording to type whole words faster.
I also found this review of CharaChorder on YouTube and it seems to have had some major bugs, although the manufacturer says that they’ve fixed most of those.
Hi. I just bought a pair of Twiddler3 units. AMA!
I’ll start.
I love this device and the device makes me sad and it also makes me sad to realize that there is just something wrong with me, because I love it anyway.
I knew from way back when that the Tek Gear folks would ship me a unit that was not glued together if I asked them to. Indeed, they did.
Glue and the silicon holding strap-case are the only things holding it together. Without the glue.. it creaks when I use it. The case separates easily into two pieces.
Inside, a big green PCB with little .. snappy momentary switches, of the normal through-hole design.. Hold on, time out. This needs to be a blog post with pictures.
Ugh, IOU a post, xiaq.. I’ll do it this weekend.
The little HAT switch (thumb stick) is profoundly useless.
Looking forward to your post!
I suppose you’ll cover it in your post, but why did you specifically ask for a unit that was not glued together?
I have a twiddler, and I really want to get back into using it. It is probably the perfect one handed keyboard, but one handed keyboards are hard.
Asking as someone who’s never used a one-hand keyboard before but considering buy a Twiddler, what did you find hard?
getting good muscle memory takes a long time, I wouldn’t say I ever got to that point. The experience is also much slower, the fastest I’ve ever seen someone claim to get with a twiddler is 30 wpm. Far better than nothing, but it takes some adaptation for it to feel comfortable to use a computer at that speed, if you’re used to 100+ wpm.
I also am still not sure if I’m holding the twiddler correctly. It takes a couple of tries to get the hand strap just right. In my specific case it was better to find a two handed keyboard that works for my specific needs.
I never managed to get used to my twiddler. I have a twiddler 2, I believe, and found the keys to be very low in tactile feedback, and the shape didn’t fit my particular hand well, but others milage may vary.
Twiddler 3 has a curved shape and comes with a wrap so I’m hoping it will be reasonably comfortable to hold.
I am concerned about the tactile feedback though. Twiddler 3 (from its look) seems to have soft rubber keys like a remote control, which is probably not great. It’s also hard to tell how much travel the keys have from watching videos.
I’ve been looking into keyboards that I can use on the go and came across what seems to be the most popular portable chording keyboard these days. This page is a somewhat more interesting starting place as it teaches you the layout.
Other terminal file managers include nnn and ranger. Their user interfaces are more complex, but they have more features. They, too, can be configured to
cd
on exit (nnn’s docs, ranger examples 1 and 2).Two more for the list:
Links: broot, xplr
Shameless plug for Elvish (https://elv.sh), a shell with a built in file system navigation UI (it’s demo 5 on the homepage).
The author is confusing type definitions and type aliases.
The
type Color int
syntax in Go does not makeColor
an alias toint
. It definesColor
as new type withint
as the underlying type. The syntax for aliasing istype Color = int
. (In Haskell terms, Go’stype Color int
isnewtype Color = int
, and Go’stype Color = int
istype Color = int
.)This distinction is important because you can define methods on
Color
because it’s a new type; if it’s an alias ofint
that wouldn’t be allowed. In fact, this is exactly the same mechanism astype Foo struct { ... }
, which definesFoo
as a new type with the an anonymousstruct
as its underlying type.The fact that you can define
type Color string
and then assign a string literal to aColor
-typed variable is not becauseColor
is an alias ofstring
, it’s because string literals are untyped and can be assigned to any type whose underlying type isstring
(written as~string
in a constraint). You can’t do this:To get back to the original point - I’ve always suspected that the omission of enums (and sum types to some extent) has to do with Go’s (original?) focus on developing networked services. In a networked services you have to assume that every enum is open and closed enums are generally an anti-pattern.
It’s a bit of an extreme position to force this on the whole language though, closed enums that don’t cross network boundaries are useful and convenient.
What I’ve always suspected is that like everything else in Go it was started from C. C’s enums are useless, so they removed it from Go, which I think was a good idea (better not have enums at all than have garbage ones). One spanner in that wheel is that Limbo had sum types via
pick
(I think they were pretty shit tho, at least in terms of syntax / UX).I don’t believe closed enums to be an anti-pattern in networked services. They’re a great thing when generating as they ensure you can’t generate nonsense content, and on intake you validate them at the boundaries, just like you have to validate an “open enum” because the software does not handle every possible value of the underlying type, and you don’t want to end up in nonsense land.
I speculate the motivation is exactly that Go doesn’t want to get in the business of doing validations - if something physically fits in a
uint32
, it can be put in a type withuint32
as the underlying type with an explicit conversion. It’s up to the programmer to decide how to do the validation by writing code.This makes a lot of sense in networked services. If a protocol defines a flag that is supposed to have one of 6 values, and the server now sees a 7th value it doesn’t know, failing isn’t always the correct thing to do. You may want to just fall back to some default behavior, log it, and so on.
Sure, you can do that if the language provides validation for you, but then the language also needs to provide facilities for handling failed validations. And that sounds like something the designers of Go didn’t want in the language. Go would rather give you low-level tools than leaky high-level ones - it’s the same mentality that has led to the error handling situation.
Exhaustive switches on enums are also a double-edged sword across API boundaries. Once you expose a closed enum as part of your API, there will be consumers who try to do exhaustive switches on them. This means that adding a new possible value is a breaking change, so all closed enums get frozen the moment you publish your API.
Again, there is still a niche for closed enums - when it’s internal to a package, or when it’s part of a protocol or API that will literally never change, and I feel Go designers probably underestimated the size of that niche. I’m just speculating on why they decided to not have it after all - it probably came from a mentality that focuses a lot on network and API boundaries.
What sort of weird-ass scenario did you cook up here? There isn’t any of that, or any need for the language to provide validation. You do whatever validation you want when you convert from the protocol to the internal types, exactly as you handle invalid structs for instance, or just out of range values for whatever types.
That is an assertion with no basis in reality.
Love for being awful?
Which is what you want 99 times out of 100: most of the time the set of possible values is fixed (in the same way nobody’s extending the integer), and most of the rest you want to be a breaking change in the same way switching from 1 byte to 2, or signed to unsigned, or unsigned to signed, is a breaking change.
In the few cases where you foresee new values being routinely added and that not being a breaking change, don’t use sum type. Or do that anyway if the language supports non-exhaustive sum types.
Grug brain programmer not so smart, as it says on the tin…
While you’re correct, the distinction doesn’t really matter to the complaint, even if the parameter is typed it’s just a conversion away:
so the point remains that:
It is relevant because even though it doesn’t prevent deliberate conversions, it prevents accidental uses.
For better or for worse, Go programmers don’t tend to demand absolute guarantees from the language when it comes to the type system. This is very different from the community of other statically typed languages, probably because a lot of Go programmers have a background from dynamically typed languages like Python.
Speaking about language, it should be “Language Zoo”, not “Languages Zoo”. Because it is a compound word (not a compounds word). Sorry, but I see this too often.
Traditionally yes, but there’s now a trend to pluralize the attributive noun. https://english.stackexchange.com/a/474505 has a summary of the phenomenon.
If all the author needed was a blog, maybe the problem is that his tech stack is way too big for his need? A bunch of generated HTML file behind a Nginx server would not have required this amount of maintenance work.
Is the caching of image at the edge really necessary? So what if it take a little while to load them. Just by not having to load a front end framework and making 10 API call before anything is displayed, the site will already load faster than many popular site.
If the whole point is to have fun and learn stuff, the busy work is the very point of course. Yet all this seems to be the very definition of non value added work.
At the end he says
So I think the whole point is to have fun and learn stuff.
Inventing your own static site generator is also a lot of fun. And because all the hard work is done outside the serving path, there’s much less production maintenance needs.
Different people find different things fun
IMO if you do it right, inventing your own static site generator is only fun for about half a day tops. Because it only takes a couple hours. :)
Not if you decide to write your own CommonMark compliant Markdown parser :]
Pandoc is right there.
I’ve been seriously considering dropping Markdown and just transforming HTML into HTML by defining custom tags. Or finally learning XSLT and using that, and exposing stuff like transforming LaTeX math into MathML via custom functions.
All of these can be done away with.
I understand that the point may be to explore new tech with a purposefully over-engineered solution, but if the point is learning, surely the “lesson learned” should be that this kind of tech has real downsides, for the reasons the author points out and more. Dependencies, especially in the web ecosystem, are often expensive, much more so than you would think. Don’t use them unless you have to.
Static html and simple CSS are not just the preference of grumpy devs set in their ways. They really are easier to maintain.
There’s several schools of thought with regards to website optimization. One of them is that if images load quickly, you have a much lower bounce-rate (or people that run away screaming), meaning that you get more readers. Based on the stack the article describes, it does seem a little much, but he’s able to justify it. A lot of personal sites are really passion projects that won’t really work when scaled to normal production workloads, but that’s fine.
I kinda treat my website and its supporting infrastructure the same way, a lot of it is really there to help me explore the problem spaces involved. I chose to use Rust for my website, and that seems to have a lot less ecosystem churn/toil than the frontend ecosystem does. I only really have to fix things when bumping packages about once per quarter, and that’s usually about when I’m going to be improving the site anyways.
There is a happy medium to be found, but if they wanna do some dumb shit to see how things work in practice, more power to them.
Sometimes we need a tiny bit more flexibility than that. To this day I don’t know how to enable content negotiation with Nginx like I used to do with Apache. Say I have two files,
my_article.fr.html
, andmy_article.en.html
. I want to serve them underhttps://example.com/my_article
, English by default, French if the user’s browser prefers it over English. How do I do that? Right now short of falling back to Apache I’m genuinely considering writing my own web server (though I don’t really want to, because of TLS).This is the only complication I would like to address, it seems pretty basic (surely there are lots of multilingual web site out there), and I would have guessed the original dev, not being American, would have thought of linguistic issues. Haven’t they, or did I missed something?
Automatic content negotiation sucks though? It’s fine as a default first run behavior, but as someone who lived in Japan and often used the school computers, you really, really need there to also be a button on the site to explicitly pick your language instead of just assuming that the browser already knows your preference. At that point, you can probably just put some JS on a static page and have it store the language preference in localStorage or something.
There’s a way to bypass it: in addition to
Also serve
And generate a bit of HTML boilerplate to let the user access the one they want. And perhaps remember their last choice in a cookie. (I would like to avoid JavaScript as much as possible.)
If JS isn’t a deal breaker, you can make my_article a blank page that JS redirects to a language specific page. You can use
<noscript>
to have it reveal links to those pages for people with JS turned off.Browsers have had multiple user profiles with different settings available, for more than a decade now (in the case of Firefox I distinctly remember there being a profile chooser box on startup in 2001–2).
Which is fine if you can actually make a profile to suit your needs. If you cannot make a profile, you are stuck with whatever settings the browser has, and you get gibberish in response as you might not understand the local language.
Look, the browser is a user agent. It’s supposed to work for the user and be adaptable to their needs. If there are that many restrictions on it, then you don’t have a viable user agent in the first place and there’s nothing that web standards can do about that.
The initial release of Firefox was 2004. Did you typo 2011 or mean one of its predecessor browsers?
Yeah I’m probably thinking of Phoenix.
There’s no easy way, AFAIK - you either run a Perl server to get redirects or add an extra module (although if you were doing that, I’d add the Lua module which gives you much more freedom to do these kinds of shenanigans.)
Caddy allows you to match HTTP headers, and you can probably achieve what you want with a bunch of horrible rewrite rules.
You can always roll your own HTTP server and put it behind Caddy or whatever TLS-capable HTTP server.
You could put Apache behind Nginx; I’ve done that before, and I might do it again.
It’s been quite a while since I delved in on these.
In go, single character variable names are idiomatic, especially when the scope is small.
I’ve read that, and I try to follow idioms as much as possible, but I’m having a really hard time with this one because I really can’t find any justification for it (i.e. what makes Go so special that decades old advice is suddenly irrelevant?). I can see it if “small scope” means 1 or 2 lines, but beyond this I’m having a really hard time keeping track of them.
But even if one accepts the idiom, how do you solve my issue when the scope is not small?
http://doc.cat-v.org/bell_labs/pikestyle
This is such a dishonest argument, though, because it presents a false dilemma between very short names and “huge” names.
In Python I routinely write loops like:
This is more readable than a single-letter name would be, and doesn’t fall into any “huge name” trap I’m aware of.
It’s not the 1970s anymore. Our computers have the disk and memory space to let us use names like “format” instead of “fmt”. And so we should, and should leave the 1970s conventions in the dustbin of history where they belong.
I’m not sure how this is a dishonest argument.
Your Python code is equally well expressed if
entry
is namede
, as far as I can see. It is not obvious thatentry
“is more readable than a single-letter name” ofe
, at least without more context.Preferring
fmt
overformat
is not a decision made from 1970s-era technical constraints. It’s possible to prefer the shorter form over the longer form, even if our servers have bajillions more available resource-bytes.Because, as I said, it presents a false dilemma. If the only possible choices were single-character names, or “huge” unwieldy names, there might be a point. But there are in fact other options available which aid in readability by providing context without being “huge”.
Single-letter names rarely provide context. If a file contains multiple functions, each of which contain at least one loop, single-character names fail to differentiate which loop (or which function) one is looking at.
Also, I find it somewhat amusing that single-character loop variables are extremely likely to lose context or even collide during refactorings, yet as far as I’m aware the reason why Go stylistically discourages some other naming conventions (such as a “this” or “self” parameter name for functions which receive an instance of a struct defined in the same file) is that refactoring might cause loss of context.
But mostly, the single-character thing just feels to me like another instance of Go’s designers attempting to stand athwart the history of programming language design and practices, and yell “Stop!”
The normal guidance in Go is to name variables with expressivity (length) proportional to their lifetime, and the constraints of their type.
The name of a variable that’s only in scope in a single line block needs to provide far less context compared to the name of a variable that’s in scope for an entire e.g. 100 line function.
A variable of type SpecificParameter can be named sp because that meaning is unambiguous. A variable of type string that represents an e.g. request ID should probably not be called simply id, better e.g. reqID, especially if there are other string parameters that exist with similar lifecycles.
It’s not the job of a variable name to disambiguate at the file, or really even function, level. An index variable for a one-line for loop is probably appropriately named
i
no matter how many times this situation is repeated in a function or file.Or name them
SpecificParameter
andrequestID
. Pointlessly shortening variable names far too often leads to names that confuse rather than communicate. IsreqID
a request ID? A requirement ID? A requisition ID? If I see that name out of context – say, in a debug log – how am I supposed to know which one it is?And even in context it’s not always clear. Going back to the “self” param debate, as I understand it the standard Go approach if I want to write, say, a function that takes and operates on a
Server
struct, is to declare the parameter ass
. Some obvious and more communicative names are ruled out by other aspects of Go’s naming rules/conventions, but how exactly is that informative? If I’m not directly in that file, and just see a call to the function, I’m going to need to either go look up the file or rely on IDE lookup showing me the type of the parameter to figure out what it’s supposed to be, since outside of the original context it might bes
forServer
, or might bes
forSocket
or forSSLContext
orSortedHeaders
or…(and the loop-index case does still lose context, and also needing to name loop index variables is an indication of a language with poorly-thought-out iteration patterns)
But really it just comes down to a question of why? Why does
requestID
need to be shortened toreqID
? Is the target machine going to run out of disk space or memory due to those four extra characters in the name? No. So there’s no concrete resource constraint requiring an abbreviated name. Is there any additional clarity provided by naming it onlyreqID
? No, in fact clarity is lost by shortening it. What is the alleged gain that is so overwhelmingly important that it requires all variable names to be shortened like this? As far as I can tell, it really is just “that’s the way we did it with C in the 1970s”.And it’s not even consistent – someone linked below a buffered I/O module that’s shortened to
bufio
, but theReader
andWriter
types it works with are not shortened toRdr
andWrtr
. Why does it have a functionDiscard
and not a functionDscrd
? WhyNewReaderSize
and notNwRdrSz
? There’s no principle here I can detect.Once I’m familiar with the convention, it’s a lot easier for me to recognize the shape of reqID than it is to parse the text “requestID”.
Right now I’m working in a codebase where we have to write
sending_node_component
all over the place and it’s so miserable to both type and parse that the team unanimously agreed to just alias it issnc
.You put my feelings into words, thank you
Why are you abbreviating ‘identifier’ throughout your post? What about SSL? Communication is full of jargon and abbreviations that make sense in context. They’re used to make communication smoother. Code does the same. Variable names are jargon.
Because to many people, it’s simply more readable.
“SSL” is an initialism, not an abbreviation.
So there’s an agreed-upon set of standard names which all programmers everywhere are trained on, aware of, and understand? Cool, so why does this thread – asking about how to name variables – exist in the first place?
And I gave examples of how abbreviated variable names fail at this, where slightly-longer versions do not have the same issues. I note that you haven’t actually engaged with that.
I simply don’t find your examples compelling. I simply don’t find your supposed failures harder to understand.
And if you insist on being pointlessly pedantic, single letter variable names are also initialisms of single words.
So, suppose we look at a set of possibilities:
requestIdentifier
requestIdent
reqIdentifier
requestID
reqIdent
reqID
reqI
rqID
rqI
rI
r
Do you honestly believe that all of them are basically equally useful and informative, even when viewed out of context (say, in a log or error message when trying to debug)?
I never see variable names in log files, let alone without additional information.
Do you typically dump raw variable names into logs, with no additional context? Don’t you think that indicates a problem with the quality of your logs?
It seems pathological to focus on readability of variables in the context of poorly written log messages.
Log messages are one example of when a name might appear out of context.
Do you believe all of the above names are basically equally useful and informative, even when viewed out of context? Because if you work someplace where you never have to debug based on small amounts of context-missing information, perhaps you are lucky or perhaps you are THE ONE, but the rest of us are not. So I’d appreciate your input on the hypothetical missing-context situation.
I think the basic disagreement is whether the names need to make sense without context because I don’t think they have to. The context is the surrounding code.
Or maybe the names of local variables matter a lot less, and the way the code scans when skimming matters more than you acknowledge.
It’s easier for me to read code with shorter lines. The shape of the code matters about as much as the names, maybe more.
This is a non-consideration. I don’t view them out of context. Short names tend to read better in context.
Nobody is suggesting this.
The claim is that the verbosity of a variable name ought to be a function of the lifetime of that variable, and the ambiguity of its name versus other in-scope variables.
Variable names always exist in a context defined by source code.
If you have a for loop over SpecificParameter values which is 3 lines of code long, there is no purpose served by naming the variable in that loop body as
specificParameter
, in general. It’s laborious for no reason.Variable names always exist in the context of the source code in which they are written. There is no realistic way that the literal name of a variable as defined in source code could end up in a debug log.
So why did you use the abbreviation IDE instead of spelling it out?
“IDE” is not an abbreviation, it’s an initialism. If you’re going to be pedantic like this you have to be correct about it.
According to Wikipedia, an initialism is a type of abbreviation.
“IDE” is a lexicalized abbreviation. “req” is not: you wouldn’t say “I sent you a req”, would you?
Sure I would. Especially in the context of hiring.
I would say if the name of the function is not visible on screen at the same time, your function is probably too long. But what’s your point here? That local variables should have names that are unique in a source file, although they are not in global scope?
This one I don’t understand. Newer languages do not have/allow longer variable names than older languages (for the most part). So I don’t see how the length of variable names has anything to do with the history of language design.
I’d argue there’s no such thing as a function that’s too long, only too complex. Do you think this function would be worthy of splitting, being over a hundred lines long?
In my opinion it’s not, as it’s just a linear list of tasks to be done in order to generate a chunk of bytecode for a function. It’s like a cake recipe. It’s a list of steps telling you how to mix and bake all the different ingredients to produce a cake.
Now obviously, some cake recipes are fairly complicated, and they can be split into sub-recipes - but that is a result of deep nesting being hard to keep track of, not length. In Unreal Engine for instance there are lots of functions which go incredibly deep in terms of nesting, and those could probably be factored out for readability, but I still wouldn’t be so sure about that, as moving out to smaller functions tends to hide complexity. It’s how accidental O(n²) happens.
Which is why I tend to factor code out to separate functions if I actually need to reuse it, or to make a “bookmark” showing a future extension point.
If it was
for e in BlogEntry.entries()
, I’d agree (and I’d say it should be renamed to this becausequery
is too vague) because then you know you’re getting anentry
back from that method. But a genericBlogEntry.query()
doesn’t give you any hint as to what’s coming back - which is wherefor entry in ...
is helpful because it signals “hey, I’m getting an entry object back from this”. Means you don’t have to hunt downBlogEntry.query
and find out what it returns.I’d argue if that’s not clear then it’s not e that is the problem, but .query that is. Also if you use some form of IDE the “hunting” down is usually okay. You anyways want to know whether it returns an error. And sometimes you want to know whether it’s some response type rather than the actual data or something. But in this example I think e would be as clear as entry - at least it is for the scenarios I can think of.
100% with you on this, but if this thread shows something it’s that this is way more subjective than I thought.
TBH I don’t mind it as much when it’s in the standard library, a few well known exception to the rule are not too bad.
The quote compares i to index. You are misrepresenting the argument.
Go is a reaction against earlier mainstream languages, in particular C++ and Java, and part of the reaction is to “reset” what are acceptable and what are not.
Even older advice and experience, from the people that wrote Go. They (and I) find it more readable to use short names, so they did.
For example: https://github.com/golang/go/blob/master/src/bufio/bufio.go
Very cool. However, while I’m not sure how they exactly define a transition it would seem that their claimed transition from 2 to 6 goes via 5.
The most straightforward definition of a transition is “a process during which a system starts at state X and ends at state Y”. Forbidding the system from passing through any of a set of states can be interesting, but that’d be an additional requirement. (Maybe there’s some way to define a transition that incorporates this requirement as an inherent property, but I can’t think of it.)
Also, the state of the system includes the velocity and the acceleration of each node [1]. The transition from 2 to 6 goes through a state where the relative position of the nodes are the same (or very close to) state 5, but it does not go through state 5 itself because the system isn’t stationary at that moment.
[1] Otherwise you can just swing the pendulum randomly at different speeds; the arms will accidentally line up from time to time, and you can claim to have achieved the transitions.
In this case, I am somewhat confused.
The first minute shows transition between each state. For example, 5 -> 0 -> 3, if intermediate states are allowed.
What am I missing?
That it does them in one single fluid motion.
Hmm I suppose you mean because all the transitions between 0 and X are shown, the transition from X to Y can simply be made by X to 0 and 0 to Y?
That is correct - but the other transitions shown are shorter. I imagine those are the shortest the creator found.
As an Old Fart who’s been fascinated by type and typography since my teens, it’s been amazing to see the progress of computer text. My first computer didn’t even have lower case letters!
My favorite story about this (which I may have regaled you with before, being an Old Fart) is about a meeting between Apple and Sun about Java2D, circa 1998. The type expert from Apple is describing all the tables inside TrueType fonts for things like ligatures, and how supporting them is not optional. One of the Sun guys scoffs, “yeah, but how many people really care about fancy typography like ligatures?” The Apple person gives him a Look and says “about two billion people in India, the Middle East and Asia.” [Don’t take the number literally, it’s just my recollection.]
Every time I see beautifully rendered Arabic or Devanagari or Chinese or…. in my browser it still amazes me.
I’m too young to have actually experienced it first hand, but according to the Chinese textbooks and magazines I’ve read, in the 70s a lot of people genuinely thought the Han script wouldn’t survive because it was “incompatible” with computers.
To clarify, Chinese typography is relatively straightforward - Chinese characters are fixed width, and there is no ligature. I believe Japanese is in a similar position, hiragana and katakana are usually the same width as kanji. There are of course a lot of corner cases and advanced techniques, but putting characters on a grid probably already gets you 80% the way there.
The challenging part back was how to store the font, which consists of at least several thousand everyday characters, each of which needs to be at least 32px x 32px to be legible.
Korean is nothing but ligatures.
To be pedantic, Hangul is ligatures, but Korean has other writing systems too, for example Hanja, which are descended from Chinese characters (hence why the Han character space is referred to as CJK, for Chinese/Japanese/Korean), which actually gets into a fun gotcha about text rendering that this article didn’t cover: unified Han characters in the CJK space are rendered dependent on the font, which means that:
They will be rendered as their
Simplified/TraditionalChinese/Hanja/Kanji representations depending on what font is used to render them, meaning external hinting needs to be used if these languages are mixed in one placeHanja especially, but also Kanji, are used situationally and a large number are not commonly used, hence these characters may not be present in common Japanese/Korean fonts. In those cases, the cascade rules apply and a Chinese equivalent may be shown, potentially in an entirely different style.
Simplified and traditional characters are usually encoded separately because they are considered to be “different characters”. Only “the same character” with “regional stylistic differences” are unified. There is a lot of nuance, corner cases and mistakes.
There’s a lot of discussion in https://lobste.rs/s/krune2/font_regional_variants_are_hard
Updated, thanks for the correction.
Still inaccurate though, because “Chinese” is not one regional variant :)
Let’s say that the same character is shared by simplified Chinese, traditional Chinese, Korean Hanja and Japanese Kanji. There’s usually only one stylistic variant in simplified Chinese [1], but up to 3 different stylistic variants in traditional Chinese: mainland China [2], Hong Kong / Macau, and Taiwan.
[1] Because it’s standardized by mainland China’s government, and is followed by Singaporean Chinese and Malaysian Chinese
[2] Traditional Chinese is still used in mainland China, mostly for academic publications that need to cite a lot of old text
I don’t remember hearing about Chinese computers in the old days — China’s tech wasn’t as advanced and these wasn’t as much trade. Old Japanese computers used only Hiragana and/or Katakana, which were fortunately limited enough in number to work in 8-but character sets. In the mid 80s I used some HP workstations that had Katakana in the upper half of the encoding after ASCII; I remember those characters were useful in games to represent monsters or other squiggly things :)
Computers were pretty late to the party, the problems with Chinese text started with movable type. Until a few hundred years ago it was (apparently, from reading about the history of typography a long time ago) fairly common for neologisms in Chinese to introduce new ideographs. Often these would be tweaks of old ones (or compositions of parts of existing ones) to convey meaning. Movable type meant that you needed to get a new stamp made for every new character. Computers made this even worse because you needed to assign a new code point in your character set and get fonts to add glyphs. This combination effectively froze a previously dynamic character set and meant that now neologisms have to be made of multiple glyphs, allowing the script to combine the disadvantages of an ideographic writing system with the disadvantages of a phonographic one.
Well, this is a bit of an exaggeration.
Modern Chinese words are already predominately two or more characters, and neologisms are typically formed by combining existing characters rather than inventing new characters. Modern Chinese is full of homophones, and if you invent a new character today and say it out aloud it will almost certainly be mistaken for another existing character. (In case it isn’t clear, unlike Japanese, which has a kunyomi system, characters in Chinese have monosyllabic pronunciations as a rule; there are a few exceptions that are not very important.)
Most of the single-character neologisms are academic, where characters are written and read much more frequently than spoken and heard. All the chemical elements have single-character names, and a bunch of chemical and physical terms got their own new characters in the early 20th century. The former still happens today when new elements are named, but the latter process has already slowed down greatly before the advent of computers.
There are some non-academic, single-character neologisms that are actually spoken out. I don’t have time to fully delve into that here, but one interesting thing is that there are a lot of rarely used Chinese characters encoded in Unicode, and reusing them is a very common alternative to inventing brand new characters.
In the future people, will be amazed at how primitive our merely-32-bit Unicode was. By then every character will be a URI, allowing new glyphs and emoji to be created at whim. (This development will be closely followed by the first plain-text malware.)
Surely every new character will be a SHA512 of the result of adding the corresponding glyph data on a blockchain?
As a lisper, I would disagree with the basic premise that “shells” or programming environments cannot have a good REPL and be a good programming language.
At risk of being obvious; “Things are the way they are for reasons.” These two tools will exist on day 1 and on day 2 somebody will make sure both of them are scriptable, because being scriptable is the killer feature of shells. It’s what makes shells useful. Even DOS had a scriptable shell, it’s that great a feature.
The original article says that (good?) programming languages require “readable and maintainable syntax, static types, modules, visibility, declarations, explicit configuration rather than implicit conventions.”
As a non-lisper, I’ve never found Lisp syntax readable or maintainable. The books I’ve read, and the Lispers I’ve known, all swore that it would all click for me at some point, but nope. I get lost in all the parentheses and identical looking code.
As a random example, here’s a function I found online that merges two lists:
I would put out my eyes if I had to read code like that for more than a day. The last line with its crystal clear
)))))
is the kicker. I know that this may be (largely? entirely?) subjective, but I think it’s a relatively common point of view.Admittedly there are a lot more people who routinely see
and feel everything is fine.
Yes, feeling is subjective and is fine.
you are kind. It is often:
You have an extra
;
in there!I’d not let that pass in a code review 😜
Completely subjective and entirely valid.
My problem with that code isn’t the parens per se. It’s that you as the reader have to already know or be able to infer what is evaluated and what is not.
defun
is a primitive function and it’s really being called, butmyappend
andL1
andL2
are just symbols being defined.cond
is a function and is really called, but the arguments tocond
are not evaluated. However,cond
does execute the first item of each list it is past and evaluate that to figure out which branch is true. Presumablycond
is lazy only evaluates until it reaches its first true condition.format
I guess is a function, but I have no idea whatt
is or where it comes from. (Maybe it’s justtrue
?)null
is a primitive symbol, but in this case, maybe it’s not a value, it’s a function that tests for equality? Or maybecond
handles elements with two items differently than elements with one item?cons
,car
, andcdr
are functions and they are really evaluated when their case is true…Anyhow, you can work it out with some guesswork and domain knowledge, but having syntax that distinguishes these things would be much more clear:
This is something Clojure improves somewhat upon traditional Lisps, it uses
[]
rather than()
for grouping (see defn]).I don’t love the parentheses, but they’re not my biggest stumbling block with Lisp either. I find it hard to read Lisp because everything seems the same—flat somehow. Maybe that’s because there’s so little visual, explicit syntax. In that case, (maybe?) we’re back to parentheses. I’m not sure.
It’s because you’re reading the abstract syntax tree almost literally. In most languages, syntactic patterns create structure, emphasis, guidance. What that structure costs is that it can be a wall in your way when you’d prefer to go against the grain. In Lisp, there is almost no grain or limitation, so you have surprising power. In exchange, structure is your job.
Thanks: you’ve described the flatness (and its cause) much better than I could have. (I also appreciate your adding a benefit that you get from the lack of given or required structure.)
nothing surprising with “myappend”, “l1” and “l2” and no need for a different syntax^^
cond
is a macro. Rightly, it’s better to know that, but we can infer it given the syntax. So, indeed, its arguments are not evaluated. They are processed in order as you described.(format t "~a arg)
(for true indeed) printsarg
to standard output.(format nil …)
creates a string.we can replace
car
withfirst
andcdr
withrest
.So, I don’t know that Lisp would necessarily click for any given person, but I did find that Janet clicked things in my brain in a way few languages have since.
I built a lot of fluency with it pretty quick, in part because it was rather small, but had most of what I wanted I’m a scripting language.
That doesn’t apply to Common Lisp, though, which is both large and with more than a few archaic practices.
That being said, a lot of folks bounce off, and a lot of the things that used to be near exclusive to Lisp can be found elsewhere. For me, the simplicity of a smaller, postmodern parens language is the appeal at this point (I happen to be making my own, heh)
some things yes, but never all together, let alone the interactivity of the image-based development!
I mean, I think Factor is an example of of all of that coming together in a something distinctly not a Lisp
subjective. BTW Lisp ticks the other boxes. Personal focus on “maintainable”.
To merge two lists: use
append
.Here’s the code formatted with more indentation (the right way):
Did you learn the
cond
macro? Minimal knowledge is required to read Lisp as with any other language.Sure, but what I posted is (pretty obviously) teaching code. The function is called
myappend
because (presumably) the teacher has told students, “Here is how we might writeappend
if it didn’t exist.”Thank you, but that did nothing to make the code more readable to me. (See below on “to me.”)
I’m not sure, but you seem to be using “subjective” as a way of saying “wrong.” Or, to put this in another way, you seem to want to correct my (and Carl’s) subjective views. As a general rule, I don’t recommend that, but it’s your choice.
I’m glad you enjoy Lisp and find it productive. But you may have missed my point. My point was that I—and many people—do not find List readable or maintainable. (It’s hard to maintain what you find unpleasant and difficult to read, after all.) I wasn’t saying you or anyone else should change your subjective view; I was just stating mine. To quote jyx, “Yes, feeling is subjective and is fine.” I didn’t mean to pick on something you love. I was questioning how widely Lisp can play the role of great REPL plus readable maintainable programming language.
hey, right, I now find my wording a bit rigid. My comment was more for other readers. We read a lot of lisp FUD, so sometimes I try to show that the Lisp world is… normal, once you know a few rules (which some busy people expect to know when they know a C-like language).
rewording: “dear newcomer, be aware that Lisp also has a built-in for this”. I really mean to say it, because too often we see weird, teaching code, that does basic things. Before, these examples always bugged me. Now I give hints to my past self.
OK, no pb! However I want to encourage newcomers to learn and practice a little before judging or dismissing the language. For me too it was weird to see lisp code at the beginning. But with a little practice the syntax goes away. It’s only syntax, there is so much more to judge a language. I wish we talked less about parens, but this holds for any other language when we stop at the superficial syntax.
but truly, despite one’s tastes, Lisp is maintainable! The language and the ecosystem are stable, some language features and tooling explicitly help, etc.
I think saying that every paradigm is about state is taking things a bit too far. Sure state management is a cross-cutting concern that can interact with a lot of things, but that doesn’t mean these things are defined purely in their interaction with states.
OO and FP are indeed mostly about states, but OO is also about inheritance, which in turn can be about computation rather than state. FP is also about higher-order functions, which again show up in computations more frequently than in state management.
Declarative vs imperative can also be about computation.
Services can also be about computations. In fact, in a lot of microservice-based architectures, you will have a large number of almost stateless services [1] that talk to each other, with one or a few databases at the very bottom that manages all the states. In other words microservices are more often about distributing computation rather than distributing states.
[1] Cache is a notable exception, but cache is arguably a performance optimization and does the change the stateless nature
Alan Kay, who coined the term, would disagree. More modern OO languages are increasingly moving away from the idea that subtyping should be coupled to implementation inheritance, because it turned out to be a terrible idea in practice. Even in OO languages that support it, the mantra of ‘prefer composition over inheritance’ is common.
I think more important than Kay, is what the inventors of object-orientation would say - Nygaard and Dahl. They have sadly passed, but I imagine what they would say (particularly Nygaard) is that there is a huge problem space to which some aspects of a language might be applied, and others not. To lump everything together as the same for all is problematic. Not all software is the same. For example, implementation inheritance has been very successful for a type of problem - the production of development and runtime frameworks that provide default behavior that can be overridden or modified.
Alan Kay doesn’t get to decide what words mean for everybody else, especially when he’s changed his mind in the 40+ years since he originally coined the term.
Since Hillel is being too circumspect, I’ll cite the piece: https://lobste.rs/s/0zg1cd/alan_kay_did_not_invent_objects.
Moreover I feel the description of OO vs FP and declarative vs imperative seem to imply that the amount of states to manage is given, and the different paradigms are just different ways to manage them. While it’s true that every system has inherent states, there can also be a lot of accidental states and different paradigms can have very different amount of accidental states.
Completely agree! This was really just a short article showing a different way you can view programming philosophies, and maybe evaluate them based on the trade offs they make about state management. I didn’t really dive into how certain approaches may introduce additional state, and whether that tradeoff is worth it, but it’s an important thing to consider when designing systems.
It’s more of a conceptual model that can be applied to the various programming philosophies, and shows how they differ but also how they are similar. It’s focusing on the trade offs which is important, rather than a “one true way.”
I’m slightly confused by what you mean by “computation.” Inheritance and higher-order functions both are ways of structuring code with the goal to get it more correct.
I did leave out the topic of performance from this article, and I think that is a very important axis in which the philosophies have varying takes as well. It felt like it would have made the article too unwieldy if I tried to tackle that as well.
As someone who has used Windows, macOS, Linux, and FreeBSD extensively as professional desktop OSs I still don’t understand the love so many hackers have for Apple kit.
Because not everyone needs the same features as you. I like that MacOS behaves close enough to a Linux shell, but with a quality GUI. I particularly like emacs bindings in all GUI text fields, and the ctrl/cmd key separation that makes terminals so much nicer to use. I like the out-of-the-box working drivers, without having to consult Wikis about which brands have working Linux drivers. I like the hardware, which is best in class by all metrics that matter to me, especially with Apple Silicon. I like the iPhone integration, because I use my iPhone a lot. I like AppleScript, and never bothered to learn AutoHotKey. I like that my MacBook wakes up from sleep before I can even see the display. I like the massive trackpad, which gives me plenty of space to move around the mouse. I like Apple Music and Apple Photos and Apple TV, which work flawlessly, and stream to my sound system running shairport-sync. I like Dash for docs, which has an okay-ish Linux port but definitely not the first class experience you get on MacOS. I like working gestures and consistent hotkeys, tightly controlled by Apple’s app design guidelines. I like that I can configure caps lock -> escape in the native keyboard settings, without remembering the X command or figuring out Wayland or installing some Windows thing that deeply penetrates my kernel.
I use Linux for servers. I have a Windows gaming PC that hangs on restart or shut down indefinitely until you cut power, and I don’t care enough to fix it because it technically still functions as a gaming PC. But for everything else I use MacOS, and I flat out refuse to do anything else.
As someone who ran various Linux distros as my main desktop OS for many years, I understand exactly why so many developers use Apple products: the quality of life improvement is staggeringly huge.
And to be honest, the longer I work as a programmer the more I find myself not caring about this stuff. Apple has demonstrated, in my opinion, pretty good judgment for what really matters and what’s an ignorable edge case, and for walking back when they make a mistake (like fixing the MBP keyboards and bringing back some of the removed ports).
You can still boot Linux or FreeBSD or whatever you want and spend your life customizing everything down to the tiniest detail. I don’t want to do that anymore, and Apple is a vendor which caters to my use case.
I am a macOS desktop user and I like this change. Sure, it comes with more limitations, but I think it is a large improvement over having companies like Dropbox and Microsoft (Onedrive) running code in kernel-land to support on-demand access.
That said, I use Maestral, the Dropbox client has become too bloated, shipping a browser engine, etc.
I don’t follow why the - good! - move to eliminate the need for kernel extensions necessitates the deprecation of external drives though.
I’m not a Dropbox user so I’ve never bothered to analyse how their kext worked. But based on my own development experience in this area, I assume it probably used the kauth kernel API (or perhaps the never-officially-public-in-the-first-place MAC framework API) to hook file accesses before they happened, download file contents in the background, then allow the file operation to proceed. I expect OneDrive and Dropbox got special permission to use those APIs for longer than the rest of us.
As I understand it, Apple’s issue with such APIs is twofold:
These aren’t unreasonable concerns, although the fact they’re still writing large amounts of kernel
vulnerabilitiespanic bugscode themselves somewhat weakens their argument.So far, they’ve been deprecating (and shortly after, hard-disabling) kernel APIs and replacing them with user-space based APIs which only implement a small subset of what’s possible with the kernel API. To an extent, that’s to be expected. Unrestricted kernel code is always going to be more powerful than a user space API. However, one gets the impression the kernel API deprecations happen at a faster pace than the user space replacements have time to mature for.
In this specific case,
NSFileProvider
has a long and chequered history. Kauth was one of the very first kernel APIs Apple deprecated, back on macOS 10.15. It became entirely unavailable for us plebs on macOS 11, the very next major release. Kauth was never designed to be a virtual file system API, but rather an authorisation API: kexts could determine if a process should be allowed to perform certain actions, mainly file operations. This happened in the form of callback functions into the kext, in the kernel context of the thread of the user process performing the operation.Unfortunately it wasn’t very good at being an authorisation system, as it was (a) not very granular and (b) leaving a few gaping holes because certain accesses simply didn’t trigger a kauth callback. (Many years ago, around the 10.7-10.9 days, I was hired to work on some security software that transparently spawned sandboxed micro VMs for opening potentially-suspect files, and denied access to such files to regular host processes; for this, we actually tried to use kauth for its intended purpose, but it just wasn’t a very well thought-out API. I don’t think any of Apple’s own software uses it, which really is all you need to know - all of that, sandboxing, AMFI (code signing entitlements), file quarantine, etc. uses the MAC framework, which we eventually ended up using too, although the Mac version of the product was eventually discontinued.)
Kauth also isn’t a good virtual file system API (lazily providing file content on access atop the regular file system) but it was the only public API that could be (ab)used for this purpose. So as long as the callback into the kext didn’t return, the user process did not make progress. During this time, the kext (or more commonly a helper process in user space) could do other things, such as filling the placeholder file with its true content, thus implementing a virtual file system. The vfs kernel API on the other hand, at least its publicly exported subset, is only suitable for implementing pure “classic” file systems atop block devices or network-like mounts.
NSFileProvider
was around for a few years on iOS before macOS and used for the Usual File Cloud Suspects. Reports of problems with Google Drive or MS OneDrive on iOS continue to this day. With the 10.15 beta SDK, at the same as deprecating kauth, everyone was supposed to switch over to EndpointSecurity or NSFileProvider on macOS too. NSFileProvider dropped out of the public release of macOS 10.15 because it was so shoddy though. Apple still went ahead and disabled kauth based kexts on macOS 11 though. (EndpointSecurity was also not exactly a smooth transition: you have to ask Apple for special code signing entitlements to use the framework, and they basically ignored a load of developers who did apply for them. Some persevered and eventually got access to the entitlement after more than a year. I assume many just didn’t bother. I assume this is Apple’s idea of driving innovation on their platforms.)Anyway, NSFileProvider did eventually ship on macOS too (in a slightly different form than during the 10.15 betas) but it works very differently than kauth did. It is an approximation of an actual virtual file system API. Because it originally came from iOS, where the UNIXy file system is not user-visible, it doesn’t really match the way power users use the file system on macOS: all of its “mount points” are squirrelled away somewhere in a hidden directory. At least back on the 10.15 betas it had massive performance problems. (Around the 10.14 timeframe I was hired to help out with a Mac port of VFSforGit, which originally used kauth. (successfully) With that API being deprecated, we investigated using NSFileProvider, but aside from the mount point location issue, it couldn’t get anywhere near the performance required for VFSforGit’s intended purpose: lazily cloning git repos with hundreds of thousands of files, unlike the kauth API. The Mac port of VFSforGit was subsequently cancelled, as there was no reasonable forward-looking API with which to implement it.)
So to come back to your point: these limitations aren’t in any way a technical necessity. Apple’s culture of how they build their platforms has become a very two-tier affair: Apple’s internal developers get the shiny high performance, powerful APIs. 3rd party developers get access to some afterthought bolt-on chicken feed that’s not been dogfooded and that you’re somehow supposed to plan and implement a product around during a 3-4 month beta phase, the first 1-2 months of which the only window in which you stand any sort of slim chance of getting huge problems with these APIs fixed. Even tech behemoths like Microsoft don’t seem to be able to influence public APIs much via Apple’s Developer Relations.
At least on the file system front, an improvement might be on the horizon. As of macOS 13, Apple has implemented some file systems (FAT, ExFAT and NTFS I think) in user space, via a new user space file system mechanism. That mechanism is not a public API at this time. Perhaps it one day will be. If it does, the questions will of course be whether
(The vfs subsystem could be used for implementing a virtual file system if you had access to some private APIs - indeed, macOS contains a union file system which is used in the recovery environment/OS installer - so there’s no reason Apple couldn’t export features for implementing a virtual file system to user space, even if they don’t do so in the current kernel vfs API.)
Part of that may also be the hardware, not the software.
My datapoint: I’ve never really liked macOS, and tried to upgrade away from a MacBook to a “PC” laptop (to run KDE on Linux) two years ago. But after some research, I concluded that - I still can’t believe I’m saying this - the M1 MacBook Air had the best value for money. All “PC” laptops at the same price are inferior in terms of both performance and battery life, and usually build quality too (but that’s somewhat subjective).
I believe the hardware situation is largely the same today, and will remain the same before “PC” laptops are able to move to ARM.
macOS itself is… tolerable. It has exactly one clear advantage over Linux desktop environments, which is that it has working fonts and HiDPI everywhere - you may think these are just niceties, but they are quite important for me as a Chinese speaker as Chinese on a pre-hiDPI screen is either ugly or entirely unreadable. My pet peeve is that the dock doesn’t allow you to easily switch between windows [1] but I fixed that with Contexts. There are more solutions today since 2 years ago.
[1] macOS’s Dock only switches between apps, so if you have multiple windows of the same app you have to click multiple times. It also shows all docked apps, so you have to carefully find open apps among them. I know there’s Expose, but dancing with the Trackpad to just switch to a window gets old really fast.
Exactly one? I count a bunch, including (but not limited to) better power management, better support for external displays, better support for Bluetooth accessories, better and more user-friendly wifi/network setup… is your experience with Linux better in these areas?
Command+~ switches between windows of the active application.
This is one area that I’d concede macOS is better for most people, but not for me. I’m familiar enough with how to configure power management on Linux, and it offers much more options (sometimes depending on driver support). Mac does have good power management out of the box, but it requires third party tools to do what I consider basic functions like limiting the maximal charge.
The M1 MBA I have now has superior battery life but that comes from the hardware.
I’ve not had issues with support for external monitors using KDE on my work laptop.
The MBA supports exactly one external display, and Apple removed font anti aliasing so I have to live with super big fonts in external displays. I know the Apple solution is to buy a more expensive laptop and a more expensive monitor, so it’s my problem.
Bluetooth seems suck the same everywhere, I haven’t notice any difference on Mac - ages to connect, random dropping of inputs. Maybe it works better for Apple’s accessories which I don’t have any of, so it’s probably also my problem.
KDE’s wifi and network management is as intuitive. GNOME’s NetworkManager GUI used to suck, but even that has got better these days.
I know, but
I’ve used a Mac for 6 years as my personal laptop and have been using Linux on my work laptop.
Back then I would agree that macOS (still OS X then) was much nicer than any DE on Linux. But Linux DEs have caught up (I’ve mainly used KDE but even GNOME is decent today), while to an end user like me, all macOS seems to have done are (1) look more like iOS (nice in some cases, terrible in others) and (2) gets really buggy every few releases and returns to an acceptable level over the next few versions. I only chose to stay on a Mac because of hardware, their OS has lost its appeal to me except for font rendering and HiDPI.
A bluetooth headset can be connected in two modes (or more), one is A2DP with high quality stereo audio, but no microphone channel, and the other one is the headset mode which has low quality audio but a microphone channel. On macOS the mode will be switched automatically whenever I join or leave a meeting, on Linux this was always a manual task that most of the time didn’t even work, e.g. because the headset was stuck in one of the modes. I can’t remember having a single issue with a bluetooth headset on macOS, but I can remember many hours of debugging pulseaudio or pipewire just to get some sound over bluetooth.
It sounds like you found a third-party app you like, but for anyone else who’s annoyed by this, you may find this keyboard shortcut helpful: when you’re in the Command-Tab app switcher, you can type Command-Down Arrow to see the individual windows of the selected app. Then you can use the Left and Right Arrow keys to select a window, and press Return to switch to that window.
This is a little fiddly mechanically, so here’s a more detailed explanation:
(On my U.S. keyboard, the Backtick key is directly above Tab. I’m not sure how and whether these shortcuts are different on different keyboard layouts.)
This seems ridiculous when I write it all out like this, but once you get it in your muscle memory it’s pretty quick, and it definitely feels faster than moving your hand to your mouse or trackpad to switch windows. (Who knows whether it’s actually faster.)
Thanks, I’ve tried this before but my issue is that this process involves a lot of hand-eye loop (look at something, decide what to do, do it). On the other hand, if I have a list of open windows, there is exactly one loop - find my window, move the mouse and click.
I hope people whose brain doesn’t work like mine find this useful though :)
It’s kind of nutty, but my fix for window switching has been to set up a window switcher in Hammerspoon: https://gist.github.com/jyc/fdf5962977943ccc69e44f8ddc00a168
I press alt-tab to get a list of windows by name, and can switch to window #n in the list using cmd-n. Looks like this: https://jyc-static.com/9526b5866bb195e636061ffd625b4be4093a929115c2a0b6ed3125eebe00ef20
Thanks for posting this! I have an old macbook that I never really used OSX on because I hated the window management. I gave it another serious try after seeing your post and I’m finding it much easier this time around.
I ended up using https://alt-tab-macos.netlify.app over your alt tab, but I am using Hammerspoon for other stuff. In particular hs.application.launchOrFocus is pretty much the win+1 etc hotkeys on Windows.
Once you factor in the longevity of mac laptops vs pcs, the value proposition becomes even more striking. I think this is particularly true at the pro level.
I use both. But to be honest, on the Linux side I use KDE plasma and disable everything and use a thin taskbar at the top and drop all the other stuff out of it and use mostly the same tools I use on macOS (neovim, IntelliJ, Firefox, etc…).
Which is two extremes. I’m willing to use Linux so stripped down in terms of GUI that I don’t have to deal with most GUIs at all other than ones that are consistent because they’re not using the OS GUI framework or macOS.
There’s no in between. I don’t like Ubuntu desktop, or gnome, or any of the other systems. macOS I am happy to use the guis. They’re consistent and for the most part. Just work. And I’ve been using Linux since they started mailing it out on discs.
I can’t tell you exactly why I’m happy to use macOS GUIs but not Linux based GUIs, but there is something clearly not right (specifically for me to be clear; everyone’s different) that causes me to tend to shun Linux GUIs altogether.
If I cared about hacking around with the OS (at any level up to the desktop) or the hardware, I wouldn’t do it on Apple kit, but I also wouldn’t do it on what I use every day to enable me to get stuff done, so I’d still have the Apple kit for that.
Go ahead. Kick the hornets’ nest.
What would you, @FrostKiwi, suggest as an alternative to the Han Unification? Let’s imagine, for a moment, that Unicode doesn’t need to be backwards compliant.
I’m not OP but I’ve been driven to distraction by the same issue. It’s frustrating that Unicode has 「A」「А」「Α」「𝝖」「𝞐」 but they forced unification of far more visually distinct CJK characters.
IMO the Unicode consortium ought to introduce unambiguous variants of all CJK hanzi/kanji that got over-unified, and deprecate the ambiguously unified codepoints. Clients can continue to guess when they need to render an ambiguous codepoint (no better option), and new inputs wouldn’t have the same problem.
Aside: My solution for Anki was to add a Japanese font (EPSON-KYOKASHO) to the media catalog and reference it in CSS. This works cross-platform, so I get consistent rendering on Android, macOS, and Ubuntu. The relevant card style is:
I disagree. You can go into any used bookstore in Japan, and pull a book off the shelf with “Chinese” style pre-war kanji in it. It would be silly to say that those books should be encoded with completely different codepoints than post-war books. Yes, there were a few overly aggressive unifications in the first round, but those have all been disambiguated by now, and what’s left really are just font difference. If you were to approve splitting out all those font differences into new characters, then you’re on the road to separating out the various levels of cursive handwriting, which multiplies the number of characters by at least 4x, maybe more. The point of Unicode is not to preserve a single way that characters look. For that, we have PDF, SVG, etc. It’s just trying to preserve the semantics and handle pre-existing encodings. It’s a blurry line, for sure, but I think they did as well as they could have under the circumstances.
Now emoji, on the other hand, that’s an ongoing disaster. :-)
Splitting all the font differences would already cause an up to 4x multiplication just for characters used in mainland China, which has two font standards that apply to both traditional and simplified characters: https://en.wikipedia.org/wiki/Xin_zixing
The size of the Unicode code space is 1,114,112 [1], and Han characters occupy 98,408 codepoints [2] right now (8.8%). A lot of those characters are rarely used, but unfortunately that also means they tend to be quite complex, made up of a lot of components, which increases their chances of them having (at least theoretical) regional variants. If Unicode tries to encode every regional variant of every character as separate codepoints, there might be a real danger of Han characters exhausting the Unicode code space.
[1] https://en.wikipedia.org/wiki/Code_point [2] https://en.wikipedia.org/wiki/Script_(Unicode)
Correction: it would cause a 2x multiplication, not 4x, for characters used in mainland China since simplified and traditional characters are already encoded separately.
Readers with a western background might be tempted to reason by analogy to 「a」 having two forms, or how 「S」 looks very different in print vs handwriting, and there are characters (e.g.「令」) for which there are multiple valid shapes, but CJK unification is more like combining Greek, Latin, and Cyrillic.
I can walk into a bookstore in the USA and find a book written in Greek, but that doesn’t mean 「m」 and 「μ」should be the same codepoint and treated like a font issue. Iμaginε ρεaδing τεxτ wρiττεn likε τhiς! Sure it’s readable with a bit of effort, but it’s clearly not correct orthography.
I disagree with your position that character variants between Chinese and Japanese can be universally categorized as “font differences”.
There are some characters like 「骨」 or 「薩」 for which that’s true, but once the stroke count or radicals become different then I think it’s reasonable to separate the characters into distinct codepoints. It’s especially important if you ever want to have both characters in the same sentence, since per-word language tagging is impractical in most authoring tools.
Consider the case of mainland China’s post-war simplification. The simplified characters were assigned separate codepoints, even though following the “CJK unification logic” would have 「魚」 and 「鱼」 be assigned the same codepoint. The fact that I’m able to write that sentence in text (instead of linking pictures) is IMO evidence that separate codepoints are useful.
Additionally, I think it’s worth pointing out that the Unicode approach to CJK has never been shy about redundant codepoints to represent semantic differences. If there’s separate codepoints for 「骨」「⾻」「⻣」, why is it that 「餅」 doesn’t have variants for both 「𩙿」 and 「飠」?
I don’t know the people who decided on Han unification, but I imagine its starting point is perception of the users of Han characters, rather than abstract principles. Han characters are perceived as the same writing system with some regional differences by its users: indeed, Hanzi, Kanji, Hanja and Chữ Hán all mean “Chinese characters”.
At the same time, Greek and Latin alphabets are perceived by their users to be different writing systems, and regional variants of the Latin alphabets are again perceived to be essentially the same writing system.
You can argue how rational those perceptions are and where the line really “should” be drawn, but that doesn’t change the fact that there are almost universal perceptions in these cases.
I am again going to point to perceptions, and in this case also China’s post-WW2 simplification. The character reform effort produced not just a traditional vs simplified table, but also an “old form” vs “new form” table to cover stylistic differences [1]. So again a perception exists that the simplified/traditional axis and the “old forms”/“new form” axis are distinct. In fact if you look at the “old form”/“new form” table, most of those variants are indeed unified in Unicode.
One of the concessions of Han unification was “round trip integrity”; converting from a pre-existing Han encoding must not lose information. So if a source character set encoded two variants separately, Unicode also had to. This may be one of those cases.
Or it was just a mistake. There are a lot of mistakes when it comes to encoding Han characters in Unicode.
[1] See https://zh.wikipedia.org/wiki/新字形; I linked to the English version of the article in a sibling comment, but unfortunately the English page leaves out the most interesting part, the table itself.
Yeah. A thought experiment is, should we undo “Roman unification” and have separate codepoints for English, French, and German? English orthography and French orthography are different! The French have ç and « and different spacing around punctuation. Uppercase and lowercase Roman letters are separate, but shouldn’t we also always use different codepoints for italics, since historically they evolved separately and only got added in later? A complication is that in practice, because Unicode incorporates different encodings, it ends up having a bunch of unsystematic repeats of the Roman alphabet, including italics, but they’re only used on places like Twitter because you can’t use rich text. Users prefer to think about these things as “the same letter” presented different ways.
Specifically on the issue of stroke counts, stroke counts are not stable. People can count the same character differently, even when it is visually identical. There are lots of posts about this on Language Log. Here is a typical one: https://languagelog.ldc.upenn.edu/nll/?p=39875
Thank you for your elaborate response! Why don’t you write a Unicode proposal? It’s not as unreachable as it might sound and they are usually open for suggestions. If you take your time, you might be able to “rise” in the ranks and make such a difference. These things can change the world.
More knowledgable and motivated people than I have spent person-centuries trying to convince the Unicode consortium to change their approach to CJK, so far largely without success.
It’s difficult to express how little interest I have in arguing with a stifling bureaucracy to solve a problem caused by its own stubbornness. If I ever find myself with that much free time I’ll sign up for an MMORPG.
This is very sad indeed…
For me it’s rather straight forward. Unicode even defines blocks of characters that were found on an obscure ancient rock. [1] and some codepoint inclusions are so rare, as to exist only once in script, and even then it was written wrong and had to be corrected [2].
And yet, it can’t properly capture a language, making unicode essentially “Implementation defined”. Not much of a standard then. If I look into the Source code of some webapp repos I have access to, there are over 100mb of just fonts files going: Japanese - Bold, Japanese - Normal, Japanese - Italics, Japanese - Italics Bold, Traditonal Chinese - Bold, Traditonal Chinese - Normal … you get the idea.
We have space for obscure stuff, then we have the space to properly define a language and throw in the important sinographs, that cause the use of multiple font file copies being distributed just to support multiple asian languages. And all of that, just one new codepoint block away, which would even be backwards compatible. (Of course there are more modern Open Font formats which include all variants, saving just the regional different ones as a difference and zip compressing it all, but fixing this at the Unicode level is still in the realm of possibility I think)
[ 1 - YouTube] ᚛ᚈᚑᚋ ᚄᚉᚑᚈᚈ᚜ and ᚛ᚑᚌᚐᚋ᚜
[ 2 - Wikipedia ] ꙮ
What an unsatisfying situation we are in! Thanks for your elaborate response; this gave me great context!
I’m not super confident about this, but the Unicode code space might not be big enough to contain all regional variants of all Han characters (https://lobste.rs/s/krune2/font_regional_variants_are_hard#c_1ccl2m).
Also, if you open the floodgate of encoding regional variants, you probably also want to all historical variants, and that I’m sure will use up the Unicode code space…
The point is that they are distinct characters, not glyphs of known characters. Unicode won’t add codepoints when a new medieval manuscript has a new way to write the Latin letter a, for example.
Ohh, now I get your point. There indeed is a difference.
I guess this boils down to certain sets of sinographs teetering on the edge of being a distinct character vs a different glyph of the same one. With one of the more egregious examples changing the number of strokes in the radicals𩙿vs飠 as mentioned above by @jmillikin.
With my personal frustration being the radical 令 (order) being flipped in Chinese vs Japanese to what looks almost like a hand written version of 今 (now) like in the word 冷房.*valid, but bad example by me, as explained hereThe job of the consortium is indeed not easy here. I’m just sad to see there being not enough exceptions, even though things like 𠮟る vs 叱る received new code points, thus amending Unicode after-the-fact, for crossing that line from glyph to distinct meaning, yet the more deceptively different looking examples did not.
It’s true that a lot of cases straddle the line between different fonts and different characters, but for most of font variations (like the 冷房 example you gave) there is no question that they are the same characters.
Cross-language font differences really only appear significant when you consider the different standard typeset fonts. They are usually smaller than the difference between typeset and handwritten fonts, and often smaller than the difference between fonts from different eras in the same language.
If it’s any consolation, the right-hand version of 「冷」 is totally normal in typeset Japanese. I live in Tokyo and see it all the time (on advertisements, etc). You may want to practice recognizing both typeset and handwritten forms of some kanji, though when you practice writing it’s best to practice only the handwritten form.
If you’re learning Japanese and live overseas, Google Images can be a useful resource to find examples of kanji used in context:
https://www.google.com/search?q=冷房&source=lnms&tbm=isch
Example search result: https://item.rakuten.co.jp/arimas/518125/
You can also look for words containing similar kanji. In this case, there’s a popular genre of young-adult fiction involving “villainess” (悪役令嬢, “antagonist young lady”) characters, so you can see a lot of examples of book covers:
https://www.google.com/search?q=悪役令嬢&source=lnms&tbm=isch
Ohh, that’s a misunderstanding, should have worded my comment better. I am fully aware of that and have no issue reading that.It’s just that the Chinese version of code point has a glyph, that has a totally different appearance: https://raw.githubusercontent.com/FrostKiwi/Nuklear-wiki/main/sc-vs-jp.pngI’m sorry, I’m having trouble understanding what you mean. In your linked image, both versions of 「冷」 look normal to me. The left hand side is handwriting, the right hand is typeset. It’s similar to how “a” or “4” have two commonly used forms.
There’s some Japanese-language coverage of the topic at the following links, if you’re interested:
The important thing is to not accidentally use the typeset version when you are practicing handwriting.
Interestingly enough, in the https://raw.githubusercontent.com/FrostKiwi/Nuklear-wiki/main/sc-vs-jp.png comparison, my Japanese native colleagues don’t recognize the left 冷 as written in the simplifed chinese variant at all. So maybe the example I chose wasn’t that bad after all…
I know much less about simplified Chinese, so I might be completely off here, but I don’t think 「冷」 has a simplified version.
If you mean that they didn’t recognize it in general, then I don’t know what to tell you. Take a look at a site that offers kanji practice sheets for children (example: https://happylilac.net/p-sy-kanzi.html) and you’ll find that form in the kanji learned in 4th grade (example: https://happylilac.net/k4-50.pdf).
no, i thought you suggested both sc and jp of 令 are equivalent…
As such I thought my example ( https://raw.githubusercontent.com/FrostKiwi/Nuklear-wiki/main/sc-vs-jp.png ) was bad. My point is, the people do not recognize the SC version and as such, the example is good after all?
You were right all along: https://jisho.org/search/%E5%86%B7%20%23kanji I’m rewriting that section. Sry for spamming your inbox and many thx for your corrections!
Oooooooooooh. Today I learned^^ When I hand write I am fully aware of radicals like 言、心、必 and friends being different when compared to the printed version. I was not aware the same is true for 令. Many thanks for explaining! The misunderstanding was on my part, thx for your patience.