1. 43
  1.  

    1. 31

      A good “falsehoods” list needs to include specific examples of every falsehood.

      1. 28

        Yours doesn’t! And I maintain that it’s still a good boy list.

        1. 2

          That doesn’t look to me like it’s meant to be an example of a good falsehoods list.

          1. 3

            My point stands. Dogs.

            1. 3

              In addition, it’s worth knowing that dogs up to 2 years of age exhibit the halting problem.

        2. 27

          I’ll make an attempt, with the caveat that this list seems so obvious to me that I’m worried I might be missing some nuance (imagine a similar list about cooking utensils with “people think knives can only be used for butter, but in reality they can also be used to cut bread, meat, and even vegetables!!!”).

          Sentences in all languages can be templated as easily as in English: {user} is in {location} etc.

          Both the substitutions and the surrounding text can depend on each other. The obvious example is languages where nouns have gender, but you might also have cases like Japanese where “in” might be へ, で, or に to indicate relative precision of the location.

          Words that are short in English are short in other languages too.

          German is the classic example of using lengthy compound words where English would use a shorter single-purpose word, “Rindfleisch” vs “beef” or “Lebensmittel” vs “food” (why yes I haven’t had lunch yet, why do you ask…?).

          For any text in any language, its translation into any other language is approximately as long as the original.

          See above – English -> German tends to become longer, English -> Chinese tends to become shorter.

          For every lower-case character, there is exactly one (language-independent) upper-case character, and vice versa.

          Turkish and German are famous counter-examples, with Turkish 'i' / 'I' being different letters, or German ß capitalizing to "SS" (though I think this is now considered somewhat old-fashioned?).

          The lower-case/upper-case distinction exists in all languages.

          Not true in Chinese, Japanese, Korean.

          All languages have words for exactly the same things as English.

          Every language has words that don’t exist in any other language. Sometimes because the concept is alien (English has no native word for 寿司), sometimes because a general concept has been subdivided in a different way (English has many words for overcast misty weather that don’t translate easily into languages from drier climates).

          Every expression in English, however vague and out-of-context, always has exactly one translation in every other language.

          I’m not sure what this means because many expressions in English don’t even have a single explanation in English, but in any case, idioms and double entendres often can’t be translated directly.

          All languages follow the subject-verb-object word order.

          If one’s English to SVO order is limited, limited too must their knowledge of literature be.

          When words are to be converted into Title Case, it is always the first character of the word that needs to be capitalized, in all languages.

          Even English doesn’t follow a rule of capitalizing the first character of every word. Title Casing The First Letter Of Every Word Is Bad Style.

          Every language has words for yes and no.

          One well-known counter-example being languages where agreement is by repeating a verb:

          A: “Do you want to eat lunch together?” B: “Eat.”

          In each language, the words for yes and no never change, regardless of which question they are answering.

          See above.

          There is always only one correct way to spell anything.

          Color / colour, aluminum / aluminium

          Each language is written in exactly one alphabet.

          Not sure exactly what this means – upper-case vs lower-case? Latin vs Cyrillic? 漢字 vs ひらがな カタカナ ? 简化字 vs 繁体字 ? Lots of counter-examples to choose from, Kazakh probably being a good one.

          All languages (that use the Latin alphabet) have the same alphabetical sorting order.

          Lithuanian sorts 'y' between 'i' and 'j': https://stackoverflow.com/questions/14458314/letter-y-comes-after-i-when-sorting-alphabetically

          Some languages special-case ordering of letter combinations, such as ij in Dutch.

          And then there’s the dozens of European languages that have their own letters outside the standard 26. Or diacritics.

          All languages are written from left to right.

          Arabic, Hebrew.

          Even in languages written from right to left, the user interface still “flows” from left to right.

          Not sure what “flows” means here, but applications with good RtL support usually flip the entire UI – for example a navigational menu that’s on the right in English would be on the left in Arabic.

          Every language puts spaces between words.

          Segmenting a sentence into words is as easy as splitting on whitespace (and maybe punctuation).

          Chinese, Japanese.

          Segmenting a text into sentences is as easy as splitting on end-of-sentence punctuation.

          English: "Dear Mr. Smith".

          No language puts spaces before question marks and exclamation marks at the end of a sentence.

          No language puts spaces after opening quotes and before closing quotes.

          French famously has rules that differ from English regarding spacing around punctuation.

          All languages use the same characters for opening quotes and closing quotes.

          “ ” in English,「 」in Japanese, « » in French,

          Numbers, when written out in digits, are formatted and punctuated the same way in all languages.

          European languages that use '.' for thousands separator and ',' for the fractional separator, or languages that group by different sizes (like lakh/crore in Indian languages).

          No two languages are so similar that it would ever be difficult to tell them apart.

          Many languages are considered distinct for political reasons, even if a purely linguistic analysis would consider them the same language.

          Languages that have similar names are similar.

          English (as spoken in Pittsburgh), English (as spoken in Melbourne), and English (as spoken in Glasgow).

          More seriously, Japanese and Javanese.

          Icons that are based on English puns and wordplay are easily understood by speakers of other languages.

          Often they’re difficult to understand even for English speakers (I once saw a literal hamburger used to signify a collapsable sidebar).

          Geolocation is an accurate way to predict the user’s language.

          Nobody who has ever travelled would think this. And yet. AND YET!

          C’mon Google, I know that my IP is an airport in Warsaw but I really don’t want the Maps UI to switch to Polish when I’m trying to find a route to my hotel.

          Country flags are accurate and appropriate symbols for languages.

          You can roughly gauge where you are in the world by whether the local ATMs offer “🇬🇧 English”, “🇺🇸 English”, or “🇦🇺 English”.

          Every country has exactly one “national” language.

          Belgium, Luxembourg, Switzerland.

          Every language is the “national” language of exactly one country.

          English, again.

          1. 14

            Turkish and German are famous counter-examples, with Turkish ‘i’ / ‘I’ being different letters, or German ß capitalizing to “SS” (though I think this is now considered somewhat old-fashioned?).

            The German ß has history.

            The old rule is that ß simply has no uppercase. Capitalizing it as “SS” was the default fallback rule if you had to absolutely capitalize everything and the ß would look bad (such as writing “STRAßE” => “STRASSE”). Using “SZ” was also allowed in some cases.

            The new rule is to use the uppercase ß: ẞ. So instead of “STRASSE” you now write “STRAẞE”.

            The usage of “SZ” was disallowed in 2006, the East Germans had an uppercase ß since 1957, the West German rules basically said “Uppercase ß is in development” and that was doppred in 1984 for the rule to use SS or SZ as uppercase variant. The new uppercase ß is in the rules since 2017. And since 2024 the uppercase ß is now preferred over SS.

            The ISO DIN 5008 was updated in 2020,

            This means depending on what document you’re processing, based on when it was created and WHERE it was created, it’s writing of the uppercase ß may be radically different.

            It should also be noted that if you’re in Switzerland, ß is not used at all, here the SS substitute is used even in lower case.

            Family names may also have custom capitalization rules, where ß can be replaced by SS, SZ, ẞ or even HS, so “Großman” can become “GROHSMANN”. Note that this depends on the person, while Brother Großmann may write “GROHSMANN”, Sister Großmann may write “GROSSMANN” and their mother may use “GROẞMANN” and these are all valid and equivalent.

            Umlauts may also be uppercased without the diacritic umlaut and with an E suffix; ä becomes “AE”. In some cases even lowercase input does the translation because older systems can’t handle special characters, though this is not GDPR compliant.

            No two languages are so similar that it would ever be difficult to tell them apart.

            Many languages are considered distinct for political reasons, even if a purely linguistic analysis would consider them the same language.

            If you ever want to have fun, the politics and regionality of German dialects could be enough to drive some linguists up the wall.

            Bavarian is recognized as a language and dialect at the same time, it can be subdivided into dozens and dozens of subdialects, which are all similar but may struggle to understand eachother.

            As someone who grew up in Swabian Bavaria, my dialect is a mix of both Swabian and Bavarian, I struggle to understand Northern Bavaria but I struggle much less with Basel Swiss Germany (which is distinct from Swiss German in that it originates from Lower Allemans instead of Higher Allemans) which is quite close in a lot of ways.

            And the swiss then double down on making things confusing by sometimes using french language constructs in german words, or straight up importing french or italian words.

            1. 2

              East Germans had an uppercase ß since 1957

              What should I read to learn more about this? Why wasn’t the character in Unicode 1.0, then?

              1. 5

                East Germany added the uppercase ß in 1957 and removed it in 1984. The spelling rules weren’t updated, so despite the presence of an uppercase ß, it would have been wrong to use it in any circumstances. Since Unicode 1.0 is somewhere around 1992, with some early drafts in 1988, it basically missed the uppercase ß being in the dictionary.

                The uppercase ß itself has been around since 1905 and we’ve tried to get it into Unicode since roughly 2004.

                1. 1

                  Is this more like there being an attested occurrence in a particular dictionary in East Germany in 1957 rather than common usage in East Germany?

              2. 7

                Every expression in English, however vague and out-of-context, always has exactly one translation in every other language.

                I’m not sure what this means because many expressions in English don’t even have a single explanation in English, but in any case, idioms and double entendres often can’t be translated directly.

                A good example of this is a CMS I used to work on. The way it implemented translation was to define everything using English[0], then write translations as a mapping from those English snippets to the intended language. This is fundamentally flawed, e.g. by homonyms:

                Subject            From       Flags              Actions
                ----------------------------------------------------------------
                Project update     Alice      Unread, Important  [Read] [Delete]
                Welcome            HR         Read               [Read] [Delete]
                

                Here, the “Read” flag means “this has been read”, whilst the “Read” button means “I want to read this”. Using the English as a key forces the same translation on both.

                [0] We used British English, except for the word “color”; since we felt it was better to match the CSS keywords (e.g. when defining themes, etc.).

                1. 4

                  One trick is to use a different word on the asset: Reviewed(adj) and Review(v) don’t have the same problem that Read(adj) and Read(v) do. Seen(adj) and See(v); Viewed(adj) and View(v). And so on. Then you can “translate” to English to actually use Unread/Read/[Read] if you still like it without confusing the translator who need to know you want more like e.g. Lido/Ler or 阅读/显示 and so on.

                2. 3

                  Much better than the original article. Also love how many of the counter examples come from English.

                3. 16

                  My bar for these lists is https://yourcalendricalfallacyis.com/ and most “falsehoods programmers believe” lists don’t meet it.

                  1. 6

                    The number of exceptions caused by the Hebrew calendar makes me shed a tear of joy.

                    Here’s one falsehood they missed: the length of a year varies by at most one day. True in Gregorian calendar, apparently true in the Islamic calendar, but not true in the Hebrew calendar: leap years are 30 days longer than regular years.

                    1. 2

                      They sorta cover it on the “days” section, by way of mentioning that the Hebrew calendar has leap months.

                      They also miss Byzantine calendars which are still used by many churches, related to the Jewish Greek calendar from the Septuagint. It’s of course complicated by the fact that many churches & groups do not agree on what year was the start, so it’s complex to use (but still in somewhat fairly common liturgical use).

                      1. 1

                        Wow 30? I need to red more about this

                    2. 10

                      Here’s a fun (counter)example of (something like) this one from my heritage language:

                      In each language, the words for yes and no never change, regardless of which question they are answering.

                      (Context: the word for enjoy/like is the same in the language, so when translating to English, I choose whichever sounds most natural in each given example sentence.)

                      When someone says, “do you (enjoy/)like it?”, if you want to say “yes, I like it”, that’s fine, but if you want to say you don’t like it, you would say, “I don’t want it”; if you were to say, “I don’t like it” in that situation, it would mean, “I don’t want it”. The same reversal happens if they ask, “do you want it?”, and you want to respond in the negative.

                      So someone would say, “do you want a chocolate bar?”, and you’d say, “no, I don’t want it”, and that would mean, “no, (I already know) I don’t (usually/habitually) enjoy it (when I have it), (therefore I don’t want it)”, whereas, “no, I don’t enjoy it” would just straightforwardly mean, “I don’t want it”.

                      (You can also respond with just, “no!” instead of using a verb in the answer.)

                      This only happens in the habitual present form. Someone might ask, “do you like chocolate?” before they offer you some, and you can say, “no, I don’t want it”, but if they already gave you a chocolate bar to try, they may ask, “did you like it?” in the past tense, and you’d have to respond with, “I didn’t like it” instead of, “I didn’t want it”. And, “do you want chocolate?” would be met with, “no, I don’t like it”, but “did you want chocolate?” would be met with, “no, I didn’t want it”, and that second one would just mean what it straightforwardly sounds like in English.

                      (Strictly speaking, it doesn’t need to be a response to a question, I’m just putting it into a context to show that the verb used in the answer isn’t just a negative form of the same verb used in the question.)

                      (It’s hard to explain because if you were to translate this literalistically to English, it wouldn’t even be noticed, since saying, “no, I don’t like it” in response to, “do you want it?” is quite natural, but does literally just mean, “I don’t like it”, in the sense of, “no, (I already know) I don’t (usually/habitually) enjoy it (when I have it), (therefore I don’t want it)”. Even, “no, I don’t want it“ in response to, “do you like it?” is fairly natural in English, if a little presumptive-sounding.)

                      1. 4

                        In Polish when someone asks you “Czy chcesz cukru do kawy?” (“Do you want coffee with sugar?”) and you can respond with “Dziękuję”, which can mean 2 opposite things “Yes, please” or “No, thank you”.

                      2. 6

                        The original ones, like “…Names”, don’t; part of what I find fun about them is trying to think of counterexamples.

                        1. 6

                          I think if you want them to be useful they need to include counterexamples. If it’s just a vent post then it’s fine to leave them.

                          1. 4

                            The first one gets a pass because it was the first one, and even then, I think it’s better to link people one of the many explainers people wrote about it.

                        2. 9

                          No two languages are so similar that it would ever be difficult to tell them apart.

                          I once had someone tell me that the lang attribute in HTML was pointless because language detection is “trivial”. I had no response to that.

                          1. 5

                            Explaining why it’s non-trivial is unfortunately also non-trivial.

                          2. 6
                            • A polite sentence in one language will not sound inappropriate or rude when translated to another language

                            See also Brand blunders

                            1. 5
                              • For any text in any language, its translation into any other language is approximately as long as the original.

                              I recently saw the Spanish name of a card in Slay the Spire run under the energy cost, and my Spanish is bad enough that I couldn’t infer the start of the name and thus had to suggest the card by describing it instead of reading its name.

                              • Country flags are accurate and appropriate symbols for languages.
                              • Every country has exactly one “national” language.
                              • Every language is the “national” language of exactly one country.

                              These not only fall prey to countries with multiple languages, or languages without a country, but also that country lists are universal, dialects (minecraft has some amazing dialects in their language picker), languages with multiple countries (at least English is a well known example of this).

                              1. 4

                                Geolocation bit me personally moving from Toronto to Montreal. I didn’t become a native French speaker in the move but many devices and services have trouble getting this right for me. Muddying for programmers are laws that do vary by geography.

                                1. 4

                                  many of these problems can be effectively avoided in (web) applications by using tools like icu, unicode cldr, Intl browser apis, and mozilla’s project fluent, ‘a localization system for natural-sounding translations’.

                                  1. 3

                                    Sentences in all languages can be templated as easily as in English: {user} is in {location} etc.

                                    I don’t doubt the, uh, falsehood of this one, but it surprised me a bit. I don’t know any other language well enough to think of a counterexample.

                                    1. 13

                                      It’s, uh, it sucks.

                                      As @xigoi mentioned here, in languages with grammatical cases, the form of {location} may vary.

                                      It’s not very straightforward. Best-case scenario is actually that a noun that indicates a location is always the same case (e.g. accusative), and you just have some declension rules + a series of exceptions to remember because there’s always a few of those.

                                      But some languages retain a locative case – i.e. they indicate a location by a specific form of the noun, rather than with a preposition like “in”. To make matters worse, the locative form may be gendered. E.g. in Polish it’s w domu, but w Austrii. Hopefully. Polish is special in every way there is :-).

                                      To make matters worse, some languages retain the locative, or variants of a locative, but it’s no longer productive. I think the only European language here is Hungarian, which no longer uses the locative, but retains a locative form for some places.

                                      And to make them even worse, some languages actually have different cases for relative locations. Some Baltic languages are like that. E.g. the “house” in “out of the house” and “inside of the house” would be in different cases (illative vs. ellative) and thus different forms (Finnish, which hopefully someone can correct me about if I got it wrong, would be talosta in illative but taloon in ellative.

                                      I know that there are many other constructions in some African and Asian languages, but I don’t know enough about those. I know, for example, that there are languages where:

                                      • Locatives agree with verbs and other predicates, so the form depends on what’s between {user} and {location}.
                                      • The form of the verb depends on word order/agreement (I think some Bantu languages are like that: “to be” would have different forms when saying “the birds are in the tree” vs. “in the tree are the birds”).
                                      • Some locative forms can drop certain verbs (so the “is in’ is implicit if {location} is in a locative form).
                                      1. 8

                                        It’s not so easy in English either. For example, “Please select an {object}”. If object is “pen”, then you get a grammatically incorrect English sentence. The use of “a” vs. “an” depends on the initial syllable of the following word.

                                        1. 1

                                          Or, notably, “You selected {number} items”.

                                          1. 4

                                            item(s) ;)

                                            Not to take away the point, but this one is at least so commonly “fixed” that most people stopped caring.

                                            1. 1

                                              And then you have an issue with “items”, because there are languages which don’t have an equally generic word that you can use. It would be either a different one depending on what you’re selecting, or would sound really clumsy.

                                              1. 1

                                                Some developers really are lazy… Like, how difficult is it to write "{number} item{number == 1 ? "" : "s"}"?

                                                1. 1

                                                  From my experience at least 50% of the time that is becoming a lot uglier depending on template language. Yes, if it works like this, fine.

                                                  1. 1

                                                    That would encode English grammar into program logic, though. Pragmatic me wants to avoid it because it adds guilt from this list of falsehoods.

                                                    If that “s” string can be tied to this context (because gender affects pluralization) before given to translators, it can work beautifully for many other languages, but I heard at least slavic follows a different pattern.

                                                    1. 5

                                                      Even in English it’s not complete: plural for potato is potatoes, not potatos.

                                                      I heard at least slavic follows a different pattern.

                                                      Yeah, slavic numerals are hard. In Polish: ziemniak - potato

                                                      • 1 ziemniak
                                                      • 2|3|4 ziemniaki
                                                      • 5-21 ziemniaków
                                                      • 22|23|24 ziemniaki
                                                      • 25-31 ziemniaków …

                                                      Sometimes the case is genitive, sometimes nominative 🤷. As a native I can become puzzled given awkward and unnatural enough sentence. I was once shown an example from a C1 language exam: I danced with 21 girls. All variants of a numeral sound a little awkward in my head.

                                                      1. 3

                                                        from a C1 language exam: I danced with 21 girl

                                                        This broke my brain a bit. There’s no way to phrase it well. At that point your only good way out is to dance with one more.

                                                      2. 3

                                                        The big problem, if you care about translating your app down the road, isn’t even the “s” suffix. It’s the assumption that you use a particular word if there is exactly 1 of something, and a different word if there is any quantity other than 1 of the thing. That’s true in some languages, but it’s not universal.

                                                        In some languages, there are different forms of nouns based on quantity, but they don’t follow the “if 1, then X, else Y” pattern. In Russian, for example, it’s based on the last digit of the quantity. Mostly. If the quantity ends in 2, 3, or 4, you use one word form, but not if the quantity is 12, 13, or 14. If the quantity ends in 5, 6, 7, 8, 9, or 0, or it’s 11 through 14, you use a different form. And if the quantity ends in 1, but isn’t 11, the form varies depending on the grammatical context. (Disclaimer: I don’t speak Russian, but I remember Russian being a fun stress test of some pluralization code I worked on when I was doing i18n stuff.)

                                                        And it’s not always more complicated than English. In some languages (Chinese, for example) there isn’t a plural form at all. The concept of pluralization just flat-out doesn’t exist, aside from a small handful of plural pronouns like the words for “them” and “us.”

                                                        1. 1

                                                          You need to do that anyway, because some languages do not have a straightforward equivalent of “item(s)”.

                                                  2. 1

                                                    Good point!

                                                  3. 6

                                                    Rankings being like, 1st 2nd 3rd 4th … 11th … 21st … 111th, is an example of how much of a pain at least one language can be, let alone every other language’s approach. (I think some actually use completely different words between counting and ranking too, fun!)

                                                    Aside: There was a table I found one day that listed “all” the languages and their handling of counting, ranking, etc, and how to program around it. Forgot to bookmark it however.

                                                    1. 4

                                                      the canonical data source for things like that is the unicode common locale data repository (cldr), specifically the ‘ordinal numbers’ in the language plural rules charts. this data is used, for example, by web browsers to power Intl.PluralRules and other Intl apis.

                                                    2. 5

                                                      Foe example, in any language that has grammatical cases, you would need to know the proper declension of {location}.

                                                      1. 4

                                                        In French:

                                                        • Je suis à la maison (I am at home)
                                                        • Je suis chez le médecin (I am at the doctor)
                                                        • Je suis au marché (I am at the market)
                                                        • Je suis aux Halles (I am at les Halles – a Paris metro station)
                                                        1. 6

                                                          Similarly in Polish

                                                          • Jestem w domu (I am at home)
                                                          • Jestem u lekarza (I am at the doctor)
                                                          • Jestem na targu (I am at the market)
                                                          • Jestem nad jeziorem (I am at the lake)

                                                          A lot of those issues seem to be due to English speakers not knowing any other languages.

                                                        2. 3

                                                          I’ve been learning some Korean lately and picked up that what we’d say as “outside of the apartment” in English is more like “apartment outside” in Korean.

                                                          1. 1

                                                            Prepositions are untranslatable because they don’t map one-to-one: I am in the toilet.

                                                            1. 2

                                                              And the idioms are different. One interesting example from Dutch: “Hij lijkt op zijn vader” is “he looks like his father” in English but it’s ~literally “He looks on (top of) his father” which of course doesn’t make much sense in English.

                                                              1. 1

                                                                LOL, despite what you say the verb means, that looks so much like Norwegian: “Han likner på sin far”, where “likne” is the verb for to “look like”. But yes, “på” means “on (top of)” here as well.

                                                          2. 3

                                                            No two languages are so similar that it would ever be difficult to tell them apart.

                                                            What is the situation in which this matters? If two languages are so similar that they are difficult to tell them apart, is there anyone who needs to know that?

                                                            1. 9

                                                              The differences might not be obvious to outsiders, but they can be very important to those that use the languages. I’ve definitely heard of mistakes like someone presenting Chinese text in a font meant for Japanese. It might be somewhat readable, but it’ll definitely be weird. Additionally, I doubt I could reliably distinguish written Danish and Norwegian, but I’m sure it makes a difference to the people who speak those languages.

                                                              1. 3

                                                                Right, but it doesn’t make a difference to you.

                                                                It does, however, make a difference to the people who can tell them apart easily. If it’s easy for someone to tell that you got it wrong, then I would argue that the languages are not difficult to tell them apart.

                                                                If nobody can tell them apart easily, then I don’t think it affects anyone.

                                                                1. 2

                                                                  The last time I asked a similar question, I was given the example:

                                                                  Which language are these sentences in? “My hand is in warm water. My pen is in my hand.” If you said English then you’re wrong, they’re in Afrikaans.

                                                                  And that may be true, but I fail to see how it matters to literally anyone.

                                                                  1. 12

                                                                    Language detection is a pretty common feature, to tweak the UI, offer spellchecking, choose a locale, guess the charset…

                                                              2. 5

                                                                Well, if you were trying to identify the difference between American, Canadian, and UK English in a small sample of text, you might guess the wrong one, and then end up formatting a date or currency incorrectly.

                                                                More generally, I think this rule applies less to overall similarity between languages, and more between indistinguishable subareas in languages.

                                                                E.g., for Japanese kanji and Chinese scripts, if you were just presented a small snippet of kanji, you might confuse it for Chinese.

                                                                The Lao and Thai languages are mutually intelligible when spoken, to the point that each group can understand each other, but the written scripts aren’t as similar. If you did voice recognition/transcription, it would be very easy to confuse one for the other iiuc without a sufficiently large corpus to pick up on regionally-specific words.

                                                                1. 5

                                                                  Could still be a falsehood programmers believe even if we can’t find a situation (yet) where it matters in practice :-)

                                                                  1. 4

                                                                    If I remember correctly, some languages have the two-negatives-are-a-positive and two-negatives-are-a-negative difference in regional dialects.
                                                                    One thing can be said but the exact opposite meaning can be received as a result.
                                                                    I think keeping this in mind should lead people to try and speak clearly, so translations will pick up on the right meaning, or so regional differences will be less of a problem. “don’t not avoid double negatives”

                                                                    1. 2

                                                                      The language you’re speaking right now has that!

                                                                      British RP accent: “I did not do nothing” vs American hillbilly: “I ain’t done nothin”. But nobody gets confused.

                                                                    2. 1

                                                                      It depends, for example, on the amount of text you are trying to guess upon, and how it affects the future interaction.

                                                                      Say, Russian and Ukrainian are different enough, but on small phrase fragment it might be hard to tell, and a lot of software defaults to spellchecking as Russian because it was some of the “bigger markets”. Infamously, Edge browser in this year 2025 starts to spellcheck any Cyrillic text as Russian if it is the language you tell it “not to translate” on some site once (which, I guess, it adds to the internal list of the languages the user understands, and then it is considered more probable for any Cyrillic text).

                                                                      1. 3

                                                                        Here is a somewhat artificial set of examples:

                                                                        • “Я говорю правду” (I tell the truth) is the same in Ukrainian and Russian
                                                                        • “Я брешу” is grammatically correct in both and has close meaning, but in Ukrainian, it is a neutral phrase meaning “I am lying,” while in Russian, it is closer to “I am bullshitting/barking like a dog.” So the spellchecker would be happy, but the style checker might be not.
                                                                        • “Я збрехав” is correct in Ukrainian (“I lied”) but isn’t grammatically correct/meaningful Russian.
                                                                        • “Олег говорить правду” is grammatically correct Ukrainian, but in Russian, it should be a bit different, “Олег говорит правду”, so is it Ukrainian or Russian with a typo?

                                                                        The languages share like ~60% of common word roots (not always with the same meaning, though), and a lot, but not all, of the grammar/syntax.

                                                                        So, if your software uses statistical language guessing to tweak some features like a spellchecker or speech recognition (and some software is so proud of itself it doesn’t even allow changing the guessed language manually), it is better to know that your guess might be wrong!

                                                                    3. 3

                                                                      Country flags are accurate and appropriate symbols for languages

                                                                      I mean this doesn’t stand for the English language let alone any other.

                                                                      1. 9

                                                                        And yet, language selection/display using flags is a common trope.

                                                                        1. 7

                                                                          I suspect there’s two reasons for this:

                                                                          1. The use and understanding of a universal “language” symbol isn’t widespread enough yet. I see 🌐 occasionally as a representation of a globe, but that’s abstract and not universally understood. A much better symbol is the sort I see here, but I don’t believe it’s in Unicode yet, and I’m not sure how well it’s known either.
                                                                          2. A symbol is a must. If you tried a text-only dropdown, you run into a chicken-and-egg problem, where someone who can’t read the language can’t locate the change-language selection to begin with!

                                                                          Flags are clearly suboptimal, but they’re widely recognized, and work well enough when supplemented with text to distinguish between languages sharing a flag.

                                                                          1. 1

                                                                            Yes, many such cases. On DuckDuckGo, search results are very bad if I search without specifying the language. But for specifying the language, it offers only a country selector. There are multiple languages in this country.

                                                                          2. 2

                                                                            I definitely agree that flags are not accurate, but I also think that it’s not always a huge deal. For example, I’m a Canadian who speaks both English and French and I don’t find it too innapropriate when English is represented with a US flag or when French is represented with a French flag. C’est la vie.

                                                                          3. 3

                                                                            Icons that are based on English puns and wordplay are easily understood by speakers of other languages.

                                                                            I’m reminded of when JUCE named their code generation file a “pip” which is apparently very funny in British English, and confused the hell out of the Americans. But maybe if they had an icon of a straw the pun would have translated.

                                                                            1. 6

                                                                              More like American programmers

                                                                              1. 22

                                                                                You’d think so! I did too.

                                                                                I was the tech lead of the internationalization effort for a popular website a number of years back. This was in the US. The site was English-only and we wanted to make it available in a wide variety of languages. We wanted to make it feel as native to each language as we could, rather than feeling like a translation of a foreign site.

                                                                                My team and I came up with a bunch of internal tools and a flexible library we could use to make our code work in multiple languages. When I say “flexible” I mean it went way beyond simple token replacement; it could do things like look up different variants of sentences depending on whether a caller-supplied place name referred to a city or a country and whether that distinction mattered in the target language, could use different pluralization rules for different languages, took gender into account if a sentence mentioned a person whose gender was known, and so on. We had people with linguistics backgrounds making sure we didn’t fall into any obvious traps.

                                                                                The code base was far too big for my little team to update on our own, so an early goal was to give the rest of the engineering team all the resources they needed to do a really good job of updating their own corners of the code. In addition to thoroughly documenting our tools and libraries, we wrote up a set of annotated examples of how to change existing English-only code to be translation-friendly, and we made sure it covered all the common patterns in the code base (including visual design things like assuming a button only needed to be exactly big enough to hold an English label) and included examples of what could go wrong in different languages if people decided to just do string concatenation instead.

                                                                                Then we started rolling it out. My expectation going into it was like yours: that the monoglot American devs would struggle to embrace all the techniques because English-specific assumptions would be too deeply ingrained.

                                                                                But once I started doing code reviews of people’s changes, the reality was different. It turned out there was no measurable relationship between how good someone was at making their part of the site translatable into a wide variety of languages and which language(s) they spoke. Americans who’d never spoken anything but English were just as good at it, on average, as trilingual Europeans or people whose native languages were very different from English.

                                                                                The thing that floored me was seeing people from other countries repeatedly make mistakes that would have made it impossible to correctly translate part of the site into their own native languages. This happened a lot, and it happened across multiple native languages. It seemed to me like some people were able to put their brains in “human language is highly variable and the code needs to act accordingly” mode, and some people were stuck in “I am working in English right now, so everything is English” mode, and it barely mattered if they happened to speak some other language or not.

                                                                                Maybe the situation would have been different if the site had been in more than one language from the get-go; I don’t know. But that experience really shattered some of my preconceptions about the advantages of speaking multiple languages. (For the record: I still think it’s worthwhile to be multilingual!)

                                                                                1. 5

                                                                                  The first job I had in Germany, having moved here as a fresh-faced monolingual foreigner, was kind of like this. We were building an internal tool that was used by warehouse workers in various European countries. We knew we had a lot of fairly monolingual users in a variety of different languages, so when we decided to redesign the tool, I pushed really hard for making sure that every part of the UI was fully translated into all the relevant languages. I was amazed by how much my German colleagues pushed back against this, saying it would be a lot of effort, and people could just learn what the different English-language messages meant over time.

                                                                                2. 7

                                                                                  There were things a decade ago that my non-American English native speaker coworkers (Romanian, Croatian, Indian [Hindi & Marathi]) learned alongside me who’d been doing localization for a while — back then, I spoke natively American English but had six school years of Latin, 10+ years of self-driven Esperanto, and smattering of (Mexican) Spanish and (Canadian) French — when we did a big project targeting 10 languages on release days and 22 within four weeks in a patch release. I’ve picked up Dutch and some Korean since then and I’m constantly learning new things about language having gotten into linguist sector of Instagram, Threads, Bluesky, and the fediverse.

                                                                                  Unless you’re a linguist, you’re always learning surprising new things about language, discovering new tools in the toolbox, per se. If you’re a linguist, you’re learning which has/does what because you’re more familiar with what’s in the toolbox.

                                                                                  1. 1

                                                                                    Cool you’ve learned so many languages. I’m currently trying to learn a new language and struggling a bit so maybe you can help. What are your preferred ways to learn a new language and make it stick?

                                                                                    1. 4

                                                                                      Consistent practice. Try to experience all modes - reading, writing, listening, speaking, and conversing.

                                                                                      Find media you like. You don’t have to be even at 50% comprehension to listen to some audio, but obviously you’re only going to get little bits.

                                                                                      Adverts are much simpler than anything except children’s media.

                                                                                      Experiencing the language is generally the best way to internalize the rules, but reading up on complex rules to help you practice them is also important.

                                                                                      1. 1

                                                                                        Regular practice. Duolingo is fine if all you can put into it is 10-15 minutes per day. That’s better than 0 and 5 minutes isn’t doing much. Most of my focus is on reading and writing until I started learning Dutch in 2023. You have to read a ton and listen a lot. I don’t listen as much as I should, but there are plenty of Dutch teachers and comedians on social media that I’ve come to enjoy.

                                                                                        I think it’s important to remember your purpose. I like learning languages because I like linguistics and language, not because I have an acute need to interact in another language. Honestly, some trips to Belgium and The Netherlands in the last few years have been the most immersive foreign language environment I’ve been in… and any Belgian or Dutch can tell you that you can get along just fine in most both of those countries speaking just English. I was able to use nothing but Dutch to get lunch in my great grandmother’s small hometown, though!

                                                                                        1. 2

                                                                                          Thanks, I’ll keep practicing! :)

                                                                                    2. 3

                                                                                      Absolutely not, I’ve seen a couple of these with German software, with English/French just an afterthought. And that’s already two languages where half of the stuff doesn’t even apply because they are both LTR languages in latin script. I will admit that it’s of course more likely to be a US/UK dev team.

                                                                                    3. 2

                                                                                      Languages that have similar names are similar.

                                                                                      Which ones are meant by this?

                                                                                      1. 12

                                                                                        Romanian and Romani

                                                                                        1. 5

                                                                                          and Romansh! And Romang!

                                                                                          It seems that Wikipedia has an editor in the know on this disambiguation, see the See Also on “Roman Language”.

                                                                                          1. 2

                                                                                            That mix-up sure caused some embarrassment to the producers of Peaky Blinders.

                                                                                            1. 1

                                                                                              On the other hand, the writers of Utopia leaned into it and had a whole plot point where the main characters mix up Romanian and Romani.

                                                                                            2. 1

                                                                                              Both Indo-European though.

                                                                                            3. 4

                                                                                              Maybe Javanese and Japanese?

                                                                                            4. 2

                                                                                              Icons that are based on English puns and wordplay are easily understood by speakers of other languages.

                                                                                              While it’s not exactly a pun, I often see “CC” icons in video players on the web. This isn’t a very good icon since “closed captions” isn’t a universal term.

                                                                                              1. 1

                                                                                                Nice. But it would be better if example foreach case were given.

                                                                                                1. 1

                                                                                                  ¿I was hoping for a discussion of Smalltalk vs C++, but instead …?