1. 8

  2. 32

    It’s not possible to define a function that converts strings into the correct capitalization for names, because the rules for capitalization vary between cultures, locations, and people.

    You’d need to satisfy the following test cases:

    • (from the article) Names such as JJ and KJ should be entirely capitalized.
    • Names such as Ai, Cy, Jo, and Ng should have only their first letter capitalized.
    • The token “De” should be capitalized in names from some countries/regions, and not capitalized for others.
    • Some names may have mixed capitalization, such as “McDonnell”.
    • People may have names that don’t follow any standard capitalization rules, either because they changed them as adults or as specified on their birth certificate.
      • For example, I have a relative whose last name is entirely lower-case letters.

    The best way to handle names is to let people type them in themselves, and store them as-is with minimal processing. If you need to distinguish family and given names, have separate fields for that.

    • But be prepared to handle the case of people with only a single-word name.
    1. 25

      There’s nothing wrong or odd here. What you are expecting capitalize() to be is the function that would be properly described as this:

      Returns a copy of the string capitalized as if it were a name.

      But this isn’t what capitalize() does. What capitalize() does is described in the Python 3.11 documentation:

      Return a copy of the string with its first character capitalized and the rest lowercased.

      This is exactly what it does in your examples. This is not a “problem” and there is nothing Python “doesn’t understand”. You have confused these two function descriptions because, in the majority of cases, they have the same effect: the proper name capitalization is usually the same thing as the string with a capital first letter and the rest lowercase. But they aren’t the same thing, semantically, and you’re mistaken for thinking that a function that does one should do the other.

      If your application needs to treat two-letter names like this, just do it yourself:

      def name_capitalize(s):
          return s.capitalize() if len(s) != 2 else s.upper()
      1. 1

        Oh wow, that’s a straightforward fix for this. I’ll be updating my post with this. :)

        1. 10

          as a person named Ed I’d really like you to consider not doing this

          1. 7

            Stop complaining ED

          2. 7

            I know people names Jo who would very much dislike being called JO though, so be careful about using it for your proposed purpose.

        2. 12

          Other posters have covered some relevant points, but I always love an opportunity to link a classic: patio11’s “Falsehoods Programmers Believe About Names”

          1. 6

            As the other replies point out, this is what Python documents the method as doing, and there is no generic “do the right thing” method possible that accounts for all the possible “right things”.

            Which comes down to the fact that case and case transformations are actually pretty complex. Unicode has multiple ways to define case and casedness! Unicode alone cannot provide universal comprehensive case mappings! It’s not an easy thing at all, and any “fix” for the situation you’re complaining about will break others.

            1. 5

              I think the real solution is to stop using things like this on people’s names. My name is McKayla, with a capitalized K, but so many systems insist that the k is lowercase, especially banks for some reason. I always type it with the correct capitalization, but they don’t listen.

              I don’t think this means Python should change the capitalization. It wouldn’t make sense to change the behavior to suit my needs, especially when it would currently make my entire name lowercase if it were in the middle of a sentence. I just think this function is so useless and broken as to be a blight on the language. I wish they’d just remove it, so that people wouldn’t so easily be able to wave it off as “doing the right thing, it’s built in!”, and for developers to think about what they’re really after, which is probably just capitalizing the first letter, and leaving the rest alone.

              1. 2

                Maybe I’m overly pessimistic about this, but I don’t think people are going to start doing the right thing just because a function is removed that does the wrong thing. Yeah, maybe str.capitalize is too visible and that causes people to do things wrong who wouldn’t otherwise. But your bank was probably doing this wrong long before Python was even thought of. And more generally by the time you start looking for a function to capitalize a person’s name it’s likely that you have already locked in some bad design decisions that led you to need it.

                Other commenters have already dropped some annoying cases, and I’ll contribute one more: you need to know if the string is in Turkish, which capitalizes i to İ, or Dutch, which capitalizes ij to IJ. ISTM there’s no reasonable way to keep track of all the context you might possibly need in order to capitalize some text correctly, especially if that text contains a person’s name. But people are going to keep doing it anyway and I think only social pressure will stop them.

                1. 2

                  Honestly, I’m surprised that the bank in question isn’t using EBCDIC for names.

              2. 4

                I might get convinced otherwise, but I suggested the satire tag :) Isnt it obvious that this function applies to strings (like all string functions do), not names of people?

                1. 5

                  I don’t think this is a satire tag. The person is genuine, even if they’re wrong.

                2. 4

                  Okay, we all know that every attempt to change how people spell out their names will end up like this, right?


                  1. 2

                    That’s one’s a clbuttic.

                    Related: the Scunthorpe problem

                    1. 1

                      Hehehe took me a while

                  2. 4

                    Is this an issue with Python, or an issue with your parents having strayed from the style guide?

                    JJ, meet my wife Jo, and understand why you don’t both get a function that cases your name correctly 100% of the time.

                    1. 1

                      Although this is harder but IMHO the better way is to just lowercase everything and stop using uppercase.