Prism uses a language-<x> class for <code> tags, which I think is reasonable, though a data-attribute also makes sense to me. I am curious how many places you’d find code snippets without at least some surrounding context on which language it is - specifically thinking about screen readers. A lot of blog posts do end up having un-introduced-language snippets from the author’s favourite language. But I also think in those cases, the language itself doesn’t matter - it’s just there to demonstrate a point. And in those cases too, a non-screen reader also is left guessing the language. So I suppose that doesn’t matter so much.
I also use a class of language-<X> for my blog. I’ll also put the language (like “C” or “Lua”) in the title attribute, as I’ve found that most GUI-based browsers will show it as a popup when you hover over the element.
And what’s wrong with it? It allows you to specify that a portion of the (or the whole) document is in a different language. It’s very useful information. I don’t see any downsides to it, and the information it encodes is meaningful.
It’s important for documents that contain multiple languages, because without lang= there’s no way to determine which font and/or text-to-speech voice to use.
Fonts are important for eastern Asian contexts (e.g. Chinese and Japanese use different fonts for the same Unicode code points), and text-to-speech for pretty much anything (ever hear a French TTS try to pronounce English? It’s hilarious, but also incomprehensible).
Why do we have to figure out text-to-speech voice? There’s two options here:
The user speaks the language. In that case, the text to speech voice can easily determine which of the ~4 languages that the person speaks the language is in. Because it’s trivial to determine what language text is in, or, alternatively, it isn’t in a language (it’s just a person’s name, or is ambiguous, in which case a sighted reader also can’t determine the language, and it also likely doesn’t matter)
The person doesn’t speak the language. In that case, it doesn’t matter what voice reads it, because it is meaningless text, just as it is for a sighted reader.
It’s difficult to imagine someone being so stubbornly obtuse by accident, so I’m starting to agree with other commenters that you’re not participating in good faith.
Because it’s trivial to determine what language text is in,
Does your hypothetical language detection consider 「北京」and 「東京」 to be the same language? If so, which?
alternatively, it isn’t in a language (it’s just a person’s name, or is ambiguous, in which case a sighted reader also can’t determine the language, and it also likely doesn’t matter)
You may be surprised at how many people care about having their name pronounced correctly.
The person doesn’t speak the language. In that case, it doesn’t matter what voice reads it, because it is meaningless text,
I can’t read (or speak) Polish, so when I visited Warsaw recently it was useful to have my phone be able to pronounce Polish words embedded in an English-language webpage (e.g. Wikipedia).
The questions you’re asking can be avoided with a quick google search about the lang attribute. Because your opinions are so forcefully asserted yet so easily disputed, I think your comments come across as rude and ignorant (not of technology but of other languages and cultures as well), and IMO reflect badly on this community.
Right, but so what? I don’t know what language that is in. Why does my screen reader have to? Why does a screen-reader user need more information than a non screen-reader user? If the rest of the page is in Afrikaans, the screen reader can use the Afrikaans voice to read that line. If the rest of the page is in English, it can use the english voice. If the page contains no other text, then the text is in BOTH languages, and it can use whichever the user prefers, because the text means the same thing in either, and the user likely prefers one language over the other. There’s no need to disambiguate here, as evidenced by the fact that in a book, this is not disambiguated, and sighted readers have never, ever had a problem reading it.
It’s important for screen-readers which are often used by people with vision deficiencies. (Braille terminals are expensive, and some people are not visually impaired enough to have learned Braille)
The screen reader can use the lang= attribute to determine which pronunciation and which accent to use when transforming the text into sounds. “Schadenfreude” has a totally different pronunciation in German and in English where it was borrowed. And most likely, native speaker of both languages would be unable to understand the other pronunciation. So you cannot imagine how difficult it would be for a German person to understand if the text-to-speech software started to read German text as if it was English.
Prism uses a
language-<x>class for<code>tags, which I think is reasonable, though a data-attribute also makes sense to me. I am curious how many places you’d find code snippets without at least some surrounding context on which language it is - specifically thinking about screen readers. A lot of blog posts do end up having un-introduced-language snippets from the author’s favourite language. But I also think in those cases, the language itself doesn’t matter - it’s just there to demonstrate a point. And in those cases too, a non-screen reader also is left guessing the language. So I suppose that doesn’t matter so much.This is also the suggestion made in the HTML spec itself.
I also use a class of
language-<X>for my blog. I’ll also put the language (like “C” or “Lua”) in the title attribute, as I’ve found that most GUI-based browsers will show it as a popup when you hover over the element.The lang tag is the most uselessly dumb thing in all of HTML.
There isn’t a tag
<lang>but it is a global attribute for all HTML elements. Is that what you meant?And what’s wrong with it? It allows you to specify that a portion of the (or the whole) document is in a different language. It’s very useful information. I don’t see any downsides to it, and the information it encodes is meaningful.
It’s also actually used such as the user agent loading breakpoints for hyphens, etc.
To whom is that information meaningful? It’s trivial to determine computationally, and no humans care.
It’s not only not trivial, but sometimes not possible to determine computationally.
It’s important for documents that contain multiple languages, because without
lang=there’s no way to determine which font and/or text-to-speech voice to use.Fonts are important for eastern Asian contexts (e.g. Chinese and Japanese use different fonts for the same Unicode code points), and text-to-speech for pretty much anything (ever hear a French TTS try to pronounce English? It’s hilarious, but also incomprehensible).
Why do we have to figure out text-to-speech voice? There’s two options here:
The user speaks the language. In that case, the text to speech voice can easily determine which of the ~4 languages that the person speaks the language is in. Because it’s trivial to determine what language text is in, or, alternatively, it isn’t in a language (it’s just a person’s name, or is ambiguous, in which case a sighted reader also can’t determine the language, and it also likely doesn’t matter)
The person doesn’t speak the language. In that case, it doesn’t matter what voice reads it, because it is meaningless text, just as it is for a sighted reader.
It’s difficult to imagine someone being so stubbornly obtuse by accident, so I’m starting to agree with other commenters that you’re not participating in good faith.
Does your hypothetical language detection consider 「北京」and 「東京」 to be the same language? If so, which?
You may be surprised at how many people care about having their name pronounced correctly.
I can’t read (or speak) Polish, so when I visited Warsaw recently it was useful to have my phone be able to pronounce Polish words embedded in an English-language webpage (e.g. Wikipedia).
The questions you’re asking can be avoided with a quick google search about the
langattribute. Because your opinions are so forcefully asserted yet so easily disputed, I think your comments come across as rude and ignorant (not of technology but of other languages and cultures as well), and IMO reflect badly on this community.Which language are these sentences in? “My hand is in warm water. My pen is in my hand.” If you said English then you’re wrong, they’re in Afrikaans.
Right, but so what? I don’t know what language that is in. Why does my screen reader have to? Why does a screen-reader user need more information than a non screen-reader user? If the rest of the page is in Afrikaans, the screen reader can use the Afrikaans voice to read that line. If the rest of the page is in English, it can use the english voice. If the page contains no other text, then the text is in BOTH languages, and it can use whichever the user prefers, because the text means the same thing in either, and the user likely prefers one language over the other. There’s no need to disambiguate here, as evidenced by the fact that in a book, this is not disambiguated, and sighted readers have never, ever had a problem reading it.
It’s important for screen-readers which are often used by people with vision deficiencies. (Braille terminals are expensive, and some people are not visually impaired enough to have learned Braille)
The screen reader can use the
lang=attribute to determine which pronunciation and which accent to use when transforming the text into sounds. “Schadenfreude” has a totally different pronunciation in German and in English where it was borrowed. And most likely, native speaker of both languages would be unable to understand the other pronunciation. So you cannot imagine how difficult it would be for a German person to understand if the text-to-speech software started to read German text as if it was English.