I posted this partially because of the new Unicode version, and partially as an answer to people who ask why Zig doesn’t have a built-in Unicode string type.
My argument is that if you want to support Unicode, you have to do so knowingly.
No built-in type can exempt you from that.
My argument is that if you want to support Unicode, you have to do so knowingly.
No built-in type can exempt you from that.
From this comment, it sounds like Swift successfully exempts developers from thinking about Unicode – if they work on non-performance-sensitive programs. Swift’s abstraction over Unicode strings could lead to unexpectedly slow operations on certain strings, so I understand why Zig wouldn’t want that.
To avoid the impression that Zig doesn’t support Unicode at all, I’ll note that though the Zig language doesn’t have a Unicode type, the Zig standard library has a std.unicode struct with functions that perform Unicode operations on arrays of bytes.
Do you know if there are any plans to update std.unicode given the issues raised by the author of Ziglyph in this comment – that graphemes would be a better base unit than codepoints? I only just started trying to write Unicode-aware code in Zig, but after reading about the available libraries, I wish for Zigstr or something like it to replace std.unicode in the standard library. Otherwise, I worry about developers finding std.unicode in the standard library, using it to read strings a codepoint at a time, and thinking they’ve handled everything they need to.
The comments I linked to were left on an issue that was closed because no Zig language changes were needed. Would it be well-received if I opened a new issue about updating the Zig standard library as I described above?
I think that followup comment isn’t quite correct. Swifts strings cannot be indexed using a plain numeric index as in other languages. Instead, they are indexed using the String.Index type which must be constructed using the String instance in question, advanced and manipulated using String index methods. All this ceremony makes it rather obvious that it’s not an O(1) operation.
I added a comment to one of the threads you linked just now, and I will reproduce it here:
@jecolon thank you for your comments. Before tagging 1.0, I will be personally auditing std.unicode (and the rest of std) while inspecting ziglyph carefully for inspiration. If you’re available during that release cycle I would love to get you involved and work with you an achieving a reasonable std lib API.
In fact, if you wanted to make some sweeping, breaking changes to std.unicode right now, upstream, I would be amenable to that. The only limitation is that we won’t have access to the Unicode data for the std lib. If you want to make a case that we should add that as a dependency of zig std lib, I’m willing to hear that out, but for status quo, that is a limitation because of not wanting to take on that dependency.
In summary, std.unicode as it exists today is mainly used to serve other APIs such as the file system on Windows. It is one of the APIs that I think is far from its final form when 1.0 is tagged, and someone who has put in the work to make ziglyph is welcome to go in and make some breaking changes in the meantime.
I posted this partially because of the new Unicode version, and partially as an answer to people who ask why Zig doesn’t have a built-in Unicode string type.
My argument is that if you want to support Unicode, you have to do so knowingly.
No built-in type can exempt you from that.
From this comment, it sounds like Swift successfully exempts developers from thinking about Unicode – if they work on non-performance-sensitive programs. Swift’s abstraction over Unicode strings could lead to unexpectedly slow operations on certain strings, so I understand why Zig wouldn’t want that.
To avoid the impression that Zig doesn’t support Unicode at all, I’ll note that though the Zig language doesn’t have a Unicode type, the Zig standard library has a
std.unicode
struct with functions that perform Unicode operations on arrays of bytes.Do you know if there are any plans to update
std.unicode
given the issues raised by the author of Ziglyph in this comment – that graphemes would be a better base unit than codepoints? I only just started trying to write Unicode-aware code in Zig, but after reading about the available libraries, I wish for Zigstr or something like it to replacestd.unicode
in the standard library. Otherwise, I worry about developers findingstd.unicode
in the standard library, using it to read strings a codepoint at a time, and thinking they’ve handled everything they need to.The comments I linked to were left on an issue that was closed because no Zig language changes were needed. Would it be well-received if I opened a new issue about updating the Zig standard library as I described above?
I think that followup comment isn’t quite correct. Swifts strings cannot be indexed using a plain numeric index as in other languages. Instead, they are indexed using the String.Index type which must be constructed using the String instance in question, advanced and manipulated using String index methods. All this ceremony makes it rather obvious that it’s not an O(1) operation.
I added a comment to one of the threads you linked just now, and I will reproduce it here:
In summary, std.unicode as it exists today is mainly used to serve other APIs such as the file system on Windows. It is one of the APIs that I think is far from its final form when 1.0 is tagged, and someone who has put in the work to make ziglyph is welcome to go in and make some breaking changes in the meantime.