From following StackOverflow’s Go tag a while, beginners do get confused about this. I think it’s not a bad rule of thumb to leave bytes as bytes until it becomes inconvenient or (due to mutability) confusing/bug-prone, not convert because a string seems inherently right for some kind of content. Some context:
In some languages, like Rust or Python 3, the string type is for Unicode text and bytes is a different thing meant for binary content. This isn’t enforced in Go: you can convert []byte{0xFF}, which isn’t valid UTF-8, to a string. Guidance from the designers also doesn’t seem to discourage using whatever type for whatever content: “A string is, in effect, a read-only slice of bytes”.
Go strings are immutable. That has some consequences:
Converting mutable byte slices to and from immutable strings must copy the content. (Or you’d have two pointers to the same place, one typed as immutable and the other as mutable!) So read bytes -> convert to string -> convert to bytes for output makes 3x the garbage as read -> possibly tweak in-place -> output.
Most operations that modify strings (most everything but substring) need to copy the content, where on a byte slice they might be able to modify in-place. This can lead to O(n^2) perf for repeated concatenations (for that, strings.Builder can help).
The immutability can be handy for preventing some kinds of errors of course: you tend to want to reuse a read buffer for the next read, and retained slices of the original buffer are going to point to new content then. (Sometimes, “I need to make a copy now anyway, might as well make it a string” might apply.) Immutability is also necessary for some things like hashtable keys.
Most facilities for strings are available for byte slices too, e.g. there’s a bytes package to match strings, the regexp package can accept either type, unicode/utf8 decodes bytes similar to how range decodes strings.
Slightly different but related idea: stream when you can. If you don’t do anything tricky to save allocs, building up a big HTTP response in a string/[]byte likely makes more garbage than outputting little bits at a time. So Go’s html/template, for example, takes a Writer instead of returning the whole page as a string or byte slice.
Also if you start with code to do little writes then someday want a big string/[]byte, no problem: pass in bytes.Buffer as the writer. Harder to go the other direction.
The benefits of streaming can also apply other places of your pipeline, not just requests/response/file I/O, e.g. you can read a row from your DB, do something with it, read another row… instead of building up a big table structure in memory. If you’re lucky, you might lazily write an app that can deal with heavy requests without exploding, or at least can keep a lot of simultaneous connections in flight gracefully.
Totally correct. I think, I even mention it somewhere in the post. The thing is, I wanted to make the post applicable to a broader audience coming to Go for the first time. Since a byte slice and a byte array are pretty much interchangeable concepts in this context, I decided to the stick to using byte array everywhere. It’s just what most programmers would know as a term.
Do byte arrays allocate in-place though? That would have a bearing on cache locality for performance or unsafe access. When glossing over details I like to at least know that it’s being done to follow up on later.
I like this. Using byte everywhere can be better for performance. In i.e Spring / Java you may not be able to get to the byte array all the time without going through a String (which can be wasteful).
This article is nearly devoid of any actual content.
Like most other content these days. Three statements, wrapped with tons of fluff. It’s hard to conform to the rules of the SEO machine otherwise.
Thanks for the feedback anyway ;)
From following StackOverflow’s Go tag a while, beginners do get confused about this. I think it’s not a bad rule of thumb to leave bytes as bytes until it becomes inconvenient or (due to mutability) confusing/bug-prone, not convert because a string seems inherently right for some kind of content. Some context:
[]byte{0xFF}
, which isn’t valid UTF-8, to a string. Guidance from the designers also doesn’t seem to discourage using whatever type for whatever content: “A string is, in effect, a read-only slice of bytes”.bytes
package to matchstrings
, the regexp package can accept either type,unicode/utf8
decodes bytes similar to howrange
decodes strings.Slightly different but related idea: stream when you can. If you don’t do anything tricky to save allocs, building up a big HTTP response in a
string
/[]byte
likely makes more garbage than outputting little bits at a time. So Go’shtml/template
, for example, takes a Writer instead of returning the whole page as a string or byte slice.Also if you start with code to do little writes then someday want a big
string
/[]byte
, no problem: pass inbytes.Buffer
as the writer. Harder to go the other direction.The benefits of streaming can also apply other places of your pipeline, not just requests/response/file I/O, e.g. you can read a row from your DB, do something with it, read another row… instead of building up a big table structure in memory. If you’re lucky, you might lazily write an app that can deal with heavy requests without exploding, or at least can keep a lot of simultaneous connections in flight gracefully.
It’s about “byte slices” (
[]byte
), not “byte arrays” ([42]byte
). But to be fair, the distinction doesn’t matter much for the gist of the post.Totally correct. I think, I even mention it somewhere in the post. The thing is, I wanted to make the post applicable to a broader audience coming to Go for the first time. Since a byte slice and a byte array are pretty much interchangeable concepts in this context, I decided to the stick to using byte array everywhere. It’s just what most programmers would know as a term.
Do byte arrays allocate in-place though? That would have a bearing on cache locality for performance or unsafe access. When glossing over details I like to at least know that it’s being done to follow up on later.
I like this. Using byte everywhere can be better for performance. In i.e Spring / Java you may not be able to get to the byte array all the time without going through a String (which can be wasteful).