You can achieve something similar using -fuzz in ImageMagick. convert -delay 10 input*.png -fuzz 5% +map -layers Optimize output.gif will produce a 10 fps GIF where pixels are constant unless their colors change by at least 5%, and performs global palette optimization on the result.
That’s a great trick, does it work with other animated formats (especially those handling more than 256 colors)? I wish there was a codec optimized for screen casts (meaning here mostly static such as coding, not game screen casts), or codecs for which you could declare regions. So you would have a mask for the face of the speaker that is in a tiny box and the rest would use a different algorithm.
You’d be better off sending the screen and the face as two separate video streams and letting the client render them together. Writing an encoder that tries to do segmenting like this automagically would be a huge pain. I believe H.264 has room in the standard for something like that however. Make sure you use YUV 4:4:4 colorspace, not 4:2:0 or 4:2:2 since those subsample the color information turning colored text into a blurry mess.
As for better screen codecs, there are tons to choose from. ZMBV, dosbox’ capture codec, is one of the more interesting ones. I’ve worked on FFmpeg’s encoder for that. You also have VNC’s set of codecs, which are quite simple. APNG might also be an option.
Assuming you’re doing the face as a picture-in-picture for streaming, try aligning the face to a 32x32 grid (eg: make it 128x128 in the bottom-left corner).
h.265 divides the image into blocks (of up to 32x32 pixels). If you align the sub-stream to that grid boundary you should get fewer encoding artifacts.
You can achieve something similar using -fuzz in ImageMagick. convert -delay 10 input*.png -fuzz 5% +map -layers Optimize output.gif will produce a 10 fps GIF where pixels are constant unless their colors change by at least 5%, and performs global palette optimization on the result.
That’s a great trick, does it work with other animated formats (especially those handling more than 256 colors)? I wish there was a codec optimized for screen casts (meaning here mostly static such as coding, not game screen casts), or codecs for which you could declare regions. So you would have a mask for the face of the speaker that is in a tiny box and the rest would use a different algorithm.
You’d be better off sending the screen and the face as two separate video streams and letting the client render them together. Writing an encoder that tries to do segmenting like this automagically would be a huge pain. I believe H.264 has room in the standard for something like that however. Make sure you use YUV 4:4:4 colorspace, not 4:2:0 or 4:2:2 since those subsample the color information turning colored text into a blurry mess.
As for better screen codecs, there are tons to choose from. ZMBV, dosbox’ capture codec, is one of the more interesting ones. I’ve worked on FFmpeg’s encoder for that. You also have VNC’s set of codecs, which are quite simple. APNG might also be an option.
Thanks a lot for the explanation on the YUV spaces, now I understand why some mkv’s weren’t looking nice with text.
Assuming you’re doing the face as a picture-in-picture for streaming, try aligning the face to a 32x32 grid (eg: make it 128x128 in the bottom-left corner).
h.265 divides the image into blocks (of up to 32x32 pixels). If you align the sub-stream to that grid boundary you should get fewer encoding artifacts.