1. 60
  1.  

  2. 33

    While AVIF has decent compression due to using AV1, its container format (HEIF) is unfortunately a bloated designed-by-committee mess. We looked at implementing it in libavformat in FFmpeg during GSoC 2019, and the conclusion in short was “no”. I might write a blog post on its many failings if there’s interest, but in short: it is not just a still image format. It also supports albums, compositing and a whole slew of other junk. This makes implementing a decoder for it in a project like FFmpeg a monumental task.

    In my opinion it would be vastly superior to just define a FourCC for AV1 and stick it in a BMP file. BMP parsers are common, the format already supports compression. There’s no need to come up with anything new. A similar argument can be made for audio formats, which can just be stuck inside WAV files with an appropriate TwoCC.

    1. 14

      I’d love a bog post that explains it in great detail (the format, the existing ffmpeg software architecture, the assumptions, the goals, the conflicts). I’d also like to hear about non-technical side of this - I think there’s lots of value to be derived from talking about projects that didn’t succeed.

      1. 12

        Out of curiosity, is any container format NOT a mess? I’ve heard people complain about ogg, MP4 and a few others, but nobody seems to dish out much praise anywhere. Container formats in general seem to be a somewhat obscure topic, nobody seems to say much about tradeoffs in them or what makes a good vs. bad design.

        1. 4

          Well, BMP and WAV are fairly simple and find wide use. They have their quirks though, like uncompressed BMP are stored upside-down and lines must be an even number of bytes. WAV only supports constant bitrate. AVI worked well enough before B-frames started being used. Ogg is an absolute joke. ISOBMFF (MOV) has tons of derivatives including MP4, 3GP and HEIF. It suffers from requiring a complete header to decode. Fragmented MP4 fixes that, but of course that’s only MP4. MXF is widely used in the broadcast world, is a huge mess both design wise and for being split over oodles of SMPTE specs. It also happens to be the format I maintain in libavformat.

          1. 3

            is any container format NOT a mess

            This is a very good observation.

            Container formats that implement a ‘database-in-a-file’ with bunch of tables, ‘foreign-key-conventions’, and so on are really really difficult to use (and I cannot even imagine, what it is to implementers, or folks who write conversion utilities).

            I do not know what a proper solution/architecture approach for these are, though. It seems that this model is needed.

            PEM ( https://serverfault.com/questions/9708/what-is-a-pem-file-and-how-does-it-differ-from-other-openssl-generated-key-file )

            PDF

            HDF5

            come to mind.

            1. 3

              The ISO container (MPEG-4 part 14, QuickTime, &c.) is at least a sensible model for a time-synced multi-stream container. It has a lot of cruft in it, though.

              1. 3

                Bink

                From what I’ve heard a significant portion of its value is that you don’t have to deal with any open formats/libraries, all of which are garbage.

              2. 6

                You sure make it sound like an overengineered piece of shit, and it it is, then your blog post (please write it!) would help expose it and limit the damage it can do.

                1. 6

                  HEIF, and thus AVIF, is an unfortunate pile of ISO specs. Each spec in itself isn’t unreasonable, but the sum of them adds up to ridiculous bloat. You need 300 bytes of MPEG metadata to say 1 bit of information that the AVIF has an alpha channel.

                  However, it’s most likely that nobody will implement the full feature set, and we’ll end up with a de-facto AVIF minimal profile that’s just for still images.

                  AVIF-sequence is another bizarre development. It’s a video format turned into image format turned back into worse, less efficient, more complex video format. And Chrome insists on requiring AVIF-sequence over a real AV1 video in <img>.

                  1. 3

                    However, it’s most likely that nobody will implement the full feature set, and we’ll end up with a de-facto AVIF minimal profile that’s just for still images.

                    This is the issue though. Because we can’t claim to have implemented AVIF because someone is going to come along with a composite AVIF some day and go “guise ffmpeg is broken it can’t decode this”.

                    I looked at AVIF-sequence just now, it just sounds like AV1 in MP4 with “avis” in the ftyp atom. Nothing too strange about that.

                  2. 2

                    There’s no need to come up with anything new are there other patent-free high-compression formats that support, as an example, PPTX-> conversion (to an individual file) ?

                    For my needs, being able to stick a slide show into one file (and then being able to reference ‘a page’ within the file, on a client), solves some technical complexities.

                    I might write a blog post on its many failings if there’s interest

                    Oh, and I also join folks who would love to see you write a blog post on this. Implementer’s analysis of AVIF, its pain points, short comings, etc, would be very interesting in shaping community understanding of this.

                    1. 1

                      I think the complexity is being used though. For example, iPhones take burst images compounded into a single HEIF, IIRC.

                      1. 1

                        Is burst images the correct term for this? From what I can see iphones just take an actual video recording. Burst images makes me think of cameras which actually move the shutter but I’m not sure it makes any difference on a phone camera where there are no moving parts.

                    2. 8

                      It’s interesting that AVIF already has multiple independent implementations. AFAIK WebP after 10 years has only libwebp.

                      There’s C libavif + libaom, and I’ve made a pure Rust encoder based on rav1e and my own AVIF serializer.

                      1. 4

                        Do you think that’s related to the standardization process vs. the VP9 code dump approach (IIRC WebP is derived from VP9)?

                        1. 1

                          WebP is derived from an older VP8. That may be the partly the cause, because the world has quickly moved on to VP9, but I’m not really sure.

                          1. 1

                            Wasn’t it patent encumbered?

                            1. 1

                              In the same way as AV1 is: the inventors say no but third parties make vague threats to seed FUD and make sure companies go the safe way and just license MPEG.

                        2. 8

                          I wish they’d included an avif at a similar size as the jpeg used in the first F1 comparisons.

                          The ‘26’ in front of the car is barely visible in anything else than the jpeg, but the jpeg is ~70kb, so would avif at ~70kb match jpeg on that? do better? worse? I can’t tell.

                          1. 6

                            What really stands out to me is how some details are totally unaffected. The red bull sticker looks almost identical on the avif but the 26 which is almost as big becomes a complete smudge.

                            1. 4

                              AVIF has a novel technique of predicting color from brightness. This makes encoding of color cheaper overall, and helps it have super sharp edges of colored areas without any fringing.

                              However, the red-blue “26” text is only a difference in hue, but not brightness, so the luma channel doesn’t help it get predicted and sharpened. The encoder should have been smarter about this and compensated for it.

                              1. 1

                                The markings on the road are another smudge.

                              2. 1

                                Just a bit below, there is the same image as a 20 KB JPEG (to be compared with the 20 KB AVIF).

                                1. 2

                                  Sure, I’ve seen that, and it is neat, but there’s no 70KB AVIF to be compared with the 70KB JPEG, so that I can see whether AVIF is still better at that size.

                                  1. 2

                                    For a lot of things, there optimisation you want is best quality meeting this size / bandwidth goal, rather than lowest size meeting this quality goal. If a 70KiB JPEG meets your size / bandwidth requirements then it would be interesting to see how your quality increases going to a 70 KiB AVIF.

                                2. 6

                                  The fine detail of the road is lost in all of the compressed versions

                                  To my eye, the jpeg and webp have managed to keep some impression of the detail on the road, whereas the AVIF looks like its been attacked with a smudge tool. Like @ethoh, I also wish they’d included an AVIF at a similar size to the jpeg, as I think that their AVIF example, while noticeably smaller, is also noticeably worse quality. (For a meaningful comparison, it would be nice to have several different file sizes with an example of each format at each size.)

                                  The AVIF image kinda reminds me of this paper about vectorizing pixel sprites.

                                  1. 4

                                    I suspect the encoder is low-pass filtering without telling the decoder how to reconstruct the noise. AV1 has this feature called “film grain”, the idea being to parameterize noise, but it’s still a bit manual.

                                    Or, it’s the “keyframe filtering bug”: https://www.reddit.com/r/AV1/comments/igshgw/aom_git_let_the_encoder_check_show_existing/

                                    If I understand correctly: For video, noiseless keyframes are better keyframes (in terms of predicting images that come after), but too much filtering looks bad to humans.

                                    1. 1

                                      Considering it’s a panning shot I’m surprised anyone can see details in the road…

                                      1. 2

                                        Why would panning remove the details from the road? It adds motion blur, but there’s still plenty of detail there (until the AVIF smooths it all out and makes it look like the road is untextured and static relative to the car).

                                        1. 3

                                          Maybe I’m just biased as a photographer, but I automatically exclude blurred images from any consideration of technical comparison - artistically they’re usually fine, as in this case.

                                          1. 2

                                            The motion blur is arguably an important part of the original photo in that it gives a sense of movement. The AVIF compression has completely removed the directional aspect of the motion blur on the road surface.

                                            If the subject was out of focus, you could maybe justify excluding the image on the basis that it is not a good photo to start with, but that’s not the case here. There’s no point doing technical comparisons if they don’t correspond to a meaningful result in terms of human perception.

                                            1. 4

                                              The motion blur is arguably an important part of the original photo in that it gives a sense of movement.

                                              It’s not just the motion blur from panning. The detailed image shows motion blur too, which makes it really hard to compare the quality.

                                              It’s a decent racing car image (if a bit derivative). It’s a bad image for the stated purpose of comparing different compression methods. In fact I wouldn’t be surprised if it’s intentionally chosen to make the comparison to JPG better for AVIF - who knows?

                                              This image could have been chosen instead - https://flic.kr/p/2hgVPG2. It’s a similar subject, but with better light, more details, and has both motion blur (in the wheels) and OOF blur (in the background).

                                    2. 5

                                      Using AVIF to encode images is a nice idea and given it’s an open standard I fully support it. Webp never really took off, especially because Apple didn’t chime in with it until very lately, however, AVIF has a real chance of replacing jpeg (whose successor jpeg2000 never took off) as a general-purpose format for lossy image encoding.

                                      The downsides though, namely due to the fact that AVIF is new (looking at the horrible encoding speed) and still remains a video encoding format, shouldn’t be ignored.