1. 5

    Cool, but the next wave of ad blockers will need a completely novel approach once SSAI (server-side ad insertion) takes off unless we all just collectively reject ad monetized video content.

    DAI (Google’s SSAI solution) is already in what amounts to a prerelease for larger customers

    1. 5

      Could you explain quickly what SSAI is?

      1. 6

        Sure! I will limit my explanation to the bounds of HLS (HTTP Live Streaming) since the concept is the same for both HLS & DASH (Dynamic Adaptive Streaming over HTTP) and these are the two most important ABR (Adaptive Bitrate) content types.

        1. You have a manifest example.m3u8 file that declares a list of where your video fragment files <N>.ts are, this file usually sits somewhere “private”, maybe even encrypted with a key that only the SSAI server knows if the company has enough technical expertise to handle running the infra for it.
        2. Browser asks for example.m3u8 from some URL that the SSAI server sits in front of, the server fetches the actual manifest (or maybe has a local cached version already available) and looks for special places where the manifest declares an Ad can be inserted, SSAI fetches the Ad (bidding/etc) and inserts the resulting .ts files into the example.m3u8
        3. SSAI server sends the resulting spliced example.m3u8 back to the client with a few extra .ts files in it, and updated metadata (this is a big thing I’m glossing over) so it doesn’t break metadata in the browser about video duration, etc.

        Here are some more resources:

        1. 3

          Who controls the SSAI server? Would that be Google in this case and the content is made available to them by the company that owns the page where the video will be displayed? So does that mean that in order to host a ad network that uses SSAI you basically have to proxy all traffic for your customers?

          It seems weird to do the ads on the server since (as I understand it) advertisers don’t trust content providers not to cheat, and that’s why ads are fetched on the client from separate servers (which can then be blocked with relative ease).

          Maybe I just totally don’t understand what’s happening here.

          1. 3

            Advertisers don’t trust content providers in general not to cheat.

            However, Google have been caught ‘cant-believe-its-not-cheating’ multiple times with no impact, and it took years for it (eg putting brands next to KKK vids) to catch up with them on youtube.

            I suspect YT could pull it off and tell advertisers that’s the new deal.

      2. 4

        I think the next wave, already here, really, are service-specific user agents. Instead of cutting out the advertising, they cut out the content and make a new frame for it.

        These take many different forms including websites (archive.is, youtube downloader sites), scripts (youtube-dl), binary apps (Frost, AlienBlue, NewPipe).

        1. 2

          As @whjms noted, unless they are patching the manifest files on the fly to undo the SSAI (possible, but would lead to another type of whack-a-mole) it doesn’t matter how you are showing the content

          1. 1

            Wouldn’t newpipe still have to display the SSAI ads, since the ads are dynamically inserted into the video?

          2. 2

            It’s already taken off. Quite a few of the youtube videos I watch–maybe as many as 50%–are sponsored by an audiobook company or a learning-video company.

            The only solution I can think of to this is a crowd-sourced database of video timestamps to skip between; this is is an impossible-to-complete task which grows ever larger, and it’s open to abuse.

            1. 1

              There’s a machine learning model that was trained to skip sponsorship sections, too, though, personally I’m not so bothered if they were picked by the creator and the creator is getting paid directly and reasonably well for it.

              1. 1

                The leading extension that blocks sponsorships relies on user-submitted times, what’s this machine learning driven one you’ve mentioned? Actually pretty curious about this, I’ve been planning to build an ad-blocker for the TV!

                1. 1

                  It was a recurring neural net trained on the automatic video transcriptions: Reddit thread (and very good intro video); repo.

            2. 2

              My old employer, a big player in the video space, has been doing SSAI for a few years now.

              I never worked in that directly, because I find it gross, but I suspect you could detect differences in encoding between the “content” and “ad” segments.

              1. 2

                That sounds like it would be fun to make. I suspect you’re right, and I would not be surprised if the differences are huge and glaring. On podcasts, which I listen to much more frequently than I watch online video, the differences are often audible. I can detect the ad spots by ear in many cases, just because the artifacts change when they cut over.

                1. 2

                  I bet that you don’t even need to look at the data, per se. My guess is that the primary method for all of this is HLS, where you have a top-level (text) manifest file that lists the different renditions, and each of those URLs points to another manifest that lists the actual video segment URLs. If I were building SSAI without an eye towards adblockers, I would splice the content and the ads at that second manifest level, so the URLs would suddenly switch over from one URL pattern to another. I believe the manifest also includes the timestamps and segment lengths, so you should be able to detect a partial segment just before you switch from content to ad.

                  It’s possible that they’re instead delivering it all as one MP4 stream, but that seems out of favor these days. Or they could do HLS but have segments that bridge the gap from content to ad, but that might involve re-transcoding, and if it didn’t… well, you might see something interesting with keyframes or something, I suppose? I don’t think they’d bother with that anyhow, since it sounds more complicated.

                  1. 1

                    I think most of it is currently based around #EXT-X-DISCONTINUITY declarations

              2. 2

                Does SSAI get to track you across the web? TBH, I don’t care about ads themselves, especially in video (that last bit may be because I just don’t watch all that much video). What aggravates me is the whole surveillance aspect of most current online advertising. By my read, SSAI should neuter the ability to track you across different sites. I’m set to call that flawless victory, if ad supported content is forced to resort to something that can’t track me.

                1. 1

                  They still build it to involve tracking, with JS and cookies and whatnot that all happens before the video stream is requested. I believe if all of that is blocked, you still get ads, just not “retargeted” ones.

              1. 10

                Beat me to the punch in a way far better than I could.

                I do disagree with this point (in the sense that it may be worded a bit strictly for the general case):

                Computers are fast - a lot faster than we can possibly perceive. By some estimates human beings perceive two events as instantaneous at around 100ms. To Drew’s point, this is an eternity for a computer. In that time a modern CPU could execute 10 billion instructions.

                One of the points I wanted to drive home was we really should care about these death-by-a-thousand-cuts sometimes incurred via these abstractions (imo necessary for security and productivity), when they often are the same cut repeated a few hundred times. Users are penalised for this when applied repeatedly (imagine a 100ms delay after pressing a button throughout the lifetime of a program). Of course if you read further along, for program start-up, this delay sure is often worth it.

                In a sense we should be focusing a lot more on program optimisation for the sake of preserving energy. We should be designing our languages so that they’re conducive to optimisation (looking at you C, for the pointer alias semantics which arguably killed Itanium).

                Ironically, one of the benchmarks that I found was that printing Hello World a thousand times uses half the syscalls in Electron (a pretty large hammer for the nail that is ensuring multiplatform UI consistency) as compared to C. :)

                1. 2

                  for the pointer alias semantics which arguably killed Itanium

                  I agree with your general comment. I’ll note that Itanium was largely killed by a combination of no backward compatibility (highest importance) and being a pain for compiler developers. Mainly backward compatibility. Intel previously lost billions on i432 APX and i960 for similar reasons.

                1. 5

                  If I were you, I’d implement the blocking/slowing as a browser extension. You’d have an API available that allows you to interact with stuff on the HTTP/HTTPS level without the need to deal with DNS, TCP, Encryption.

                  It’s less work, but admittedly also less fun :-)

                  1. 4

                    I’m unfamiliar with writing browser extensions, so from my perspective the author has taken the easy way out on this. Do you know easy resources to get started on writing extensions? I’ve seen articles describing how to make Hello World extensions, but it’s the steps after that I’m having trouble taking - which APIs can I use, how should I package this etc.

                    1. 2

                      There certainly are many browser extensions which do this for you already - I use LeechBlock which acts both as a site blocker and a delaying tool. It’s rather quite feature rich, so it pretty much sated my needs for blocking.

                      Of course, there’s a relevant xkcd too - refer to the alt-text..

                      1. 1

                        It’s a good idea. I thought on it while reading. @ac’s advice was to put as much friction as possible between oneself and the addictive sites. Many people have more than one browser. Disabling an extension is usually easy. A network firewall running all connecting traffic through a proxy that enforces this would add much more friction. Even better if you set the machine up where you had to be admin to change anything but a good friend has half the password or something. Like a support group, they won’t let you make that change without giving you lots of shit about it (aka positive, peer pressure).

                        Just brainstorming here…

                      1. 3

                        SRI is a great resource to use if you can. Unfortunately, there exists just enough external libraries which don’t use versioning (i.e. you can’t say scriptname-0.3.1.js instead of scriptname.js which may update), so that if you do use it on those scripts, you can’t guarantee that it will work in perpetuity. The obvious solution is to host that script yourself, but some scripts rely on them being loaded from a particular domain (looking at you utteranc.es!), so it’s not exactly feasible 100% of the time without a lot of effort.

                        Another thing that is updated since the writing of this article (2015), is that you indeed can have error reporting when SRI hashes fail. If SRI hashes are required under Content Security Policy (CSP) rules, any failure will be logged to the CSP reporting location.

                        One last point, is that this is a defense for cross-domain CDNs (e.g. you don’t serve your HTML through the same CDN) - so that if you host your site through Cloudflare, they technically can (and benevolently do as a service) perform rewrites to your site, so this doesn’t protect you from them being malicious there.

                        1. 3

                          I wish there was an easy way to protect unversioned files. But I don’t think there is. At this point it’s more likely SRI gets expanded to more html elements rather than seeing a significant change in the integrity mechanisms.

                          Another thing that is updated since the writing of this article (2015), is that you indeed can have error reporting when SRI hashes fail. If SRI hashes are required under Content Security Policy (CSP) rules, any failure will be logged to the CSP reporting location.

                          The SRI/CSP integration never really shipped in Firefox and Chrome. It’s always been behind a flag and in the end we removed it from Firefox :(

                          1. 1

                            The SRI/CSP integration never really shipped in Firefox and Chrome. It’s always been behind a flag and in the end we removed it from Firefox :(

                            Oh! That’s a shame.

                            According to this it says that Chrome still has it, but it’s out of date.

                            1. 1

                              The problem is and was with interface that don’t allow supplying integrity Metadata.

                              Disallowing non-SRI styles amd scripts practically also disallows module scripts and @import in CSS. Not many were willing to bite that bullet.

                              1. 2

                                Hm, that makes total sense. I’m sure there’d be room for something like annotating CSS imports with SRI hashes, and likewise for JS, but I presume no-one would care enough for a feature that necessitates language changes (and for two languages as well!).

                                1. 1

                                  Yeah, last I checked, the CSS spec is not defined on top of “fetch” (but SRI depends on it). This means the whole SRI spec can’t easily apply to anything internal to CSS.

                                  Module scripts would be doable though.

                        1. 2

                          Is t here any way to protect your site if it’s being served through a CDN? Not just the site’s resources but the pages and everything.

                          1. 5

                            A proxying CDN like Cloudflare? No, they’re a voluntary man-in-the-middle for your website.

                            1. 2

                              If you’re using something like CloudFlare for your pages, then as far as the browser is concerned CloudFlare is your site.

                            1. 3

                              The website won’t let me scroll normally in several different browsers…

                              EDIT: my mouse has to be over the text in order to scroll the website, if it’s in the white space beside it I can’t scroll. This is bad.

                              1. 2

                                Well, even worse - the entire site is written with Vue and doesn’t load without JavaScript enabled. For some bizarre reason, the scrollbar for that div with the content is padded to be obscured outside of view, so it isn’t apparent that it’s scrollable!

                              1. 2

                                Already having lots of fun with it! Unfortunately tilde.town is down for sign-ups right now, they’re in the middle of writing a new tool to help handle sign-ups. :(

                                Pic of it being used on ~town

                                  1. 3

                                    If nothing else (I digress), it provides a substantial list of tools relevant to the solving the problem at hand, and where (and briefly how) they’re used. That alone is an invaluable start for further research into the topic, and a goldmine for a heads-up as to how secure systems can be architected.

                                    1. 2

                                      Like hell. They mention all kinds of tools for readers to look into, how they use some of them, and a bunch of articles and presentations. Can’t see how you equate that to zero content.

                                    1. 1

                                      One that I love is Matthew Rayfield’s. It’s a collection of silly and absurd projects, really. http://matthewrayfield.com/

                                      1. 45

                                        I hope there’s an uproar about the name.

                                        Really shitty move for a giant company to create a competing library with such a similar name to an existing project. Bound to cause confusion and potentially steal libcurl users because so many people associate Google with networking and the internet.

                                        1. 22

                                          I wonder how long it takes for google autosuggest to correct libcurl to libcrurl.

                                          1. 11

                                            Looks like crurl was just an internal working name for the library[0]. They’ve changed it already in their bug tracker to libcurl_on_cronet[1].

                                            [0] https://news.ycombinator.com/item?id=20228237

                                            [1] https://chromium-review.googlesource.com/c/chromium/src/+/1652540

                                            1. 7

                                              Holy shit! It’s with a Ru in the middle instead of a Ur! I actually missed that until I read your comment and reread the whole thing letter-by-letter. Google knows full well that this will cause confusion since they added a feature to chrome for this exact problem. Egregious and horrible.

                                              1. 15

                                                Google knows full well that this will cause confusion

                                                I’m not part of the team anymore and have no connection to this project, but my guess is that some engineers thought it was a funny pun/play on words and weren’t trying to “trick” people into downloading their library. I’m not saying you shouldn’t be careful about perceptions when your company has such an outsized influence, but I highly doubt this was an intentional, malicious act.

                                                1. 6

                                                  I’d bet this is exactly what happened. I’ve given projects dumb working names before release, and had them snowball out of my control before.

                                              2. 2

                                                Honestly, I had to double check that I wasn’t reading libcrule.

                                                1. 2

                                                  Honestly, their lack of empathy here, and the need to extend rather than collaborate indicates in my opinion a concerning move away from OSS. I hope to be corrected though.

                                                1. 6

                                                  Whilst yes, there are not any by-default indicators that the microphones are running on any OS that I know of, there are a lot in what could be considered the second OS of one’s computer - their browser. Both Firefox and Chrome indicate that there are recordings (for both audio and video). Respectively, I know Firefox overlays some icons onto the OS plus an additional indicator in the URL bar, whilst Chrome has an obvious red-recording circle on the offending tab. But let’s not forget that both of these (iirc) explicitly ask when a site requests to use these!

                                                  I am in the covering-webcam camp however, but its simply because the webcam looks ominous! Irrational, perhaps, but it’s near-zero effort from me.

                                                  But I do agree with the sentiment in the thread so far, in that the content of my screen/filesystem is far more interesting to a malicious actor than that of a webcam. I suppose it may just be a basic human instinct of being adverse to being unknowingly watched.