I’m not sure I agree, but I think it depends on what you consider an abstraction. In this post, the author defines an abstraction from completely freeing you from what’s underneath it. So TCP being reliable means that any unreliability that comes through means the TCP abstraction is leaky. But I think this isn’t quite correct.
Abstract means:
(verb) to consider as a general quality or characteristic apart from specific objects or instances
(noun) something that concentrates in itself the essential qualities of anything more extensive or more general, or of several things; essence.
(adj) thought of apart from concrete realities, specific objects, or actual instances
With these definitions, is TCP a leaky abstraction? No, because TCP is the general solution to how to add reliability on top of an unreliable transport. The fact that TCP doesn’t guarantee that a packet arrives under all conditions is part of TCP, not a leak.
Another example of a leaky abstraction that people like to use is garbage collection pauses. I don’t think this is a leaky abstraction because it is part of the explicit trade-off that GC’s give a developer. The developer can manage memory and possibly get it wrong or they can not manage memory and pay a latency cost later.
An example of a leaky abstraction for me is one that tries to present something as the essential qualities of something but the specifics of underlying implementations spill through. For example, take Python’s `subprocess' tool, which attempts to abstract process management. However, the various options it gives have different meanings on POSIX vs Windows that one has to be aware of:
If closefds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. (Unix only). Or, on Windows, if closefds is true then no handles will be inherited by the child process. Note that on Windows, you cannot set close_fds to true and also redirect the standard handles by setting stdin, stdout or stderr.
This is leaky, because the same code has different semantics depending on if I run it on a POSIX env or a Windows env. subprocess does not abstract process management, it leaks various implementation details through. With TCP, I don’t have to concern myself with if I’m communicating over Etherent or WiFi, those details don’t leak through the interface.
A leaky abstraction is where I have to do something different depending on what implementation I’m using, not when an abstraction has different semantics than I thought it did.
To me, the goal of an abstraction is to, well, abstract away messy details of something underneath and provide a more succinct and coherent view of something. If, on a single platform, you find yourself needing to understand the underlying mechanics of what’s being abstracted, then you have a leaky abstraction.
To you, it sounds like a leaky abstraction is one where you have to know the mechanism, even if it’s consistent. I don’t think this is a good definition because you can always elevate those things to the abstraction. I think a better definition is when abstraction is not consistent when something it is abstracting over changes. In the subprocess example, subprocess is trying to offer a consistent interface for using processes across multiple OS’s. But if the OS varies, the interface is no longer consistent. So it’s trying to make it look like the OS is a constant, but really it’s a variable that you have to be aware of. I don’t think GC is like this.
I completely agree with the idea of a leaky abstraction, but the problem is not with abstractions per se. I think this is a problem with education, or perhaps more accurately curiosity. The programmer that is curious as to how the abstractions they use work will always be more successful than the ones that take the abstractions for granted. And part of this means educating yourself - learning how the abstractions work by digging deeper in the code and implementation details.
I’m not sure I agree, but I think it depends on what you consider an abstraction. In this post, the author defines an abstraction from completely freeing you from what’s underneath it. So TCP being reliable means that any unreliability that comes through means the TCP abstraction is leaky. But I think this isn’t quite correct.
Abstract means:
With these definitions, is TCP a leaky abstraction? No, because TCP is the general solution to how to add reliability on top of an unreliable transport. The fact that TCP doesn’t guarantee that a packet arrives under all conditions is part of TCP, not a leak.
Another example of a leaky abstraction that people like to use is garbage collection pauses. I don’t think this is a leaky abstraction because it is part of the explicit trade-off that GC’s give a developer. The developer can manage memory and possibly get it wrong or they can not manage memory and pay a latency cost later.
An example of a leaky abstraction for me is one that tries to present something as the essential qualities of something but the specifics of underlying implementations spill through. For example, take Python’s `subprocess' tool, which attempts to abstract process management. However, the various options it gives have different meanings on POSIX vs Windows that one has to be aware of:
This is leaky, because the same code has different semantics depending on if I run it on a POSIX env or a Windows env.
subprocessdoes not abstract process management, it leaks various implementation details through. With TCP, I don’t have to concern myself with if I’m communicating over Etherent or WiFi, those details don’t leak through the interface.A leaky abstraction is where I have to do something different depending on what implementation I’m using, not when an abstraction has different semantics than I thought it did.
[Comment removed by author]
To you, it sounds like a leaky abstraction is one where you have to know the mechanism, even if it’s consistent. I don’t think this is a good definition because you can always elevate those things to the abstraction. I think a better definition is when abstraction is not consistent when something it is abstracting over changes. In the
subprocessexample,subprocessis trying to offer a consistent interface for using processes across multiple OS’s. But if the OS varies, the interface is no longer consistent. So it’s trying to make it look like the OS is a constant, but really it’s a variable that you have to be aware of. I don’t think GC is like this.I completely agree with the idea of a leaky abstraction, but the problem is not with abstractions per se. I think this is a problem with education, or perhaps more accurately curiosity. The programmer that is curious as to how the abstractions they use work will always be more successful than the ones that take the abstractions for granted. And part of this means educating yourself - learning how the abstractions work by digging deeper in the code and implementation details.