I want to write a reply post using https://github.com/carlmjohnson/flowmatic but who knows if I’ll get around to it. This is a really common pattern and it’s cool that it can be written but it’s too much boilerplate. It should be easier (and with my library, it is 😊).
You should. If you don’t, I might (though I’d have to learn to use it first). I was thinking as I was writing this article that it’s only a matter of time before some library bundles all this boilerplate up into a nice interface using generics. I’ll have to spend some time learning your library.
I started looking at the repo, and as a sanity check, I added a benchmark then ripped out concurrency, and did the benchmark again. It could be that I did something wrong in my tests, but I found that things went slightly faster with no concurrency at all. It’s pretty hard to get a speed up on pure CPU bound operations because the overhead of scheduling is so high.
I’ll admit that I need better benchmarks, but I do have some basic timing information on the indexing phase of a directory with ~900 images (everything from lorem picsum). With a single worker it takes my laptop 3m 43s, with two workers it takes 2m 9s (I didn’t bother going higher–my laptop only has two cores). I wouldn’t call this test purely CPU bound, since a lot of the time goes into loading images.
Of course, that’s just indexing, you’re probably right about the swapping phase. Caching resized tiles (even if it had to be on disk) would probably have a greater effect on performance.
To be clear, I also found that it was faster to run 2 workers rather than 1. I think it leveled off by 8 workers. What got me the best performance though was to rip out the channels and just pass slices from one function to another instead. I was just looking at the testdata directory though, not a huge amount of data.
I think it’s worth adding a benchmarking suite of some kind and figuring out empirically if you get better performance from having the concurrency just in one place but another, eg to read the files from disk but not to do the tiling.
It’s a fun project because there are a lot avenues to explore. :-)
This has been a very fun project. I wrote about the concurrency, but the color extraction and matching was probably more fun to work out (it probably wouldn’t be much fun to read about though).
You’re absolutely right about the benchmarks. It’s on my list.
But I don’t get the same behavior you do when I remove the concurrency code. It might have shaved off a few seconds from the workers=1 case, but it’s hard to say without better benchmarks because there’s some variation in my testing anyway.
In addition to being a neat looking library, the flowmatic vs. stdlib examples in the flowmatic README are a nice little primer/refresher on how to approach these problems w/ the stdlib.
I want to write a reply post using https://github.com/carlmjohnson/flowmatic but who knows if I’ll get around to it. This is a really common pattern and it’s cool that it can be written but it’s too much boilerplate. It should be easier (and with my library, it is 😊).
You should. If you don’t, I might (though I’d have to learn to use it first). I was thinking as I was writing this article that it’s only a matter of time before some library bundles all this boilerplate up into a nice interface using generics. I’ll have to spend some time learning your library.
I started looking at the repo, and as a sanity check, I added a benchmark then ripped out concurrency, and did the benchmark again. It could be that I did something wrong in my tests, but I found that things went slightly faster with no concurrency at all. It’s pretty hard to get a speed up on pure CPU bound operations because the overhead of scheduling is so high.
I’ll admit that I need better benchmarks, but I do have some basic timing information on the indexing phase of a directory with ~900 images (everything from lorem picsum). With a single worker it takes my laptop 3m 43s, with two workers it takes 2m 9s (I didn’t bother going higher–my laptop only has two cores). I wouldn’t call this test purely CPU bound, since a lot of the time goes into loading images.
Of course, that’s just indexing, you’re probably right about the swapping phase. Caching resized tiles (even if it had to be on disk) would probably have a greater effect on performance.
To be clear, I also found that it was faster to run 2 workers rather than 1. I think it leveled off by 8 workers. What got me the best performance though was to rip out the channels and just pass slices from one function to another instead. I was just looking at the testdata directory though, not a huge amount of data.
I think it’s worth adding a benchmarking suite of some kind and figuring out empirically if you get better performance from having the concurrency just in one place but another, eg to read the files from disk but not to do the tiling.
It’s a fun project because there are a lot avenues to explore. :-)
This has been a very fun project. I wrote about the concurrency, but the color extraction and matching was probably more fun to work out (it probably wouldn’t be much fun to read about though).
You’re absolutely right about the benchmarks. It’s on my list.
But I don’t get the same behavior you do when I remove the concurrency code. It might have shaved off a few seconds from the workers=1 case, but it’s hard to say without better benchmarks because there’s some variation in my testing anyway.
I don’t know how much time you want to spend on this, but here’s the exact code I ran: https://github.com/pboyd/mosaic/tree/no-concurrency I DM’d you a link to the images I’ve been testing with.
In addition to being a neat looking library, the flowmatic vs. stdlib examples in the flowmatic README are a nice little primer/refresher on how to approach these problems w/ the stdlib.
I’d read that