1. 2

    (1) The memory sizes are some fixed numbers. E.g, 32, 64, 128, etc, not random;

    I had a hard time parse this line. What does “fixed numbers” even mean? Does it mean the numbers have to be compile time constants?

    1. 2

      Sorry for my poor English. Because I use memory size as the key of the hashtable, and the key space of hashtable will affect performance. If user only uses some “fixed” size memory, it will have a better performance. Otherwise, the performance may be downgraded. For example, I use CUDA to do some mathematical computation, and only allocate memory in some size: 8K, 16K, the performance gets a remarkable improvement compared using the original CUDAMalloc* APIs, thanks!

      1. 2

        I still don’t understand what you are saying.

        What is a ’“fixed” size memory?

        Some magic numbers picked by tuning performance? Do I need compile time known constants? Can I give a number at runtime?

        I guess you actually want to do a page aligned allocation. Is this what you are saying?

        1. 2

          What “fixed” size means in this case is that you expect to have many allocations/frees of the same size, i.e., you are allocating and freeing many objects of the same size throughout execution of the program.

          And you very likely do not want to page-align ever allocation of, say, 100 bytes. It would be very inefficient if every allocation of 100 bytes was on a different 4K page.

          1. 1

            Yes, it should be “arbitrary”. Sorry for my poor English.

            1. 1

              I update README, hope this can make clear. But sorry again for my poor English.

            2. 1

              I think “arbitrary” is what you mean rather than “random”.

          1. 1

            Seems like it’s straightforward to read, well done! However, as a general advice, when the functionality is that simple you have to provide unit testing at least so you build up confidence in your own code, not only for yourself but for other people who want to use it.

            1. 1

              Thanks very much for your suggestion! It is a really good habit of adding unit testing even it is just a toy. I will try to add unit testing later, thanks very much!

            1. 5

              Seems that this implements a basic slab cache on top of malloc (or other) using C++ map data structures. Though I think maybe technically a slab cache allocates by dividing up a contiguous block of memory into specifically-sized chunks instead of piecing individual small chunks together in a queue. You can see in your example on your github page that your two 100-byte allocations are not at contiguous memory addresses.

              I don’t know much about the speed of C++ things, but it would be nice to see some speed comparisons to malloc, since I think malloc does some binning/caching of its own instead of immediately returning stuff to the system.

              Also, in allocate_memory(), if _alloc_fun() returns NULL due to underlying allocator (i.e., malloc) failing, it will still increment _used_mem, which will lead to incorrect accounting. It might also cause some problems when the NULL pointer is freed.

              1. 4

                @bio_end_io_t:

                Thanks very much for your comment! I learnt a lot of thing.

                Also, in allocate_memory(), if _alloc_fun() returns NULL due to underlying allocator (i.e., malloc) failing, it will still increment _used_mem, which will lead to incorrect accounting.

                Yes, this is a bug and I have already fixed it, thanks!

                1. 1

                  Freeing a NULL pointer shouldn’t be an issue (at least when using free); the C99 standard draft says that:

                  The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation. If ptr is a null pointer, no action occurs.

                  1. 1

                    Yes that is correct according to the standard. The issue in ump was with the accounting inside ump, but nanxiao fixed that.

                1. 11

                  Note that SMT doesn’t necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores.

                  In case you’re wondering, this refers to OpenBSD’s giant-locked kernel. Some parts of this kernel are now unlocked (e.g. network stack) but for some workloads 2 CPUs can be faster than 3 or more due to lock contention.

                  1. 1

                    Per my understanding, every “physical” CPU can have many cores, and each core can have multiple hardware thread if SMT is supported. So every “hardware thread” is a “logical” CPU. For OpenBSD kernel, does it do special operations according to physical CPU, core and hardware thread? Or just consider “logic” CPU? Thanks!

                    1. 2

                      As far as I know the SMT threads were simply exposed as additional CPUs to the scheduler.

                      1. 1

                        @stsp Thanks for your response!

                        If I understand correctly, disable SMT means cut half the “logical” CPU, right? For example, if the server has one CPU, 2 cores, and every core has 2 hardware threads, in theory, the server has 4 “logical” CPUs. Assume my workload has 4 thread, and every thread is independent and computing-intensive (mostly user-space computation, not involved kernel part, such as syscall, or accessing network, etc.). Currently the workload can occupy the whole 4 “logical” CPUs. But now, if the count of “logical” CPU is halved, and my workload’s 4 thread need to contend for 2 “logical” CPUs. So in this scenario, the workload’s performance should be downgraded.

                        Is it correct? Thanks in advance!

                        1. 3

                          At least when HT was new, it also meant the caches would be halved unless you disabled HT in bios. So if your threads are doing different things they might suffer from it.

                          1. 1

                            As far as I understand, it doesn’t mean that all 4 threads can progress in parallel, it will depend on which unit in the CPU each thread is utilizing.

                    1. 6

                      One variant of RSS that I like, but doesn’t seem to be commonly used, is PSS (proportional set size), which allocates memory usage of shared libraries proportionally to the various processes that are using them. Otherwise a shared library loaded by 5 processes ends up quintuple-counted in each process’s RSS, even though that memory is only used once total. I found that idea from smem, a tool that measures PSS.

                      1. 1

                        Thanks for showing this project!

                      1. 1

                        Great to see this! Are you planning to send a PR so this gets merged into upstream?

                        1. 4

                          It is already merged into upstream, thanks! :-)

                          1. 1

                            Yeah they were very responsive when I was upstreaming Fuchsia compatibility.

                        1. 5

                          This is a nice effort, but one wonders why the author doesn’t want to use vmstat(8).

                          Side note: The author doesn’t seem to be too familiar with OpenBSD and its conventions. The man page was written in man(7), which is deprecated in favor of mdoc(7) on OpenBSD

                          1. 4

                            Thanks very much for your pointing out to use mdoc!

                            Compared to vmstat(8), my simple toy has following differences:
                            (1) Add displaying swap space;
                            (2) Only consider active pages as “used” memory, others are all counted as “free” memory.IMHO, for the end user who doesn’t care the guts of Operating System, maybe this method is more plausible?

                            All in all, I just write a small tool for fun, and thanks very much again for giving pertinent advice!

                            1. 2

                              Agreed. Sometimes you don’t really care about everything vmstat offers. free is dirty neat :)

                              • TIL about mdoc
                              1. 1

                                P.S. After some testing, I modify the calculating free method just now: use free pages as “free” memory, then others are considered as “used” memory.

                              2. 3

                                Thanks for educating me about the distinction: https://github.com/blinkkin/blinkkin.github.com/wiki/man-vs-mdoc

                                1. 4

                                  I’d suggest Practical UNIX Manuals for introductionary reading for mdoc, too: https://manpages.bsd.lv/mdoc.html

                              1. 2

                                Next Year in Christchurch New Zealand! https://twitter.com/linuxconfau2019

                                I look forward to welcoming you in person!

                                1. 1

                                  New Zealand is a beautiful country. Hope I can get the opportunity to take part in linuxconfau2019.

                                  1. 1

                                    While I’m sure there will be some activities associated with the conference, there is a lot more to see and do.

                                    So if you do come, take a bit of vacation leave after the conf so you can do a few trips into the mountains or to the beaches.

                                1. 2

                                  Not really a lot of meat here. @nanxiao, I’m not sure if English is your first language but “cautious” is not really the right word to use here. This post is just describe some rules of the language, there isn’t much caution to be had. Perhaps “Be aware …” would be a slightly better title.

                                  1. 1

                                    @apy Got it! Thanks for your comments!

                                  1. 2

                                    Is it just me, or is the page only using half of the screen width, making it quite hard to read o a mobile device?

                                    1. 2

                                      It’s not just you.

                                      1. 2

                                        I fix the theme, thanks!

                                        1. 1

                                          thanks, much better now

                                      1. 1

                                        You mentioned missing some talks. Is there any official way to watch videos of all the presentations? It’d be nice to see them all recorded and posted as a playlist on YouTube, perhaps in their official channel. I looked around but can’t find a collection of them.

                                        1. 1

                                          I think this is the official youtube channel: https://www.youtube.com/user/CppCon/playlists. It seems the videos for CppCon 2017 are still being uploaded.

                                          1. 1

                                            Thanks. I’ll check again in a week and maybe they will be complete.

                                        1. 1

                                          Ran on macOS just to see what happens:

                                          Architecture:            x86_64
                                          Byte Order:              Little Endian
                                          Total CPU(s):            4
                                          Model name:              MacBookPro11,1
                                          

                                          I appreciate graceful degradation!

                                          1. 1

                                            I update the code, so it can run on macOS now:-). When you have time, you can try it, thanks!

                                          1. 2

                                            Works as intended for me on Intel Pentium and Xeons with FreeBSD.

                                            Feature request: also show frequencies (min, max, current).

                                            1. 1

                                              Both OpenBSD and FreeBSD’ sysctl(3) interface lack mib code to get frequencies. CPUID instruction can’t take effect. So I need to consider other methods, thanks for your feedback!