Threads for ngoldbaum

    1.  

      You should try looking at the NumPy internals!

      It’s better than it used to be though…

      1.  

        No!! Not this comment from a NumPy Maintainer!!

        Bad code is everywhere :(

      2. 9

        in case you are curious about whether this post has been cross-posted to HN by the example botfly discussed in the blog—it has been. 🙃

        1. 11

          Glad to see this in the 30.1 release notes:

          Native compilation is now enabled by default.

          It really makes a night and day difference in terms of UI pauses.

          1. 1

            Warning for people who get emacs off of homebrew tho, there was some issue with libgccjit’s formula that means that native compliation was not being enabled there, and I have my doubts about it being the case with this either.

              1. 2

                Can confirm that emacs with libgccjit, and Emacs 30, has been working just fine through emacs-plus for months (at least)

                1. 2

                  I had to brew reinstall gcc libgccjit to make it install the latest Emacs. just fyi.

                  1. 1

                    I use these builds: https://github.com/jimeh/emacs-builds

                    No 30.1 yet but the recent nightly builds are 31.something.

                2. 1

                  while native compilation is fast, it creates another step in my workflow that was just painful enough. I move between boxes and bring my editor with me (I dont use tramp). So I found myself wanting to quickly spin up my editor on VMs. Enter emacs naitive compilation. After going through the lengthy doom install, opening the editor would fire off this native compilation mode which took a long time too.

                  It caused me to look reach for neovim and now that’s what i use on VMs. I still have my precious emacs on a couple machines but that’s about it, which is kind of a bummer.

                  TLDR: neovim can bootstrap in 1/4 or 1/5 the time of emacs. Wish I had a solution..

                  1. 2

                    It’s totally fine to turn off nativecomp (either at build time or run time) in situations where it doesn’t make sense to you.

                    1. 1

                      i’m not sure how to turn off native comp at runtime but I will look into it.

                      The problem with native comp is features dont always work right until they have been compiled. So I want to install and use the editor. Having such a slow install process with both doom and native comp is what’s painful.

                      1. 4

                        The problem with native comp is features dont always work right until they have been compiled.

                        That shouldn’t be the case, do you have any examples?

                        For disabling at runtime, (setq native-comp-jit-compilation nil) before loading anything should do it. You could make it conditional on system-name or similar to select it only on the machines in question. Emacs will still take advantage of anything already native-compiled but won’t try to compile in the background. There is also native-comp-jit-compilation-deny-list for finer control.

                    2. 1

                      how about sshfs / edit files from the host os?

                      1. 2

                        i’ve tried, I suspect the latency is too high. In addition it requires extra setup which i’d rather not deal with. since I have to use teleport and vpn in different occasions, i’d likely end up with hung sshfs procs all the time.

                      2. 1

                        How portable is the eln-cache? (I would guess that same gcc-version and arch would be enough, then you can just copy that folder, mine is like 100M ie. less than what you download in a minute of doom-scrolling)

                        1. 1

                          i’ve tried, and while that works I havn’t figured out all the inputs yet. Most of the VMs I run are different than my desktop or macbook too. A worthy experiment for sure but I have yet to figure it out.

                    3. 14

                      Note that this is an internal optimization and doesn’t mean that the Python language will support tail-call optimization. Still, up to 45% faster on ARM CPUs?!?!

                      1. 4

                        I was kind of excited reading the headline thinking it was exactly that. oh well

                      2. 11

                        I’m one of the main authors of this guide. Our hope is it provides useful advice for supporting free-threaded Python in Python projects that depend on native code. This guide is the product of the last 9 months or so of my team at Quansight Labs actively working on support for free-threaded Python in community packages like NumPy, SciPy, Pandas, scikit-lean, Cython, PyO3, and many others on top of that.

                        I’d like to know if anyone has any material they’d like to see here or any suggestions to improve it.

                        Also if you have any questions about free-threaded Python in general I’m happy to try to answer.

                        1. 3

                          Cool to see the common libraries already having support. I’m interested in hearing success stories where performance was improved with the GIL gone. What kind of workload was it? How large was the improvement?

                          1. 4

                            Basically anything using multiprocessing is leaving a ton of performance on the table.

                            See this NumPy PR where I fixed a multithreading scaling issue in NumPy. The details of the PR are less important than the graph in the description, which shows that threading scales to very high thread count and reaches a steady-state level of performance per thread that beats multiprocessing by more than an order of magnitude.

                            A colleague is working on a webserver that exploits threading to spawn AI agents and run pytorch code in the background, all using just the python threading module. Similar workflows in a GIL-enabled workflow requires something like gevent to work around the GIL.

                            1. 1

                              FWIW the link is broken.

                        2. 2

                          I really want to write an updated version of pytest-leaks, to detect reference counting bugs in NumPy. Sadly the existing plugin is slow and it’s bitrotting.

                          1. 19

                            To me this is actually a major downside. I really don’t like the isolation, because it hides the problematic shared global state.

                            I require my code to be thread-safe and without semantics-changing global state. If some code or dependency is unexpectedly stateful or non-thread-safe, I want it to screw up my tests. Because such problems happen in unexpected places, it’s hard to write dedicated tests for them, but running wide variety of tests in parallel happens to be a good way to serendipitously uncover such issues.

                            1. 3

                              I used this quite profitably to find bugs in free-threaded CPython running the PyO3 tests with cargo test or cargo stress. Rust is able to spawn threads so quickly that it triggered some heretofore unseen races.

                              1. 2

                                This is quite valid, and I’m glad cargo test meets your use case.

                                I am absolutely open to alternative modes of running tests within nextest that meet the quality and reliability bar set by the current process-per-test model.

                              2. 8

                                Often, there are the same links on the front pages of Lobste.rs and Hacker News. I was curious how many, actually?

                                You didn’t ask “why”, but a while ago I noticed one of the main drivers of this is that toddacerdoti will quickly repost many new lobste.rs. They run a service that lets you quickly build web bots in a manner analogous to what yahoo pipes used to do, so I guess it’s built using that.

                                1. 4

                                  Yeah, I’ve posted on here a few times and found it was reposted to HN within a few minutes multiple times. Definitely a few repost bots on there.

                                  1. 3

                                    I complained to dang about that obvious bot only to be told “he’s definitely not a bot”.

                                    I’m okay with repost bots if they aren’t trying to be coy about it and can’t use their enormous amounts of karma to influence votes (not saying this is happening, but it’s a strong motivator)

                                  2. 3

                                    Marginally (ir)relevant aside: if you are wondering about the square quotes, they are mjd’s way of writing scare quotes after I made a jokey suggestion in response to one of mjd’s shitposts

                                    1. 3

                                      I definitely interpreted them as “the ceiling of simple”, which works here too I think!

                                      1. 2

                                        You’re practically famous

                                        1. 1

                                          I’m less famous than my email address 🤓

                                        2. 2

                                          You are making the world a better place.

                                          1. 1

                                            My internet-fried brain interpreted this as “simple” being the bugfixes’ stand name. But that’s just me.

                                          2. 7

                                            Yikes! That’s a nasty kernel bug.

                                            1. 4

                                              I’m proud to say that my name shows up in this document for my contributions related to free-threading support in community packages. Back in 2012 when I was struggling with structuring Python projects - understanding what the heck __init__.py files do - that my name would show up in the Python docs.

                                              1. 10

                                                I’m a NumPy maintainer. AMA about this release or NumPy in general.

                                                1. 2

                                                  No questions, just… thanks! I rarely reach for numpy directly myself outside of silly fun things like advent of code, but I wind up using it a lot just the same (pandas says hello), and the fact that it has been so well maintained for such a long time has made my life easier and given me the ability to get shit done faster. Thank you for keeping this working as well as it has, for so long.

                                                  1. 1

                                                    My biggest gripe with NumPy is afaik, it has no way to do various iterative calculations. I cannot define a kernel to do it, and there is just no way I know to do it. For an example, see https://stackoverflow.com/q/76145123/

                                                    I have an efficient loop-based way to do it using a Python for-loop, but there is no way I know in which I can tell NumPy to do it efficiently over the array.

                                                    If you know of a special way, feel free to post an answer there.

                                                    1. 2

                                                      I’d recommend using numba or cython to write your own JIT-accelerated kernel or C extension. I agree it’s something numpy isn’t very good at.

                                                    2. 1

                                                      A related limitation that I see in NumPy is that afaik, it cannot compile a function containing multiple vectorized operations. In contrast, numba, numexpr, jax, and pytorch have compilation features. The underlying goal is faster execution of the compiled function, with me not needing any intermediate values. It would have to be JIT compiled.

                                                      1. 1
                                                        1. Do you think there are low hanging fruits for better performance in NumPy in the future?
                                                        2. Have you considered adding an option to pre allocate buffers for functions in order to eliminate internal allocations?
                                                        3. Have you considered re implementing some the code using Numba? It might make more code accessible to more people to contribute (Though will make Numba a dependency).
                                                        1. 2
                                                          1. More SIMD in numpy. Support for free-threaded python and better guarantees around thread safety when working with numpy arrays in parallel. More speculatively: it would be cool if NumPy gained some sort of ability to detect chained expressions and calculate more efficient inlined versions of a calculation based on knowing the full expression graph. Something like numexpr but without needing to program in strings. Array API support today opens up the option to write code against numpy on a CPU but then replace numpy with another library like Jax, PyTorch, or CuPy without spending a bunch of effort porting.

                                                          2. You can do that today with the out keyword argument. Unless I’m misunderstanding the question.

                                                          3. NumPy exists to be a low-level CPU-only reference library that builds and runs on a wide variety of platforms. Maybe it grows a way to (optionally?) jit-compile like I alluded to above but I doubt NumPy’s internals will see any near-term changes more revolutionary than replacing custom C templating and codegen with C++.

                                                      2. 1

                                                        I like how this shows that ASan isn’t a panacea and if your buffer overflow happens to be “too far” from the buffer, it won’t catch it. I guess valgrind would. because it can instrument every single malloc/free.

                                                        1. 2

                                                          i believe valgrind fails to catch some other buffer overflows, e.g. if you go off the end of a stack buffer into another buffer from the same stack frame. standard advice is use both (of course valgrind is also much slower)

                                                          1. 2

                                                            Yes. ASan instrumentation inserts red zones between stack objects, so that such buffer overflows can be detected. Valgrind with uninstrumented programs cannot detect this.

                                                            Instrumentation also allows ASan to detect use-after-scope and use-after-return errors.


                                                            Here are some unpublished notes of mine:

                                                            If -fsanitize-address-stack-use-after-scope (default) is enabled, when a variable gets out of scope, its shadow memory is filled with 0xf8 (kAsanStackUseAfterScopeMagic). Accessing the variable will lead to a stack-use-after-scope error.

                                                            Similar to stack-use-after-scope detection, ASan performs stack-use-after-return detection. -fsanitize-address-use-after-return= accepts one of the following values:

                                                            • runtime (default): instrumented code checks a global variable __asan_option_detect_stack_use_after_return to decide whether a fake stack frame is used.
                                                            • always: instrumented code unconditionally creates a fake stack frame. This saves code size.
                                                            • never: don’t detect use-after-return

                                                            When the instrumentation decides to create a fake stack frame, it allocates one using __asan_stack_malloc_{0..10}(uptr ptr, uptr size). The runtime function may return nullptr, in which case a fake stack frame is unavailable, and alloca will be used to allocate the local stack frame; otherwise, stack variables and associated redzones are allocated on the fake stack frame.

                                                        2. 3

                                                          This is neat, I like how it puts the commit graph front and center in the UI.

                                                          1. 1

                                                            This is interesting in that it suggests at this point they only should support Python 3.10 or later, if I understand this correctly, when many open source projects will support all security-supported Python releases, which includes 3.8 and 3.9 as well at the moment.

                                                            (Python is now doing major releases once a year, so this has them supporting 3 major releases at a time.)

                                                            1. 1

                                                              SPEC-0 is only a minimum recommendation. Projects can support more versions if they want to.

                                                              1. 1

                                                                To be fair I was thinking … “you can do that? maybe I should do that!”

                                                            2. 2

                                                              The PR implementing this NEP just got merged, so I thought I’d share here. I learned a ton working on this project but I’m sure there are also things we can improve. One of the features of the design I’m excited about is that all the details of the memory layout and the precise location of the string data on the heap are opaque in the API and can be changed in the future. One thing I think would be really cool is a different layout following the Arrow memory layout for variable width strings when creating an immutable string array. NumPy doesn’t have a concept of immutable arrays yet, but with a no-gil python on the horizon we need to start thinking about thread safety more carefully. Rust bindings would also make more sense if numpy exposed both mutable and immutable array interfaces.

                                                              1. 1

                                                                How much disruption should I expect from this? In terms of a medium sized 10k line Python application that uses Numpy, how much human-hour are we expected to set aside to make sure proper compatibility?

                                                                1. 3

                                                                  From glancing at the changelog, most of the pure-Python changes seem relatively small (renamed and moved APIs and the like), and the usage of ruff automates upgrading a bunch, so probably not too big a deal if you do the dependency management correctly.

                                                                  1. 3

                                                                    If you use the numpy C API, probably a decent amount. If you only use the python API, you should have to deal with minimal changes, the vast majority should be automatically fixable with ruff.

                                                                    Also you can try today using the nightly wheel.

                                                                  2. 2

                                                                    I would add that library maintainers should ensure that projects with a C build dependency on numpy are built with numpy 2. A binary built with numpy 2 should be importable under older numpy versions. If you build with an older numpy, and you might be doing that because we used to suggest building with the oldest support numpy for ABI compatibility, then you’ll produce binaries that are not ABI compatible with numpy 2.

                                                                    1. 10

                                                                      I’m a NumPy developer, currently working on the upcoming release along with a big new feature that will hopefully be merged in time to be included in NumPy 2.0. AMA.

                                                                      1. 3

                                                                        The UTF-8 dtype looks really cool; would make it a lot easier to use e.g. Rust string APIs for bulk data.

                                                                        1. 3

                                                                          Has the NumPy team debated the option of releasing this as “numpy2”, such that unmaintained library dependencies that depend on NumPy will continue to work?

                                                                          My guess is that wouldn’t be feasible because of the binary nature of NumPy - what happens if numpy2 code attempts to manipulate an object that was created by numpy?

                                                                          I’m asking because I was pondering the same issue earlier today with regards to Pydantic 2 (which broke a huge swathe of dependencies, many of which remain incompatible) - I was pointed to this thread where that team had discussed the same issue: https://github.com/pydantic/pydantic/discussions/5402

                                                                          1. 3

                                                                            My guess is that wouldn’t be feasible because of the binary nature of NumPy - what happens if numpy2 code attempts to manipulate an object that was created by numpy?

                                                                            This is actually fine, it’s the other way, NumPy 1 attempting to work with the NumPy 2 ABI, that can’t work. Due to some clever hacks, you can import something compiled against NumPy 2.0 if you have NumPy 1.0 installed, so it is possible for downstream to simultaneously support NumPy 1.x and NumPy 2.0.

                                                                            However that doesn’t help any API changes we made. Although we make API changes in minor releases too, so major version number increment or not unmaintained packages will bitrot either way.