Hi Crustaceans, this is a project I’ve been working on and off for the past months. It’s been working for me and I thought it was time to share in case anyone has a similar need for a Python sandbox of sorts. I would love feedback on it :D
It seems that a vast amount of Python’s functionality has been sacrificed, and the resulting sandbox is possibly correct but very limited.
Have you checked out the PyPy sandbox? Their sandboxing technique is more akin to capability-style isolation, and permits the full range of Python’s expressive power.
def ord(i, l=[[]]):
while len(l) <= i: l.append(l[:])
return l[i]
This generates an exponentially-sized tree of linear depth and width. We can cleanly sneak under the size limits now; this snippet exhausts my laptop’s memory without a problem:
Great point, I was hoping for some holes to be exposed. I will need to limit total nodes evaluated and total scope data usage…
Pypy-sandbox is excellent, I’ve used it in the past. While you can limit memory usage, the pypy-sandbox also doesn’t protect against infinite loops. Also communication is only stdin/out, which means scope is unavailable, and some sort of serialization is needed for data and exceptions. Also pypy startup time is in 100s of ms which means calls to user functions must be asynchronous and require a queue or offline use.
Update: I added some checks for total amount of memory used (in bytes) and for the total count of expressions evaluated.
Can you think of any other ways to use up resources or time?
Here we can limit a maximum scope size (included parent scope)
In [12]: sneklang.MAX_SCOPE_SIZE
Out[12]: 10000
In [14]: sneklang.snek_eval("""
...: def ord(i, l=[[]]):
...: while len(l) <= i: l.append(l[:])
...: return l[i]
...:
...: len(ord(10000))""")
[...]
ScopeTooLarge: Scope has used too much memory at line: 5, column:0
Or by number of nodes evaluated
In [19]: sneklang.snek_eval("""while 1: 1""")
---------------------------------------------------------------------------
[...]
TooManyEvaluations: This program has too many evaluations at line: 1, column:9
Although there is no plan for interactive debugging, there is a small place where sneklang is better than normal python: Coverage.
In [1]: import sneklang
In [9]: source = """
...: a = True
...: b = a or "not truthy"
...: """
In [11]: coverage = sneklang.snek_test_coverage(source)
In [13]: print(sneklang.ascii_format_coverage(coverage, source))
Missing Str on line: 3 col: 9
b = a or "not truthy"
---------^
87% coverage
There is something very import demonstrated here.
In normal python coverage, logical operators are not counted as separate branches. But above, you can see the boolean shortcut is measured, and you are alerted that all possible conditions have not been tested.
In normal python boolean shortcuts would have to be rewritten to their equivalent if statements to get proper coverage measurement.
I am Danish and my brain parsed the name sa SNE-KLANG – literally: SNOW-TIMBRE – which made me curious. I was a bit disappointed to find SNEK-LANG… :wink:
I’m always kind of wary of language VM that wasn’t designed for sandboxing being used as a sandbox, even after being restricted to a subset. I’ve seen a few attempts at this that at first looked like using safe restriction, but after a bit of fuzzing or vuln research in the VM, there end up being many edge cases of memory corruption that doesn’t matter when Python is used as an application, but that led to disastrous consequence when used as a sandbox VM.
In the end, if you want to safely run untrusted code, it’s probably better to either use well tested and designed from the ground-up for sandbox (e.g. Lua) or use proper sandboxing patterns (e.g. secomp, pledge)?
I appreciate the quest to find a suitable safe subset of Python. For language design that works.
However as you say, the implementation path for the interpreter is not a safe approach. If you rely on finding and patching holes you will never be done. A safe platform starts as something so simple that it is obviously safe. Then you extend it until it becomes useable.
Have you looked at Starlark already? Google needed a configuration language for Bazel and implemented a subset of it as well, probably with different design constraints.
Cool experiment.
It seems that a vast amount of Python’s functionality has been sacrificed, and the resulting sandbox is possibly correct but very limited.
Have you checked out the PyPy sandbox? Their sandboxing technique is more akin to capability-style isolation, and permits the full range of Python’s expressive power.
The sandbox is trivally hung:
Let us generate the Von Neumann ordinals:
This generates an exponentially-sized tree of linear depth and width. We can cleanly sneak under the size limits now; this snippet exhausts my laptop’s memory without a problem:
Great point, I was hoping for some holes to be exposed. I will need to limit total nodes evaluated and total scope data usage…
Pypy-sandbox is excellent, I’ve used it in the past. While you can limit memory usage, the pypy-sandbox also doesn’t protect against infinite loops. Also communication is only stdin/out, which means scope is unavailable, and some sort of serialization is needed for data and exceptions. Also pypy startup time is in 100s of ms which means calls to user functions must be asynchronous and require a queue or offline use.
Update: I added some checks for total amount of memory used (in bytes) and for the total count of expressions evaluated.
Can you think of any other ways to use up resources or time?
Here we can limit a maximum scope size (included parent scope)
Or by number of nodes evaluated
I imagine you have no plans for a debugger and breakpoints? In other words, no step on snek?
Although there is no plan for interactive debugging, there is a small place where sneklang is better than normal python: Coverage.
There is something very import demonstrated here.
In normal python coverage, logical operators are not counted as separate branches. But above, you can see the boolean shortcut is measured, and you are alerted that all possible conditions have not been tested.
In normal python boolean shortcuts would have to be rewritten to their equivalent
if
statements to get proper coverage measurement.There’s also https://github.com/keith-packard/snek
I am Danish and my brain parsed the name sa SNE-KLANG – literally: SNOW-TIMBRE – which made me curious. I was a bit disappointed to find SNEK-LANG… :wink:
You may be interested in Starlark.
I’m always kind of wary of language VM that wasn’t designed for sandboxing being used as a sandbox, even after being restricted to a subset. I’ve seen a few attempts at this that at first looked like using safe restriction, but after a bit of fuzzing or vuln research in the VM, there end up being many edge cases of memory corruption that doesn’t matter when Python is used as an application, but that led to disastrous consequence when used as a sandbox VM.
In the end, if you want to safely run untrusted code, it’s probably better to either use well tested and designed from the ground-up for sandbox (e.g. Lua) or use proper sandboxing patterns (e.g. secomp, pledge)?
Yes, you should be wary, here be dragons
I appreciate the quest to find a suitable safe subset of Python. For language design that works.
However as you say, the implementation path for the interpreter is not a safe approach. If you rely on finding and patching holes you will never be done. A safe platform starts as something so simple that it is obviously safe. Then you extend it until it becomes useable.
Have you looked at Starlark already? Google needed a configuration language for Bazel and implemented a subset of it as well, probably with different design constraints.