7 minutes
Exploring Python Code Execution Patterns for Fun and Profit
Recently, I was looking at the Semgrep output for an open source project and I saw an interesting finding about potential arbitrary code execution. The code essentially ran an eval
but the interesting part was converting Python code to an abstract syntax tree, compiling it to bytecode, and evaluating it. This made me curious about other unsafe ways to write Python code that could lead to code execution. In this article, I will discuss a few patterns that I learnt about through my limited research.
Notes:
- This is by no means a complete list of all patterns that can lead to arbitrary code execution in Python.
- Some of these examples are contrived, and might not make any sense in the real world. I merely use them to demonstrate vulnerable code patterns.
Basic Patterns
Python provides many ways of running system commands or arbitrary code in its os
and subprocess
modules. Some examples include:
os.system
,os.popen
,os.execve
subprocess.Popen
,subprocess.run
,subprocess.check_call
,subprocess.check_output
While subprocess
functions are designed to be safer than os
functions, allowing code execution via untrusted input should always be avoided. Some examples of untrusted inputs leading to remote code execution can be seen in these vulnerabilities reported against Roxy-WI. Here, user input is directly concatenated to commands used in calls to os.system
, subprocess.Popen
and ssh.exec_command
.
It is also possible to run arbitrary Python code using eval
and exec
.
eval
and exec
with Abstract Syntax Trees
eval
and exec
require a string, bytes or code object. There is also compile
which can compile code into bytecode. The compiled bytecode can then be used with eval
or exec
. However compile
requires a string, bytes or AST object. Therefore, just by themselves one cannot eval
or exec
a function like this:
eval(my_function)
TypeError: eval arg 1 must be a string, bytes or code object
Although I can’t think of any good reasons for evaluating a function like this, someone might have some use case. They can then convert the function to an AST first, compile it to bytecode, and evaluate or execute it as needed. If a malicious user can control what function is being evaluated or executed, they will achieve code execution.
Code example
def eval_node(func):
tree = ast.parse(textwrap.dedent(inspect.getsource(func)))
for node in ast.walk(tree):
if isinstance(node, ast.Call):
try:
return eval(compile(ast.Expression(node), "fl", "eval"))
except Exception:
pass
return None
def my_function():
subprocess.check_output(["id"]).decode("utf-8").strip
print(eval_node(my_function))
Here, my_function
is parsed into an AST, which is then compiled to bytecode before being evaluated.
Recursive attribute lookup
I learnt about this pattern while reading a CVE report for Celery by Calum Hutton. I would highly recommend reading the original article, but I will summarize the finding here.
Python allows performing recursive lookups on objects to get their attributes or sub-attributes. This means it is possible to traverse from one module to another, and use functions present in the other module, if the first module imports the subsequent module(s). This also means if an application exposes functionality that allows user controlled object traversal, the user might be able to manipulate the traversal and get references to functions such as os.system
, therefore acheiving arbitrary code execution. (I am using os.system
as example, but this would apply to something like subprocess.run
too.)
Not every module imports the os
module, so this pattern only affects modules import os
and thus have os
as an attribute . Some examples of such modules are:
If you are interested in finding more modules, you can look at the Python sourcecode where the modules are defined and check if the module imports os
.
getattr(pathlib, "os")
<module 'os' from '/Users/user/.pyenv/versions/3.9.5/lib/python3.9/os.py'>
getattr(shutil, "os")
<module 'os' from '/Users/user/.pyenv/versions/3.9.5/lib/python3.9/os.py'>
getattr(random, "_os")
<module 'os' from '/Users/user/.pyenv/versions/3.9.5/lib/python3.9/os.py'>
getattr(posixpath, "os")
<module 'os' from '/Users/user/.pyenv/versions/3.9.5/lib/python3.9/os.py'>
Code example
This snippet is adapted from Celery source code presented in the original article.
def code_exec(d):
_module = d["module"]
_type = d["type"]
try:
cls = sys.modules[_module]
for name in _type.split("."):
cls = getattr(cls, name)
except Exception:
pass
_msg = d["message"]
try:
if isinstance(_msg, (tuple, list)):
d = cls(*_msg)
else:
d = cls(_msg)
except Exception:
pass
d = {
"module": "os",
"type": "system",
"message": "id",
}
The above snippet will lead to running os.system("id")
. First cls
is instantiated as os
using sys.modules
. It is then updated to the os.system
function via recursive lookup using getattr
. Finally code is executed when cls(_msg)
is evaluated.
CVE-2023-33733
CVE-2023-3373 is a CVE in the reportlab
Python library. This was reported by Elyas Damej, who has published a great write-up describing the vulnerabilties and its technical details. This is a very interesting CVE since it combines recursive attribute lookup and using eval
to execute arbitrary code.
I would recommend reading the original write-up, but I’ll summarize it here. The reportlab
library allows creating PDFs using Python. In 2019, a CVE was discovered which allowed remote code execution through the color
HTML tag which is passed to eval
without proper sanitization. To fix this rl_safe_eval
sandbox was implemented where all Python builtins functions are removed and many builtin functions are overriden to prevent access to dangerous functions that could lead to code execution. This sandbox was bypassed by creating a new class Word
which is specifically crafted to bypass the checks in the sandbox. Following this, the __globals__
attribute of Word
is accessed and used to call os.system
Pickle
In Python, the pickle
module is used for serializing and deserializing objects. Serialization refers to the process of converting objects in memory to a byte stream. Deserialization is the reverse process, where the byte stream is converted back into objects in memory. Serialization is performed by pickle.dump
/ pickle.dumps
and deserialization is performed by pickle.load / pickle.loads
.
Python allows specifying how an object should be pickled by using the __reduce__
method. The __reduce__
function can return a tuple which represents callable code along with arguments to the callable code. When this object is deserialized, Python will run the callable code in the object’s __reduce__
method. This gives one the ability to create an object that can lead to code execution when deserialized.
Code example
import pickle
class Payload:
def __reduce__(self):
import os
return (os.system, ('whoami',))
serialized = pickle.dumps(Payload())
pickle.loads(serialized)
When an object of the class Payload
is deserialized using pickle.loads()
, the __reduce__
function is called. This function returns callable code os.system('whoami')
, which is then executed.
This behavior of pickle
has led to many CVEs. Some examples:
Forward References and typing.get_type_hints
I learnt of this technique from a Stack Overflow post about dangers of unpickling untrusted data.
A forward reference is a reference to a variable, function, or class that is defined later in the code. One place where forward references can be used is with typing.get_type_hints
.
From documentation: typing.get_type_hints
returns a dictionary containing type hints for a function, method, module or class object. In addition, forward references encoded as string literals are handled by evaluating them in globals and locals namespaces.
This behavior can be misused to run arbitrary code, by specifying the code as a forward reference. Setting the __anotations__
attribute of an object to such a forward reference does the trick.
Code example
class Payload(object):
def __init__(self):
self.__annotations__ = {"x": """eval('__import__("os").system("ls")')"""}
p = Payload()
print(typing.get_type_hints(p))
Here, we specify a forward reference which imports os
and calls os.system
It is possible that typing.get_type_hints
is used internally by other Python functions, which could make these functions also susceptible to similar misuse. The Stack Overflow post mentions functools.singledispatch
which has inner function register
. The register
function calls typing.get_type_hints
. I tried using the above Payload
class with functools.singledispatch
but was not able to achieve code execution. Therefore, I leave this as an exercise to the reader.
While researching for this article, I learnt many things about Python internals. I found this quite interesting, and therefore, plan to look more into Python internals - and research other unsafe patterns that could lead to security issues. I also plan to use what I learn to look for security issues in open source code. Only time will tell how successful I am in these endeavors!
Thanks for reading!