# Insecure Deserialization
*Serialization* describes the process of writing in-memory code object to a binary form, typically for storage to disk or transmission across a network. *Deserialization* is the opposite process: transforming incoming binary data into an in-memory code object. If your code uses deserialization, you need to ensure it is not deserializing untrusted input: the presents an opportunity for an attacker to inject malicious code into your web-server at runtime.
## Serialization in Python
There are a number of ways to serialize data in Python: the `pickle` module has `dumps(…)` and `loads(…)` functions to write in-memory objects to binary form and read them back in. The NumPy library is frequently used to write and read array data, and the Google’s Protobuf library allows you to serialize objects in a language-neutral way.
Unserializing data using `pickle.loads(…)` function – or using the `shelve` module, which is backed by `pickle` – can result in arbitrary code execution. Any class which defines a method called `__setstate__(state)` will have that method invoked when the data-stream is unpickled. This will allow an attacker to execute code on your server if they you unpickle untrusted content.
Deserization vulnerabilities can also occur when reading in YAML (*Yet Another Markup Language*) files. YAML is a popular file format for storing configuration values, but the default loader in the `yaml` module allows the YAML file to specify which classes YAML files deserialize to using the `!!python/object` tag:
“`python import yamlclass Hero: def __init__(self, name, hp, sp): self.name = name self.hp = hp self.sp = sp # This will return an instance of the Hero class |
## Mitigation
The easiest way to avoid deserialization vulnerabilities is to avoid using serialization altogether. If you need to accept structured data from an HTTP request, XML or JSON are more common formats and less prone to malicious use.
You can securely deserialize YAML files by turning off the support for custom classes using the `yaml.SafeLoader` class:
“`python
# This will return an array with form [‘First’, ‘Second’, ‘Third’], but will |
If you *do* use the `pickle` module, you should ensure your byte or object streams come from a trusted source, deserialize to an expected form, and cannot be tampered with. Since code can be executed during the unpickling process, you should never unpickle data sent from an HTTP request.
To detect tampering of pickled objects, you should generate a digital signature when you write out a byte stream for later use, then verify that signature when reading the objects back in:
“`python import pickle, hmac, hashlib# Only those possessing the secret key will be able to generate a valid signature. secret_key = b’0c07187d-5fd7-486f-a3a2-699200a623a5′ # The signature can be shared publicly, along with the serialized data. # Here we recalculate the signature to check if the data has been tampered with. # A change in signature indicates this data isn’t safe to deserialize. # If the newly calculated signature is the same as the old value, the data has not |
## CWEs
* [CWE-614](https://cwe.mitre.org/data/definitions/614)