Insecure Deserialization is a vulnerability which occurs when untrusted data is used to abuse the logic of an application, inflict a denial of service (DoS) attack, or even execute arbitrary code upon it being deserialized. It also occupies the #8 spot in the OWASP Top 10 2017 list.
In order to understand what insecure deserialization is, we first must understand what serialization and deserialization are. We’ll then cover some examples of insecure deserialization and how it can be used to execute code as well as discuss some possible mitigations for this class of vulnerability.
Serialization refers to a process of converting an object into a format which can be persisted to disk (for example saved to a file or a datastore), sent through streams (for example stdout), or sent over a network. The format in which an object is serialized into, can either be binary or structured text (for example XML, JSON YAML…). JSON and XML are two of the most commonly used serialization formats within web applications.
Deserialization on the other hand, is the opposite of serialization, that is, transforming serialized data coming from a file, stream or network socket into an object.
Web applications make use of serialization and deserialization on a regular basis and most programming languages even provide native features to serialize data (especially into common formats like JSON and XML). It’s important to understand that safe deserialization of objects is normal practice in software development. The trouble however, starts when deserializing untrusted user input.
Most programming languages offer the ability to customize deserialization processes. Unfortunately, it’s frequently possible for an attacker to abuse these deserialization features when the application is deserializing untrusted data which the attacker controls. Successful insecure deserialization attacks could allow an attacker to carry out denial-of-service (DoS) attacks, authentication bypasses and remote code execution attacks.
The following is an example of insecure deserialization in Python. Python’s native module for binary serialization and deserialization is called pickle. This example will serialize an exploit to run the whoami command, and deserialize it with pickle.loads().
# Import dependencies import os import _pickle # Attacker prepares exploit that application will insecurely deserialize class Exploit(object): def __reduce__(self): return (os.system, ('whoami',)) # Attacker serializes the exploit def serialize_exploit(): shellcode = _pickle.dumps(Exploit()) return shellcode # Application insecurely deserializes the attacker's serialized data def insecure_deserialization(exploit_code): _pickle.loads(exploit_code) if __name__ == '__main__': # Serialize the exploit shellcode = serialize_exploit() # Attacker's payload runs a `whoami` command insecure_deserialization(shellcode)
It’s quite easy to imagine the above scenario in the context of a web application. If you must use a native serialization format like Python’s
pickle, be very careful and use it only on trusted input. That is never deserialize data that has travelled over a network or come from a data source or input stream that is not controlled by your application.
In order to significantly reduce the likelihood of introducing insecure deserialization vulnerabilities one must make use of language-agnostic methods for deserialization such as JSON, XML or YAML.
Do bear in mind however, that there may still be cases where it is possible to introduce vulnerabilities even when using such serialization formats. Chief among these is XML External Entity (XXE), which is endemic to a variety of XML parsers across a variety of programing languages and third-party libraries. Another such example in Python is when using PyYAML, one of the most popular YAML parsing libraries for Python.
The simplest way to load a YAML file using the PyYAML library in Python is by calling yaml.load(). The following is an simple unsafe example that loads a YAML file and parses it.
# Import the PyYAML dependency import yaml # Open the YAML file with open('malicious.yml') as yaml_file: # Unsafely deserialize the contents of the YAML file contents = yaml.load(yaml_file) # print the contents of the key 'foo' in the YAML file print(contents['foo'])
yaml.load() is not a safe operation, and could easily result in code execution if the attacker supplies an YAML file similar to the following.
foo: !!python/object/apply:subprocess.check_output ['whoami']
Instead, the safe method of doing this would be to use the
yaml.safe_load() method instead.
While the above examples were specific to Python (and in the PyYAML example, specific to a Python library), it’s important to note that this is certainly not a problem limited to Python. Applications written in Java, PHP, ASP.NET and other languages can also be susceptible to insecure deserialization vulnerabilities.
Serialization and deserialization vary greatly depending on the programming language, serialization formats and software libraries used. To such an extent, fortunately, there’s no ‘one-size-fits-all’ approach to attacking an insecure deserialization vulnerability. While this makes the vulnerability harder to find and exploit, it by no means makes it any less dangerous.