Securing applications is not the easiest thing to do. An application has many components: server-side logic, client-side logic, data storage, data transportation, API, and more. With all these components to secure, building a secure application can seem really daunting.
Thankfully, most real-life vulnerabilities share the same root causes. And by studying these common vulnerability types, why they happen, and how to spot them, you can learn to prevent them and secure your application.
The use of every language, framework, or environment exposes the application to a unique set of vulnerabilities. The first step to fixing vulnerabilities in your application is to know what to look for. Today, let’s take a look at 27 of the most common vulnerabilities that affect Python applications, and how you can find and prevent them.
Let’s secure your Python application! The vulnerabilities I will cover in this post are:
- XML external entity attacks (XXE)
- Insecure deserialization
- Remote code execution (RCE)
- SQL injection
- NoSQL injection
- LDAP Injection
- Log injection
- Mail injection
- Template injection (SSTI)
- Regex injection
- XPath injection
- Header injection
- Session injection and insecure cookies
- Host header poisoning
- Sensitive data leaks or information leaks
- Authentication bypass
- Improper access control
- Directory traversal or path traversal
- Arbitrary file writes
- Denial of service attacks (DoS)
- Encryption vulnerabilities
- Insecure TLS configuration and improper certificate validation
- Mass assignment
- Open redirects
- Cross-site request forgery (CSRF)
- Server-side request forgery (SSRF)
- Trust boundary violations
XML External Entity Attacks
XML external entity attacks, or XXE, are when attackers exploit an XML parser to read arbitrary files on your server. Using an XXE, attackers might also be able to retrieve user information, configuration files, or other sensitive information like AWS credentials. To prevent XXE attacks, you need to explicitly disable these functionalities. You can read in detail about how to prevent XXE in Python applications here.
Serialization is a process during which an object in a programming language (say, a Python object) is converted into a format that can be saved to the database or transferred over a network. Whereas deserialization refers to the opposite: it’s when the serialized object is read from a file or the network and converted back into an object. Many programming languages support the serialization and deserialization of objects, including Java, PHP, Python, and Ruby.
Insecure deserialization is a type of vulnerability that arises when an attacker can manipulate the serialized object and cause unintended consequences in the program’s flow. Insecure deserialization bugs are often very critical vulnerabilities: an insecure deserialization bug will often result in authentication bypass, denial of service, or even arbitrary code execution. Learn how attackers can exploit insecure deserialization.
To prevent insecure deserialization, you need to first keep an eye out for patches and keep dependencies up to date. Many insecure deserialization vulnerabilities are introduced via dependencies, so make sure that your third-party code is secure. It also helps to avoid using serialized objects and utilize simple data types instead, like strings and arrays. Finally, you can learn how to deserialize objects safely to prevent insecure deserialization.
Remote Code Execution
Remote code execution vulnerabilities, or RCE, are a class of vulnerabilities that happen when attackers can execute their code on your machine. One of the ways this can happen is through command injection vulnerabilities. They are a type of remote code execution that happens when user input is concatenated directly into a system command. The application cannot distinguish between where the user input is and where the system command is, so the application executes the user input as code. The attacker will be able to execute arbitrary commands on the machine.
One of the easiest ways to prevent command injection is to implement robust input validation in the form of an allowlist. Read about how to implement allowlists to prevent RCE here.
Command injection is also a type of injection issue. Injection happens when an application cannot properly distinguish between untrusted user data and code. When injection happens in system OS commands, it leads to command injection. But injection vulnerabilities manifest in other ways too.
In an SQL injection attack, for example, the attacker injects data to manipulate SQL commands. When the application does not validate user input properly, attackers can insert characters special to the SQL language to mess with the query’s logic, thereby executing arbitrary SQL code. Learn more about how these SQL injection attacks work here.
SQL injections allow attacker code to change the structure of your application’s SQL queries to steal data, modify data, or potentially execute arbitrary commands in the underlying operating system. The best way to prevent SQL injections is to use parameterized statements, which makes SQL injection virtually impossible. Learn about how to use parameterized statements in this article.
Databases don’t always use SQL. NoSQL databases, or Not Only SQL databases, are those that don’t use the SQL language. NoSQL injection refers to attacks that inject data into the logic of these database languages. NoSQL injections can be just as serious as SQL injections: they can lead to authentication bypass and remote code execution.
Modern NoSQL databases, such as MongoDB, Couchbase, Cassandra, and HBase, are all vulnerable to injection attacks. NoSQL query syntax is database-specific, and queries are often written in the programming language of the application. For the same reason, methods of preventing NoSQL injection in each database are also database-specific. You can learn how to prevent NoSQL injection in MongoDB, Couchbase, Cassandra, and HBase here.
The Lightweight Directory Access Protocol (LDAP) is a way of querying a directory service about the system’s users and devices. For instance, it’s used to query Microsoft’s Active Directory. When an application uses untrusted input in LDAP queries, attackers can submit crafted inputs that cause malicious effects. Using LDAP injection, attackers can bypass authentication and mess with the data stored in the directory. You can use parameterized queries to prevent LDAP injection. Find out how LDAP injections work and how to prevent them.
You probably conduct system logging to monitor for malicious activities going on in your network. But have you ever considered that your log file entries could be lying to you? Log files, like other system files, could be tampered with by malicious actors. Attackers often modify log files to cover up their tracks during an attack. Log injection is one of the ways attackers can change your log files. It happens when the attacker tricks the application into writing fake entries in your log files.
Log injection often happens when the application does not sanitize newline characters “\n” in input written to logs. Attackers can make use of the new line character to insert new entries into application logs. Another way attackers can exploit user input in logs is that they can inject malicious HTML into log entries to attempt to trigger an XSS on the browser of the admin who views the logs.
To prevent log injection attacks, you need a way to distinguish between real log entries, and fake log entries injected by the attacker. One way to do this is by prefixing each log entry with extra meta-data like a timestamp, process ID, and hostname. You should also treat the contents of log files as untrusted input and validate it before accessing or operating on it.
Many web applications send emails to users based on their actions. For instance, if you subscribed to a feed on a news outlet, the website might send you a confirmation with the name of the feed.
Mail injection happens when the application employs user input to determine which addresses to send emails to. This can allow spammers to use your server to send bulk emails to users or enable scammers to conduct social engineering campaigns via your email address. Learn how attackers can achieve mail injection and how you can prevent it here.
Template engines are a type of software used to determine the appearance of a web page. These web templates, written in template languages such as Jinja, provide developers with a way to specify how a page should be rendered by combining application data with web templates. Together, web templates and template engines allow developers to separate server-side application logic from client-side presentation code during web development.
Template injection refers to injection into web templates. Depending on the permissions of the compromised application, attackers might be able to use the template injection vulnerability to read sensitive files, execute code, or escalate their privileges on the system. Learn how template injection work and how to prevent them in this post.
A regular expression, or regex, is a special string that describes a search pattern in text. Sometimes, applications let users provide their own regex patterns for the server to execute or build a regex with user input. A regex injection attack, or a regular expression denial of service attack (ReDoS), happens when an attacker provides a regex engine with a pattern that takes a long time to evaluate. You can find examples of these patterns in my post here.
Thankfully, regex injection can be reliably prevented by not generating regex patterns from user input, and by constructing well-designed regex patterns whose required computing time does not grow exponentially as the text string grows. You can find some examples of these preemptive measures here.
XPath is a query language used for XML documents. Think SQL for XML. XPath is used to query and perform operations on data stored in XML documents. For example, XPath can be used to retrieve salary information of employees stored in an XML document. It can also be used to perform numeric operations or comparisons on that data.
XPath injection is an attack that injects into XPath expressions in order to alter the outcome of the query. Like SQL injection, it can be used to bypass business logic, escalate user privilege, and leak sensitive data. Since applications often use XML to communicate sensitive data across systems and web services, these are the places that are the most vulnerable to XPath injections. Similar to SQL injection, you can prevent XPath injection by using parameterized queries.
Header injection happens when HTTP response headers are dynamically constructed from untrusted input. Depending on which response header the vulnerability affects, header injection can lead to cross-site scripting, open redirect, and session fixation.
For instance, if the
Location header can be controlled by a URL parameter, attackers can cause an open redirect by specifying their malicious site in the parameter. Attackers might even be able to execute malicious scripts on the victim’s browser, or force victims to download malware by sending completely controlled HTTP responses to the victim via header injection. More about how these attacks work here.
You can prevent header injections by avoiding writing user input into response headers, stripping new-line characters from user input (newline characters are used to create new HTTP response headers), and using an allowlist to validate header values.
Session Injection and Insecure Cookies
Session injection is a type of header injection. If an attacker can manipulate the contents of their session cookie, or steal someone else’s cookies, they can trick the application into thinking that they are someone else. There are three main ways that an attacker can obtain someone else’s session: session hijacking, session tampering, and session spoofing.
Session hijacking refers to the attacker stealing someone else session cookie and using it as their own. Attackers often steal session cookies with XSS or MITM (man-in-the-middle) attacks. Session tampering refers to when attackers can change their session cookie to change how the server interprets their identity. This happens when the session state is communicated in the cookie and the cookie is not properly signed or encrypted. Finally, attackers can “spoof” sessions when session IDs are predictable. If that’s the case, attackers can forge valid session cookies and log in as someone else. Preventing these session management pitfalls requires multiple layers of defense.
Host Header Poisoning
Web servers often host multiple different websites on the same IP address. After an HTTP request arrives at an IP address, the server will forward the request to the host specified in the Host header. Although Host headers are typically set by a user’s browser, it’s still user-provided input and thus should not be trusted.
If a web application does not validate the Host header before using it to construct addresses, attackers can launch a range of attacks, like XSS, server-side request forgery _(_SSRF), and web cache poisoning attacks via the Host header. For instance, if the application uses the Host header to determine the location of scripts, the attacker could submit a malicious Host header to make the application execute a malicious script:
String scriptURL = "https://" + properties.getProperty("host") + "/script.js";
Learn more about how Host header attacks work here.
Sensitive Data Leaks
Sensitive data leak occurs when an application fails to properly protect sensitive information, giving users access to information they shouldn’t have available to them. This sensitive information can include technical details that aid an attack, like software version numbers, internal IP addresses, sensitive filenames, and file paths. It could also include source code that allows attackers to conduct a source code review on the application. Sometimes, the application leaks private information of users, such as their bank account numbers, email addresses, and mailing addresses.
Some common ways that an application can leak sensitive technical details are through descriptive response headers, descriptive error messages with stack traces or database error messages, open directory listings on the system’s file system, and revealing comments in HTML and template files. You can learn how to prevent data leaks in Python here.
Authentication refers to proving one’s identity before executing sensitive actions or accessing sensitive data. If authentication is not implemented correctly on an application, attackers can exploit these misconfigurations to gain access to functionalities they should not be able to. For more details about how you can configure authentication properly in Python, read this tutorial.
Improper Access Control
Authentication bypass issues are essentially improper access control. Improper access control occurs anytime when access control in an application is improperly implemented and can be bypassed by an attacker. However, access control comprises of more than authentication. While authentication asks a user to prove their identity: “Who are you?”, authorization asks the application “What is this user allowed to do?”. Proper authentication and authorization together ensure that users cannot access functionalities outside of their permissions.
There are several ways of configuring authorization for users: role-based access control, ownership-based access control, access control lists, and more. A good post to reference for implementing access control is here.
Directory traversal vulnerabilities are another type of improper access control. They happen when attackers can view, modify, or execute files they shouldn’t have access to by manipulating file paths in user-input fields. This process involves manipulating file path variables the application uses to reference files by adding the
../ characters or other special characters to the file path. The
../ sequence refers to the parent directory of the current directory in Unix systems, so by adding it to a file path, you can often reach system files outside the web directory.
Attackers can often use directory traversals to access sensitive files like configuration files, log files, and source code. To prevent directory traversals, you should validate user input that is inserted into file paths, or avoid direct references to file names and use indirect identifiers instead, read this tutorial for more information.
Arbitrary File Writes
Arbitrary file write vulnerabilities work similarly to directory traversals. If an application writes files to the underlying machine and determines the output file name via user input, attackers might be able to create arbitrary files on any path they want, or overwrite existing system files. Attackers might be able to alter critical system files like password files or log files, or add their own executables into script directories.
The best way to mitigate this risk is by not creating file names based on any user input, including session information, HTTP input, or anything that the user controls. You should control the file name, path, and extension for every created file. For instance, you can generate a random alphanumeric filename every time the user needs to generate a unique file. You can also strip user input of special characters before creating the file. Learn about these techniques in this post.
Denial of Service Attacks
Denial of service attacks, or DoS attacks, disrupts the target machine so that legitimate users cannot access its services. Attackers can launch DoS attacks by exhausting all the server’s resources, crashing processes, or making too many time-consuming HTTP requests at once.
Denial of service attacks are hard to defend against. But there are ways to minimize your risk by making it as difficult as possible for attackers. For instance, you can deploy a firewall that offers DoS protection, and prevent logic-based DoS attacks by setting limits on file sizes and disallowing certain file types. You can find more detailed steps on preventing denial of service attacks here.
Encryption issues are probably one of the most severe vulnerabilities that can happen in an application. Encryption vulnerabilities refer to when encryption and hashing are not properly implemented. This can lead to widespread data leaks and authentication bypass through session spoofing.
Some common mistakes developers make when implementing encryption on a site are:
- Using weak algorithms
- Using the wrong algorithm for the purpose
- Creating custom algorithms
- Generating weak random numbers
- Mistaking encoding for encryption
A guide to encryption security can be found here.
Insecure TLS Configuration and Improper Certificate Validation
Besides encrypting the information in your data stores properly, you should also make sure that your application in transmitting data properly. A good way of making sure that you are communicating over the Internet securely is to use HTTPS with a modern version of transport layer security (TLS) and a secure cipher suite.
During this process, you need to ensure that you are communicating with a trusted machine, and not a malicious third-party. TLS uses digital certificates as the basis of its public key encryption, and you need to validate these certificates before establishing the connection with the third-party. You should verify that the server you are trying to connect to has a certificate that is issued by a trusted certificate authority (CA) and that none of the certificates in the certificate chain are expired.
“Mass assignment” refers to the practice of assigning values to multiple variables or object properties all at once. Mass assignment vulnerabilities happen when the application automatically assigns user input to multiple program variables or objects. This is a feature in many application frameworks designed to simplify application development.
However, this feature sometimes allows attackers to overwrite, modify, or create new program variables or object properties at will. This can lead to authentication bypass, and manipulation of program logic. To prevent mass assignments, you can disable the mass assignment feature with the framework you are using, or use a whitelist to only allow assignment on certain properties or variables.
Websites often need to automatically redirect their users. For example, this
scenario happens when unauthenticated users try to access a page
that requires logging in. The website will usually redirect those users to the
login page, and then return them to their original location after they are authenticated.
During an open-redirect attack, an attacker tricks the user into visiting
an external site by providing them with a URL from the legitimate site that
redirects somewhere else. This can lead users to believe that they are still on the original site, and help scammers build a more believable phishing campaign.
To prevent open redirects, you need to make sure the application doesn’t redirect users to malicious locations. For instance, you can disallow offsite redirects completely by validating redirect URLs. There are many other ways of preventing open redirects, like checking the referrer of requests, or using page indexes for redirects. But because it’s difficult to validate URLs, open redirects remain a prevalent issue in modern web applications.
Cross-Site Request Forgery
Cross-site request forgery (CSRF) is a client-side technique used to attack other users of a web application. Using CSRF, attackers can send HTTP requests that pretend to come from the victim, carrying out unwanted actions on a victim’s behalf. For example, an attacker could change your password or transfer money from your bank account without your permission.
Unlike open redirects, there is a surefire way of preventing CSRF: using a combination of CSRF tokens and SameSite cookies, and avoid using GET requests for state-changing actions.
Server-Side Request Forgery
SSRF, or Server Side Request Forgery, is a vulnerability that happens when an attacker is able to send requests on behalf of a server. It allows attackers to “forge” the request signatures of the vulnerable server, therefore assuming a privileged position on a network, bypassing firewall controls, and gaining access to internal services.
Depending on the permissions given to the vulnerable server, an attacker might be able to read sensitive files, make internal API calls, and access internal services like hidden admin panels. The easiest way to prevent SSRF vulnerabilities is to never make outbound requests based on user input. But if you do need to make outbound requests based on user input, you’ll need to validate those addresses before initiating the request.
Trust Boundary Violations
“Trust boundaries” refer to where untrusted user input enters a controlled environment. For instance, an HTTP request is considered untrusted input until it has been validated by the server.
There should be a clear distinction between how you store, transport, and process trusted and untrusted input. Trust boundary violations happen when this distinction is not respected, and trusted and untrusted data are confused with each other. For instance, if trusted and untrusted data are stored in the same data structure or database, the application will start confusing the two. In this case, untrusted data might be mistakenly seen as validated.
A good way to prevent trust boundary violation is to never write untrusted input into session stores until it is verified. See an example of this mitigation implemented in Python here.
What other security concepts do you want to learn about? I’d love to know. Feel free to connect on Twitter @vickieli7.
Now that you know how to fix these vulnerabilities, secure your Python application by scanning for these vulnerabilities! ShiftLeft CORE can find these vulnerabilities in your application, show you how to fix these bugs, and protect you from Python security issues.