The Hidden Danger in PDFs: How Misconfigurations Can Expose Sensitive Data?

Patryk Bogdan

28 January, 2025

Overview

A recent security audit revealed a critical vulnerability in the way WeasyPrint processes user-provided data for generating invoices in PDF format. The issue occurs because of insufficient input validation, allowing attackers to inject malicious HTML tags that are rendered within the generated PDF. This flaw opens the door to extracting sensitive files from the application’s infrastructure or querying remote resources, posing significant security risks.

Vulnerability Breakdown

The application allows user input, such as names or surnames, to be embedded directly into PDFs via the WeasyPrint engine. However, the lack of proper sanitization permits attackers to inject HTML tags into the input. For instance:

Tags like <b> or <h3> enable attackers to manipulate the formatting of PDF content.
Tags like <link> allow the inclusion of external files as PDF attachments.

Such crafted inputs exploit WeasyPrint’s default configuration, enabling unauthorized access to files on the server or external sources. This capability can be leveraged to extract sensitive system data and perform internal reconnaissance.

Real-World Exploitation

During testing, auditors demonstrated the severity of this vulnerability. Files retrieved included:

Payment operator tokens
Credentials for SMS and email gateways
PostgreSQL database access credentials
Hosting system access credentials
JWT encryption keys

Additionally, the vulnerability allowed querying of internal infrastructure from the server processing the PDFs, highlighting its potential for lateral movement within the application environment.

Finding Vulnerability

Now, let’s move on to the interesting part! To identify this vulnerability and provide proof of its existence, I sent the following:

Screenshot of malicious payload

The following response demonstrates the server’s acceptance of the malicious payload and initiation of PDF generation:

Screenshot of server response

Then I downloaded the PDF file generated by the malicious payload:

Screenshot of PDF download

Below is the extracted Python script (e.g., ex.py) used during the analysis:

Screenshot of Python script

Now I extract an attachment from a PDF document: Using the ex.py script, the attachment is extracted from the PDF:

Screenshot of script execution

After running the script, the following file appears in the directory:

Screenshot of extracted file

The contents of the extracted file are displayed, revealing very sensitive environment variables:

Screenshot of environment variables

This is not all! Then I downloaded second PDF file containing another malicious payload:

Screenshot of second PDF

As you can see the contents of the extracted /etc/passwd file are displayed, confirming unauthorized file access:

Screenshot of /etc/passwd contents

Root Cause

This vulnerability stems from the default configuration of WeasyPrint, which allows unrestricted access to local and external files. Without stringent input validation and output sanitization, the software effectively serves as a bridge for unauthorized data extraction.

To address this vulnerability, organizations should:

Implement Input Validation and Sanitization User-generated data should be rigorously sanitized to strip out any HTML or script tags before being incorporated into documents.
Restrict Resource Access Limit the software’s access to local and external files, allowing only resources essential for its operation.
Environment Hardening a) Segregate sensitive configuration files across different machines. b) Adopt the principle of least privilege for processes involved in PDF generation.

Other Insights

Ex-Employee Private Code Repository Accounts: A Breach Waiting to Happen?

From SPI Sniffing to Keys: Extracting Clevis/BitLocker Secrets from TPM Traffic #HardwareHacking

Symfony Profiler in Production – An Entry Point for Sensitive Data Leaks and Remote Code Execution

Happy to get a call or email
and help!