RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine. In versions prior to 0.23.0, a low-privileged authenticated user (normal login account) can execute arbitrary system commands on the server host process via the frontend Canvas CodeExec component, completely bypassing sandbox isolation. This occurs because untrusted data (stdout) is parsed using eval() with no filtering or sandboxing. The intended design was to "automatically convert string results into Python objects," but this effectively executes attacker-controlled code. Additional endpoints lack access control or contain inverted permission logic, significantly expanding the attack surface and enabling chained exploitation. Version 0.23.0 contains a patch for the issue.
RAGFlow, an open-source Retrieval-Augmented Generation engine, suffers from a critical vulnerability allowing unauthenticated remote code execution (RCE). A low-privileged user can execute arbitrary system commands via the Canvas CodeExec component, leading to complete system compromise and potential data exfiltration. This flaw stems from the insecure use of eval() on attacker-controlled data, bypassing intended sandboxing.
Step 1: Authentication: An attacker obtains valid credentials for a low-privileged user account on the RAGFlow instance.
Step 2: Payload Injection: The attacker crafts a malicious payload, typically Python code designed to execute arbitrary system commands (e.g., os.system('whoami') or subprocess.run(['cat', '/etc/passwd'])). This payload is designed to be injected into the stdout stream.
Step 3: Code Execution: The attacker uses the Canvas CodeExec component to trigger the execution of the malicious payload. This is achieved by providing the crafted payload as input to the component.
Step 4: eval() Execution: The Canvas CodeExec component executes the attacker-controlled code, which is then parsed by eval().
Step 5: Command Execution: The injected code, now executed by the server, runs the attacker's commands on the host operating system.
Step 6: Privilege Escalation (Potential): Depending on the system configuration and the attacker's payload, further exploitation could lead to privilege escalation or lateral movement within the network.
The vulnerability lies within the Canvas CodeExec component of RAGFlow, specifically in how it handles the output of executed code. The component uses eval() to parse the stdout of a code execution, intending to convert string results into Python objects. However, this implementation fails to sanitize or validate the stdout before passing it to eval(). This allows an attacker to inject arbitrary Python code into the stdout stream, which is then executed by the server. The lack of input validation and sandboxing around the eval() call is the root cause. Furthermore, the presence of additional endpoints with missing or inverted access control expands the attack surface, enabling chained exploitation. The vulnerability is a classic example of an insecure deserialization flaw, where attacker-controlled data is used to control program execution.