RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine. In versions prior to 0.22.0, the use of an insecure key generation algorithm in the API key and beta (assistant/agent share auth) token generation process allows these tokens to be mutually derivable. Specifically, both tokens are generated using the same `URLSafeTimedSerializer` with predictable inputs, enabling an unauthorized user who obtains the shared assistant/agent URL to derive the personal API key. This grants them full control over the assistant/agent owner's account. Version 0.22.0 fixes the issue.
RAGFlow, a Retrieval-Augmented Generation engine, suffers from a critical vulnerability where the shared assistant/agent URL can be used to derive the owner's API key. This allows an attacker to gain full control of the victim's account, potentially leading to data breaches, unauthorized access, and malicious use of the RAGFlow platform.
Step 1: Victim Creates Assistant/Agent: The legitimate user creates an assistant or agent within RAGFlow and shares it, generating a beta (share auth) URL.
Step 2: Attacker Obtains Shared URL: The attacker gains access to the shared assistant/agent URL (e.g., through phishing, social engineering, or accidental exposure).
Step 3: Token Derivation: The attacker analyzes the shared URL and the structure of the beta token. They then use the predictable inputs and the knowledge of the URLSafeTimedSerializer (or its characteristics) to derive the API key used by the owner.
Step 4: API Key Usage: The attacker uses the derived API key to authenticate to the RAGFlow API, gaining full control over the victim's account, including access to data, models, and potentially other resources.
The vulnerability stems from the use of the URLSafeTimedSerializer with predictable inputs for generating both API keys and beta (assistant/agent share auth) tokens. The core issue lies in the lack of sufficient entropy and unique inputs during token generation. Specifically, the same serializer, likely using a shared secret key and predictable timestamps or other metadata, is used to create both token types. This allows an attacker who obtains a shared assistant/agent URL to reverse-engineer the API key. The root cause is the insecure key generation algorithm, which fails to provide sufficient cryptographic separation between the two token types. The use of a shared secret and predictable inputs makes the tokens mutually derivable, enabling unauthorized access.