Processing LLM Chats

PII Eraser provides specialized /chat/* endpoints designed to sit between your application and Large Language Models (LLMs) like GPT-5, Claude Opus, or Gemini. These endpoints accept and return messages in the standard OpenAI Chat Completions format, so you can sanitize conversations before they leave your infrastructure and forward the output directly to any compatible LLM provider.

Why Use the Chat Endpoint?

While you could process individual chat messages one by one using the Text API and all alternatives offered by other vendors do, to our knowledge, the Chat endpoints offer some significant advantages:

Improved Accuracy via Conversational Context: The model sees the full conversation history when processing each message. This dramatically improves detection accuracy for ambiguous terms that depend on earlier turns. For example, if the assistant says "Please give me your phone number to put in the email I'm drafting" and the user responds with "4955 4743", PII Eraser uses the earlier context to correctly identify "4955 4743" as PHONE with higher confidence. This is especially important for numerical PII and PCI, which are some of the most sensitive information types.
No Custom Parsing Logic: The endpoints preserve the OpenAI Chat Format (system, user, assistant, or any custom role). The output can be forwarded directly to an LLM provider without writing complex parsing or reconstruction logic.
Selective Role Processing: You can choose to only process specific roles (e.g., user messages only), leaving system prompts and assistant responses untouched. Messages from excluded roles may still be used internally as context for better detection accuracy.
Incremental Processing: The message_start_index parameter allows you to skip messages that have already been processed in previous requests, resulting in faster processing speeds. Earlier messages are still used for context where necessary, but entities are only returned for messages at or after the specified index.

Stateless Design

Similar to OpenAI's API, the PII Eraser API is stateless. Conversation history isn't stored between requests, nor is there any caching, such as a KV Cache. You must send the relevant conversation history in the messages array with every request.

Message Format

Each message in the messages array must contain a role and content field:

{"role": "user", "content": "My name is Anna Schneider."}

The role field is not limited to the standard OpenAI roles (system, user, assistant). Custom roles such as user1, user2, or agent are also valid, which is useful for multi-party conversation scenarios like call center transcripts.

Extra fields are stripped

Only the role and content fields are processed. Any additional fields on a message object (such as reasoning_content) will be removed from the output. If you need to preserve these fields, store them separately and re-attach them after processing.

Counting Tokens

Similar to /text/count_tokens, use /chat/count_tokens to check the token count of a conversation before processing. This is useful for estimating processing time or verifying that the conversation fits within the configured max_tokens limit.

import json
import requests

payload = {
    "messages": [
        {"role": "system", "content": "Your name is Alfred and you are a helpful assistant."},
        {"role": "user", "content": "Hi, my name is Anna and I need you to write me an email"},
    ]
}

r = requests.post("http://localhost:8000/chat/count_tokens", json=payload)
print(json.dumps(r.json(), indent=4))

Response:

{
    "total_tokens": 35
}

Detect

Use /chat/detect when you need to identify PII in a OpenAI Completions API format chat without modifying the message content. This is useful for compliance logging, audit trails, or flagging conversations that contain sensitive data.

Request:

import json
import requests

payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi, my name is Anna and I need you to write me an email"}
    ],
    "entity_types": ["NAME", "EMAIL", "PHONE"]
}

r = requests.post("http://localhost:8000/chat/detect", json=payload)
print(json.dumps(r.json(), indent=4))

Response:

The response provides a list of lists, where each inner list contains all entities detected in the corresponding message. Note that low confidence entities are already removed and aren't returned. Please see the API Reference for further details.

{
    "entities": [
        [],
        [
            {
                "entity_type": "NAME",
                "start": 15,
                "end": 19,
                "score": 0.9768698215484619
            }
        ]
    ],
    "stats": {
        "total_tokens": 30,
        "tps": 2503.13
    }
}

Transform

Use /chat/transform to anonymize a OpenAI Completions API format chat. The response includes both the anonymized messages array that can be sent to cloud LLM APIs and a separate entities list.

All four operators (redact, mask, hash, redact_constant) function identically to their /text/transform counterparts described in Processing Text – Transformation. The operator can be set per-request or as a default in config.yaml, with the request value taking precedence.

Request:

import json
import requests

payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi, my name is Anna and I need you to write me an email"}
    ],
    "operator": "redact"
}

r = requests.post("http://localhost:8000/chat/transform", json=payload)
print(json.dumps(r.json(), indent=4))

Response:

The response provides the anonymized messages array alongside a list of transformed entities per message, with start/end positions referencing the transformed output text. Please see the API Reference for further details.

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hi, my name is <NAME> and I need you to write me an email"
        }
    ],
    "entities": [
        [],
        [
            {
                "entity_type": "NAME",
                "output_start": 15,
                "output_end": 21
            }
        ]
    ],
    "stats": {
        "total_tokens": 30,
        "tps": 1388.51
    }
}

The difference between detect and transform entities is the same as for the text endpoints—see Detect vs. Transform Entities.

Selective Role Processing

In many generative AI workflows, only user inputs need to be anonymized, with system prompts and assistant responses left as-is to preserve functionality. The chat_roles parameter lets you specify exactly which roles to process.

Messages with roles not in the chat_roles list are returned unmodified in the output, and their corresponding entities entry will be an empty list. However, PII Eraser may still use some or all of these messages internally for context to provide more accurate detection.

Request:

import requests

payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful medical assistant that goes by the name of Alfred."},
        {"role": "user", "content": "My date of birth is 15/03/1985 and I live in Rome, can you tell me which hospitals I should consider for complex surgery?"},
        {"role": "assistant", "content": "Certainly, the Gemelli Hospital is well regarded. If you give me your codice fiscale I can check your eligibility."},
        {"role": "user", "content": "Sure it's MRT MTT91 D08F205J."}
    ],
    "entity_types": ["NAME", "ADDRESS", "TAX_ID", "DOB"],
    "operator": "redact",
    "chat_roles": ["user"]
}

r = requests.post("http://localhost:8000/chat/transform", json=payload)

for msg in r.json()["messages"]:
    print(f"{msg['role']}: {msg['content']}")

Output:

system: You are a helpful medical assistant that goes by the name of Alfred.
user: My date of birth is <DOB> and I live in Rome, can you tell me which hospitals I should consider for complex surgery?
assistant: Certainly, the Gemelli Hospital is well regarded. If you give me your codice fiscale I can check your eligibility.
user: Sure it's <TAX_ID>.

Notice that the system and assistant messages are returned unmodified, while in both user messages the enabled entity types have been redacted. Disabling LOCATION allows for the quasi-identifier "Rome" to go through, which in this chat has a very low risk of reidentification.

OpenAI Chat Redaction

The OpenAI chat PII firewall example in GitHub shows how to integrate PII Eraser when using an OpenAI-compatible LLM.

Incremental Processing

Incremental Processing Requires State Management

As PII Eraser is stateless, you must calculate and track message_start_index in your own application.

For long-running conversations, you can use message_start_index to avoid re-processing messages that have already been anonymized. Messages before the start index may still be read for context, but entities are only detected and returned for messages at or after the specified index.

This is particularly useful for latency-sensitive, real-time chat applications where each new user message triggers a processing call.

Request:

import requests

payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful medical assistant that goes by the name of Alfred."},
        {"role": "user", "content": "My date of birth is <DOB> and I live in Rome, can you tell me which hospitals I should consider for complex surgery?"},
        {"role": "assistant", "content": "Certainly, the Gemelli Hospital is well regarded. If you give me your codice fiscale I can check your eligibility."},
        {"role": "user", "content": "Sure it's MRT MTT91 D08F205J."}
    ],
    "entity_types": ["NAME", "ADDRESS", "TAX_ID", "DOB"],
    "operator": "redact",
    "message_start_index": 3
}

r = requests.post("http://localhost:8000/chat/transform", json=payload)

for msg in r.json()["messages"]:
    print(f"{msg['role']}: {msg['content']}")

Output:

system: You are a helpful medical assistant that goes by the name of Alfred.
user: My date of birth is <DOB> and I live in Rome, can you tell me which hospitals I should consider for complex surgery?
assistant: Certainly, the Gemelli Hospital is well regarded. If you give me your codice fiscale I can check your eligibility.
user: Sure it's <TAX_ID>.

PII Eraser processes only message 3 but uses messages 0–2 for context to improve detection accuracy. The entities list will contain empty lists for messages 0–2 and detected entities only for message 3.