Introduction to PII Eraser
Secure, High-Performance PII Detection & Anonymization
PII Eraser is a containerized REST API designed to detect, redact, mask, or hash Personally Identifiable Information (PII), Payment Card Industry (PCI) data, and other sensitive entities in text and chat logs.
It serves as a self-hosted alternative to hyperscaler cloud services, offering lower latency & costs, and complete data sovereignty — your data never leaves your infrastructure.
Core Capabilities
PII Eraser supports four transformation operators that control how detected entities are handled. Each operator can be set globally via config.yaml or per-request via the API.
| Operator | Description | Input | Output |
|---|---|---|---|
redact | Replaces entities with a semantic type tag. | Call Stefan Müller | Call <NAME> |
mask | Replaces characters with a configurable symbol. | ID: 181/815/08155 | ID: ############# |
hash | Replaces entities with a deterministic SHA-256 hash. | Stefan Müller | a8b92f1c... |
redact_constant | Replaces all entities with the same static string. | Call Stefan Müller | Call <REDACTED> |
In addition to transformation, PII Eraser offers detection-only endpoints that return entity types, positions, and confidence scores without modifying the text — ideal for analytics, compliance audits, and NER workflows.
Both text and chat endpoints support all four operators, along with customizable entity types, allow & block lists, and confidence thresholds.
Quick Start
You can be up and running in minutes using Docker.
1. Run the container:
2. Send a request:
3. Response:
{
"text": [
"Hello <NAME>"
],
"entities": [
[
{
"entity_type": "NAME",
"output_start": 6,
"output_end": 12
}
]
],
"stats": {
"total_tokens": 7,
"tps": 4718.14
}
}
Why PII Eraser?
Global & Europe-First Localization
Unlike many US-centric solutions, PII Eraser is built with native, deep support for Western European languages and data formats alongside comprehensive English-language coverage:
| Region | Countries |
|---|---|
| DACH | Germany, Austria, Switzerland |
| France & Benelux | France, Belgium, Netherlands, Luxembourg |
| UK & Ireland | United Kingdom, Ireland |
| Southern Europe | Italy, Spain |
| North America | United States, Canada |
| Oceania | Australia |
Country-specific identifiers — such as the German Steuer-Identifikationsnummer, the French Numéro de sécurité sociale, or the Australian Medicare Number — are detected out of the box. No language codes or country codes are required; PII Eraser handles multilingual and mixed-language input automatically.
See Supported Languages and Supported Entity Types for the complete coverage matrix.
Industry-Leading Accuracy
PII Eraser uses the latest transformer technology to detect sensitive entities. This delivers higher accuracy than legacy regex or rule-based detectors — particularly on real-world data that doesn't fit rigid formats or contain explicit PII type descriptors.
Consider the difference when processing natural conversation:
"Yeah, you can reach me at four nine five five four seven four three."
Pattern-based systems often miss PII expressed in natural language. PII Eraser's transformer models understand context and semantics, catching entities that regex-based approaches cannot. PII Eraser is also optimized for long inputs and numerical entity types such as PCI and identification numbers, areas where transformer models usually perform poorly.
LLM & GenAI Ready
PII Eraser provides dedicated /chat/* endpoints that accept and return messages in the standard OpenAI Chat Completions format. Sanitize conversations before they leave your infrastructure and forward the output directly to any compatible LLM provider — no custom parsing or reconstruction logic required.
The chat endpoints leverage full conversational context for improved detection accuracy, support selective role processing (e.g., anonymize only user messages), and offer incremental processing for latency-sensitive real-time applications.
Massive Context Window
PII Eraser supports up to 1 million tokens per API request and features special optimizations to maintain accuracy on larger inputs. Process entire documents, call transcripts, or database exports in a single call — no chunking, no splitting, no reassembly logic. The limit can be raised further via the max_tokens configuration parameter.
Drop-In Presidio Replacement
PII Eraser provides full compatibility endpoints for Microsoft Presidio Analyzer, allowing you to upgrade your detection accuracy and performance without rewriting your application logic and to continue using Presidio Analyzer.
Enterprise-Grade Security
PII Eraser is built for regulated environments:
- Air-Gapped by Design: PII Eraser deploys as a single, stateless container that runs entirely offline. No telemetry, no usage analytics, no external API calls — ever.
- CPU-Only Inference: No GPU or CUDA dependencies, eliminating the management overhead and persistent patching cycles associated with the large software stack required to use GPUs.
- Minimal Attack Surface: Built on Chainguard distroless base images with a minimal dependency tree, targeting zero known CVEs at build time.
- Hardened Runtime: Read-only filesystem, all Linux capabilities dropped, no root access.
For the full security model, see Security. For support channels, response targets, and vulnerability reporting, see Support.
Optimized Compute Performance
Highly optimized for modern x86 architectures with AVX-512 VNNI and AMX instruction sets. A single c8a.xlarge AWS instance (4 vCPUs) delivers over 3,500 tokens/second, scaling to over 5,800 tokens/second on a c8a.2xlarge. See Benchmarks & Hardware Selection for full results, including Fargate serverless benchmarks. PII Eraser also runs natively on serverless platforms like AWS Fargate and Azure Container Instances without specialized instance provisioning.
Documentation Overview
Explore the full documentation to get the most out of PII Eraser.
User Guide
| Section | Description |
|---|---|
| Processing Text | Detect, redact, mask, and hash PII in text strings. |
| Processing LLM Chats | Anonymize and detect PII in OpenAI-format conversations before sending them to LLM providers. Covers conversational context, selective role processing, and incremental processing. |
| Supported Entity Types | Full reference of general and country-specific entity types, including PCI data, government IDs, and financial identifiers. |
| Supported Languages | Supported languages and countries. |
| Customization | Customize detection via allow lists, block lists and more. |
| Presidio Compatibility | Drop-in migration guide for teams currently using Microsoft Presidio Analyzer. |
| Performance Tuning | Concurrency, batching, connection pooling, and CPU selection for maximum throughput. |
Deployment & Installation
| Section | Description |
|---|---|
| Getting Started | Prerequisites and general deployment guidance. |
| Running with Docker | Local and single-host container setup. |
| AWS Deployment | Production-grade CloudFormation reference implementation with ECS Fargate and EC2 support. |
| Other Platforms | Guidelines for Kubernetes and other orchestrators. |
| Benchmarks & Hardware Selection | Hardware selection guide and processing throughput by AWS instance type for EC2 and Fargate. |
| Security | Container hardening, network isolation, and compliance considerations. |
Reference
| Section | Description |
|---|---|
| API Reference | Interactive OpenAPI documentation for all endpoints. |
| Config File Reference | Complete reference for all config.yaml parameters. |
| Third-Party Licenses | Open-source attribution and license notices. |