Metadata Isolation
Our SaaS never sees your actual row-level data. Only schema metadata and job statuses are tracked.
- Auth & RBAC
- Job scheduling
- Compliance logs
- Policy & metadata
Privacy-First Test Data Platform
Ark discovers PII, masks sensitive fields, and provisions safe environments for engineering and security teams.
Platform
Discover -> Mask -> Subset -> Provision
Deployment
Control plane in SaaS, execution inside your network
Outcome
Safer test databases for CI, QA, and sandbox work
Supported source databases
Ark ships with first-class PostgreSQL and MySQL support today. We only list engines that are available in production.
Additional engines are on the roadmap — ask us if yours is a priority.
Direct Contrast
Why engineering teams are replacing manual scripts and heavy database copies.
Restoring terabytes of production dumps into staging. Expensive cloud costs, slow refreshes, and insecure exposure.
Writing custom SQL scripts that break on schema changes, leaving unmasked data exposed.
Uploading raw database rows to external SaaS runtimes for classification, violating privacy guidelines.
Deploy small, consistent slices (e.g. 5% of users). Quick to load, easy to manage, zero schema breakage.
Automatic PII discovery inside schemas, free text notes, and nested JSON payloads.
All data transformations run 100% locally via the VPC Agent. SaaS control plane only manages orchestration.
Feature Catalog
Ark is built to serve platform engineers, security operators, and compliance officers alike.
Slice large databases while maintaining foreign key relationships so tests never crash on missing relations.
Seamlessly trigger masked ephemeral test databases directly in your CI/CD pipeline or local developer shell.
Programmatically automate test data provisioning directly inside internal platforms or custom developer portals.
Spin up temporary Postgres, MySQL, or Mongo targets using local containers and populate them with safe subsets.
Developer Workflow
Start with ark-cli for fast operator execution, then use Go or JavaScript SDKs when provisioning belongs inside platform code, test harnesses, or internal tools.
# authenticate onceark-cli login \ --api-url "https://control.ark.dev" \ --api-key "$ARK_API_KEY" \ --tenant-id "$ARK_TENANT_ID" # inspect available source configsark-cli configs list # provision a masked ephemeral databaseark-cli testenvs create \ --config "550e8400-e29b-41d4-a716-446655440000" \ --waitControl Panel
Browse the real product surfaces used to manage configs, policies, jobs, and prepared environments.
Security Architecture
Ark keeps orchestration centralized while execution stays close to the source data inside your environment.
Orchestration loop
Our SaaS never sees your actual row-level data. Only schema metadata and job statuses are tracked.
Runs as a stateless container in your VPC. No inbound ports required; no persistent data storage.
Connect the agent with a read-only database user, or point it at a production replica instead of the primary. Ark only needs SELECT-level access to build subsets — write access to production is never required.
Why Ark
Every week without automated data governance is a week of mounting risk, wasted engineering hours, and regulatory exposure. Here's what changes when you deploy Ark.
The average data breach costs $4.88M globally — and 82% involve human error. Ark removes the human factor by automating PII discovery, classification, and masking before data ever reaches a test environment.
Engineers and QA teams spend up to half their time locating, preparing, and cleaning test data. Ark provisions masked, referentially-intact datasets on-demand — in minutes, not days.
GDPR fines exceeded €7.1B cumulative. KVKK penalties for data security breaches reach ₺17M per violation. Ark maps every column to KVKK, GDPR, CCPA, and HIPAA automatically — giving auditors the exact legal reference they need.
Unlike SaaS-only tools that upload your rows to external runtimes, Ark's agent runs inside your VPC. Classification, masking, and subsetting happen where your data already lives. Raw data never crosses the network boundary.
Ark's triple-engine classifier (schema + content + LLM) runs entirely on local models — Qwen, Gemma, Granite via Ollama. No data sent to OpenAI or any external API. Full accuracy, full privacy.
Every API key creation, rotation, revocation — every classification run, every masking job — is logged with who, when, and what. When auditors ask 'who accessed this data?', you answer in two clicks.
See the transformation
Ark doesn't just reduce risk — it turns test data into a competitive advantage. Ship faster with realistic data. Sleep better knowing production PII never leaks.
Deployment Ready
Start with the getting started guide, review the architecture, or walk through the product in more detail.
Enterprise-Grade Security & Compliance
Ark operates 100% inside your secure VPC. Your actual row-level data never leaves your private network perimeter.