Documentation

Understand Ark before you deploy

Ark helps engineering and security teams share realistic test data without copying raw production databases to the cloud. This page explains what runs where, what stays in your network, and how to reach your first masked sandbox.

Start a pilot How it works

Platform model

Two parts, one workflow

Think of Ark as a remote control in the cloud and a worker inside your network. You plan and approve in SaaS; the agent executes next to your data.

Ark SaaS

Control plane

Where your team plans and governs

Web console, API, CLI, and SDKs for your engineers
Tenant users, roles, and audit trail
Database connection registry (metadata only — no live SQL from the cloud)
Classification reports, masking policies, and job orchestration
Ephemeral test-environment requests and status

Hosted by Ark. Holds governance metadata in ark-meta — not your raw database rows.

Your VPC

Data plane agent

Where work actually runs

ark-agent runs inside your network, near your databases
Connects outbound to the control plane — no inbound access from SaaS to your DB
Reads source data, classifies columns, applies masking, builds subsets
Writes masked dumps to your object storage (MinIO / S3)
Provisions ephemeral MySQL or PostgreSQL sandboxes when requested

You deploy and operate this component. Raw production data stays here.

Data boundary

What stays inside your network

The control plane coordinates work and stores governance metadata. It does not need a copy of your production tables to do its job.

Stays in your VPC Goes to Ark SaaS

Production database rows Column labels, confidence scores, schema metadata

Database passwords (decrypted in-agent) Job progress and completion status

Masked SQL/CSV dumps (your object storage) Masking audit summaries

LLM prompts with samples (if Ollama enabled in VPC) Approved policy and config versions

Typical paths

Pick the journey that matches your team

Most customers start with one high-value workflow, prove value, then expand to governance and CI automation.

Safe test database in minutes

Engineering & QA

Register your source database connection
Create a source profile and config with row limits
Request a test environment from the UI or CLI
Agent provisions a masked sandbox and returns a ready DSN
Hand the DSN to CI, QA, or a developer

Example ark-cli testenvs create --config "<config-id>" --wait

Govern sensitive data first

Security & compliance

Connect a source database and run a classification job
Review flagged columns in the console (profiler role)
Approve masking rules for PII, credentials, and embedded JSON/text
Run a subset job with masking enforced
Export compliance views and audit logs from the control plane

Trigger via Console → Source profile → Run classification job

Automate from CI/CD

Platform & DevOps

Create an API key with tenant scope
Authenticate ark-cli or ark-sdk-go / ark-sdk-js in your pipeline
Trigger test environments or subset jobs on every build
Wait for ready status and inject the DSN into test runners
Tear down ephemeral databases when the pipeline finishes

Example ark-cli login --api-url "$ARK_URL" --api-key "$ARK_API_KEY" --tenant-id "$TENANT"

Key concepts

Terms you will see in the console

Tenant

Your organization's isolated workspace — users, connections, jobs, and policies are scoped to one tenant.

Connection

A registered source or target database endpoint. Credentials are resolved inside your VPC by the agent, not stored as plaintext in SaaS when using env-ref mode.

Source profile

The governance object for a source DB: schema snapshot, classification report, and approved masking rules.

Config

A runnable recipe derived from a source profile — extraction limits, masking runtime, target connection, and test-environment settings.

Job

A unit of work dispatched to your agent: classification, subset export, synthetic data generation, or sandbox provisioning.

Test environment

An ephemeral MySQL or PostgreSQL database provisioned in your VPC from a config, pre-loaded with a masked subset.

First deployment

From tenant to first masked database

Ark provisions your tenant

Your Ark contact creates the tenant. An admin logs into the console and invites team members with the right roles.

Deploy the agent in your network

Create an agent in the console, copy the token, and run ark-agent in your VPC with outbound access to the control plane URL.

Register database connections

Add source (and optional target) connections. Test connectivity through the agent — the control plane never opens a direct SQL session.

Classify and approve masking

Run classification on a source profile. Security reviewers approve masking rules before data leaves production patterns.

Create a config and run your first job

Bind profile to target, set row limits, then trigger a subset or test-environment job. Monitor progress in the console or CLI.

Agent startup (customer VPC)

# Configure .env from portal provisioning bundle (mTLS certs + gRPC target)
ark-agent start

Access surfaces

Console, CLI, or SDK — same platform

Web console

Admins, profilers, operators

Full governance UI: connections, classification review, masking editor, job history, test-environment browser, and audit logs.

Best for: Day-to-day operations and security review

ark-cli

Any team, any language stack

Shell-first access for login, config listing, job triggers, and test-environment lifecycle. Works in CI without a language SDK.

Best for: Scripts, pipelines, and quick operator tasks

Go / JavaScript SDK

Platform engineers

Typed clients for config listing, job orchestration, wait loops, and DSN helpers inside services and internal portals.

Best for: Embedded automation in Go or Node services

Roles at a glance

tenant_admin Manage users, agents, connections, and tenant settings

profiler Run classification, review columns, approve masking rules

tenant_user / tenant_ci Trigger subset jobs, synthetic generation, and test environments

Ready to run your first workflow?

The getting-started guide walks through a focused CI pilot — one source, one config, one masked sandbox.

Open getting started Back to home