Documentation

Understand Ark before you deploy

Ark helps engineering and security teams share realistic test data without copying raw production databases to the cloud. This page explains what runs where, what stays in your network, and how to reach your first masked sandbox.

Platform model

Two parts, one workflow

Think of Ark as a remote control in the cloud and a worker inside your network. You plan and approve in SaaS; the agent executes next to your data.

Ark SaaS

Control plane

Where your team plans and governs

  • Web console, API, CLI, and SDKs for your engineers
  • Tenant users, roles, and audit trail
  • Database connection registry (metadata only — no live SQL from the cloud)
  • Classification reports, masking policies, and job orchestration
  • Ephemeral test-environment requests and status

Hosted by Ark. Holds governance metadata in ark-meta — not your raw database rows.

Your VPC

Data plane agent

Where work actually runs

  • ark-agent runs inside your network, near your databases
  • Connects outbound to the control plane — no inbound access from SaaS to your DB
  • Reads source data, classifies columns, applies masking, builds subsets
  • Writes masked dumps to your object storage (MinIO / S3)
  • Provisions ephemeral MySQL or PostgreSQL sandboxes when requested

You deploy and operate this component. Raw production data stays here.

Data boundary

What stays inside your network

The control plane coordinates work and stores governance metadata. It does not need a copy of your production tables to do its job.

Stays in your VPC Goes to Ark SaaS
Production database rows Column labels, confidence scores, schema metadata
Database passwords (decrypted in-agent) Job progress and completion status
Masked SQL/CSV dumps (your object storage) Masking audit summaries
LLM prompts with samples (if Ollama enabled in VPC) Approved policy and config versions

Typical paths

Pick the journey that matches your team

Most customers start with one high-value workflow, prove value, then expand to governance and CI automation.

A

Safe test database in minutes

Engineering & QA

  1. Register your source database connection
  2. Create a source profile and config with row limits
  3. Request a test environment from the UI or CLI
  4. Agent provisions a masked sandbox and returns a ready DSN
  5. Hand the DSN to CI, QA, or a developer
Example ark-cli testenvs create --config "<config-id>" --wait
B

Govern sensitive data first

Security & compliance

  1. Connect a source database and run a classification job
  2. Review flagged columns in the console (profiler role)
  3. Approve masking rules for PII, credentials, and embedded JSON/text
  4. Run a subset job with masking enforced
  5. Export compliance views and audit logs from the control plane
Trigger via Console → Source profile → Run classification job
C

Automate from CI/CD

Platform & DevOps

  1. Create an API key with tenant scope
  2. Authenticate ark-cli or ark-sdk-go / ark-sdk-js in your pipeline
  3. Trigger test environments or subset jobs on every build
  4. Wait for ready status and inject the DSN into test runners
  5. Tear down ephemeral databases when the pipeline finishes
Example ark-cli login --api-url "$ARK_URL" --api-key "$ARK_API_KEY" --tenant-id "$TENANT"

Key concepts

Terms you will see in the console

Tenant

Your organization's isolated workspace — users, connections, jobs, and policies are scoped to one tenant.

Connection

A registered source or target database endpoint. Credentials are resolved inside your VPC by the agent, not stored as plaintext in SaaS when using env-ref mode.

Source profile

The governance object for a source DB: schema snapshot, classification report, and approved masking rules.

Config

A runnable recipe derived from a source profile — extraction limits, masking runtime, target connection, and test-environment settings.

Job

A unit of work dispatched to your agent: classification, subset export, synthetic data generation, or sandbox provisioning.

Test environment

An ephemeral MySQL or PostgreSQL database provisioned in your VPC from a config, pre-loaded with a masked subset.

First deployment

From tenant to first masked database

01

Ark provisions your tenant

Your Ark contact creates the tenant. An admin logs into the console and invites team members with the right roles.

02

Deploy the agent in your network

Create an agent in the console, copy the token, and run ark-agent in your VPC with outbound access to the control plane URL.

03

Register database connections

Add source (and optional target) connections. Test connectivity through the agent — the control plane never opens a direct SQL session.

04

Classify and approve masking

Run classification on a source profile. Security reviewers approve masking rules before data leaves production patterns.

05

Create a config and run your first job

Bind profile to target, set row limits, then trigger a subset or test-environment job. Monitor progress in the console or CLI.

Agent startup (customer VPC)

# Configure .env from portal provisioning bundle (mTLS certs + gRPC target)
ark-agent start

Access surfaces

Console, CLI, or SDK — same platform

Web console

Admins, profilers, operators

Full governance UI: connections, classification review, masking editor, job history, test-environment browser, and audit logs.

Best for: Day-to-day operations and security review

ark-cli

Any team, any language stack

Shell-first access for login, config listing, job triggers, and test-environment lifecycle. Works in CI without a language SDK.

Best for: Scripts, pipelines, and quick operator tasks

Go / JavaScript SDK

Platform engineers

Typed clients for config listing, job orchestration, wait loops, and DSN helpers inside services and internal portals.

Best for: Embedded automation in Go or Node services

Roles at a glance

tenant_admin Manage users, agents, connections, and tenant settings
profiler Run classification, review columns, approve masking rules
tenant_user / tenant_ci Trigger subset jobs, synthetic generation, and test environments

Ready to run your first workflow?

The getting-started guide walks through a focused CI pilot — one source, one config, one masked sandbox.

entr