Skip to content
snapshot

Storage

Hosts are pure compute. Data lives in cloud object storage, encrypted at rest; per-job scoped, short-lived credentials are delivered encrypted to the chosen worker.

Where the data lives (cloud object storage, encrypted at rest) and how a worker is handed a temporary, read-only credential to just the slice of data one job needs.

Impact: Hosts are pure compute — they never hold your data or any long-lived keys, so a machine can't keep or leak your dataset.

Default provider
local-fake
1 provider supported
Formats enabled
3
csv · json · parquet
At rest
Parquet ME
Modular Encryption
Credentials
per-job scoped
short-lived, sealed
In transit
QUIC TLS 1.3
mutual auth

Providers

DuckDB filesystem extensions
AWS S3
available
httpfs + awss3://
MinIO / S3-compatible
available
httpfs + awss3://

Self-hosted; requires path-style addressing.

endpoint
minio.local:9000
url_style
path (MinIO) / vhost (AWS)
use_ssl
false
region
us-east-1

No secrets here — the access key / secret are never in config; they arrive per job, encrypted.

Azure ADLS
available
azureabfss:// / az://
Google Cloud Storage
available
httpfs (S3-interop) / gcsgcs://
Generic HTTPS
available
httpfshttps://
Local
defaultconfigured
local filesfile://
Effective storage config
The live storagesection of this node's GridConfig from the snapshot. Remote access is off on the loopback run, so endpoint / region / TLS knobs are unset here.
default provider
local-fakeconfigured
enabled_providers
local-fake
enabled_formats
csv, json, parquet
enable_remote_access
false
require_extensions
true
credential_ttl
15m
key_ttl
15m
Formats
Table formats DuckDB can read directly from object storage. The enabled tag marks the formats this node has turned on in its live config.
FormatDuckDB extension(s)On this nodeNote
ParquetparquetenabledCore / bundled — columnar, the default lake format.
CSVcoreenabledBuilt-in reader/writer; schema sniffing.
JSONjsonenabledCore / bundled — newline-delimited & nested.
Delta Lakedelta + httpfsavailableTransaction-log aware table reads.
Apache IcebergicebergavailableSnapshot / manifest-based table reads.
Per-job scoped credential
The ScopedCredential sealed to the winning worker.
provider
s3
token
opaque STS session / SAS / downscoped token
prefix
s3://acme-lake/orders/2026/
expires_at
15m

The prefix is a read-only scope — the token cannot touch anything outside it.

Delivery flow

  1. 1Requester mints short-lived, downscoped credentials (read-only, one prefix).
  2. 2Seals them to the chosen worker's node / enclave key.
  3. 3Worker runs CREATE SECRET (… SCOPE …) with the delivered token.
  4. 4Worker reads only that prefix over HTTPS — nothing else in the bucket.
Sample CREATE SECRET
What the worker runs locally once the scoped token is unsealed.
CREATE SECRET (
  TYPE      s3,
  KEY_ID    …,
  SECRET    …,
  ENDPOINT  'minio.local:9000',
  URL_STYLE 'path',
  USE_SSL   false,
  REGION    …,
  SCOPE     's3://bucket/prefix/'
);
TYPE s3URL_STYLE pathSCOPE prefix

KEY_ID / SECRETcome from the per-job token — they live only in the worker's process for the life of the job, never on disk or in the grid catalog.

Encryption & the honest security boundary
Two of the three data states are cryptographically solved; the third depends on the host's hardware tier.
In transitsolved

QUIC + TLS 1.3 with mutual authentication between peers.

At restsolved

Parquet Modular Encryption — the stored bytes are meaningless without the per-job key.

In usehardware-dependent

Only guaranteed on L2 confidential-computing hardware. Commodity laptops cannot guarantee RAM confidentiality, so sensitive data is routed only to attested L2 hosts, while laptops handle public data under quorum + reputation.

Net effect: storage operators and host operators both see only ciphertext; plaintext exists only inside an L2 enclave or, for public data, transiently in a laptop's RAM under quorum + reputation guards.
OS execution sandbox
The boundary aroundjob execution, complementing DuckDB's own lockdown. Now hardened and secure-by-default for the host-serving path — the live [sandbox]section of this node's config.
enabled
enabled
true
process per job
true
backend
auto
egress mode
inherit_storage
Process-per-job + OS confinement. Each FOREIGN job runs in an OS-sandboxed child process (Linux cgroups+seccomp, macOS Seatbelt, Windows Job Objects via backend = auto).
Real DuckDB lockdown. enable_external_access off, locked configuration, scoped allowed-directories, ephemeral temp. Where OS confinement can't apply, the host fails safe — it refuses to serve remote-access jobs unconfined rather than running them exposed.