> solutions / web3

Managed DevOps for Web3, validators and rollups

Testnet next quarter. Hiring an SRE who actually knows Cosmos SDK is a 4-month process, plus equity. We get your validators live in 3 regions in 5 days, with slashing alerts and a signed uptime SLA.

We arrive with a working stack: Cosmos SDK, Geth, Reth, OP Stack, Arbitrum Orbit, Polygon CDK, EigenDA, Celestia. For each client protocol we wire up signing observability, distributed locks against double-sign, HSM/KMS workflows, and runbooks for Sev-1 events (slashing triggers, peer drops, fork-choice mismatches).

Whitespace we sit on: post-deploy operations for L2s. Conduit, Caldera and Altlayer own sequencer ramp-up; ops after launch (monitoring, reboots, migrations, hard-fork cutovers) is still an open position. We take it.

stack we operate

Web3 subset. The platform layer is identical across ICPs.

Web3: Cosmos SDK Geth Reth OP Stack Arbitrum Orbit Polygon CDK EigenDA Celestia
Platform: Kubernetes Terraform Ansible Prometheus Grafana Loki OpenTelemetry PagerDuty

what we deploy

Concrete deliverables for Web3 teams. Each one ships end-to-end with repo, IaC and runbooks.

[ Validator set across 3 regions ]

Cosmos SDK / Geth / Reth, key isolation on HSM/KMS, distributed lock against double-sign, slashing alerts, missed-block dashboards, failover playbook.

[ RPC front + load-balancer ]

Geth / Reth read-replicas with per-method rate-limits, hot-path caching, p95 latency SLO, geo-routing for global traffic.

[ Sequencer for L2 (OP Stack / Orbit) ]

Sequencer + batcher + proposer as separate processes, L1 finality monitoring, switchover playbook, hot-standby in another region.

[ Incentivized testnet: 100 nodes in 72h ]

Burst delivery for incentives programs: bare-metal sourcing, auto-onboarding, equal-load region spread, leaderboard-position dashboard.

[ DA layer: EigenDA / Celestia ]

Light nodes with signed uptime, retrieval latency tracking, missed-header playbook, sync with the consensus layer.

what we operate 24/7

After handoff the pager lives with us. Coverage tuned for validators and rollups:

  • Signing observability: any missed block triggers Sev-2, two in a row triggers Sev-1.
  • Auto-failover to a hot-standby in another region on peer drop >30s or disk pressure.
  • On-call escalation: p95 first response 15 min for Sev-1.
  • Slashing-trigger watchdog: if the distributed lock stops responding, the signing key flips to read-only in <500ms.
  • Versioned runbooks per protocol: hard-fork cutover, chain halt, fork-choice mismatch, mempool flood.
  • Monthly ops review: what broke, what we fixed, what we are changing in the SLO.

migration scenarios

What we move without downtime and without exposing key material.

testnet to mainnet

Validator-set cutover to mainnet with a key ceremony, state sync, checkpointing, and a rollback plan.

cloud to bare-metal

Validators moved off AWS/GCP onto Latitude.sh or OpenMetal: typical 40% reduction in per-node costs with no latency penalty.

hard-fork cutover

Coordinated client upgrade for a known fork height: pre-flight checks, canary node, rolling restart by region.

cross-region sequencer

L2 sequencer moved to another jurisdiction or provider without dropping blocks: hot-standby promote + DNS cutover.

RPC split into geo-clusters

RPC split per-region as traffic grows: anycast / geo-DNS, cache warm-up, per-region rate-limits.

client swap (Geth to Reth)

Parallel sync, per-block checksum verification, smooth switch with no missed slots.

cases

Anonymized. NDAs cover names; the numbers are real.

ZK rollup · 6 mo · validator ops + RPC · slashing: 0 · uptime: 99.97% over 90d
Cosmos L1 · 12 mo · 7 validators across 4 regions · missed blocks: <0.02% · governance votes: 100%
OP Stack L2 · 4 mo · sequencer + batcher + RPC · 0 missed batches since launch
Incentivized testnet · 8 weeks · 50-node burst · top-5 operator by uptime

SLA tiers

Three coverage levels. For validators and sequencers we recommend Silver or higher: slashing risk does not tolerate 5x8.

Tier Response p95 (Sev-1) Coverage Incident report Engineer hours / mo
Bronze 30 min Business hours, 5×8 Within 48h 40
Silver 15 min 24/7 on-call rotation Within 24h 80
Gold 5 min 24/7 with dedicated engineer Within 12h 160+

FAQ

You do. HSM/KMS workflow where keys never leave your control. We sign via a signer daemon with a distributed lock; we don't custody material. Optional MPC setup (CGGMP-21 / FROST) where the protocol supports it.

Architecturally we rule out double-sign through a distributed lock: the signing key flips to read-only if consensus with the other instance can't be reached. Financial responsibility depends on tier: Gold includes a slashing-insurance discussion, Bronze/Silver use a shared-risk model. Across 3 years of ops in the current team: 0 slashing incidents.

Supply window: 72h from signed contract to first live node. Regionally distributed rollout closes within 5-7 days. Send the protocol spec plus target regions; we reply with a concrete window in 24h.

Yes, it's one of our primary stacks. That covers custom modules, IBC relayers, governance voting, and upgrade-handler migrations across major versions. CometBFT, CosmWasm, IBC v2: all in scope.

Yes. Onboarding: 1 week to inventory existing infra, import IaC (or regenerate via Terraform), move keys via a ceremony, take the pager. If anything is critically broken before handoff we fix it first, then sign the SLA.

ready to ship infra?

Tell us about the workload. We reply within 24 hours.