Stefano Schotten

Stefano Schotten

GPU Infrastructure • Production Engineering • Reliability • Data Center Operations • Automation

I’ve spent two decades building and operating infrastructure where reliability is bounded by physics: compute, storage, networks, power, cooling—and the operational discipline required to keep systems predictable.

Linux has been my home base for 25+ years. I still prefer to work close to the system, where performance and reliability are shaped by primitives, contention, and the details that don’t show up in diagrams.

Background

I co-founded and self-funded a cloud and data center project, leading a generational transformation at AMTI—from VAR to cloud service provider. Over nearly 20 years as CTO, I scaled to 600 kW (including GPU-dense compute), delivered 2,000+ consecutive days of uptime, reduced OPEX by 40%, and drove 60× growth—culminating in an acquisition in 2024. I’m currently serving as a Data Center & GPU Infrastructure Leader (Advisor) while completing the transition services agreement (TSA).

I’ve also spent years in executive-facing rooms where the job is to translate operational risk into decisions: quantify constraints, model failure modes, and justify what needs to be built before it breaks.

Focus Areas

  • Operable performance across compute, storage, and low-latency networking (RDMA, NVMe-oF) in distributed systems
  • Observability and telemetry: instrumentation that’s trustworthy, time-aligned, and actionable
  • Automation and orchestration: IaC, compliance-by-default, schedulers (Slurm), and Kubernetes patterns
  • Fleet control planes: BMC/OOB management (Redfish/IPMI), inventory, lifecycle, and automated network perimeters
  • Infrastructure economics (FinOps): cost transparency, capacity planning, and constraints-to-dollars decision making
  • Operational maturity and security-by-design: incidents, postmortems, safe change, and CISSP-driven controls

Contact

Stefano Schotten