Stefano Schotten
GPU Infrastructure • Production Engineering • Reliability • Data Center Operations • Automation
I’ve spent two decades building and operating infrastructure where reliability is bounded by physics: compute, storage, networks, power, cooling—and the operational discipline required to keep systems predictable.
Linux has been my home base for 25+ years. I still prefer to work close to the system, where performance and reliability are shaped by primitives, contention, and the details that don’t show up in diagrams.
Background
I co-founded and self-funded a cloud and data center project, leading a generational transformation at AMTI—from VAR to cloud service provider. Over nearly 20 years as CTO, I scaled to 600 kW (including GPU-dense compute), delivered 2,000+ consecutive days of uptime, reduced OPEX by 40%, and drove 60× growth—culminating in an acquisition in 2024. I’m currently serving as a Data Center & GPU Infrastructure Leader (Advisor) while completing the transition services agreement (TSA).
I’ve also spent years in executive-facing rooms where the job is to translate operational risk into decisions: quantify constraints, model failure modes, and justify what needs to be built before it breaks.
Focus Areas
- Operable performance across compute, storage, and low-latency networking (RDMA, NVMe-oF) in distributed systems
- Observability and telemetry: instrumentation that’s trustworthy, time-aligned, and actionable
- Automation and orchestration: IaC, compliance-by-default, schedulers (Slurm), and Kubernetes patterns
- Fleet control planes: BMC/OOB management (Redfish/IPMI), inventory, lifecycle, and automated network perimeters
- Infrastructure economics (FinOps): cost transparency, capacity planning, and constraints-to-dollars decision making
- Operational maturity and security-by-design: incidents, postmortems, safe change, and CISSP-driven controls
Contact
- Email: s@ure.us
- LinkedIn: linkedin.com/in/schotten
- GitHub: github.com/sch0tten
Stefano Schotten