Senior Technical Program Manager – AI Infrastructure, Site Operations at Cerebras
Interview Preparation Plan
As a Senior Technical Program Manager for AI Infrastructure and Site Operations at Cerebras, you will play a crucial role in ensuring the health, performance, and reliability of the company's advanced AI compute infrastructure. This position requires a blend of strategic leadership and hands-on technical expertise, with a focus on maximizing compute capacity to support critical AI objectives. You will leverage your Site Reliability Engineering (SRE) background to build robust systems, manage high-stakes technical escalations, and champion customer success. The role involves overseeing the operation and reliability of AI compute infrastructure, acting as the primary owner for critical infrastructure systems, and developing automation solutions. This role is ideal for a proactive problem-solver with extensive experience in large-scale distributed systems and a track record of leading high-performing teams. You will partner with cross-functional teams, including engineering and product, to align on long-term infrastructure strategy and support future AI initiatives. Continuous evaluation and improvement of existing processes, tools, and technologies to enhance system reliability and operational efficiency will be key. Your expertise will be vital in managing the execution of interdisciplinary teams in a fast-paced environment to solve complex problems in the rapidly evolving AI space.
Key Responsibilities
- Lead and manage the operation and reliability of advanced AI compute infrastructure.
- Drive technical ownership of critical infrastructure systems, ensuring uptime, performance, and capacity optimization.
Ready to Ace Your Interview?
Sign up for free to practice with AI-powered mock interviews tailored to this role and company.