As a Staff Engineer for the Capacity and Performance Engineering (CPE) Team, you will be working on efforts to improve the scalability and efficiency of Cruise’s infrastructure. You will need to work cross-functionally with engineering and various teams everyday to drive efficiency initiatives and develop tooling to automate this. You will be responsible for collaborating with engineering for building scalable and efficient platforms, and optimizing our existing platforms. Your day will involve working across teams such as AI, Product, and Infrastructure to develop the structure for cost showback and self-service analysis, leading the CPE team for cloud efficiency discovery and execution, and working on strategic projects that will shape the future of Cruise. In particular, we’re looking for someone with familiarity with high performance compute clusters, scheduler logic, and resource contention tradeoff experience. We’re looking for an engineer that has a proven ability to make logic gate tradeoffs to represent efficiency targets without going too far and is comfortable investing the team in deep understanding to find novel improvements backed by data.
What you'll be doing:
- Provide deepest visibility to what is going on for all products: Run capacity and performance experiments to determine scaling and utilization parameters for various service tiers.
- Proactively identify gaps in infrastructure efficiency and workflow efficiencies, with being a key contributor in driving proposals to results
- Partner with engineering teams, Compute/Simulation platform in particular, to conduct capacity and performance experiments to resolve potential performance bottlenecks
- Work closely with software engineers to reduce their consumption of cloud resources and improve their performance
- Work with cloud service providers to proactively negotiate and retain necessary SKUs and capabilities required for efficient scale and capacity readiness
- Work frequently with other teams to coordinate major changes to cross-system architectures, influencing upstream or downstream for the most efficient solution.
- Present efficiency opportunities and project cost savings to Cruise executive team
- Design, develop and lead automation to help capacity plan for both near and long term
What you must have:
- Familiarity with HPC clusters, Compute Platforms
- 8+ years experience in capacity or performance engineering role
- 5+ years experience managing teams
- Expert application performance experience
- Expert knowledge of various public cloud providers
- Expert with data modeling for public cloud
- Expert with budgeting and capacity planning experience
- Expert in SQL, Python, scripting, and building automation tools
- Self-disciplined and thrives in fast moving environment
- Excellent communication skills
The salary range for this position is $166,600 - $245,000. Compensation will vary depending on location, job-related knowledge, skills, and experience. You may also be offered a bonus, long-term incentives, and benefits. These ranges are subject to change.