Position: Software Systems Engineer
Working hours: 9am - 5pm ET (flexible)
Company location: New York, United States (remote);
Salary: USD$4K / month + PTO
Candidates location: Latin America (LATAM)
Upload your resume in English, please
ABOUT THE COMPANY
This YC S23 startup is revolutionizing compute orchestration with a unique and powerful system that enhances performance by up to 10 times. Their technology enables real-time save, move, and restore for compute workloads, significantly improving efficiency and reducing costs. The company collaborates with infrastructure firms, including cloud development environments, AI inferencing, databases, and MLOps/DataOps platforms, helping them boost gross margins by increasing utilization and cutting costs by 20-80%.
Clients view this company as the next-generation VMware designed for the cloud, solving previously insurmountable problems. Led by founders with extensive experience in building and scaling successful startups, the team is small but impactful. They value passionate individuals who love to learn and operate at high velocity, offering opportunities to explore and innovate with cutting-edge technologies.
REQUIREMENTS
Deep understanding of
Kubernetes (system administration - using Helm, Terraform, etc.)
Containers (runtimes: runc, Docker, Podman, etc.; and container orchestration)
Virtualization
Modern (and potentially legacy) infrastructure stack
Experience with Cloud Infrastructure (AWS specifically)
Experience with Go, C, Rust, or any other systems-programming language
Experience scaling infrastructure as part of a platform team
Experience production-izing and managing production-level Kubernetes clusters
Experience with on-call duties
Enthusiasm for open source and Linux (preferably with experience in Gentoo, Arch, or similar distributions)
NICE TO HAVE
Experience supporting data teams with data processing infrastructure (BigQuery, OpenTelemetry, etc.)
Experience implementing observability and monitoring best practices
Experience with high performance computing (SLURM)
Experience deploying and scaling ML workloads (training or inference) in production