As a Site Reliability Engineer (SRE) at Shopmonkey, you'll be instrumental in ensuring the reliability, scalability, and performance of our systems and services for both internal and external stakeholders. We're seeking a seasoned professional who demonstrates mastery in computer science fundamentals and possesses a track record of independently implementing and delivering end-to-end, cloud-native solutions. This role requires strong backend expertise, including a deep understanding of our application's infrastructure, alongside proficiency in Site Reliability Engineering principles.
What you will do:
- Work directly with an Engineering Lead and other team members in the Platform and engineering teams to ensure reliable system functionality and scalability.
- Lead efforts in designing, building, and maintaining highly scalable, reliable, and secure infrastructure solutions.
- Drive initiatives to improve system reliability, performance, and scalability.
- Act as a subject matter expert in incident response, participating in on-call rotations and resolving production issues promptly.
- Design and implement robust monitoring, alerting, and incident response mechanisms to ensure system uptime and availability.
- Conduct post-incident reviews and implement preventive measures to mitigate future incidents.
- Mentor junior team members and contribute to their professional development.
- Stay abreast of industry best practices and emerging technologies, advocating for their adoption where applicable.
We are looking for people who have:
- Extensive experience in backend development and automation, with proficiency in: Bash, Golang, SQL and Typescript.
- Strong understanding of Site Reliability Engineering principles and practices.
- Demonstrated experience in designing and implementing scalable and reliable infrastructure solutions.
- Expertise with public cloud providers (GCP, AWS, Azure)
- Expertise with Distributed Systems, managed with Kubernetes
- Minimum of 7 years of professional software development experience, with a focus on site reliability engineering or infrastructure operations.
- Experience with PubSub/Eventing patterns is advantageous.
- Bachelor's degree in Computer Science or related field, or equivalent practical experience.
In the United States the range is typically a salary of $160,000 to $180,000 + bonus + equity + benefits. The range provided is Shopmonkey’s reasonable estimate of the base compensation for this role. The actual amount will be based on job-related and non-discriminatory factors such as location, experience, training, skills, and abilities. Consult with your Recruiter during the initial call to determine a more targeted range based on these job-related factors. In addition to this base compensation company stock options and benefits as outlined below are included.