Guidewire is searching for a Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry’s leading Analytics platform. As a member of the Analytics Reliability Team, you’ll be responsible for building and evolving our SRE practice for Analytics.
The Analytics team at Guidewire uses internet scale data collection, adaptive machine learning, and insurance risk modeling capabilities to help insurers and other financial institutions model evolving risks, develop new products, and make better business decisions.
Downtime and failures are inevitable, but how SREs deal with the problem is what’s important. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments. Part of the responsibility SREs have is to collaborate with developers to troubleshoot and solve problems and reduce customer impact where possible. SREs will also need to go one step further after the incident to document and examine what went wrong and develop measures such as automated runbooks to handle the issue moving forward.
\n- This is an On-Call position
- Responding to any critical incidents and ticket escalations
- Following and documenting our post incident response/post mortem processes
- Executing planned patching or improving related automation
- Engineering to reduce toil, tune alerts, and improve documentation
- When NOT on-call, you will be responsible for:
- Engineering to re-platform or migrate layers of our infrastructure to Kubernetes ecosystems
- Analyzing our AWS infrastructure and related applications for design and architectural opportunities to improve overall reliability
- Creating patterns of observability to ensure all alerts have consistent content/config to ensure triaging is short and overall MTTR is continuously improved
- Analyzing incident data to determine the next opportunity to improve reliability
- Influencing engineers to improve application reliability and scalability to run efficiently
- Documenting every action, if not captured as code, so your findings turn into repeatable actions and then into automation
- Improve operational processes (such as deployments and upgrades) to make them as boring as possible
- Proven experience designing and deploying SLI’s, SLO’s, and Error Budgets
- Proven experience triaging and debugging distributed systems on cloud infrastructure
- Proven experience in designing and engineering CICD pipelines within K8S and legacy ecosystems
- Proven experience in designing and engineering monitors, dashboards, and synthetic transactions in Datadog
- Proven experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud native approaches
- Proven experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible
- Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale
- Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent
- AWS Certified in multiple categories
- Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design
- Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions
- Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc
#LI_REMOTE
#feature
#sitereliabilityengineer #sre #aws #kubernetes #python
About Guidewire
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.
As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.
For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.
Guidewire Software Inc. provides equal employment opportunities to all applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.