Apply now »

Title:  Site Reliability Engineer

Location: 

Canberra, ACT, AU

Requisition ID:  129120

Job Summary

Our TechOps Engineers are the frontline team keeping our large fleet of cloud-hosted Apache Kafka, Cassandra, OpenSearch, Cadence, Valkey, Clickhouse and PostgreSQL clusters up and running. Every day you will diagnose and solve challenging and interesting technical problems providing a service that is relied on by some of the leading global names in tech to deliver for millions of end users.  

This role is for an Australia-based Senior TechOps engineer – primarily focusing on Cadence Opensource technology – that includes operating, maintaining, upgrading and continuously improving the Managed Service for Cadence (across AWS, Azure and GCP) to deliver a great customer experience.

Job Requirements

  • Working with our Managed Service Product development team to establish Cadence operational requirements and support procedures. 
  • Responding to customer queries and incidents, diagnosing and solving complex technical issues by liaising with customer’s engineers.  This will include written communication via support tickets and occasional video-call based support.  
  • The role provides an opportunity to additionally work extensively on Apache Cassandra, Kafka, Opensearch, PostgreSQL, along with Cloud providers such as AWS, GCP and Azure. 
  • Assist/mentor Level-1 team members to develop their technical capabilities on Cadence. 
  • Undertake complex cluster operations such as migrations, upgrades and maintenance. 
  • Provide expert operational support to our nodes running in the cloud (AWS, Azure and GCP) as well as On-premise, using technologies such as Linux (Debian), Docker, and languages including Java, Python and Bash. 
  • Investigate issues and apply standard maintenance procedures to optimize the performance and stability of production systems 
  • Liaise with the Development and Product Management team through all stages of the development cycle to ensure proper release processes/procedures are being followed 
  • Develop and continually improve our suite of internal automation tools, applications, and processes 
  • Be a proactive, reliable and supportive member of the TechOps team, and participate in a rotating L2 shift roster 

Skills and Education

We're looking for smart engineers with exceptional communication skills, a positive attitude, and a passion for IT and learning new things. We expect you to be, or quickly become proficient in the range of technologies we use.  

 

  • You must have at least  3-5 years of working experience in addition to:  
  • Managing Production environment, including performance benchmarking and tuning on application and kernel level. 
  • Strong Linux skills with experience in cloud environments is a must, preferably AWS or GCP or Azure. Should be comfortable working from the command line. This is essential, there are no GUIs here 
  • Familiarity with installing and maintaining VMs and applications in scale, including upgrade, migration and life cycle management. 
  • Ability to debug applications using logs and metrics, and replicate issues in local environment. 
  • Preferably experience with Ansible, Prometheus, Terraform, Grafana and Docker. 
  • Good fundamental computer science / software engineering skills and knowledge, particularly operating system internals, memory management, and networking.  
  • Ideally, programming skills in languages such as GO, Python, Java, Bash scripting, SQL and source code control using Git.  
  • Exceptional ability to communicate clearly and professionally in written and verbal English (essential).  
  • Follow required processes and procedures.  
  • Work as part of a team and use your initiative to get things done.  
  • Passion for all things IT, and especially open source.  
  • Any customer service experience is favorable. 


Job Segment: Cloud, Computer Science, Software Engineer, Open Source, Developer, Technology, Engineering

Apply now »