Apply now »

Title:  Site Reliability Engineer (Cadence)

Location: 

Bangalore, Karnataka, IN

Requisition ID:  132780

Job Summary

NetApp is looking for a Senior Techops Engineer to join our growing Instaclustr team in Bangalore, India. NetApp’s Instaclustr offering provides open source as-a-service company, delivering reliability at scale. We manage cutting edge open-source technologies (Cassandra, Kafka, PostgreSQL, Redis/Valkey, OpenSearch, Postgres, ClickHouse and Cadence) for our customers around the world. 
NetApp Instaclustr makes it easy for our customers to run powerful open-source applications at the highest levels of scale. We have developed a platform that takes care of the whole lifecycle: provisioning infrastructure, installing applications and, most importantly, keeping the applications running reliably in production.  Since being founded in 2013, Instaclustr has grown strongly, with over 300 customers worldwide, and over 18,000 nodes under management.
Our Technical Operations Engineers are the frontline team keeping our large fleet of cloud-hosted open-source clusters up and running. Your work will ensure the security, reliability and performance of world-class systems and databases. You will collaborate with our customer’s technical teams, from globally recognised companies in the gaming, banking and logistics industry sectors, ranging from big multinationals to emerging start-ups.

Roles and Responsibilities

  • Maintain and monitor large-scale Cadence clusters along with Cassandra, Kafka, and Opensearch clusters in a highly automated cloud environment.
  • Diagnose and resolve complex technical issues by analyzing logs, metrics, and system behavior.
  • Provide operational support for nodes running on AWS, Azure, and GCP using Linux (Debian), Docker, and scripting languages such as Python and Bash.
  • Perform cluster operations including upgrades, migrations, and maintenance.
  • Collaborate with customer engineering teams to troubleshoot and resolve issues related to Cadence and other supported technologies, ensuring clear and professional communication.
  • Participate in Level 2 on-call rotation for incident response and operational support.
  • Develop and enhance internal automation tools and processes to improve efficiency and reliability.
  • Investigate issues by reviewing source code and contribute to code-level fixes when needed.
  • Maintain and monitor large-scale Cadence clusters along with Cassandra, Kafka, and Opensearch clusters in a highly automated cloud environment.
  • Diagnose and resolve complex technical issues by analyzing logs, metrics, and system behavior.
  • Provide operational support for nodes running on AWS, Azure, and GCP using Linux (Debian), Docker, and scripting languages such as Python and Bash.
  • Perform cluster operations including upgrades, migrations, and maintenance.
  • Collaborate with customer engineering teams to troubleshoot and resolve issues related to Cadence and other supported technologies, ensuring clear and professional communication.
  • Participate in Level 2 on-call rotation for incident response and operational support.
  • Develop and enhance internal automation tools and processes to improve efficiency and reliability.
  • Investigate issues by reviewing source code and contribute to code-level fixes when needed.

Education

  • Typically requires a minimum of 4-8 years of related experience with a Bachelor’s degree or 6 years and a Master’s degree; or a PhD with 3 years experience; or equivalent experience.


Job Segment: Open Source, Cloud, Linux, Software Engineer, Engineer, Technology, Engineering

Apply now »