Apply now »

Title:  Site Reliability Engineer (OpenSearch)

Location: 

Bangalore, Karnataka, IN

Requisition ID:  135105

Job Summary

NetApp is seeking a Technical Operations Engineer (OpenSearch) to join our growing Instaclustr team in Bangalore, India. In this role, you will be part of a frontline Site Reliability Engineering (SRE) team responsible for ensuring the availability, performance, and reliability of large-scale, cloud-hosted OpenSearch clusters.
You will work in a highly automated environment managing distributed open-source systems at scale, collaborating with global customers across industries such as banking, telecom, gaming, and technology. This role requires strong operational expertise, problem-solving skills, and a passion for learning and working with modern cloud-native and open-source technologies.

Job Requirements

  • Provide end-to-end operational support for OpenSearch clusters deployed across public cloud platforms (AWS, Azure, GCP).
  • Monitor, troubleshoot, and resolve complex production issues, ensuring high availability and performance.
  • Perform cluster lifecycle operations, including upgrades, migrations, maintenance, and scaling activities.
  • Participate in L2 on-call rotations, ensuring timely incident response and resolution.
  • Collaborate with customer engineering teams to diagnose and resolve issues related to OpenSearch and other supported technologies.
  • Work closely with internal teams to enhance reliability, automation, and operational efficiency.
  • Develop and improve automation tools, scripts, and operational processes.
  • Analyse system behaviour and proactively identify opportunities for performance optimisation and reliability improvements.
  • Contribute to knowledge sharing, documentation, and continuous improvement initiatives.

Required Skills & Experience

  • Hands-on experience with OpenSearch (including troubleshooting, upgrades, and migrations) or strong willingness to develop deep expertise.
  • Experience with public cloud platforms such as AWS, Azure, or GCP.
  • Strong Linux system administration skills and comfort with command-line environments.
  • Solid understanding of distributed systems, networking, and OS internals.
  • Experience with containerisation technologies (e.g., Docker).
  • Strong problem-solving skills with the ability to debug complex production issues.
  • Excellent communication skills (written and verbal) with a customer-focused mindset.
  • Ability to work effectively in a collaborative, fast-paced environment and take ownership of tasks.

Preferred Skills

  • Experience working with other distributed systems such as Cassandra or Kafka.
  • Familiarity with source code debugging and issue investigation (e.g., Jira, codebase review).
  • Programming/scripting skills in Python, Java, or Bash.
  • Experience with Git or version control systems.
  • Prior experience in customer support or technical operations roles

Education

  • Typically requires a minimum of 4-8 years of related experience with a Bachelor’s degree or 6 years and a Master’s degree; or a PhD with 3 years experience; or equivalent experience.


Job Segment: Open Source, Developer, Java, Linux, Engineer, Technology, Engineering

Apply now »