Title: Site Reliability Engineer (OpenSearch)
Bangalore, Karnataka, IN
Job Summary
NetApp is seeking a Technical Operations Engineer (OpenSearch) to join our growing Instaclustr team in Bangalore, India. In this role, you will be part of a frontline Site Reliability Engineering (SRE) team responsible for ensuring the availability, performance, and reliability of large-scale, cloud-hosted OpenSearch clusters.
You will work in a highly automated environment managing distributed open-source systems at scale, collaborating with global customers across industries such as banking, telecom, gaming, and technology. This role requires strong operational expertise, problem-solving skills, and a passion for learning and working with modern cloud-native and open-source technologies.
Job Requirements
- Provide end-to-end operational support for OpenSearch clusters deployed across public cloud platforms (AWS, Azure, GCP).
- Monitor, troubleshoot, and resolve complex production issues, ensuring high availability and performance.
- Perform cluster lifecycle operations, including upgrades, migrations, maintenance, and scaling activities.
- Participate in L2 on-call rotations, ensuring timely incident response and resolution.
- Collaborate with customer engineering teams to diagnose and resolve issues related to OpenSearch and other supported technologies.
- Work closely with internal teams to enhance reliability, automation, and operational efficiency.
- Develop and improve automation tools, scripts, and operational processes.
- Analyse system behaviour and proactively identify opportunities for performance optimisation and reliability improvements.
- Contribute to knowledge sharing, documentation, and continuous improvement initiatives.
Required Skills & Experience
- Hands-on experience with OpenSearch (including troubleshooting, upgrades, and migrations) or strong willingness to develop deep expertise.
- Experience with public cloud platforms such as AWS, Azure, or GCP.
- Strong Linux system administration skills and comfort with command-line environments.
- Solid understanding of distributed systems, networking, and OS internals.
- Experience with containerisation technologies (e.g., Docker).
- Strong problem-solving skills with the ability to debug complex production issues.
- Excellent communication skills (written and verbal) with a customer-focused mindset.
- Ability to work effectively in a collaborative, fast-paced environment and take ownership of tasks.
Preferred Skills
- Experience working with other distributed systems such as Cassandra or Kafka.
- Familiarity with source code debugging and issue investigation (e.g., Jira, codebase review).
- Programming/scripting skills in Python, Java, or Bash.
- Experience with Git or version control systems.
- Prior experience in customer support or technical operations roles
Education
- Typically requires a minimum of 4-8 years of related experience with a Bachelor’s degree or 6 years and a Master’s degree; or a PhD with 3 years experience; or equivalent experience.
Job Segment:
Open Source, Developer, Java, Linux, Engineer, Technology, Engineering