Title: Site Reliability Engineer
Bangalore, Karnataka, IN
Job Summary
NetApp is looking for a Senior TechOps Engineer - Cassandra to join our growing Instaclustr team in India. NetApp’s Instaclustr offering provides open-source as-a-service, delivering reliability at scale. We manage cutting-edge open-source technologies (Cassandra, Kafka, PostgreSQL, Redis/Valkey, OpenSearch, ClickHouse, and Cadence) for our customers around the world.
NetApp Instaclustr enables our customers to run powerful open-source applications at the highest levels of scale. We have developed a platform that manages the entire lifecycle, encompassing infrastructure provisioning, application installation, and, most importantly, ensuring applications run reliably in production. Since its founding in 2013, Instaclustr has experienced strong growth, with over 300 customers worldwide and more than 18,000 nodes under management.
Our Technical Operations Engineers are the frontline team that keeps our large fleet of cloud-hosted, open-source clusters up and running. Your work will ensure the security, reliability, and performance of world-class systems and databases. You will collaborate with the technical teams of our customers, who are globally recognized companies in the gaming, banking, and logistics industries, ranging from large multinationals to emerging start-ups.
Job Requirements
Successful candidates for this role will have:
Essential skills:
- Strong experience in Apache Cassandra administration and architecture, with a desire to continuously learn and develop to an expert level.
- Experience in diagnosing and recommending mitigation strategies for Cassandra-related issues, including performance degradation due to resource bottlenecks, suboptimal data modeling leading to hot partitions, excessive tombstones, and inefficiencies caused by range slices and poorly constructed queries.
- Hands-on experience with Cassandra architecture and core administrative tasks, including compactions, repairs, backup and recovery, schema disagreement resolution, and configuration management.
- Experience handling Cassandra maintenance activities, including upgrades and migrations.
- Ability to investigate and research Cassandra issues by reviewing the Apache Cassandra codebase.
- Strong knowledge and experience with Linux, with the ability to work comfortably from the command line.
- Exceptional ability to communicate clearly and professionally in written and verbal English.
- Experience working with at least one public cloud platform, preferably AWS.
- Prior IT customer service or support experience within an ITIL-based environment.
- Strong fundamental computer science and software engineering skills, particularly in operating system internals, memory management, and networking.
- Ability to follow required processes and procedures and work collaboratively within a team.
Preferred Skills:
- Programming skills in Python or Java, with experience using Git for source code control.
- Experience working with Docker and Ansible is a strong additional advantage.
Education
Typically requires a minimum of 5-8 years of related experience with a Bachelor’s degree or 3 years and a Master’s degree; or a PhD with 3 years of experience; or equivalent experience.
Job Segment:
Open Source, Cloud, Software Engineer, Linux, Computer Science, Technology, Engineering