Title: Site Reliability Engineer
Cork, Munster, IE, T12 H682
Job Summary
NetApp is looking for a Staff Engineer to join our growing Instaclustr team in the EU. NetApp’s Instaclustr offering provides an open source as-a-service company, delivering reliability at scale. We manage cutting-edge open-source technologies (Cassandra, Kafka, PostgreSQL, Redis/Valkey, OpenSearch, Postgres, Click House, and Cadence) for our customers around the world.
NetApp Instaclustr enables our customers to run powerful open-source applications at the highest levels of scale. We have developed a platform that manages the entire lifecycle, encompassing provisioning infrastructure, installation of applications, and, most importantly, ensuring that the applications run reliably in production. Since its founding in 2013, Instaclustr has experienced strong growth, with over 300 customers worldwide and more than 23,000 nodes under management.
Our Technical Operations Engineers are the frontline team that keeps our large fleet of cloud-hosted, open-source clusters up and running. Your work will ensure the security, reliability, and performance of world-class systems and databases. You will collaborate with the technical teams of our customers, who are globally recognized companies in the gaming, banking, and logistics industries, ranging from large multinationals to emerging start-ups.
The Role
If you have excellent operational knowledge in managing Cassandra clusters, look no further!!
As a Staff Engineer (Cassandra), you will be part of the frontline team responsible for reliability, availability, and maintenance of our large fleet of cloud hosted Cassandra clusters. Every day, you will diagnose and solve interesting technical problems, providing Cassandra as a Managed Service in a highly automated environment. Our service is relied on by some of the leading global names in Banking and Financial Services, Telecom, IoT, and Tech companies that interact with millions of end users.
Skills & Experience
We're seeking competent engineers with exceptional communication skills, a positive attitude, and a passion for IT and continuous learning. We expect you to be proficient in, or quickly become proficient in, a range of the technologies we use.
Successful candidates for this role will:
- EU Citizenship where the individual is held accountable only to EU laws, rules, and regulations (i.e. must not hold dual citizenship with a non-EU country) (essential)
- Have strong experience in Cassandra administration, architecture, and a desire to learn more and develop to an actual expert level.
- You should possess experience in diagnosing and recommending mitigation strategies for a range of Cassandra-related issues, including performance degradation due to resource bottlenecks, suboptimal data modelling leading to hot partitions, excessive tombstones, and inefficiencies caused by range slices and poorly constructed queries.
- The ideal candidate should have hands-on experience with Cassandra architecture and core administrative tasks, including compactions, repairs, backup and recovery, resolving schema disagreements, and managing configurations.
- Candidate should have experience in handling Cassandra maintenance activities such as upgrades and migrations.
- Strong knowledge and experience with Linux, and comfortable working from the command line (essential)
- Exceptional ability to communicate clearly and professionally in written and verbal English (essential).
- Preferably have past IT Customer service/support experience in an ITIL-based setup.
- Good fundamental Computer science/software engineering skills and knowledge, particularly Operating System internals, memory management, and networking.
- Ability to follow required processes and procedures.
- Work as a technical and team lead and use your initiative to get things done.
- Preference will be given to candidates with the ability to investigate/research Cassandra issues by reviewing the Apache Cassandra codebase and Cassandra issue tracker.
- Programming skills in Python or Java, and source code control using Git would be a plus.
- Knowledge of any public cloud technologies like AWS, Docker, and Ansible will be a great addition.
- Preference for candidates to have foundational knowledge and experience working with Kafka and/or OpenSearch technologies.
I'm interested. What else will I be doing?
- Provide expert operational support to our nodes running in the cloud (AWS, Azure, and GCP), using technologies such as Linux (Debian), Docker, Ansible, and languages including Java, Python, and bash.
- Liaise with our customers’ engineers in resolving interesting issues related to Apache Cassandra and other supported technologies.
- Undertake complex cluster operations such as migrations, upgrades, and maintenance on our fleet.
- Develop and continually improve our suite of internal automation tools, applications, and processes.
- Work on continuous improvements to reduce operational toil and improve efficiencies.
- Cooperate between business units and the customer to achieve customer and business objectives.
Compensation:
The salary offered will be determined by the candidate's location, qualifications, experience, and education and may be outside of this range. Final compensation packages are competitive and in line with industry standards, reflecting a variety of factors, and include a comprehensive benefits package. This may cover Health Insurance, Life Insurance, Retirement or Pension Plans, Paid Time Off (PTO), various Leave options, Performance-Based Incentives, employee stock purchase plan, and/or restricted stocks (RSU’s), with all offerings subject to regional variations and governed by local laws, regulations, and company policies. Benefits may vary by country and region, and further details will be provided as part of the recruitment process.
Job Segment:
Cloud, Open Source, Developer, Java, Linux, Technology