Title: Senior Technical Operations Engineer
Morrisville, NC, US
Job Summary
NetApp is looking for a Technical Operations Engineer to join our growing Instaclustr team in the USA. NetApp’s Instaclustr offering provides open source as-a-service, delivering reliability at scale. We manage cutting edge open-source technologies (Cassandra, Kafka, PostgreSQL, OpenSearch) for our customers around the world.
NetApp Instaclustr makes it easy for our customers to run powerful open-source applications at the highest levels of scale. We have developed a platform that takes care of the whole lifecycle: provisioning infrastructure, installing applications and, most importantly, keeping the applications running reliably in production. Since being founded in 2013, Instaclustr has grown strongly, with over 300 customers worldwide, and over 20,000 nodes under management.
Our Technical Operations Engineers are the frontline team keeping our large fleet of cloud hosted open-source clusters up and running. Your work will ensure the security, reliability and performance of world-class systems and databases. You will collaborate with our customer’s technical teams, from globally recognized companies in the gaming, banking and logistics industry sectors, ranging from big multinationals to emerging start-ups.
The Role
As a Senior TechOps Engineer, you will be part of our TechOps team who maintain and support our large fleet of cloud-hosted Apache Cassandra, Kafka, OpenSearch & other open-source technology clusters. You will be working with other teams including Product Management, Development Teams and Customer Success to help drive and shape the way we support our customers.
What else you will do:
1. Provide expert support on incidents, diagnosing and solving data modelling and architectural issues by liaising with customer’s engineers and maintaining a high standard of customer communication.
2. Undertake complex cluster tasks including but not limited to real time migrations with no downtime, upgrades (minor & major versions), performance tuning and maintenance of our fleet of 13000+ nodes for our customers.
3. Provide expert support to our nodes running in the cloud (AWS/Azure/GCP), using technologies such as Linux (Debian, Ubuntu), Docker, and languages including Java, Python and bash.
4. Investigate issues and apply standard maintenance procedures to optimize the performance and stability of production and non-production clusters that we manage.
5. Develop and continually improve our suite of internal automation tools, applications, and processes.
6. Be a proactive, reliable and supportive member of the support team, and participate in a 24/7 rotating on-call roster.
Job Requirements
Minimum of 6 years working experience in addition to:
1. Designing and maintaining database architecture, data structures, tables, dictionaries and naming conventions to ensure the accuracy and completeness of all data master files. Experience with Apache Cassandra or similar NoSQL databases preferred.
2. Testing systems and upgrades such as debugging, tracking, reproduction, logging and resolving all identified problems, according to approved quality testing scripts, procedures and processes.
3. Experience in identifying architectural issues at scale in the cloud such as AWS/GCP/Azure.
4. Developing and managing documentation, standards, policies, and procedures related to database operations.
5. Programming skills in Python, Java, bash scripting, SQL, and source code control using Git.
6. Exceptional ability to communicate clearly and professionally in written and verbal English (essential).
7. Any customer service experience is favorable.
8. US Citizen.
Education
- Bachelor's degree in a technical or engineering field OR Master's degree with less related experience.
Nearest Major Market: Raleigh
Job Segment:
Open Source, Operations Manager, Senior Product Manager, Developer, Engineer, Technology, Operations, Engineering