Title: Software Engineer
Bangalore, Karnataka, IN
Job Summary
As an SRE Engineer at NetApp India R&D division, you will be responsible for the development, reliability, automation, and operations of AI-driven services across cloud and on-prem environments. You will be part of a highly skilled technical team named NetApp Active IQ, contributing to cutting-edge reliability engineering practices while enabling Generative AI (GenAI) innovation.
Your focus will be on applying SRE principles to GenAI-powered services — ensuring they are scalable, fault-tolerant, highly available, and meet strict SLAs. You will bridge development and operations by building automation, observability, and self-healing capabilities for AI-driven workloads.
This position requires an individual to be creative, team-oriented, technology savvy, driven to produce results and demonstrates the ability to working across teams
Job Requirements
• Design, develop, and maintain SRE automation tools for monitoring, deployment, and scaling of AI/GenAI workloads.
• Implement observability platforms (metrics, tracing, logging) tailored for GenAI services running on both Cloud and Onprem.
• Collaborate with engineering and data science teams to productionize GenAI models at scale.
• Build fault-tolerant infrastructure for AI pipelines using Kubernetes, Docker, and cloud-native tools.
• Drive capacity planning, incident management, and postmortem analysis with a focus on continuous reliability improvement.
• Implement CI/CD with automated testing and validation pipelines.
• Develop self-recovery mechanisms for critical services to minimize downtime.
• Ensure security, compliance, and resilience in AI-based applications and microservices.
• Interact with Active IQ engineering teams across geographies to leverage expertise and contribute to the tech community.
Education
Typically requires no previous professional experience.
Job Segment:
Test Engineer, Cloud, Software Engineer, Testing, Developer, Engineering, Technology