Systems Software Engineer, Kubernetes Scale - DGX Cloud
This role involves driving performance and scalability for NVIDIA's DGX Cloud software stack, focusing on Kubernetes and NVIDIA components like GPU Operator and DCGM. The engineer will diagnose complex distributed systems issues, build automated testing frameworks, and collaborate with AI teams and open-source communities to optimize large-scale AI infrastructure. Key responsibilities include continuous performance testing, root cause analysis, and contributing to upstream projects like Kubernetes and CNCF.