Systems Software Engineer, Kubernetes Scale - DGX Cloud
This role involves driving performance and scale characterization for NVIDIA's DGX Cloud software stack, focusing on Kubernetes and NVIDIA components like GPU Operator, DCGM, and NIM. The engineer will debug large-scale distributed systems, build automated testing and monitoring tools, and collaborate with AI teams and open-source communities to optimize AI infrastructure efficiency and cost. Work includes deep performance analysis, CI/CD integration, and contributing to upstream projects like Kubernetes and CNCF.