Job Summary: The AI Operations Specialist will be responsible for the implementation, maintenance, and optimization of AI and machine learning (ML) models and systems. This role requires a blend of data science, machine learning, and operational skills to ensure the seamless deployment, monitoring, and performance tuning of AI applications. Knowledge of Dell and Nvidia products is essential for optimizing hardware and software solutions. Key Responsibilities: 1. Model Deployment and Maintenance: Deploy AI/ML models into production environments. Monitor the performance of deployed models, ensuring they meet performance and accuracy benchmarks. Implement automated retraining and updating mechanisms for models. 2. System Monitoring and Troubleshooting: Develop and implement monitoring tools to ensure the health of AI systems. Troubleshoot and resolve issues related to AI/ML systems in a timely manner. Collaborate with IT and DevOps teams to ensure system stability and uptime. 3. Performance Optimization: Analyze the performance of AI models and systems, identifying areas for improvement. Optimize AI workflows and processes to enhance efficiency and reduce latency. Implement best practices for model scaling and resource management. 4. Data Management: Oversee the data pipelines that feed into AI/ML models, ensuring data quality and integrity. Collaborate with data engineering teams to design and maintain robust data infrastructure. Ensure compliance with data privacy and security standards. 5. Documentation and Reporting: Maintain detailed documentation of AI operations, including deployment procedures, troubleshooting steps, and performance metrics. Prepare reports on AI system performance, incidents, and resolutions for stakeholders. Contribute to the development of operational guidelines and best practices. 6. Collaboration and Communication: Work closely with data scientists, engineers, and business stakeholders to understand AI requirements and deliver solutions. Provide technical support and training to team members and end users. Stay updated with the latest AI/ML technologies and industry trends. 7. Hardware and Software Expertise: Leverage expertise in Dell products, including servers, storage solutions, and networking equipment, to support AI infrastructure. Utilize Nvidia GPUs and related software (CUDA, TensorRT) for optimizing AI/ML model performance. Stay informed about advancements and updates in Dell and Nvidia technologies to recommend and implement upgrades. Qualifications: Bachelors degree in Computer Science, Engineering, Data Science, or a related field. Proven experience in AI/ML operations, with hands on experience in deploying and managing models in production. Strong knowledge of machine learning frameworks (e.g., TensorFlow, PyTorch, scikitlearn). Proficiency in programming languages such as Python, Java, or R. Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes). Familiarity with monitoring tools and practices (e.g., Prometheus, Grafana). Excellent problem solving skills and the ability to work under pressure. Strong communication skills and the ability to collaborate effectively with cross functional teams. In depth knowledge of Dell products and Nvidia GPUs, including setup, configuration, and optimization. Preferred Qualifications: Masters degree in a related field. Certification in AI/ML or cloud computing (e.g., AWS Certified Machine Learning Specialty). Experience with big data technologies (e.g., Hadoop, Spark). Knowledge of MLOps practices and tools (e.g., MLflow, Kubeflow). Working Conditions: Office environment with potential for remote work. May require occasional on call duty or afterhours support for critical issues. How to Apply: Interested candidates should submit their resume and a cover letter detailing their relevant experience and interest in the position to [email address]. ADZN1_UKTJ