Role: Senior MLOps Engineer Location: Abu Dhabi, UAE (Full Relocation Provided) Company: AI71 About Us AI71 is an applied research team committed to building responsible and impactful AI agents that empower knowledge workers. In partnership with the Technology Innovation Institute (TII), we drive innovation through cutting-edge AI research and development. Our mission is to translate breakthroughs in machine learning into transformative products that reshape industries. Senior MLOps Engineer AI71 is seeking a Senior MLOps Engineer to lead the development and management of our infrastructure, designed for training, deploying, and maintaining ML models. This role plays a critical function in operationalizing state-of-the-art systems to ensure high-performance delivery across research and production environments. The successful candidate will be responsible for designing and implementing infrastructure to support efficient model deployment, inference, monitoring, and retraining. This includes close collaboration with cross-functional teams to integrate machine learning models into scalable and secure production pipelines, enabling the delivery of real-time, data-driven solutions across various domains. Key Responsibilities • Model Deployment: Lead the deployment and scaling of LLMs and other deep learning models using inference engines such as vLLM, Triton, or TGI, ensuring optimal performance and reliability. • Pipeline Engineering: Design and maintain automated pipelines for model finetuning, evaluation, versioning, and continuous delivery using tools like MLflow, SageMaker Pipelines, or Kubeflow. • Infrastructure Management: Architect and manage cloud-native, cost-effective infrastructure for machine learning workloads using AWS (SageMaker, EC2, EKS, Lambda) or equivalent platforms. • Performance Optimization: Implement monitoring, logging, and optimization strategies to meet latency, throughput, and availability requirements across ML services. • Collaboration: Work closely with ML researchers, data scientists, and engineers to support experimentation workflows, streamline deployment, and translate research prototypes into production-ready solutions. • Automation & DevOps: Develop infrastructure-as-code (IaC) solutions to support repeatable, secure deployments and continuous integration/continuous delivery (CI/CD) for ML systems. • Model Efficiency: Apply model optimization techniques such as quantization, pruning, and multi-GPU/distributed inference to enhance system performance and cost-efficiency. Qualifications • Professional Experience: Minimum 5 years of experience in MLOps, ML infrastructure, or machine learning engineering, with a strong record of managing end-to-end ML model lifecycles. • Deployment Expertise: Proven experience in deploying large-scale models in production environments with advanced inference techniques. • Cloud Proficiency: In-depth expertise in cloud services (preferably AWS), including infrastructure management, scaling, and cost optimization for ML workloads. • Programming Skills: Strong programming proficiency in Python, with additional experience in C/C++ for performance-sensitive applications. • Tooling Knowledge: Proficiency in MLOps frameworks such as MLflow, Kubeflow, or SageMaker Pipelines; familiarity with Docker and Kubernetes. • Optimization Techniques: Hands-on experience with model performance optimization techniques and distributed training frameworks (e.g., DeepSpeed, FSDP, Accelerate). • Educational Background: Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Engineering, or a related technical field. Why Join AI71? • Advanced Technology Stack: Work with some of the most capable large language models and cutting-edge ML infrastructure. • High-Impact Work: Contribute directly to the deployment of AI solutions that deliver measurable business value across industries. • Collaboration-Driven Environment: Engage with a high-performing, interdisciplinary team focused on continuous innovation. • Robust Infrastructure: Access high-performance compute resources to support experimentation and scalable deployment. • Relocation Package: Full support for relocation to Abu Dhabi, with a competitive compensation package and lifestyle benefits