Jobs

Site Reliability Engineer


Job details
  • South Bank
  • 1 week ago

Site Reliability Engineer

Are you a Site Reliability Engineer, Environment Manager, Platform Engineer, or a senior-level DevOps Engineer? Are you looking for an exciting role in a newly formed team that will drive innovation and create best-in-class development environments to support product innovation and delivery? Does a remote-first role sound good to you? If so, then this could be right up your street!

Nicholas Howard is delighted to be recruiting for a Site Reliability Engineer to join a leading systems integrator. Our client helps companies to establish, maintain and grow their IT services, and operate their critical technology in a more cost-effective manner. This is a brand-new role within the strategic engineering team, which sets and maintains design and development standards across IP development.

As a Site Reliability Engineer (SRE), you will ensure the reliability, availability, and performance of services, primarily utilising Microsoft Azure with a focus on containers, serverless, AI, analytics, and database services. You will work closely with development teams to build scalable and resilient systems and provide advisory support to our support teams. Although Azure will be our main Cloud Platform experience with AWS would be desirable. Fundamentally, the post-holder will play a crucial role in building the environment for internal development capability.

This is a remote-first role, with time in the office in London once a month.

Key Responsibilities:

  • Collaborate with development teams to design scalable and resilient architectures in Azure.

  • Develop and implement monitoring and alerting solutions to ensure service reliability.

  • Automate operational processes and tasks using Infrastructure as Code (IaC) and scripting.

  • Manage and optimise Azure resources, focusing on:

    • Containers (e.g., Azure Kubernetes Service (AKS), Azure Container Apps).

    • Serverless computing (e.g., Azure Functions, Logic Apps).

    • AI and analytics (e.g., Azure Machine Learning, Synapse Analytics, Data Factory).

    • Database services (e.g., Cosmos DB, Azure SQL, PostgreSQL).

  • Perform root cause analysis for incidents and implement preventative measures.

  • Provide advisory support to platform support teams.

  • Work in a multi-cloud environment, and while Azure is the primary focus, experience with AWS (e.g., ECS, Lambda, RDS) is beneficial.

    Key Skills and Experience:

  • Proven experience as an SRE, or in a similar role.

  • Strong expertise in Azure services (containers, serverless, AI, analytics, databases).

  • Experience with implementing and utilising monitoring & logging tools (Azure Monitor, Application Insights, Datadog, Grafana).

  • Proficient in scripting & automation (Python, Bash, PowerShell).

  • Infrastructure as Code (IaC) experience (Terraform, Bicep, ARM Templates).

  • Experience with making technical decisions and implementing solutions that align with best practices and business goals.

  • Excellent problem-solving and collaboration skills.

  • AWS knowledge and experience would be a plus.

    The company offers a highly competitive salary, along with comprehensive benefits including flexible remote working, a generous company pension, health and dental insurance, life assurance, access to the Udemy training platform to support ongoing skills development and training, and a wide range of additional lifestyle perks.

    Please register your interest by applying now

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

Site Reliability Engineer

Site Reliability EngineerAre you a Site Reliability Engineer, Environment Manager, Platform Engineer, or a senior-level DevOps Engineer? Are you looking for an exciting role in a newly formed team that will drive innovation and create best-in-class development environments to support product innovation and delivery? Does a remote-first role sound good...

South Bank

Senior Site Reliability Engineer - DevOps

What You'll Do:LM Envision, LogicMonitor's leading hybrid observability platform powered by AI, helps modern enterprises gain operational visibility into and predictability across their IT stacks, so they can continue to deliver extraordinary employee and customer experiences. LogicMonitor has a layered approach to intelligence, where AI and Machine Learning is baked...

LogicMonitor London

Principal Frontend Engineer

Matillion is The Data Productivity Cloud.We are on a mission to power the data productivity of our customers and the world, by helping teams get data business ready, faster. Our technology allows customers to load, transform, sync and orchestrate their data. We are looking for passionate, high-integrity individuals to help...

Matillion Manchester

Manufacturing Engineer (Assembly)

Shape the Future of Automotive Technology! Manufacturing Engineer - Assembly - Ford HalewoodJoin Ford Halewood Transmissions, a global leader in electric vehicle technology, as a Manufacturing Engineer and play a crucial role in shaping the future of automotive manufacturing.You'll be at the forefront of innovation, working with cutting-edge technologies and...

Halewood

Manufacturing Engineer

Shape the Future of Automotive Technology! Manufacturing Engineer - Assembly - Ford HalewoodJoin Ford Halewood Transmissions, a global leader in electric vehicle technology, as a Manufacturing Engineer and play a crucial role in shaping the future of automotive manufacturing.You'll be at the forefront of innovation, working with cutting-edge technologies and...

Halewood

Lead Data Engineer

Job Description Are you up for the challenge of creating seamless user experiences from every angle?Our teams are trusted to deliver and given the space to be awesome. We’re an inclusive community for the curious, generous, pragmatic and committed digital practitioner. Would you like to join this vibrant community of...

PA Consulting London