We are hiring for a next generation telecoms software company who are seeking a Problem Analyst to join their expanding team.
Reporting to the Head of Process Outcomes the Service Manager will be responsible for the overall success of services and related processes including incident, change, patch, and configuration management.
The responsibilities include:
• Ownership of Incident Management
• Ownership of Change Management
• Support service impacting procedures and activities
• Administration and Management activities
• Undertake ad-hoc projects and other activities as required
The Service Manager will be responsible for ensuring the number of repeat incidents is low and the level of human interaction required in the resolution of incidents is in active decline through utilising our data and machine learning tools and actively focussing on an automated, autonomous and orchestrated resolution and identification approach.
Key Accountabilities and Activities
1 Ownership of Incident Management including:
• Support the development and implementation of our new Incident Management processes and policies.
• Lead incident response efforts, ensuring timely and effective resolution of issues.
• Actively resolve issues where resolutions have been identified in the form of runbooks, automated or manual activities.
• Conduct post-incident reviews and ensure detailed documentation is maintained.
• Interrogate monitoring tools and alerts to identify issues and monitor overall performance levels and uptime of services.
• Instil a heavy focus on the avoidance of repeat incidents through machine learning tools and a mindset of autonomous and orchestrated resolutions.
• Prioritise activities based on business and customer impact matrices.
2 Ownership of Change Management procedures including:
• Facilitate change advisory board (CAB) meetings to review new requests.
• Capture unapproved changes and the scale of the impact.
• Maintain and improve the change management process.
• Identify and asses potential risks associated with changes and work with stakeholders to mitigate this.
• Identify potential impact of change on services and other internal stakeholders.
• Manage effective communication of changes to internal and external stakeholders.
3 Support service impacting procedures and activities including:
• Support the problem management process as necessary.
• Support the implementation and ensure accurate maintenance of the Configuration Management Database
• Actively feed requirements into the Data Mesh project
• Involvement with the Major Incident Management and Crisis Management procedures.
• Support the creation, implementation and management of Business Continuity and Disaster Recovery procedures and policies.
4 Administration and Management responsibilities including:
• Actively align priorities with business goals of an automated, orchestrated, and autonomic approach to operations, reducing the operational burden
• Responsible for providing support services across different time zones (UK & Europe) to multiple customers.
• Provide support on out of hours matters as necessary.
• Create and present reports on all areas of service management.
• Continuously improving service management and operational processes, with a view of increased customer satisfaction.
5 Undertake ad-hoc projects and other activities as required
Essential Qualifications/ Certifications
Essential
1. Certification in Incident Management
Desirable
2. Certification in Problem Management
Experience and Skills
1. Experience leading service management teams & activities within a software development & network engineering environment.
2. Experienced as a successful Change Manager.
3. Experience implementing new processes.
4. Excellent written and verbal communication skills specifically with complex technical matters.
5. Experience interpreting data using various tools including Prometheus and Grafana
6. Ability to work independently and confidently corral resources
7. Experience working in the telecommunication industry
8. Understanding of Kubernetes practices
9. Understanding of Agile working practises
10. Understanding of Network Dependencies
11. Experience handling Carrier Outages
12. Experience handling incident with 3rd Party SLAs