تفاصيل الوظيفة

The Incident Manager is responsible for overseeing the end-to-end management of incidents impacting cloud and infrastructure services, including AWS, Azure, and OCI environments.
This role ensures rapid restoration of services, effective communication with stakeholders, and continuous improvement through post-incident analysis.
Responsibilities: Own and manage the full incident lifecycle from detection to closure.
Act as the central command point during major (P1/P2) incidents.
Coordinate cross-functional teams including cloud, network and Infrastructure teams as well as CSMs.
Ensure timely incident triage, escalation, and resolution.
Lead incident bridges, war rooms, and crisis calls.
Ensure accurate and timely communication to stakeholders and leadership.
Track incidents against SLAs and ensure compliance with operational targets.
Drive root cause analysis (RCA) and post-incident reviews (PIRs).
Identify recurring issues and recommend preventive and corrective actions.
Maintain and improve incident management processes, playbooks, and runbooks.
Ensure proper documentation and ticket updates in ITSM tools.
Support audits, reporting, and service improvement initiatives.
5+ years of experience in IT operations, cloud, or infrastructure roles.
2+ years of experience in Incident or Major Incident Management.
ITIL Foundation or ITIL Intermediate (Incident Management) certification preferred.
Cloud certifications (AWS, Azure, OCI) are a plus.
Strong understanding of cloud platforms (AWS, Azure, OCI) and Private cloud operations.
Familiarity with monitoring, alerting, and logging tools.
Good understanding of infrastructure components (compute, storage, networking, IAM).
Ability to assess technical impact and prioritize incidents effectively.
Experience with ITSM tools (ServiceNow, Jira, Remedy, etc.
). Strong knowledge of ITIL Incident and Major Incident Management processes.

Similar Jobs