Job Purpose
The SRE is the linchpin of our digital resilience, entrusted with ensuring the uninterrupted performance of business-critical systems that directly shape revenue streams and customer experiences. In the era of advanced technologies such as microservices, databases, data streaming, and containers, the SRE assumes a pivotal position as both architect and guardian, distinctly setting this role apart from conventional support functions.
Navigating the intricacies of cutting-edge technologies goes beyond routine support tasks for the SRE. Proactively designing, implementing, and refining the infrastructure, they fortify it against dynamic challenges posed by the interplay of microservices, databases, data streaming, and containers. This proactive approach is crucial in mitigating risks, minimizing downtime, and elevating system reliability—attributes that surpass the capabilities of traditional support roles. Without the SRE, the business faces heightened downtime, potential system failures, and a compromised reputation for reliability, directly impacting our bottom line and customer loyalty. As an SRE, you embody the frontline defense of our technological sovereignty, playing an indispensable role in maintaining our digital integrity and ensuring business continuity in a landscape where the reliability of our systems is paramount for enduring Etisalat success.
Report To Position Name
•Collaborate with development and support teams to define and implement best application deployment and management practices on OpenShift/Kubernetes clusters.
•Be part of a team of collaborators focusing on internal and external client satisfaction, efficiency and flawless execution in the OLA’s and SLA’s we have with other internal teams
•Acts as a bridge between development & operation by applying engineering principles to optimize system reliability, performance and availability.
•Managing and provisioning Openshift/Kubernetes infrastructure resources
•Implement and enforce security measures and best practices, including access control, network policies, and vulnerability management.
•Evaluate new technologies and tools that can improve the OpenShift/Kubernetes and PaaS platform
•Develop and maintain dashboards and other reporting tools to ensure visibility into the health and performance of the PaaS platform.
•Stay up to date with the latest OpenShift/Kubernetes features, releases, and industry trends, and assess their applicability to our environment
•Optimize the performance of the Kubernetes/OpenShift cluster
•Work closely with the DevOps team to integrate container orchestration tools into the DevOps pipeline
•Ensuring that data is backed up regularly in order to prevent loss of important information in case of a disaster
•Monitoring application performance to ensure optimal use of resources
•Ensuring that servers are running smoothly and efficiently by troubleshooting problems with them
•Identify opportunities for improving container orchestration
•Deploy Application on managing OpenShift/Kubernetes platforms.
•Develop and maintain policies and procedures for access management and authentication for the OpenShift/Kubernetes and PaaS platform and associated infrastructure
•On-call duties for managing and supporting the OpenShift/Kubernetes platforms.
•Ensure the reliability, availability, and performance of databases, including both SQL and NoSQL databases, by monitoring, troubleshooting, and optimizing database configurations.
•Implement and maintain strategies for database scalability, including sharing, replication, and caching to handle growing data and traffic loads.
•Develop and maintain backup and recovery procedures to safeguard data and minimize downtime in the event of data loss or system failures.
•Identify and optimize slow-performing queries, improving the overall database performance.
•Create and maintain documentation, runbooks, and knowledge-sharing materials for system configurations and procedures.
•work to scale systems horizontally and vertically to meet increasing demands and traffic.
•Develop and maintain infrastructure as code (IaC) and automation scripts to eliminate manual, error-prone tasks, and streamline deployments and configurations.