On-site Full Time
--
LIPS Healthcare

Job Details

Role Summary:At LIPS Healthcare, the Site Reliability Engineer (SRE) plays a pivotal role in ensuring that our patient- and public-facing website, as well as internal tools, run reliably, securely, and efficiently. The SRE bridges the gap between development and operations, combining software engineering practices with operational excellence. This role is responsible for building, maintaining, and automating systems that enable high availability, scalability, and resilience across digital healthcare platforms.
The SRE also drives application security, operability and observability practices across platforms, ensuring that our services can be delivered with minimal disruption and maximum safety.
Principal Activities/Main Duties and Responsibilities Design, build, and maintain CI/CD pipelines ensuring safe, frequent, and zero-downtime deployments. Collaborate with software engineers and IT leadership to define availability targets, SLIs, and SLOs for LIPS Healthcare services. Implement and maintain observability practices (monitoring, logging, tracing, and alerting). Use the Four Golden Signals (latency, traffic, errors, saturation) to monitor system health and reliability. Manage cloud infrastructure (Azure) and container orchestration platforms (Kubernetes, Docker). Automate infrastructure provisioning using tools such as Terraform or Bash. Ensure operability standards: high availability, disaster recovery, and incident response readiness. Actively participate in on-call rotations to support production services, following a You Build It, You Run It model. Lead incident reviews and post-mortems, embedding lessons learned into system improvements. Mentor IT and engineering staff in automation, reliability, and operational best practices. Administer and troubleshoot networking systems, ensuring secure and reliable connectivity for internal tools and patient-facing platforms.
Education and Training Bachelor’s or Master’s degree in Computer Science, Information Systems, or a related field. Proven experience as an SRE, Dev Ops Engineer, or Systems Engineer in complex environments. Certification in cloud platforms (AWS, Azure, or GCP), Kubernetes, or Networking.
People Management• Provide technical leadership and mentoring within cross-functional teams. Primary Contacts Head of IT and CTOSoftware development teams Business stakeholders and product owners External vendors and service providers
Key Result Areas Competency – Key Behaviors Management Implements and monitors availability targets aligned to business needs. Anticipates risks in system design and proposes mitigation strategies. Skilled in project planning, automation, and operational monitoring. Leadership Encourages a culture of operability and reliability-first. Fosters collaboration between developers, operations, and business stakeholders. Leads incident reviews with transparency and accountability. Relationship Building Builds strong partnerships across IT, clinical teams, and external vendors. Shares knowledge to upskill teams in modern Dev Ops and SRE practices. Business Acumen & Enterprise Knowledge Aligns reliability work with LIPS Healthcare’s mission of safe, continuous patient care. Balances system availability with delivery speed and cost-effectiveness. Change Advocacy Champions automation and observability as enablers of healthcare transformation. Promotes continuous improvement through adoption of SRE practices. Influencing Communicates complex technical concepts clearly to both technical and non-technical stakeholders. Gains buy-in for reliability initiatives across business and IT leadership. Results Orientation Delivers measurable improvements in system uptime, performance, and incident response. Ensures deployments are frequent, safe, and reliable. Health, Safety and Security• Ensure system reliability aligns with healthcare safety standards.• Protect sensitive patient data in accordance with data protection regulations.• Promote secure coding, networking, monitoring, and operational practices. Service Improvement• Continuously improve deployment pipelines, infrastructure, and observability tooling.• Evaluate new tools and platforms that enhance system resilience and performance.• Provide input into IT transformation initiatives. Quality• Uphold quality initiatives to improve the reliability of clinical and non-clinical systems.• Support audits and compliance activities for IT systems.• Monitor and review reliability metrics against agreed SLIs/SLOs. Other• Contribute to the review and update of IT operations policies and procedures.• Collaborate with other departments to enhance workflows and integrations.• Perform any other duties commensurate with the role, as requested by IT leadership.
Working Conditions:Working Days: Monday to Friday ( Saturday & Sunday off ). Working Hours: 11:00AM to 7:00PM. Work Location: Heliopolis, Cairo.

Similar Jobs

About LIPS Healthcare
Egypt, Cairo
Hospital & Health Care