Role Purpose:
As Site Reliability Engineer you will be responsible for assuring that our platforms and applications are running smoothly and that systems work as expected. You monitor the application and platform availability & reliability and assured the appropriate levels are met. You are a creative and innovative problem solver who can partner with our development teams to make our services & products are reliable and automate manual repetitive operational activities.
Key Accountabilities and Decision Ownership:
• Use software as a tool to manage systems, solve problems, and automate resolution to achieve zero touch operations.
• Design and enhance software architecture to improve scalability, service reliability, capacity, and performance.
• Defines, creates, promotes and monitors SLO’s/SLI’s with products owners and development teams.
• Write automation code for provisioning and operating infrastructure at massive scale.
• Work with development teams to make sure the applications fit within the infrastructure and scalability/reliability is designed and implemented from the grounds up.
• Work with development teams on building pipelines and automation for delivering and deploying applications to production.
• Participate in the occasional on-call rotation supporting applications/infrastructure
• Roll up the sleeves to solve incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause
• You write post-mortem reviews and remediation recommendation
• Documents, shares and promotes sharing of knowledge
• Interacts with internal and external peers and management to share highly complex information related to areas of expertise and/or to gain acceptance of new or enhanced technology/business solutions
• Willing to work on call shifts
Core Competencies, Knowledge, and Experience:
• Bachelor’s degree in computer science, Computer Engineering, SW engineering or a related subject
• Minimum 2 years of professional software engineering experience, IT support experience
• Excellent analytical & problem-solving skills
• An open, flexible and adaptable mindset to cope with a rapidly changing set of tasks in an area of emerging, new technologies.
• Experience in working and managing a cultural national team.
• Be able to navigate a complex organization to create simple solutions
Must Have Technical / Professional Qualifications:
• Good knowledge of Google Cloud platform and its different managed services
• Experience in coding with Python, Java, JavaScript, Scala
• Very good experience in Linux/System administration
• Very good knowledge and understanding of Networks, Databases, DNS, proxies & security
• Knowledge of SQL and Relational Databases, as well as noSQL DBs as MongoDB
• Knowledge of technologies, frameworks, environments like HTML, CSS, TypeScript, React, Angular, Bootstrap, Node.js, Graph QL & REST API, SSL Certificates
• Understanding of web applications architectures
• Experience on Monitoring solution like Google Operation suite (former stackdriver), Datadog, Dynatrace
• Good Knowledge Container management and orchestration using Docker, Kubernetes, RDK
• Good knowledge of “Infrastructure as Code” using Terraform
• Good knowledge with different CI/CD tools, mainly Jenkins and GoCD
• Knowledge with automation/configuration management using Ansible, Puppet or Chef are a plus
• Good knowledge with Version Control systems such as GitHub
• Knowledge with SAFe, Agile & related methodologies
• Good knowledge of Micro services architecture
• Good knowledge of backup/recovery, disaster recovery, vulnerability and patch management.
• Is delivery focused and customer satisfaction obsessed
• Strong level of English and global/multicultural awareness
#_VOIS #movewithus