Valleysoft -
Egypt
--
Valleysoft

Job Details

Valleysoft | Center of Excellence is a regional IT services provider based in Egypt, serving clients globally since 2006.
The company collaborates with global partners like Oracle to address diverse business and technical challenges, from enterprise application development to process management.
Valleysoft's vendor-neutral and process-oriented approach, coupled with operational maturity, ensures high-quality and cost-effective services for clients.
Job Summary We are seeking a highly experienced Lead MLOps Architect with deep AWS expertise to lead the design, architecture, and governance of enterprise-grade ML platforms.
This role requires strong leadership capabilities, hands-on expertise in scalable ML systems, and experience managing large production environments.
Key Responsibilities Architect and lead enterprise-scale MLOps platforms on AWS.
Define best practices for ML lifecycle management, deployment standards, and governance.
Lead production deployment of ML models using AWS-native services.
Design automated CI/CD pipelines for ML workflows and infrastructure.
Implement advanced monitoring, drift detection, retraining automation, and observability.
Ensure high availability, scalability, security, and cost optimization.
Establish model versioning, reproducibility, and experiment tracking standards.
Lead troubleshooting of complex production issues.
Mentor and lead a team of MLOps and platform engineers.
Collaborate with stakeholders to align ML platform strategy with business objectives.
Required Skills & QualificationsMLOps & Machine Learning 10–12 years of overall experience with strong focus on ML production systems Proven experience leading ML platform architecture and large-scale deployments Deep understanding of ML lifecycle management, governance, and reproducibility Hands-on experience with TensorFlow, PyTorch, Scikit-learn Strong experience with MLflow or enterprise model management tools AWS Cloud (Mandatory) Advanced hands-on expertise in: Amazon SageMaker (training, pipelines, endpoints) S3, EC2, Lambda ECR, ECS, EKS IAM, CloudWatch Experience designing secure, compliant, and scalable ML architectures Experience implementing cost optimization strategies on AWS DevOps, Containers & IaC Strong expertise in Docker and Kubernetes (EKS) Advanced CI/CD implementation Infrastructure as Code using Terraform and/or CloudFormation Experience implementing GitOps practices Programming & Data Expert-level Python skills Experience designing robust data pipelines Strong understanding of SQL/NoSQL systems Exposure to streaming or real-time ML systems Preferred Qualifications AWS Professional-level certifications Experience with ML security, explainability, and regulatory compliance Experience building enterprise feature stores Exposure to real-time inference systems

Similar Jobs