- Install, configure, and manage OpenStack services (Nova, Neutron, Cinder, Swift, Keystone and RED HAT).
- Implement automation for provisioning compute, storage, and networking resources.
- Performance Optimization
- Benchmark and tune HPC workloads for maximum efficiency.
- Leverage GPU/accelerator integration and parallel computing frameworks (MPI, CUDA, OpenCL).
- Security & Compliance
- Ensure secure multi‑tenant HPC environments.
- Apply role‑based access control, encryption, and compliance standards.
- Monitoring & Troubleshooting
- Implement monitoring tools (Prometheus, Grafana, Nagios).
- Diagnose and resolve performance bottlenecks and system failures.
- Collaboration & Support
- Work closely with researchers, developers, and IT teams to support HPC workloads.
Provide documentation, training, and technical guidance.
- HPC Cluster Management
- Architect, deploy, and maintain HPC clusters on OpenStack including day 2 operations.
- Optimize scheduling, resource allocation, and workload balancing.
- OpenStack Administration
Strong knowledge of HPC concepts: job schedulers is must (Slurm, PBS, Torque), distributed file systems (Lustre, GPFS).Deep understanding of OpenStack architecture and APIs.Proficiency in Linux system administration (RHEL, CentOS, Ubuntu).Knowledge of high‑speed interconnects (InfiniBand, RDMA).Familiarity with storage solutions (Ceph, NFS, Swift).Experience with infrastructure‑as‑code (Terraform, Ansible, Heat templates).Scripting skills in Python, Bash, or Ansible