To design and maintain scalable, AI-driven compute clusters to support enterprise workloads and data centers with high efficiency and performance.
To design and operate fault-tolerant data center infrastructure, ensuring high availability, redundancy, and failover strategies for mission-critical systems.
To implement and manage MLOps frameworks and Site Reliability Engineering (SRE) principles to streamline AI model lifecycle, ensure operational efficiency, and maintain service reliability.
To automate infrastructure provisioning using Infrastructure-as-Code (IaC), private cloud platforms, and containerized environments to enhance agility and reduce manual effort.
To leverage software-defined infrastructure (SDN, SDS, SDC) for dynamic resource allocation, scalability, and workload optimization.
To manage and optimize virtualized environments, including private cloud, operating systems, and container orchestration platforms across the enterprise.
To oversee security, compliance, and backup operations to ensure enterprise-grade data protection, regulatory adherence, and disaster recovery readiness.
To lead managed services initiatives, including database infrastructure and internal services, to enhance performance, uptime, and cost-efficiency.
To collaborate with Infrastructure Planning and Delivery functions to ensure seamless solution implementation and handover to operational teams.
To provide input to enterprise architecture and future system capacity planning, supporting data center expansion strategies and business continuity.
To manage the operation and SLAs of Data Center services, including collaboration with the Network Group and CT teams to ensure network stability and performance.
To ensure alignment with core data center strategies and architectural standards and oversee the execution of related IT infrastructure projects in assigned regions.
To monitor data center OPEX/Capex if applicable, prepare business cases, and manage outsourced vendor contracts, ensuring cost-effectiveness and adherence to quality standards.
To act as the escalation point for regional DC operations, remaining on-call 24/7 for critical incidents and service disruptions.
To review the post-implementation performance of live systems and conduct weekly reporting to senior management on operational metrics and project progress.
To liaise with business stakeholders in assigned regions to capture infrastructure requirements and deliver tailored IT services accordingly.
To ensure timely and compliant delivery of procured equipment, verifying quality and alignment with purchase orders and technical specifications.
To stay updated on technology trends and business process changes, adapt operations, and guide team members and partners accordingly.
To balance complex technological interdependencies and manage competing demands to deliver optimal 24/7 service to internal customers
To provide infrastructure for future needs bearing in mind latest developments like containirazation, AI automation tools and so on,
To ensure enterprise-grade security for data centers, including access control, hardening and encryption.
To manage the complexities of the interaction between the multitude of technologies and services
Requirements:
Education
• Bachelor's degree in Information Technology or related fields.
Experience
At least 5 years of experience in the data centre operations area, with experience in supervising/managing others.
Experience working in a medium-to-enterprise organization.