Job Description
The Infrastructure Architect at Group 42 designs and improves scalable infrastructure for advanced AI platforms and enterprise operations. The role requires system architects to create cloud and on-premises systems together with networking and storage solutions that fulfill security needs while delivering operational availability and peak performance. The system architects work with engineering teams and DevOps teams and product teams to develop solutions which support AI training and inference and large-scale data processing.
Job ID: 2299
Date Posted: NA
Expiration Date: NA
Apply: Click Here
Main Duties
- Architect and optimize infrastructure for AI platforms, ensuring scalability, security, and high performance across cloud and on-premises environments.
- Design high availability and disaster recovery solutions to maintain resilience of mission-critical AI systems.
- Collaborate with engineering, DevOps, and product teams to enhance infrastructure for AI/ML workloads, networking, and storage.
- Implement automation and Infrastructure as Code (IaC) using tools like Terraform, Ansible, and Kubernetes for streamlined deployment and management.
- Evaluate and integrate emerging technologies to improve cloud efficiency, storage, networking, and overall infrastructure capabilities.
Essential Qualifications
- 12 years of experience which should include work in infrastructure architecture and cloud platforms and systems engineering for large-scale mission-critical systems.
- Demonstrated expertise with AWS, Azure and GCP through work on AI/ML projects which needed both scalability and resilience.
- Advanced expertise regarding cloud computing and networking and storage systems and IaC tools including Terraform and Ansible and CloudFormation.
- Expertise in using Docker and Kubernetes to deploy and manage AI/ML workloads.
- Knowledge of networking protocols and VPNs and firewalls and security best practices which apply to both hybrid and cloud environments.
Preferred Qualifications
- Distributed storage systems work together with high-performance computing systems that enable AI and machine learning applications.
- Developed expertise in creating disaster recovery plans and business continuity strategies which include designing systems for maintaining continuous operations.
- Collaborate with DevOps engineers and software developers and security personnel to achieve infrastructure objectives.
- AI-powered orchestration tools and autonomous systems to optimize business processes while maintaining security requirements.
- Understands current trends and upcoming infrastructure technologies which help develop advanced AI systems.