Title: NVIDIA DGX Infrastructure Engineer
We are looking for a skilled NVIDIA DGX Infrastructure Engineer to join our dynamic team. As a DGX Infrastructure Engineer, you will be responsible for managing NVIDIA DGX-based infrastructure. You will play a crucial role in ensuring the optimal performance, reliability, and scalability of NVIDIA DGX infrastructure.
Key Responsibilities:
- Managing NVIDIA DGX systems and related infrastructure.
- Configuring and optimizing DGX clusters for performance, reliability, and scalability.
- Collaborating with data scientists, AI engineers, and IT teams to integrate DGX systems into the overall AI and deep learning workflows.
- Monitoring system performance and implementing proactive measures to maintain optimal operation.
- Troubleshooting and resolving issues related to DGX systems, including hardware, software, and network components.
- Implementing security measures and best practices to ensure the integrity and confidentiality of DGX-based data and workflows.
- Documenting infrastructure configurations, processes, and procedures.
- Providing technical guidance and training to team members on DGX-related technologies and best practices.
- Staying current with NVIDIA DGX hardware and software advancements and recommending upgrades or enhancements as needed.