Job Title: Senior Infrastructure Engineer
Location: 665 Clyde Ave., Mountain View, CA 94043 (Onsite)
Job Overview:
We are looking for a highly skilled Senior Infrastructure Engineer to lead the design, implementation, maintenance, and optimization of our infrastructure systems. This includes managing a range of platforms from data centers and virtualized environments to GPU and storage platforms. The role requires hands-on expertise in automation, scripting, and monitoring to ensure high availability and security across both on-premise and cloud infrastructures. As a Senior Infrastructure Engineer, you will work independently and collaboratively, using your strong technical and communication skills to support seamless infrastructure operations.
Key Responsibilities:
- Infrastructure Design & Optimization: Architect, deploy, and maintain infrastructure systems spanning data centers, server hardware, OS (Linux and Windows), virtualization, storage, and GPU clusters.
- Automation & Monitoring: Use tools like Ansible, Terraform, and Packer for automation; implement robust monitoring and logging to ensure high availability and performance.
- System Operations & Troubleshooting: Develop SOPs for system setup, configuration, maintenance, and troubleshooting; proactively identify performance bottlenecks and optimization opportunities.
- Collaboration & Integration: Work closely with cross-functional teams to integrate infrastructure solutions with applications and services.
- Technology Research & Upgrades: Stay up-to-date with emerging technologies, making recommendations for upgrades and enhancements to maintain a competitive edge.
- Mentorship & Technical Guidance: Provide mentorship and guidance to junior team members, sharing best practices and troubleshooting expertise.
- On-Call Rotation: Participate in on-call support for urgent infrastructure issues outside regular hours.
Qualifications:
- Education: Bachelor’s degree in Computer Science, Information Technology, or related field.
- Experience: Minimum of 8 years in infrastructure engineering, with required expertise in:
- Data Center Operations & System Administration: Proficient in server hardware and Linux/Windows system administration.
- Virtualization Technologies: Hands-on experience with VMware, KVM, or Hyper-V.
- Storage & Backup Solutions: Familiarity with enterprise storage systems like NetApp and backup tools like Veeam.
- Automation & Scripting: Proficiency with Ansible, Terraform, and Packer; scripting in Python, Bash, or PowerShell is essential.
- GPU Clusters & Containers: Experience with GPU management and container orchestration (Kubernetes).
- Monitoring & Logging Tools: Skilled in implementing and managing comprehensive monitoring and logging systems.
- Cloud Infrastructure Management: Experience provisioning and managing cloud platforms (AWS, Azure, GCP is a plus).
- Networking Knowledge: Solid understanding of networking principles, including TCP/IP, DNS, DHCP, VPN, and firewall configurations.
- Problem-Solving & Communication: Excellent troubleshooting, analytical, and communication skills to work effectively across technical and non-technical teams.
- Time Management: Ability to manage multiple projects, meet deadlines, and deliver high-quality results.
- Self-Motivation: Strong sense of ownership, initiative, and proactive approach in driving projects to completion.