LOCALS ONLY
Client is looking for a Site Reliability Engineer (with experience working with Adobe applications) who is available to work on-site in one of the following locations: Overland Park, KS - Bellevue, WA - Atlanta, GA.
The Site Reliability Engineer (SRE) will work alongside engineering and operations teams to ensure the reliability and efficiency of Adobe’s services and systems. You will focus on automating processes, improving infrastructure, and resolving performance and availability issues to ensure a seamless experience for our customers. As an SRE, you will help bridge the gap between software engineering and operations, applying a software engineering mindset to system administration tasks.
Key Responsibilities:
- Able to work on site in one of the following locations: Overland Park, KS - Bellevue, WA - Atlanta, GA
- Ensure the availability, scalability, and reliability of Adobe’s cloud services and platforms.
- Collaborate with development teams to design, deploy, and maintain robust systems that can scale as demand increases.
- Automate manual tasks and improve efficiency in the deployment, monitoring, and management of systems and infrastructure.
- Develop and maintain monitoring, alerting, and incident management systems to identify and resolve potential issues before they impact customers.
- Participate in the on-call rotation and troubleshoot production issues, ensuring fast recovery and minimal downtime.
- Continuously improve system performance by identifying bottlenecks, optimizing processes, and recommending enhancements.
- Analyze and resolve service outages and capacity issues.
- Develop tools to streamline operational tasks and improve the overall reliability of production systems.
- Work closely with security teams to ensure the integrity and security of systems.
- Contribute to the documentation and best practices for site reliability across teams.
Skills & Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role, working with large-scale distributed systems.
- Strong programming skills in languages such as Python, Go, Java, or similar.
- Solid understanding of cloud technologies (AWS, Azure, Google Cloud, etc.) and containerization (Docker, Kubernetes).
- Hands-on experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, ELK stack, or similar.
- Experience with automation and configuration management tools like Terraform, Ansible, or Chef.
- Strong knowledge of systems administration (Linux/Unix-based environments) and networking concepts.
- Expertise in handling incident management, including troubleshooting and root cause analysis.
- Ability to work collaboratively in a fast-paced, cross-functional team environment.
- Excellent communication skills, both written and verbal.
Preferred Qualifications:
- Experience with CI/CD pipelines and deployment automation.
- Familiarity with database management systems (e.g., MySQL, PostgreSQL, NoSQL).
- Experience with infrastructure as code (IaC) practices.
- Cloud certifications (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.) are a plus.