SRE (Site Reliability Engineer)

Location : ,

Job Description

Direct Line Phone No Must needed

LinkedIn Must needed

 Data Center SRE - - Santa Clara (100% onsite) - 15491




U.S. Citizenship Status:  U.S. Citizen; Green Card

Position/Sheet Notes:   100% onsite in Santa Clara, CA


Job Description:               

•             Programming and scripting knowledge using python, shell scripting

•             Strong understanding and expertise in working with Linux and Windows OS

•             Strong troubleshooting and debugging skills.

•             Experience on SQL

•             Experience on monitoring and observability stack ex: Zabbix, Katana, elastic search, Prometheus, Grafana

•             Prior experience on working with H/W devices like Tegra boards ex: Jetson etc. will be good.


•             Manage machine configuration using configuration management ex: Puppet.

•             Manage machine status/pool/OS using automation / UI.

•             Prepare the targets to be ready for physical migration.

•             Validate the targets post physical migration by submitting tests to individual targets.

•             Debug test failures and identify if related to migration / bad target or ongoing failure on Tot.

•             Enable the targets for production workload.

•             Monitor the job success on targets with production workload.

•             Collaborate and coordinate closely with platform / lab techs for migration.

•             Maintain & update the status of migration and publish on a weekly basis.