Drag
Job Description
Title: Data Center SRE
- Programming and scripting knowledge using python, shell scripting
- Strong understanding and expertise in working with Linux and Windows OS
- Strong troubleshooting and debugging skills.
- Experience on SQL
- Experience on monitoring and observability stack ex: Zabbix, Katana, elastic search, Prometheus, Grafana
- Prior experience working on hardware devices like Tegra boards (ex: Jetson etc, will be good)
Tasks
- Manage machine configuration using configuration management ex: Puppet.
- Manage machine status/pool/OS using automation / UI.
- Prepare the targets to be ready for physical migration.
- Validate the targets post physical migration by submitting tests to individual targets.
- Debug test failures and identify if related to migration / bad target or ongoing failure on Tot.
- Enable the targets for production workload.
- Monitor the job success on targets with production workload.
- Collaborate and coordinate closely with platform / lab techs for migration.
- Maintain & update the status of migration and publish on a weekly basis.