Drag

Shivam Shukla

PROFILE SUMMARY

Seasoned Data Architect with 10+ years of experience designing and optimizing large-scale Azure Data Lakehouse solutions using Databricks, Delta Lake, and Apache Spark. Expert in cloud infrastructure, DevOps, MLOps, and LLM-driven analytics platforms, with proven leadership in scalable data and AI/ML transformations.

PROFESSIONAL EXPERIENCE

Lead Software Engineer- Data & ML (Senior Manager)

FCS Software Solution | Jun  2022 - Present
  • Architecting and managing large-scale data processing pipelines on Azure Databricks, integrating with Azure Data Lake Gen2, Delta Lake, and Unity Catalog for secure, governed, and scalable data management.
  • Designing, developing, and optimizing Delta Live Tables (DLT) pipelines in Databricks using Python, PySpark, and SQL for real-time and batch data transformation across FinOps datasets.
  • Building reusable CDC-based DLT Python modules integrated with Terraform and GitOps workflows, reducing data pipeline creation time by over 50% and enabling standardized deployment across multiple environments.
  • Migrating legacy pipelines from Apache Hive and traditional Spark clusters to Databricks Unity Catalog, enabling better data governance, fine-grained access control, and Delta Sharing across business units and partners.
  • Implementing Databricks Mlflow to track model experiments, manage lifecycle stages, and streamline the deployment of machine learning models in production via Azure ML and AKS.
  • Porting ML workloads, including cost forecasting and anomaly detection models, from Azure Kubernetes to Databricks ML (Spark MLlib), reducing model training time by over 60% using distributed training on Databricks clusters.
  • Integrating data from Snowflake, AWS S3, and Salesforce into Databricks Delta Lake using DLT, Auto Loader, and Lake Flow Connect, enabling unified analytics across cloud and enterprise systems.

Senior Databricks Engineer

Publicis Media | Aug  2021 - Jun 2022
  • Led the development and implementation of a Customer Data Platform (CDP) for a leading CPG client by aggregating and integrating customer data from sources including Live Ramp, Gigya, and Kafka, resulting in enhanced customer segmentation and improved RFM (Recency, Frequency, Monetary) analysis.
  • Architected and optimized Change Data Capture (CDC) pipelines using Debezium and Databricks within an Azure Data Lake house environment, achieving a 30% reduction in data pipeline latency.
  • Designed and maintained scalable Databricks workflows for large-scale data processing, ensuring seamless orchestration of ETL tasks across batch and streaming data.
  • Managed and configured Databricks Workspaces, clusters, and job scheduling within Azure to support high-availability analytics workloads and efficient resource utilization.
  • Built a Marketing Mix Modeling (MMM) solution leveraging ML algorithms on Databricks, enabling attribution analysis across multiple digital advertising platforms and improving marketing reach by 10%.
  • Developed containerized microservices to support modular data transformation components, leveraging Azure Kubernetes Service (AKS) for orchestration and deployment.
  • Automated data workflows and pipeline monitoring scripts using Python to improve observability, data quality checks, and job recovery processes.
  • Implemented best practices for security, cost management, and scalability across Azure components supporting Databricks and supporting services such as Azure Data Lake Storage (ADLS), Azure Key Vault, and Azure Monitor.
  • Collaborated with data scientists and business stakeholders to ensure alignment between pipeline outputs and analytical models, supporting rapid iteration and model deployment.

Senior Data Engineer

Coforge Tech Ltd. | Jul  2019 - Aug 2021
  • Designed and implemented a scalable data and ML pipeline framework on Azure using Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Functions, Synapse Analytics, and Databricks AutoML for rapid model prototyping.
  • Migrated complex on-premise ETL and ML workflows to cloud-native architectures on Azure and Google Cloud Platform (GCP), reducing infrastructure overhead and improving operational efficiency.
  • Developed and managed Databricks Pipelines for large-scale batch and streaming data processing within an Azure Data Lakehouse architecture, enabling advanced analytics and ML workflows across high-volume datasets.
  • Provisioned and maintained Azure Databricks Workspaces, configured cluster policies, and ensured optimal resource utilization across collaborative data science and engineering teams.
  • Delivered multiple proof-of-concepts (POCs) integrating Logic Apps, Azure Machine Learning, and Databricks, demonstrating low-code orchestration with automated model deployment and monitoring.
  • Implemented Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) for microservices-based architecture, supporting containerized model inference services integrated into the pipeline.
  • Automated key pipeline components and infrastructure provisioning using Python scripts, improving reproducibility and reducing deployment time.
  • Managed version control and CI/CD workflows for Databricks Notebooks using Git integration and Azure DevOps, streamlining collaboration and release processes.
  • Enforced data security and governance practices using Unity Catalog, workspace access controls, and secure networking configurations within Azure.

Senior Associate

Nagarro | Dec  2018 - Jul 2019
  • Integrated legacy AS400 systems with Azure Data Lake, enabling seamless data ingestion for downstream analytics in the Azure ecosystem.
  • Designed and implemented Databricks Pipelines to process high-volume datasets within an Azure Data Lakehouse environment, optimizing data transformation workflows.
  • Built and deployed a Convolutional Neural Network (CNN) model to estimate car damage repair costs for insurance automation, leveraging Python and scalable compute clusters on Databricks.
  • Managed and configured Azure Databricks Workspaces, including cluster provisioning, notebook orchestration, user access control, and performance tuning.
  • Migrated legacy email systems to a cloud-native AWS solution using Route 53, AWS Lambda, SES, and SNS, enabling automated communications across microservices.
  • Administered Azure infrastructure components, including Azure Container Instances and Azure Kubernetes Service (AKS), for hosting containerized microservices.
  • Implemented CI/CD workflows for Databricks notebooks using Databricks Repos and integrated version control with Git.

Programmer

Fidelity International | Nov  2017 - Dec 2018
  • Supported the GFAS record-keeping platform, catering to mutual fund clients across Europe, with a focus on data integrity and compliance within an Azure-hosted architecture.
  • Contributed to GDPR compliance by developing solutions for “Right to be forgotten” and AML (Anti-Money Laundering) use cases, using Databricks for data tracing and erasure.
  • Managed and maintained Azure Databricks Workspaces, ensuring optimal cluster configurations and job orchestration for large-scale data operations.
  • Utilized Azure Container Instances and Kubernetes Services for microservices deployment and orchestration in a modular architecture.
  • Developed reusable Python scripts to automate data validation, transformation, and anonymization processes within the Databricks environment.

System Engineer

Tata Consultancy Services | Oct  2015 - Nov 2017
  • Built Marketing Mix Models (MMM) using Python, SQL, and Spark to measure and optimize the impact of marketing channels on sales performance across multiple campaigns.
  • Developed data pipelines in PySpark for ingesting, cleansing, and aggregating large volumes of marketing and sales data from various sources including CRM, ad platforms, and third-party datasets.
  • Designed and validated regression-based and time-series-based MMM frameworks to support ROI analysis and budget allocation.
  • Migrated legacy AS400 systems to modern Python-based applications, improving performance, maintainability, and integration with modern APIs.
  • Re-architected batch processing jobs from AS400 to Spark-based distributed processing workflows, significantly reducing runtime and increasing fault tolerance.
Shivam Shukla - 10+ Years experienced AI/ML

Shivam Shukla

Databricks DevOps Engineer (Azure)

Experience: 10 Years

TECHNICAL SKILLS

Python PySpark SQL Spark Microsoft Azure AWS Google Cloud Platform (GCP) Databricks Azure Data Factory Azure ML MS Fabric

EDUCATION

MBA/PGDM (NIIT University) - 2020

CERTIFICATIONS

  • Azure Data Engineer (DP-200, DP-201)
  • Data Scientist Associate (DP-100)
  • AI Engineer (AI-100)
  • Fundamentals: DP-900, AI-900, AZ-900