Designed and implemented scalable data pipelinesusing Azure Data Factory and Azure Databricks (Python, PySpark), optimizing ETL processes to handle large datasets from diverse sources, improving data processing times by 30%.
Developed and automated end-to-end data workflowsleveraging Azure Data Factory, enabling seamless data ingestion, transformation, and storage in Azure Data Lake, ensuring 99.9% availability and reliability.
Built complex ETL solutions using PySpark in Azure Databricks, improving data aggregation and transformation performance, leading to a 40% increase in processing efficiency.
Led migration efforts for on-premises data pipelines to Azure cloud environment, utilizing Azure Data Factory and Databricks, reducing infrastructure costs by 25%.
Implemented CI/CD pipelines in Azure DevOps for continuous integration and deployment of Azure Databricks notebooks and data pipeline code, reducing deployment time by 50% and improving code quality with automated testing.
Designed and managed Kafka topics and partitions to ensure fault-tolerant and scalable data streams, improving the reliability and performance of data flows across distributed systems.
Developed real-time data streaming pipelines using Azure Event Hub to ingest large-scale event data from various sources, enabling real-time analytics and event-driven architectures.
Collaborated with cross-functional teams to design and maintain data lake architectures using Azure Data Lake and Databricks, ensuring data governance, security, and compliance across multiple environments.
Optimized data transformation logic using PySpark in Databricks to reduce data processing costs by leveraging Azure Auto Scaling capabilities, reducing compute costs by 20%.
Developed monitoring and alerting systems using Azure Monitor and Log Analytics for proactive identification of data pipeline failures, reducing downtime by 15% through faster incident resolution.
Managed and orchestrated data workflows using Azure Data Factory, ensuring smooth integration of multiple data sources such as SQL Server, Blob Storage, and REST APIs.
Performed data validation and quality checks using PySpark scripts in Azure Databricks, ensuring high accuracy and consistency in data processing and reporting pipelines.