Summary
Overview
Work History
Education
Skills
Certification
Hobbies and Interests
Languages
Timeline
Hi, I’m

Ram Kumar

Senior Data Engineer
Almere
Ram Kumar

Summary

Experienced Data Engineer with over 6 years of expertise in designing, developing, and implementing data solutions using Databricks. Possesses 18+ years of overall IT experience, with a strong background in data engineering, data quality, and cloud infrastructure. Adept at leveraging Databricks, Azure Data Factory, and Azure DevOps to build scalable, reliable, and efficient data pipelines. Proven track record in driving cost efficiencies and optimizing performance within the financial services industry.

Overview

18
years of professional experience
4
Certification

Work History

Cognizant Technology Solutions (Swedbank)

Senior Data Engineer
05.2022 - Current

Job overview

Designed and implemented scalable data pipelinesusing Azure Data Factory and Azure Databricks (Python, PySpark), optimizing ETL processes to handle large datasets from diverse sources, improving data processing times by 30%.

Developed and automated end-to-end data workflowsleveraging Azure Data Factory, enabling seamless data ingestion, transformation, and storage in Azure Data Lake, ensuring 99.9% availability and reliability.

Built complex ETL solutions using PySpark in Azure Databricks, improving data aggregation and transformation performance, leading to a 40% increase in processing efficiency.

Led migration efforts for on-premises data pipelines to Azure cloud environment, utilizing Azure Data Factory and Databricks, reducing infrastructure costs by 25%.

Implemented CI/CD pipelines in Azure DevOps for continuous integration and deployment of Azure Databricks notebooks and data pipeline code, reducing deployment time by 50% and improving code quality with automated testing.

Designed and managed Kafka topics and partitions to ensure fault-tolerant and scalable data streams, improving the reliability and performance of data flows across distributed systems.

Developed real-time data streaming pipelines using Azure Event Hub to ingest large-scale event data from various sources, enabling real-time analytics and event-driven architectures.

Collaborated with cross-functional teams to design and maintain data lake architectures using Azure Data Lake and Databricks, ensuring data governance, security, and compliance across multiple environments.

Optimized data transformation logic using PySpark in Databricks to reduce data processing costs by leveraging Azure Auto Scaling capabilities, reducing compute costs by 20%.

Developed monitoring and alerting systems using Azure Monitor and Log Analytics for proactive identification of data pipeline failures, reducing downtime by 15% through faster incident resolution.

Managed and orchestrated data workflows using Azure Data Factory, ensuring smooth integration of multiple data sources such as SQL Server, Blob Storage, and REST APIs.

Performed data validation and quality checks using PySpark scripts in Azure Databricks, ensuring high accuracy and consistency in data processing and reporting pipelines.

Cognizant Technology Solutions (Rabobank)

Sr Data Engineer
02.2021 - 05.2022

Job overview

Developed and implemented a robust Data Quality frameworkto ensure data integrity, consistency, and accuracy across various data pipelines, automating validation processes for detecting anomalies and improving data reliability.

Designed and built data pipelines in Azure Data Factory (ADF) for efficient ETL/ELT processes, orchestrating workflows to ingest, transform, and load large datasets from multiple sources into cloud-based storage and analytics platforms.

Developed and Automated CI/CD pipelines using Azure DevOps, streamlining the deployment of data pipelines, reducing manual interventions and enhancing the speed and consistency of deployments.

Ensured continuous data delivery by integrating Azure Data Factory with other services such as Databricks, Azure SQL, and Synapse Analytics, optimizing performance and reducing data processing times.

Collaborated with cross-functional teams to integrate data from diverse sources, improving data accessibility and scalability for downstream analytics and reporting.

Tata Consultancy Services(ABN AMRO)

Sr Data Engineer
10.2020 - 01.2021

Job overview

Designed and implemented scalable data pipelinesusing Azure Data Factory and Azure Databricks (Python, PySpark), optimizing ETL processes to handle large datasets from diverse sources.

Built complex ETL solutions using PySpark in Azure Databricks, improving data aggregation and transformation performance, leading to a 40% increase in processing efficiency.

Migration efforts for on-premises data pipelines to Azure cloud environment, utilizing Azure Data Factory and Databricks, reducing infrastructure costs by 25%.

Implemented CI/CD pipelines in Azure DevOps for continuous integration and deployment of Azure Databricks notebooks and data pipeline code, reducing deployment time by 50% and improving code quality with automated testing.

Developed real-time data streaming pipelines using Azure Event Hub to ingest large-scale event data from various sources, enabling real-time analytics and event-driven architectures.

Designed and managed Kafka topics and partitions to ensure fault-tolerant and scalable data streams, improving the reliability and performance of data flows across distributed systems.
Integrated Kafka with data processing frameworks such as Apache Spark and Databricks to enable real-time analytics and event-driven architectures.

Collaborated with cross-functional teams to design and maintain data lake architectures using Azure Data Lake and Databricks, ensuring data governance, security, and compliance across multiple environments.

Optimized data transformation logic using PySpark in Databricks to reduce data processing costs by leveraging Azure Auto Scaling capabilities, reducing compute costs by 20%.

Developed monitoring and alerting systems using Azure Monitor and Log Analytics for proactive identification of data pipeline failures, reducing downtime by 15% through faster incident resolution.

Managed and orchestrated data workflows using Azure Data Factory, ensuring smooth integration of multiple data sources such as SQL Server, Blob Storage, and REST APIs.

Performed data validation and quality checks using PySpark scripts in Azure Databricks, ensuring high accuracy and consistency in data processing and reporting pipelines.

Tata Consultancy Services (Lloyds Bank)

Sr Data Engineer
10.2017 - 10.2020

Job overview

Designed and implemented scalable data pipelinesusing Azure Data Factory and Azure Databricks (Python, PySpark), optimizing ETL processes to handle large datasets from diverse sources, improving data processing times by 30%.

Developed and automated end-to-end data workflowsleveraging Azure Data Factory, enabling seamless data ingestion, transformation, and storage in Azure Data Lake, ensuring 99.9% availability and reliability.

Built complex ETL solutions using PySpark in Azure Databricks, improving data aggregation and transformation performance, leading to a 40% increase in processing efficiency.

Migration efforts for on-premises data pipelines to Azure cloud environment, utilizing Azure Data Factory and Databricks, reducing infrastructure costs by 25%.

Implemented CI/CD pipelines in Azure DevOps for continuous integration and deployment of Azure Databricks notebooks and data pipeline code, reducing deployment time by 50% and improving code quality with automated testing.

Collaborated with cross-functional teams to design and maintain data lake architectures using Azure Data Lake and Databricks, ensuring data governance, security, and compliance across multiple environments.

Optimized data transformation logic using PySpark in Databricks to reduce data processing costs by leveraging Azure Auto Scaling capabilities, reducing compute costs by 20%.

Developed monitoring and alerting systems using Azure Monitor and Log Analytics for proactive identification of data pipeline failures, reducing downtime by 15% through faster incident resolution.

Managed and orchestrated data workflows using Azure Data Factory, ensuring smooth integration of multiple data sources such as SQL Server, Blob Storage, and REST APIs.

Performed data validation and quality checks using PySpark scripts in Azure Databricks, ensuring high accuracy and consistency in data processing and reporting pipelines.

Tata Consultancy Services (Virgin Media)

Data Engineer
07.2014 - 10.2017

Job overview

Developed, maintained and automated and scalable ETL/ELT data pipelines

Sourced, processed, validated, transformed, aggregated, and distributed data from 10+ sources

Optimized and automated non-performant Database queries, complex processes, manual activities, and pipelines in 2019

Ensured timely access to data by 100% of applications

Developed monitoring and alerting capabilities to ensure 100% of data pipelines were working.

Barclays Technology Center India Ltd

System Analyst
07.2010 - 07.2014

Job overview

Developed and maintained ETL pipelines using Ab Initio, designing and implementing complex data integration processes to extract, transform, and load data from various sources, ensuring data accuracy and consistency.

Utilized Tivoli Workload Scheduler (TWS) to manage and automate job scheduling, optimizing workload management and ensuring timely execution of batch processes and data workflows.

Automated manual tasks through Unix scripting, creating shell scripts to streamline data processing, system monitoring, and routine maintenance tasks, resulting in increased operational efficiency and reduced manual errors.

Collaborated with stakeholders to gather requirements and translate them into technical specifications, ensuring alignment between business needs and technical solutions.

Implemented performance tuning and optimization strategies for ETL processes and scheduling jobs, improving system performance and reducing processing times.

Tech Mahindra

Sr Software Engineer
08.2009 - 07.2010

Job overview

Developed and maintained ETL pipelines using Ab Initio, designing and implementing complex data integration processes to extract, transform, and load data from various sources, ensuring data accuracy and consistency.

Utilized Tivoli Workload Scheduler (TWS) to manage and automate job scheduling, optimizing workload management and ensuring timely execution of batch processes and data workflows.

Automated manual tasks through Unix scripting, creating shell scripts to streamline data processing, system monitoring, and routine maintenance tasks, resulting in increased operational efficiency and reduced manual errors.

Collaborated with stakeholders to gather requirements and translate them into technical specifications, ensuring alignment between business needs and technical solutions.

Implemented performance tuning and optimization strategies for ETL processes and scheduling jobs, improving system performance and reducing processing times.

Mphasis Ltd

System Analyst
11.2006 - 02.2008

Job overview

Developed and maintained Datat pipelines using Teradata and Unix scrips, designing and implementing complex data integration processes to extract, transform, and load data from various sources, ensuring data accuracy and consistency.

Utilized Tivoli Workload Scheduler (TWS) to manage and automate job scheduling, optimizing workload management and ensuring timely execution of batch processes and data workflows.

Collaborated with stakeholders to gather requirements and translate them into technical specifications, ensuring alignment between business needs and technical solutions.

Implemented performance tuning and optimization strategies for ETL processes and scheduling jobs, improving system performance and reducing processing times.

Education

Veer Bahadur Singh Purvanchal University, U.P, India

Bachelor of technology in Information Technology
07.2003

University Overview

Skills

Cloud Platforms: - Azure DataBricks: Expert in designing and developing scalable data pipelines using Databricks and Spark on Azure for real-time and batch data processing

Azure Data Factory: Developed in building and orchestrating complex ETL workflows for data integration, migration, and transformation across multiple data sources

Azure Synapse

Programming & Data Processing:

Python: Skilled in writing efficient data transformation, manipulation, and automation scripts

PySpark: Experienced in large-scale distributed data processing with PySpark

SQL: Expertise in writing complex queries for data retrieval, aggregation, and analysis

Apache Spark: Deep understanding of Spark for high-performance data processing

Integrated Kafka with data processing frameworks

Developed real-time data streaming pipelines using Azure Event Hub

Scripting: Unix, Powershell, Powershell

DevOps & CI/CD:

Azure DevOps: Hands on experience in developing CI/CD pipelines for automating data pipeline deployment and testing, using Git, Azure Pipelines, and version control strategies

Data Warehousing & Analytics:

- Azure Synapse Analytics: Experience in building data warehouses and performing data integration with Azure Synapse

- Power BI: Basic knowledge in creating data visualizations for end-users to support business insights and decision-making

Version Control & Collaboration:

- Git: Proficient in using Git for version control, code review, and collaboration across teams

Certification

Databricks Certified: DataBricks Solution Architect (Certificate of Attendance)
Issued: [Aug, 2024]

  • Attended a comprehensive course covering advanced architecture strategies and best practices for building data solutions on the Databricks platform.

Databricks Certified: Data Engineer Associate
Issued: [Jan, 2024]

  • Proficient in building and managing data pipelines using Databricks and Apache Spark, with hands-on experience in data transformations, performance optimization, and pipeline orchestration on Azure.

Microsoft Certified: Azure Fundamentals

Issued: [June, 2021]

  • Verified knowledge of cloud services, Azure workloads, security, privacy, pricing, and support in Azure environments.

Certified Big Data Hadoop Professional
Issued: [Feb, 2013]

  • Gained expertise in Hadoop ecosystem components, including HDFS, MapReduce, Hive, Pig, and HBase for large-scale data storage and processing.

Hobbies and Interests

Hobbies and Interests
  • Cricket
  • Volleyballs
  • Travelling

Languages

English
Proficient
C2
Dutch
Elementary (A2)

Timeline

Senior Data Engineer
Cognizant Technology Solutions (Swedbank)
05.2022 - Current
Sr Data Engineer
Cognizant Technology Solutions (Rabobank)
02.2021 - 05.2022
Sr Data Engineer
Tata Consultancy Services(ABN AMRO)
10.2020 - 01.2021
Sr Data Engineer
Tata Consultancy Services (Lloyds Bank)
10.2017 - 10.2020
Data Engineer
Tata Consultancy Services (Virgin Media)
07.2014 - 10.2017
System Analyst
Barclays Technology Center India Ltd
07.2010 - 07.2014
Sr Software Engineer
Tech Mahindra
08.2009 - 07.2010
System Analyst
Mphasis Ltd
11.2006 - 02.2008
Veer Bahadur Singh Purvanchal University, U.P, India
Bachelor of technology in Information Technology
Ram KumarSenior Data Engineer