Hello, I am

PRAYAG VERMA

A software Engineer

Specialized in Data-Driven Solutions!

Contact Me Resume

ABOUT

Hello, I'm Prayag Verma.

A M.S IT graduate from the University of Texas at Dallas

I'm a software engineer specialized in data domain with over four+ years of experience, working with global companies like UnitedHealthCare, AT&T, and Macy's.

Currently, I'm actively looking for an opportunity in data- driven domains such as data engineering, data or solution architect, ETL/DWH developement, and ETL Testing.

EDUCATION

Academic Journey

2023 - 2025

University of Texas at Dallas

Information Technology and Management (ITM)

2015 - 2019

Anna University, Chennai

Computer Science and Engineering (CSE)

2013 - 2015

Makatpurh High School, Giridih

Senior Secondary School (+2)

RESUME

Download

SUMMARY

I'm an innovative, independent, and deadline-driven Data Engineer and Architect with over four years of hands-on experience designing, developing, and testing user-centered enterprise data warehouse solutions.

I've worked across diverse domains, including telecommunications, health insurance, and retail, bringing expertise in data engineering, solution architecture, etl processing, data pipeline streamlining, and data warehousing to every project.

[email protected]

EDUCATION

23 - 2025

University of Texas at Dallas

Information Technology and Management (ITM)

15 - 2019

Anna University, Chennai

Computer Science and Engineering (CSE)

15 - 2019

Makatpurh High School, Giridih

Senior Secondary School(+2)

CERTIFICATIONS

DOMAIN KNOWLEDGE

Data Engineering, Data Architect, Solution, Architect, Data Analyst, Data Science, Big Data, ETL Development, ETL Testing, Business Analyst, SDE

DOMAIN KNOWLEDGE

AWS Cloud Solution Architecture, Big Data, Business Data Warehousing, Advance Statistics for Data Science, Business Analytics with R, Database Foundation for Business Analytics, Predictive Analytics for Data Science, and Prescriptive Analytics.

PROFESSIONAL EXPERIENCE

UnitedHealthCare (Data engineer)

Designed and optimized scalable ETL pipelines using Azure Data Factory, Databricks, Python, and PySpark, processing 10+ TB of structured and unstructured data stored in HDFS and ADLS, improving data processing efficiency by 10%.
Migrated on-premises operational DB, flat-file datasets, and IPC workflows to Azure Synapse, Snowflake, and Delta Lake, leveraging Apache Airflow for orchestration, cutting data load time by 25%.
Automated data integration and governance workflows using Azure Data Factory, Kafka Streams, Azure Monitor, and CI/CD pipelines, ensuring 99.9% uptime and reducing manual interventions by 30%.
Designed and deployed real-time data streaming framework using Kafka and Azure Event Hubs, enabling live data ingestion and reducing data availability lag by 12% as Infosys team.

AT&T (Data engineer)

Engineered highly scalable data pipelines to process terabytes of structured and semi-structured data using Scala, PySpark, Snowflake, and Databricks, improving ETL efficiency by 30%.
Created and enhanced data models in Snowflake and Teradata, implementing partitioning, indexing, and caching strategies, reducing query retrieval time by 8%.
Developed dynamic data streaming strategy leveraging Kafka and Azure Event Hubs, enabling synchronous processing and assuring 99% data accuracy for analytics.
Led the migration of 150+ OLTP KornShell based legacy systems to the Azure ecosystem and Snowflake, utilizing Kafka for real-time event-driven architecture, ensuring zero data loss for slowly changing dimensions (SCD/CDC).
Implemented containerized microservices using Azure Kubernetes Service (AKS), streamlining CI/CD deployments, and reducing time to-market for data products as Amdocs consultant.

Macy’s (Data engineer)

Collaborated with client to streamline ETL solutions using Informatica PowerCenter, ensuring seamless integration of tabular and non-tabular data into Oracle E-DWH and Snowflake with 99% reliability as Briston infotech employee.
Analyzed business requirements and implemented data modeling techniques (star and snowflake schemas) to transition data from OLTP to Snowflake/Oracle (OLAP) systems, improving decision-making efficiency by 20%.
Fine-tuned query performance by implementing indexing, fact, and dimension tables, cutting query execution time by 7%, while automating ETL workflows using TIDAL scheduler to reduce manual intervention and ensure seamless data integration.

SKILLS

PROGRAMMING LANGUAGES

Python

NumPy

Pandas

75%

Shell Scripting

80%

ETL / ORCHESTRATION

ADF

Airflow

AWS Glue

PwerCenter

DBT

TIDAL

STREAMING

Fabric

Kafka

Flink

VISUALIZATION

Power BI

Tableau

CI/CD & VERSION CONTROL

Azure Pipeline

Jenkins

Git Action

GitLab

DATABASES

MYSQL

Oracle

PL/SQL

MS SQL Server

75%

Teradata

70%

Snow SQL

80%

MongoDB

No SQL

65%

CLOUD TECHNOLOGIES

Databricks

ADLS

Cosmos DB

Stream Analytics

Teradata

Redshift

DynamoDB

Athena

Quick Sight

BIG DATA TECHNOLOGIES

Hadoop

Hive

HDFS

Spark SQL

Scala

Scoop

Impala

MapReduce

HBase

CONCEPTS/METHODOLOGY

SDLC

STLC

Agile

Data Modeling

Data Warehousing

Data Architecting

PROJECTS

Car Hail Damage Repair

A comprehensive solution that streamlines vehicle damage assessment using AI technology. This innovative system allows users to upload images of damaged vehicles and receive instant analysis of damage severity and repair options through a portable device.

User-Friendly Web App with Flask

Smart Image Processing with OpenCV and Pillow

Machine Learning Analysis powered by TensorFlow

Efficient Data Processing with NumPy

Digital License Management

A comprehensive platform for generating and managing digital license keys with seamless API integration. This solution simplifies the software licensing process for developers and businesses, ensuring secure key distribution and validation.

Secure License Key Generation

API Integration for Third-party Applications

User Authentication and Authorization

Integrated Support System

ETLQC - Data Testing Platform

A robust web-based application designed to validate and test data across various sources including flat files, relational databases, and APIs. ETLQC ensures data accuracy and reliability throughout ETL/ELT processes, making it an essential tool for data engineers and quality assurance teams.

Multi-source Data Validation

Automated Testing Workflows

Comprehensive Reporting Dashboard

ETL/ELT Process Integration

Car Auction Data Analysis

A comprehensive analysis of car auction data encompassing both electric and non-electric vehicles from various manufacturers. This project leverages Python and Object-Oriented Programming principles to extract valuable insights from automotive market data.

Electric vs. Non-Electric Vehicle Analysis

Price Prediction Models

Automotive Market Trend Analysis

OOP-based Data Processing Framework

Microsoft Azure Projects

A collection of end-to-end ETL/Data Engineering solutions implemented using Microsoft Azure services. This repository showcases expertise in cloud-based data processing, from minor tasks to full-scale applications, demonstrating versatile skills in modern data architecture.

Cloud-native Data Solutions

Real-time Data Processing

Scalable ETL Architectures

End-to-End Data Pipeline Management

SQL From Zero to Hero

A comprehensive educational repository featuring a carefully curated series of SQL exercises designed to take users from beginner to advanced level. Each exercise includes detailed schemas, challenging questions, and thoroughly explained solutions to build strong SQL fundamentals.

Progressive Learning Path

Real-world Problem Scenarios

Detailed Explanations and Best Practices

Diverse Database Schema Examples

CERTIFICATION

Featured Certifications

Microsoft Certified: Azure Fundamentals (AZ-900)

Credential ID: 63D8B6AD03811B34
Certification number: 0B49A3-437Z65
Earned on: 11/19/2023

Verify Credential

Microsoft Certified: Azure Data Engineer Associate (DP-203)

Credential ID: 1653FF884A622C30
Certification number: 91A5N1-B3391B
Earned on: 11/24/2023
Expires on: 11/25/2025

Verify Credential

AWS Certified Solutions Architect – Associate (SAA-C03)

Cert number: 71646bd2ceec4d2ca527e0ce0225eed3
Issued on: 04/26/2024
Expires on: 04/26/2027

Verify Credential

Additional Certifications

BLOGS

Coming Soon

Stay tuned for insightful articles and tutorials on data engineering, cloud technologies, and more!

CONTACT

Let's Connect

Feel free to reach out for opportunities, collaborations, or just to say hello!

Email

[email protected]

Location

Seattle, USA

Website

www.prayagverma.com

Send A Message

First Name *

Last Name

Email *

Phone Number

Enquiry Purpose *

Message *

Human Verification *

REC9F2X

FAQ

Frequently Asked Questions

Here are some common questions about my background, skills, and how we can work together.

What are your core areas of expertise in data engineering?

My expertise centers around building scalable data pipelines, data warehousing solutions, and ETL/ELT processes. I specialize in cloud platforms like Azure and AWS, with strong skills in Python, SQL, Spark, and modern data tools like Databricks, ADF, and Airflow. I'm particularly strong in designing data architectures that balance performance, cost, and maintainability.

How do you approach data quality and governance in your projects?

I believe data quality is the foundation of any successful data initiative. My approach includes implementing robust validation rules, automated testing pipelines, and monitoring systems to catch issues early. For governance, I work to establish clear data ownership, lineage tracking, and documentation practices. I've developed custom data quality frameworks that integrate with ETL processes to ensure consistency and reliability throughout the data lifecycle.

Can you tell me about your experience with real-time data processing?

I've designed and implemented several real-time data processing systems using technologies like Apache Kafka, Azure Event Hubs, and Stream Analytics. One notable project involved creating a real-time customer analytics platform that processed millions of events per hour with sub-second latency. This system used a combination of stream processing for immediate insights and batch processing for historical analysis, providing business users with both real-time dashboards and comprehensive reporting capabilities.

How do you handle large-scale data migrations?

Large-scale data migrations require careful planning and execution. My approach involves thorough source system analysis, detailed mapping documentation, and creating a robust testing strategy before any migration begins. I typically implement the migration in phases, starting with a proof of concept followed by incremental migrations when possible. Throughout the process, I use automated validation to verify data integrity and completeness. I also design fallback mechanisms and maintain parallel systems during the transition period to minimize business disruption.

How can I contribute to your blog section?

I welcome guest contributions on topics related to data engineering, cloud technologies, analytics, or software development! To contribute, simply reach out through the contact form with the subject "Blog Contribution" and include a brief outline of your proposed topic. The ideal length is 1000-2000 words, and I encourage practical, hands-on content that provides value to readers. You'll receive full attribution for your work, and it's a great way to share your knowledge with the community while gaining exposure for your expertise.

What topics are suitable for blog contributions?

The blog welcomes a wide range of technical topics, including but not limited to:

Data engineering best practices and patterns
Cloud architecture and implementation guides
ETL/ELT techniques and tools
Performance optimization for data pipelines
Big data technologies and frameworks
Data modeling and warehouse design
Data quality and testing strategies
Python, SQL, or Spark tutorials
Real-world case studies and problem solving
Emerging technologies in the data space

The most valuable contributions share practical insights, provide code examples when relevant, and offer actionable takeaways for readers.

What technologies do you enjoy working with most?

I particularly enjoy working with Azure Databricks, Python, and modern data pipeline orchestration tools like Apache Airflow. I find Databricks especially powerful for its unified analytics platform that combines the best of data engineering and data science capabilities. On the AWS side, I'm excited about the capabilities of services like Glue, Redshift, and Step Functions for building serverless data workflows. I'm also increasingly interested in the intersection of data engineering with MLOps, and how we can build better pipelines to support model training and deployment.

Are you available for freelance or consulting work?

Yes, I'm selectively available for freelance consulting on data engineering projects, particularly those involving complex data architectures, performance optimization, or cloud migrations. I can provide services ranging from architecture review and technical guidance to hands-on implementation and team mentoring. If you have a project in mind, please reach out through the contact form with details about your needs and timeline, and we can discuss how I might be able to help.