New Delhi, India
$12.0 per Hour
Big Data professional with overall 7+ years of experience in design, development and deployment of big data applications.
Extensively worked in developing Spark applications using Spark Core, Spark-SQL, and Spark Structured Streaming API's.
Worked on Hadoop eco-system components like Map Reduce, HDFS, Hive, HBase, Sqoop, Oozie, Kafka, Spark, Impala, Hue etc.
Extensive experience in AWS cloud services like S3, RDS, CloudWatch, Glue, EMR, EC2 and Athena.
Experience working with centralized version control system – Git
Experience working with agile methodology.
Tech Stack Expertise
Apache Airflow,Apache NIFI4 Years
Java,Core Java2 Years
- January 2016 - February 2023 - 7 Year
- February 2021 - February 2023 - 25 Months
Designed 3-layered architecture of DataLake for differet Real-Money Gaming applications.
Developed Delta Lake solution using integration of Databricks and AWS.
Migrated data to and from different data sources like Clevertap, Google Analytics4, Branch etc.
Setup Apache Airflow infrastructure on AWS environment using EC2 instance and Docker.
Developed framework to ingest structured historical and incremental data on S3 from RDS using custom Airflow operators and Apache Nifi.
Built Spark application to manage upserts from relational database into the Parquet backed tables in AWS Catalog.
Project Description: Develop and support end-to-end datapipelines to process production data and create summarized tables which are used further for audit, data visualization and analysis across multiple teams.Migrating some of the data pipelines from AWS to GCP.Developed spark jobs to process data using Dataproc.Developed pipeline to collect clickstream data using Kinesis Streams and Firehose.Responsible for integration testing of datasets with Tableau using Athena connector.
- April 2019 - December 2020 - 21 Months
Working as a part of the development team to onboard different datasets into Enterprise Data Hub for risk management.
Developed metadata-driven validation and ETL framework based on PySpark.
Migrated complex ETL pipelines from Alteryx to Spark jobs.
Responsible for applying row level security to the data exposed to the end users using Impala views.
Project Description: Migrated data from Sybase DB and MySQL to hive backed by S3 using Sqoop. Developed logic in Spark to load the incremental data from upstream to Hive table.involved in building scripts for automated deployment of applications into the PROD environment using Gradle, Bamboo and organization legacy tools. Working as a part of DevOps team to monitor and troubleshoot scheduled jobs in production.
- August 2018 - April 2019 - 9 Months
Worked as an agile team member to develop, test and deploy the product for IP Log Management System.
Involved in building datapipelines to ingest and transform data using spark and loading the output in multiple sources like HDFS, Hive and HBase.
Developed components using Spark Structured Streaming API to form dataframe from real time streaming data on Kafka and performed aggregations on the same.
Schedule the spark jobs in the cluster using oozie.
Project Description: Developed “Integrator” component as part of framework to call REST APIs on Spark dataframes.Developed JUnit of various components in Scala.Contributed in development of Java modules to read JSON configuration and ingest data in HDFS.Involved in installation of HDP 3.0 on the development cluster.Performed functional testing of various business scenarios of the product.
- January 2016 - August 2018 - 32 Months
Ingested data into Hive from Teradata using Sqoop.
Queried data over hive tables using concepts like join, group and creating views for ad hoc analysis.
Migrated data from RDBMS using Sqoop.
Orchestrated and scheduled periodic jobs using Oozie.
Project Description: it Involved in various POCs on big data stack.Understanding business rules and their prototype to automate the formation of SQL queries.Develop JUnit for unit testing of modules.Testing and validating individual modules and integration of different modules using VBA based Macros.Automate the process of triggering SQL queries and getting results using JDBC.
Bachelor Of Technology in B.TDelhi University
- June 2011 - June 2015