Talent's Information
-
Location
Pune, India
-
Rate
$11.0 per Hour
-
Experience
5 Year
-
Languages Known
English,Hindi
Available for
About Varsha
5 years of Extensive Experience in different domains in IT industry including Hadoop Ecosystem,Spark-PythonAPI’ CloudServices(AWS),NoSQL Database such as Hbase
Hands on experience with Big Data core components and Eco System including Data Ingestion and Data Processing (PySpark, Hive, SQOOP, HDFS and MapReduce.)
In depth understanding of Spark Architecture including SparkCore, Spark-SQL, DataFrames.
Have a knowledge on Amazon web services mainly S3, EC2, Aws EMR, IAM Role, ,RDS.
Uploaded and Processed terabytes of data from various structured and unstructured sources into HDFS using Sqoop, Spark.
Capable of processing large sets of structured, semi-structured data.
Very Good understanding of Partitioning, bucketing concepts in Hive and designed both Internal and External tables in Hive to optimize performance.
Solved performance issues in Hive scripts with understanding of Joins, Group and Aggregation.
Involving in Importing/Exporting data to/from RDBMS using Sqoop.
Configuring various XMLs file such as Core-site.xml, Hdfs-site.xml, Mapredsite.xml, Yarn- site.xml.
Experience in basic Linux Scripting. .
Knowledge on Airflow.
Good knowledge on reporting tools like power BI.
Strong problem solving & technical skills coupled with clear decision-making.
Hands-on Experience in mainframes and Hadoop eco-system.
Good exposures in Requirement Analysis, Customization and user acceptance.
Tech Stack Expertise
-
Python
Python
2 Years -
AWS
AWS EC2,AWS,AWS RDS
8 Years
Work Experience

Data Engineer
- January 2018 - August 2023 - 5 Year
- India
Projects

Financial Intelligence Unit
- January 2021 - September 2023 - 33 Months
-
Financial Intelligence Unit deal with Banking domain.
RESPONSIBILITIES:
Understand the upstream source nature and work with data storage to handle historical, reference, and delta data using PySpark.
Data migration from various sources such as S3 to Hbase, Hbase to Hive, Hive to S3 using Pyspark.
Shared responsibility for administration of Hadoop, Hive.
Copy All dependencies on EMR.
Create Hbase table using phoenix Script. Written Spark job to Process data.
Import Ref data from RDBMS to Hbase.
Create Hive tables deployment to transformed data using hive DDL table.
Working experience in handling Partitions, bucketing to optimize performance concepts in Hive and designing both Managed and External tables in Hive
Responsible for doing different transformation on reference and delta data like data cleaning, deduplication, Join, etc. using Pyspark.
To Process delta, use script Copy data into Hbase at data location.
Customer can extract report from S3 and can use for their further use like visualization etc. And extract data
To analyze and solve problems and put hot fixes on production.

Product Analysis And Marketing[PAAM]
- July 2018 - December 2021 - 42 Months
-
Product Analysis and Marketing[PAAM] is Data Hub where analysis for data is done on AWS glue platform.
RESPONSIBILITIES:
We used AWS Glue as cloud based ETL and Spark Glue job to process data.
We used AWS S3 for Upstream and Downstream Data storage.
Responsible for doing different transformation on big data like data cleaning, deduplication, Join etc.
Set up different Glue jobs to dumps data into RAW, Cleansing and Process bucket. Thereafter created 5 different sub folders inside each 3 main buckets.
Set up different scenario to write data using Pyspark Glue job if job run successful, failed, fetching errors, back up and processed.
Monitoring the job using CloudWatch if any problem occurs it will helpful for trigger events.
At the end uploading the output data to S3 Process Bucket which is final processed data ready for visualization.
Analytical thinking about handling long running jobs to minimize time required for completion using optimization techniques.
Hands-on experience with Amazon services like Glue, EC2, Amazon S3, Amazon RDS
AWS Redshift, EMR etc.
Soft Skills
Industry Expertise
Education

in MCA
Pune University- June 2009 - June 2012