Talent's Information
-
Location
Delhi, India
-
Rate
$14.0 per Hour
-
Experience
6.5 Year
-
Languages Known
English,Hindi
Available for
About Brijesh
6.5 years of Software Development/Bug fix experience in BigData (Hadoop, Hive, SQL, sqoop, Spark, Airflow and other Hadoop components), python
Experience in Hadoop Eco System and Google cloud services
Confident and poised in interacting with individuals at all levels coupled with excellent coordination and liaison skills with the Management of the Client to accomplish the project
Highly organized to work at various levels in a project.
Tech Stack Expertise
-
Big Data
BigQuery
0 Years -
Kotlin
Spark
3 Years -
Python
Python,Jupyter
4 Years
Work Experience
Python Developer
- January 2018 - June 2023 - 5 Year
- India
Projects
Gradeup
- February 2023 - June 2023 - 5 Months
-
A cloud based data analytics platform which is used for user engagement data process This analytics platform is used for user profile reporting
RESPONSIBILITIES:
Involved in architecture of data platform analytics platform
Planned and Designed the architecture data from android app and web sent data in Google Pubsub topic .
Develop subscriber using python which sink data from pubsub to Big query
write SQL script which process the data from raw to analytical layer
write the SQL query in redash for visualisation of data
Upload data on clevertap analytics tools for user notification track
Airflow is used for managed scripts job and used python for writing the dag
Teradata Migration To Bigquery
- October 2020 - January 2023 - 28 Months
C for storing the data and airflow is used as scheduling tool.
Prepared the design document as per the current architecture
Created new tables in bigquery as per our new architecture
Converted sas script into bigquery SQL scripts
Develop the airflow dag using python for executing the bigquery scripts Airflow is used as orchestration tool PDM migration to bigquery :
In this project we worked with PDM team for migration of existing Teradata pipeline to bigquery.
In this project we get daily information related to product . Daily information is updated related To product . some new sku added or deleted . we manage product related information in this
pipeline . as per new architecture we are using nexla for real time data processing from google pubsub and then we are storing
file in gcs bucket
Prepared the design document as per the current architecture
Created new ingress layer ,operational layer and analytical table
Converted the existing Teradata stored procedure SQL into bigquery SQL
We used python utility for creating the bigquery stored procedure
Build the ingress layer dag for loading data into raw layer from gcs bucket
Build the Operational layer dag for loading data from raw layer to operational layer
Build the Analytical layer dag for loading data from operational layer to analytical layer
Develop the airflow Dag using python for execution of complete pipeline from raw to analytical layer
We serve this analytical layer to our BI team for reporting MAO json feed :
This project involves loading JSON data from pubsub Topic to NEXLA and then Ingress to BIqQuery tables using cloud composer to orchestrate jobs.Currently EOM is being used as an OMS system which will provide the CO-export data to EDW through TiBCO queue and ABI. Now as our IT infra is migrating on cloud so BBBY decides to change the EOM with MAO (Manhattan Active Omni) a cloud based Order Management system
Created new tables for ingress and operational layer as per new architecture
Build the data pipeline for exporting the data from bigquery table to gcs bucket
Worked on Implementation of CCPA on existing ecom table which have PII data
Cxreport
- July 2019 - September 2020 - 15 Months
A cloud based data analytics platform which is used for user engagement data process This analytics platform is used for user profile reporting.This platform used by internal team. Revenue report also generated using this platform
RESPONSIBILITIES:
Involved in architecture of data platform analytics platform
Planned and Designed the architecture data from web sent data in azure event bus service.
Prepare data for cxreport from different data sources Hbase and Mysql
write hive script which process the data from raw to meaning full data
Process data visualise in redash visualisation tools
Airflow is used for schedule cxreports job
Replace oozie with airflow schedule All legacy job on airflow
Monitoring spark streaming job
Handle daily adhoc query related to data
Datalake For Healthcare Data
- May 2017 - June 2019 - 26 Months
-
A Real-Time Streaming solution to integrate data from different data Source and store on HDFS and then process that data for the machine learning model and reporting. The solution was designed to deliver the Doctor Recommendation and Chest X ray classification
RESPONSIBILITIES:
Designed the end-to-end architecture on Pivotal stack to handle the real-time streaming and batch processing.
Apache Kafka and Hadoop Ecosystem was used to collect the data in real time streams or scheduled batch loads.
Data from all sources was stored and processed in HDFS.
Real time/batch data analysis and analytics was handled by apache Kafka and spark streaming
zepplein is used for data visualisation and reporting
An in-house Android App was developed to deliver real-time alerts (notifications) to the mobile devices using Google Cloud Messaging.
Fraud Detection In WCPF
- January 2018 - July 2019 - 19 Months
-
A cloud based data analytics platform which is used for user engagement data process This analytics platform is used for user profile reporting.This platform used by internal team. Revenue report also generated using this platform
RESPONSIBILITIES:
Hadoop Ecosystem components Installation.
Nifi is used for raw data ingestion from mysql into hive tables
Creation of hive table and loading the data into hive tables
curate the data by using spark
Insurance Fraud Detection – Finding out fraud using K nearest neighbor algorithm and svm
Building Docker Images of big data tools for deploymet
worked on google cloud platform and also on google dataproc cluster ● hbase and hive integration for dataanalysis
spark and cassandra integration for data analysis