User

Data Engineer

BMT Score
93
93%
  • Remote
  • Onsite
  • Hybrid

Available for

About Sai S

  • Data Engineering professional with solid foundational skills and proven tracks of implementation in a variety of data platforms and to contribute my skills in collecting and in analyzing my data  in an established health care or health care  related organizations. Self-motivated with a strong adherence to personal accountability in both individual and team scenarios
  • Over 8+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data. And extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and BeautifulSoup.
  • Ability to collaborate with peers in both, business, and technical areas, to deliver optimal business process solutions, in line with corporate priorities.
  • Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK’s 
  • Experience in setting up monitoring infrastructure for Hadoop cluster using Nagios and Ganglia. 
  • Sustaining the Big Query, PySpark and Hive code by fixing the bugs and providing the enhancements required by the Business User. 
  • Working with AWS/GCP cloud using in GCP Cloud storage, Data-Proc, Data Flow, Big- Query, EMR, S3, Glacier and EC2 Instance with EMR cluster. 
  • Hands on strong experience in data modeling, design, and architecture on AWS and Azure; designing cloud-native data lakes and distributed data platforms, expertise in various scripting languages R, Python, JavaScript.
  • Expertise in Python (2.x/3.x) programming with multiple packages including NumPy, Pandas, boto, boto3
  • Sound experience with terraform Infrastructure/Configuration-as-code tooling.
  • Expertise in Dockerization of application and configuring container orchestration. I.e. Kubernetes, Mesos, AWS ECS, Helm package manager and CI/CD
  • Proficiency with Python-Django, Flask, Boto
  • Experience with hybrid cloud deployments, and on-premises-to-cloud migration deployments and roadmaps.
  • Sound working knowledge of advanced Python modules such as concurrent futures to implement asynchronous e0x1ecution or multithreading for parallel processing in Python.
  • Very good high exposure in building data lakes infra and set up ETL pipelines in AWS/Azure cloud infra.
  • Good working knowledge of classes for centralizing all the frequently used methods, which are invoked in Lambda handlers to create and maintain custom libraries in Python
  • Adept in analyzing large datasets using Apache Spark, Spark ML and Amazon Web Services (AWS).
  • Proficient in working with SQL/non-SQL database, AWS EC2, S3, EMR, Lambdas, Sage Maker, Docker.
  • Experience in Data Governance consulting and implementation of Data Quality frameworks, Data Catalog Metadata management, and Data Lineage.
  • 3+ years of work experience with agile development, SCRUM methodology
  • Excellent communication skills, self-motivated with a high degree of attention, can work independently.
  • Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files. 
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.  
  • Skilled in performing data parsing, data ingestion, data manipulation, data architecture, data modelling and data preparation w

Our Suggestions