Data Engineer

BMT Score
  • Remote

Available for

About Anudeep K

I'm Senior Big Data Consultant with 7 years of experience in multiple business domains using latest technologies and platforms. Focus is on Cloud, Big Data, Machine Learning, Data Science and Data Mining. A skilled developer, architect with strong problem solving, debugging and analytical capabilities, who creates the technical vision and actively engages in understanding customer requirements. I focus particularly in software performance and efficiency. Result oriented and hands on, who skill fully balances between meeting resource and time constraints, while doing it right.

Work Experience


Data Engineer

  • January 2015 - December 2022 - 8 Year
  • India



Big Basket Data Systems Recommendation Engine

  • January 2016 - January 2017 - 13 Months
Role & Responsibility
    Big basket is India’s largest online super market with a focus on delivering definitive actionable insights that help enhance customer experience. Big basket has a smart basket where it recommends products to the customers based on popularity and previous transaction history of the customers to build this recommendation engine.
    • Developed scalable recommendation engine, which generates recommendations using 1 year transaction data in 58 minutes and process 4-6 TB of data. Leveraged Amazon AWS, Amazon Redshift, Spark technologies with Machine learning capabilities.
    • Developed SQL stored procedure that generates the SOQ for the distributor
    • The batch process will implement each of the product level run through parameterisation.
    • The batch process will be designed using the same ETL framework that is currently used in production. The same standards of logging and notifications will be used as implemented in the ETL framework.
    • The following technologies are used- Amazon S3, Amazon EMR with Spark, Amazon Redshift and Shell-Scripting.

    NFR Scope Performance – batch Processing of Collaborative Filter Algorithm processing up to 1 hour (excluding EMR startup and termination time) based on the following criteria of run Parent level at 4.2 lakh customers with cosine score > 0.2 with 45 node cluster with 1 year data. Product Group level for
    4.17 lakh customers with cosine score > 0.2 with 15 node cluster 1 year data Minimum orders >4 during the 1 year period.
    Data size post application of cosine filter is upto 250 GB
    This performance is an upper limit and will be lesser based on the run level parameters on which it is executed. These performance numbers are based on the benchmarking done during the earlier PoC phase on the above mentioned criteria Any increase in time period / decrease in cosine score/minimum order criteria may require a recalibration in performance to maintain performance threshold levels.
...see less

NGCMH (Next Generation Central Media Hub)

  • January 2018 - February 2019 - 14 Months
Role & Responsibility
    Obsessory is a technology company that provides a web and mobile platform to assist shoppers in discovery, search, comparison, and tracking of items across the Internet. Obsessory’s powerful search engine catalogs millions of products from online stores on a daily basis and uses proprietary algorithms to enhance the depth and breadth of the user’s search. Obsessory employs adaptive and social learning to continuously refine the search results and present the user with the most relevant selection of items. Furthermore, Obsessory helps users keep track of desired items across the Internet and get notified of price changes, and availability of tracked items as well as sales, store events and promotions.

    • Pre-Preciessing:
    • Crawling of 100+ sites Data using Nutch
    • Fashion based ontology maintenance
    • Designed schema and modeling of data
    • Written algorithm to store all validated data in Cassandra using Spring Data Cassandra REST Programs for Validation/Normalizing/Enriching and REST API to Develop UI
    • Based on manual QA Validation. Used SparkSQL, Scala to running QA based SQL queries.
    • Indexing MR Programs on top of HBase
    • To standardize the Input Merchants data
    • To upload images to Rackspace CDN
    • To MR programs on HBase to extract the color information from
    Images including density:
    To MR programs on HBase to persist the data on HBase tables above MR jobs will run based on timing and bucketing.
        Using Image color and density data Users are allowed to select 1,2.. colors with different densities and result will be a list of products where each product image contains all give colors with exact density this has been implemented on top HBase using Spring REST web service for color Obsessed search API
    Setting up the Spark Streaming and Kafka Cluster
    Developed a Spark Streaming Kafka App to Process Hadoop Jobs Logs
    Kafka Producer to send all slaves logs to Spark Streaming App
        Spark Streaming App to Process the Logs with given rules and produce the Bad Images, Bad records, Missed Records etc.
    Spark Streaming App collect user actions data from front end
    Kafka Producer based Rest API to collect user events and send to Spark Streaming App
    Hive Queries to Generate Stock Alerts, Price Alerts, Popular Products Alerts,
    New Arrivals for each user based on given likes, favourite, shares count information
    Worked on SparkML library for Recommendations.

...see less


  • January 2019 - January 2020 - 13 Months
Role & Responsibility
    • Developed a custom connector for Confluent Kafka which pushes the JSON messages from Kafka Broker to HDFS and that is one of my highlight of my work for which even received appreciation from Confluent Kaka Platform
    • Work on Hadoop Cluster with current size of 16 Nodes and 896 Terabytes capacity.
    • Write Map Reduce Jobs, HIVEQL, Pig, Spark.
    • Import data using Sqoop into Hive and Hbase from existing SQL Server. Support code/design analysis, strategy development and project planning.
    • Create reports for the BI team using Sqoop to export data into HDFS and Hive.
    • Develop multiple MapReduce jobs in Java for data cleaning and preprocessing.
    • Involve in Requirement Analysis, Design, and Development.
    • Export and Import data into HDFS, HBase and Hive using Sqoop.
    • Involve in create Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
    • Work closely with the business and analytics team in gathering the system requirements.
    • Load and transform large sets of structured and semi structured data.
    • Load data into HBase tables using Java MapReduce.
    • Load data into Hive partitioned tables
...see less

Marriott UGI

  • January 2020 - January 2021 - 13 Months
Role & Responsibility
     UNIX Vendors promotes their Item(s) thru demand Tech which comes as a coupon. One is Copient based on targeted customers who are enrolled as a customer, second one is that comes from the new paper promotions. During the sales the promotion will be applied and the discount will be given to the customer. After sales the discount has to be reimbursed and has to be maintained in the system for minimum of 12 months for audit purpose. The volume of the data was huge and takes longer time to send feeds to AFS, Hadoop was introduced to Bill the sales data which got promotions flagged against the demand Tech promotional feed. Provide design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
    • Co-coordinate between the Business and the Off-shore team.
    • Requirement gathering and prepare the Design.
    • Export and Import data into HDFS, HBase and Hive using Sqoop.
    • Involve in creating Hive tables, loading with data and writing Hive queries.
    • Bulk load HBase using Pig
...see less

Industry Expertise



Electronics And Communication Engineering in B.E

Karnataka University
  • June 2011 - June 2014

Our Suggestions