Being a data scientist, I have done following tasks during Data Engineering.

  • Install, configure and maintain enterprise Hadoop environment
  • Loading data from different datasets and deciding on which file format is efficient for a task. Hadoop developers source large volumes of data from diverse data platforms into Hadoop platform
  • Understanding the requirements of input to output transformations
  • Hadoop developers spend lot of time in cleaning data as per business requirements using Streaming API’s or user defined functions
  • Defining Hadoop Job Flows
  • Build distributed, reliable and scalable data pipelines to ingest and process data in real-time. Hadoop developer deals with fetching impression streams, transaction behaviours, clickstream data and other unstructured data
  • Managing Hadoop jobs using scheduler
  • Reviewing and managing hadoop log files
  • Design and implement column family schemas of Hive and HBase within HDFS
  • Assign schemas and create Hive tables
  • Managing and deploying HBase clusters
  • Develop efficient pig and hive scripts with joins on datasets using various techniques
  • Assess the quality of datasets for a hadoop data lake
  • Apply different HDFS formats and structure like Parquet, Avro, etc. to speed up analytics
  • Build new hadoop clusters
  • Maintain the privacy and security of hadoop clusters
  • Fine tune hadoop applications for high performance and throughput
  • Troubleshoot and debug any hadoop ecosystem run time issues

My Major Skills as a Data Engineer

Data Source

ETL Process

Optimization Algorithms

Evaluation Matrix

Frameworks

Tools

Languages

Databases

IDEs

Cloud Platforms

Version Control System

Some Major Projects Worked On

  • Branding and promotional offerings for MultiSpeciality Hospital Group
  • Optimizing multilocation, multispecialty Hospital Group performance through actionable insights
  • Analysis and Optimization of Retail Operations and Supply Chain Management
  • Vehicle Monitoring & Real-time Alerting System
  • Network Operation Center(NOC) Automation
  • Online Store Hosting Service
  • Hotel Management System
    HRMS Management System