Big Data Engineering

MASTER THE SKILLS

Data is the new oil for the emerging digital revolution. Our course will introduce all aspects of big data processing with practical examples. From Batch processing to real-time stream processing to in-memory processing to Data warehousing/ETL to modern key-value stores - this course will really reinforce you with all it takes to make it big in the ultra-modern big data processing world. Each technology is introduced with a practical case study that expands on the didactic learning imparted in the classroom. We end the course with a capstone project that encapsulates all the learning with a comprehensive project that covers all technologies learned in the course.

Curriculum

Concepts in Big Data

  • Motivation and understand what is Big Data?
  • Sources of Big Data
  • Big Data vs Normal Data
  • Characteristics of Big Data - Volume, Variety, Velocity
  • Data Models - Structured, Semi-Structured and Unstructured data

Distributed Computing Environment for Big Data

  • Distributed Systems
  • Clusters

Distributed Processing of Data

  • Introduction to Hadoop
  • Hadoop Distributed File System (HDFS)
  • Programming using MR on Hadoop
  • Introduction to Pig

In-memory distributed processing

  • Introduction to Apache Spark
  • Apache Spark Programming Model

Data Warehousing and ETL

  • ETL
  • Data Warehousing Fundamentals
  • Data Lakes

Data Ingestion for Structured / Unstructured Data

  • Introduction to Data Ingestion
  • Sqoop: Data Ingestion in Hadoop
  • Flume: Ingestion of Events data

Data Transformation and Batch Processing

  • Hive
  • Hive v/s HBase
  • Query Optimization
  • Batch Processing
  • Using Amazon Elastic Map-Reduce (EMR)
  • Oozie - WorkFlow Engine for Hadoop

Stream Processing

  • Introduction to Streaming Data
    • Characteristics of streaming data
    • Components of a real time stream processing system
    • Features of a real time stream processing architecture
    • Social Media Data
  • Sourcing Stream Data using Apache Flume
    • Understanding Data Model, connection
  • Stream Processing
    • Elements of a stream processing system
    • Components of a Storm Cluster
    • Configuring Storm Cluster
  • Trident (Storm DSL)
    • Trident (Storm DSL) - Overview
  • Streaming on Spark
    • Understanding Spark Streaming API
    • Understanding DStream
    • Processing a Data Stream
    • Motivation for data store on cloud

No SQL DB

  • Motivation for NoSQL DB's
  • Introduction to various types of NoSQL DBs
  • Cassandra, HBase, MongoDB

Case Study

  • Case Study on HDFS
  • Case Study on MR
  • Case Study on Pig/Hive
  • Case Study on Apache Spark
  • Case Study on ETL
  • Case Study on Data Ingestion
  • Case Study on Hive & HBase
  • Case Study on Stream Processing using Spark
  • Case Study on Stream Processing using Trident
  • Case Study on No SQL DB's

Capstone Project

REGISTER FOR DEMO CLASS

DEMO
DECEMBER

FREE DEMO CLASSES - 10:00 AM IST. Demo classes will help you to get a good feel of our courses. Please Register Here for attending free demo classes.

SCHEDULE

Starting

December Every Sat & Sun 09:00 AM To 01:00 PM
Length - 16 Weeks

ADDRESS

Class Location

Opposite Google Office, Plot No.1,
Whitefields, Hitech City Road,
Kondapur, HITEC City,
Hyderabad, Telangana 500084, India.

Please fill the form