What Will I Learn?

Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker
Setting up Single Node Hadoop and Spark Cluster on Docker
Features of Spark Structured Streaming using Spark with Scala
Features of Spark Structured Streaming using Spark with Python(PySpark)
How to use PostgreSQL with Spark Structured Streaming
Basic understanding of Apache Kafka
How to build Data Visualisation using Django Web Framework and Flexmonster
Fundamentals of Docker and Containerization

Requirements

Basic understanding of Programming Language
Basic understanding of Apache Hadoop
Basic understanding of Apache Spark

+ View More

Description

In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time.

There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability.

Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.

Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data.

The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker.

Data Visualization is built using Django Web Framework and Flexmonster.

Curriculum For This Course

24 Lessons 7 Hours

Introduction to the course

2 Lessons 00:41:08 Hours

Introduction to Apache Spark 00:32:28 Preview
Real Time Spark Project Overview - Building End to End Streaming Data Pipeline 00:08:40 Preview

Environment Setup

6 Lessons 01:38:09 Hours

Setting up Docker Environment 00:09:55 Preview
Create Single Node Kafka Cluster on Docker 00:08:16
Create Single Node Apache Hadoop and Spark Cluster on Docker 00:35:07
Setting up IntelliJ IDEA Community Edition(IDE) 00:21:01
Setting up PyCharm Community Edition(IDE) 00:16:41
Setting up Django Web Framework 00:07:09

Development - Project Code Walk-through

5 Lessons 01:46:24 Hours

Event Simulator using Python(Server Status Detail) 00:19:16 Preview
Building Streaming Data Pipeline using Scala - Spark Structured Streaming 00:30:57
Building Streaming Data Pipeline using PySpark - Spark Structured Streaming 00:28:54
Setting up PostgreSQL Database(Events Database) 00:04:56
Building Dashboard using Django Web Framework and Flexmonster - Visualization 00:22:21

Complete Project Demo

2 Lessons 00:24:42 Hours

Real Time Spark Project Demo 00:14:31
Running Real Time Streaming Data Pipeline using Spark Cluster On Docker 00:10:11

Bonus Tutorial - Docker Tutorial for Beginners

9 Lessons 02:04:00 Hours

Introduction to Docker 00:11:38 Preview
Install Docker on Ubuntu 1804 00:09:57
Docker Commands - Commonly Used 00:10:33
Create First Docker Image and Container 00:09:49
Create MySQL Docker Container 00:10:58
Cassandra on Docker Container 00:09:04
MongoDB on Docker Container 00:08:01
Setting up Docker Compose 00:18:35
How to create Docker Volume 00:35:25

Related Courses

Linux command line tutorial - ...

Renjith Nair

Free

Master Python 3 Programming

ramya

$ 200$ 9.99

Master Golang Programming

ramya

$ 200$ 14.99

Master Ethical Hacking

Spark Databox

$ 200$ 9.99

Adobe Photoshop Lightroom Clas...

ramya

$ 54$ 20

Complete Android app developme...

Spark Databox

$ 200$ 9.99

Python Complete reference : Go...

Prabhu Y

$ 94.99$ 13

SDN Practical Hands On Trainin...

Prabhu Y

$ 99$ 9.99

Data Structures & Algorithms i...

Deepika S

$ 129.99$ 9.99

Docker for DevOps

Michael John

$ 149.99$ 12.99

Data Science with Python

Harish Gowda

$ 199.99$ 11.99

The Container, Kubernetes and ...

Sam Johnson

$ 199.99$ 14.99

UiPath Handson on Enterprise R...

Michael Jordan

$ 199.99$ 47

Informatica PowerCenter Traini...

Prerana Krishna

$ 149.99$ 14.99

Mulesoft ESB Training

Kenny James

$ 119.99$ 9.99

Master Devops Online Training

Ajay Prabhakar

$ 94.99$ 8.99

Linux System Programming - A p...

Mohan Parasuram

$ 49.99$ 11.99

Real Time Spark Project for Beginners: Hadoop, Spark, Docker