Home Courses Instructor Labs

Real Time Spark Project for Beginners: Hadoop, Spark, Docker

(0 Ratings) 0 Students Enrolled
Created By admin Last Updated Mon, 19-Oct-2020 English
  • Course Duration
    7 Hours
  • Mode of Training
    Self-Paced
  • Lessons
    24 Lessons
  • Validity
    Life Time
$ 199.99 $ 29.99 85% off 100% Money Back Guarantee
12k+ satisfied learners Read Reviews
What Will I Learn?
  • Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker
  • Setting up Single Node Hadoop and Spark Cluster on Docker
  • Features of Spark Structured Streaming using Spark with Scala
  • Features of Spark Structured Streaming using Spark with Python(PySpark)
  • How to use PostgreSQL with Spark Structured Streaming
  • Basic understanding of Apache Kafka
  • How to build Data Visualisation using Django Web Framework and Flexmonster
  • Fundamentals of Docker and Containerization

Requirements
  • Basic understanding of Programming Language
  • Basic understanding of Apache Hadoop
  • Basic understanding of Apache Spark
+ View More
Description

In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time.

There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability.

Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.

Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data.

The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker.

Data Visualization is built using Django Web Framework and Flexmonster.

Curriculum For This Course
24 Lessons 7 Hours
  • Introduction to Apache Spark 00:32:28 Preview
  • Real Time Spark Project Overview - Building End to End Streaming Data Pipeline 00:08:40 Preview
  • Setting up Docker Environment 00:09:55 Preview
  • Create Single Node Kafka Cluster on Docker 00:08:16
  • Create Single Node Apache Hadoop and Spark Cluster on Docker 00:35:07
  • Setting up IntelliJ IDEA Community Edition(IDE) 00:21:01
  • Setting up PyCharm Community Edition(IDE) 00:16:41
  • Setting up Django Web Framework 00:07:09
  • Event Simulator using Python(Server Status Detail) 00:19:16 Preview
  • Building Streaming Data Pipeline using Scala - Spark Structured Streaming 00:30:57
  • Building Streaming Data Pipeline using PySpark - Spark Structured Streaming 00:28:54
  • Setting up PostgreSQL Database(Events Database) 00:04:56
  • Building Dashboard using Django Web Framework and Flexmonster - Visualization 00:22:21
  • Real Time Spark Project Demo 00:14:31
  • Running Real Time Streaming Data Pipeline using Spark Cluster On Docker 00:10:11
  • Introduction to Docker 00:11:38 Preview
  • Install Docker on Ubuntu 1804 00:09:57
  • Docker Commands - Commonly Used 00:10:33
  • Create First Docker Image and Container 00:09:49
  • Create MySQL Docker Container 00:10:58
  • Cassandra on Docker Container 00:09:04
  • MongoDB on Docker Container 00:08:01
  • Setting up Docker Compose 00:18:35
  • How to create Docker Volume 00:35:25

Real Time Spark Project for Beginners: Hadoop, Spark, Docker