Home Courses Instructor Labs

Apache Spark 2.0 with Java

(68 Ratings) 4245 Students Enrolled
Created By John kennady Last Updated Sat, 23-May-2020 English
  • Course Duration
    3 Hours
  • Mode of Training
  • Lessons
    40 Lessons
  • Validity
    Life Time
$ 99.99 $ 8.99 91% off 100% Money Back Guarantee
12k+ satisfied learners Read Reviews
What Will I Learn?
  • Understand the overview of Apache Spark Architecture
  • Learn the difference between structured and semi structured datasets
  • Develop Apache Spark 2.0 applications using RDD transformations and Spark SQL
  • Learn to work with Apache Spark advanced concepts such as abstraction, RDD to process and analyze large datasets
  • Learn to scale up the Spark applications on Hadoop YARN cluster through AWS Elastic MapReduce Service
  • Share the information on different nodes on a Apache Spark cluster by broadcast variables and accumulators

  • Basic Java Programming knowledge
+ View More

This course gives you all fundamentals about Apache Spark with Java. Finally in this course, you will learn to manipulate your skills to adapt Apache Spark to build Big Data processing pipeline and Data Analytics applications. In this course, you will be discussing 10+ hands-on big data examples.

The following this are learnt from this course lecture:

  • Architectural overview of Apache Spark.
  • Develop Apache Spark 2.0 applications with Java using RDD transformations and Spark SQL.
  • Learn about Apache Spark's Abstraction, RDD to process and analyze large datasets.
  • Learn how to scale up Spark applications on a Hadoop YARN cluster throw AWS Elastic MapReduce Services.
  • Learn more about advanced techniques to optimize and tune Apache Spark jobs.
  • Thoroughly analyse the structured and semi structured Datasets to develop Spark SQL.
  • Learn to share information across the different nodes on a Apache Spark Cluster by broadcast variables and accumulators. 
  • Get to know the overview of Big Data Ecosystem.

Apache Spark gives you unlimited ability to build cutting-edge applications. Apache Spark provides in-memory cluster computing which boots the iterative algorithms and interactive data mining tasks.

Apache Spark has become most popular tool among big data engineers and data scientists.

Curriculum For This Course
40 Lessons 3 Hours
  • Introduction to Spark 00:02:21 Preview
  • Set up Spark project with IntelliJ IDEA 00:07:23
  • Set up Spark project with Eclipse 00:02:04
  • Run our first Spark job 00:02:44
  • RDD Basics 00:02:40 Preview
  • Create RDDs 00:02:26
  • Map and Filter Transformation 00:08:38
  • Solution to Airports by Latitude Problem 00:01:31
  • FlatMap Transformation 00:06:27
  • Set Operation 00:07:37
  • Actions 00:08:06
  • Solution to Sum of Numbers Problem 00:01:44
  • Important Aspects about RDD 00:01:28
  • Summary of RDD Operations 00:02:24
  • Caching and Persistence 00:05:09
  • Spark Architecture 00:02:56 Preview
  • Introduction to Pair RDD 00:01:33 Preview
  • Create Pair RDDs 00:03:54
  • Filter and MapValue Transformations on Pair RDD 00:04:53
  • Reduce By Key Aggregation 00:05:15
  • Sample solution for the Average House problem 00:03:16
  • Group By Key Transformation 00:04:43
  • Sort By Key Transformation 00:02:49
  • Sample Solution for the Sorted Word Count Problem 00:02:01
  • Data Partitioning 00:04:13
  • Join Operations 00:04:56
  • Accumulators 00:05:31
  • Solution to StackOverflow Survey Follow-up Problem 00:01:21
  • Broadcast Variables 00:06:48
  • Introduction to Spark SQL 00:03:49 Preview
  • Spark SQL in Action 00:14:43
  • Spark SQL practice: House Price Problem 00:01:53
  • Spark SQL Joins 00:06:21
  • Strongly Typed Dataset 00:08:32
  • Use Dataset or RDD 00:02:57
  • Dataset and RDD Conversion 00:02:58
  • Performance Tuning of Spark SQL 00:02:44
  • Introduction to Running Spark in a Cluster 00:04:09
  • Package Spark Application and Use spark-submit 00:08:08
  • Run Spark Application on Amazon EMR (Elastic MapReduce) cluster 00:13:32

Apache Spark 2.0 with Java