Home Courses Instructor Labs

Big Data Analytics with PySpark

(567 Ratings) 1472 Students Enrolled
Created By Ankit Sharma Last Updated 02-Nov-2019 English
  • Course Duration
    2 Hours
  • Mode of Training
    Self-Paced
  • Lessons
    24 Lessons
  • Placement Assistance
    Guaranteed
Free
12k+ satisfied learners Read Reviews
What Will I Learn?
  • Use Python and Spark together to analyze Big Data
  • Learn about Apache Spark and the Spark 2.0 architecture
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Build and interact with Spark DataFrames using Spark SQL

Requirements
  • General Programming Skills in any Language (Preferrably Python)
  • Ubuntu, Mac OS, or Windows as an operating system
  • A computer with atleast 4GB memory & internet connection
+ View More
Description

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Python!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Who can enroll this course:

Big data aspirants with Python experience and would like to learn how to use it for Big Data.

if you are familiar with another programming language and needs to learn some trending technology like Spark.

Curriculum For This Course
24 Lessons 2 Hours
  • Course Overview 00:01:04 Preview
  • Big Data Overview 00:10:18
  • Traditional Data Storage and Processing Software vs Big Data 00:03:32
  • Time line of Big Data and Hadoop based Ecosystems 00:04:37
  • What is Apache Spark 00:07:03
  • Spark API Overview 00:03:55
  • Getting started with Data bricks for eager Sparker 00:11:38
  • Different ways of installation 00:03:48
  • Cloud Digital Ocean Setup Installation 1 00:08:01
  • Python3 and Jupyter notebook installation 2 00:06:23
  • Install Java Scala Py4j Spark installation 3 00:07:01
  • Set Path variable and start Jupyter notebook installation 4 00:06:27
  • Introduction 00:02:38 Preview
  • Spark Session 00:03:03
  • Import JSON data into Dataframe 00:04:52
  • Define Custom schemaType 00:04:09
  • Dataframe as SQL table 00:03:48
  • Dataframe Operation - Part 1 00:02:15
  • Dataframe Operation - Part 2 00:08:50
  • Filter data 00:03:03
  • Handling Missing data 00:06:11
  • Dealing with datetime in Dataframe 00:04:41
  • Introduction 00:06:56
  • Streaming example 00:11:52

Big Data Analytics with PySpark