Trainings :      +91 889 8888 448    |    Certifications :      +91 868 8881 863
Introduction to Big Data, Hadoop & Spark Architecture
Introduction to Course
What is covered and not covered
Data Explosion, Data Sources, Data types
What is Big Data, Benefits & Big Data Problem
Limitations of Traditional Parallel Systems
Solution using Hadoop Framework
Characteristics and Types of Big Data Systems
What is Hadoop, History of Hadoop
Hadoop Architecture, Namenode, Job Tracker
HDFS and Map Reduce, Map Reduce example
Limitations of Hadoop 1.0 and MapReduce
Hadoop 2.0 and YARN Architecture
What is Apache Spark?
Apache Spark and Map Reduce differences
Spark Stack Architecture and Advantages
Spark History and Releases
Spark for Data science & Data processing tasks
Learning Scala – Functional Programming
Functions, Methods & Procedures
Function Literals / Anonymous Functions
Higher Order Functions – Function as a variable
Higher Order Functions – Passing function as parameter
Higher Order Functions – Returning a function
Higher Order Functions – Closures
Higher Order Functions – Partially Applied functions
Higher Order Functions – Call by Name, Call by value
Regular expressions and Pattern Matching
Case classes and Pattern Matching
Learning Scala – Basic & Object Oriented Programming
Scala Installation & Scala REPL Interpreter
First Scala Program, Scala Scripts
Scala Basics – Variables, Types, Control Structures, Loops
Scala Basics – Strings & String interpolations
Scala Basics – Functions without Parameters
Scala Basics – Functions with parameters
Scala Basics – Arrays, Lists, Ranges and Tuples
Classes, Objects and Apply method
Constructors and Parameters
Method Declaration, Call by Name
Singleton Objects, Packaging
Inheritance, Extending a class, Overriding
Traits, Case classes
Hands-on Scala Programming Labs
Creating Strings, String equality & splitting
Finding and replacing patterns in strings
Looping with Foreach, Embedded if statements
Using If construct as a Ternary Operator
Using Match expressions and assigning the result to a
variable
Using Pattern matching in Match expressions
Using classes, Objects, Methods and Traits
Using Function Literals
Working with Higher Order Functions
Creating Collections
Using Map, Flatmap, Filter on Collections
Hands on Lab – Using Foreach and reduce on Collections
Spark Essentials
Getting started with Spark
Spark Python and Scala Shells
Spark Context
Spark Runtime Architecture – Workers and Cluster Managers
Spark Runtime Architecture – Driver Programs, Executors and
Tasks
How a Spark Application works
Data sources for loading data into Spark
Understanding Hadoop Input and Output Formats
Understanding Data Serialization Formats – Avro and Sequence
files
Understanding Columnar file formats – RCFile, ORC and
Parquet
Advanced Spark Programming
Data Partitioning in Spark
Operations that benefit from partitioning
Operations that affect partitioning
Saving RDDs
Caching RDDs and Persistence
Word Count program using Spark
Spark Program Lifecycle
Spark Variables
Spark Broadcast Variables
Spark Accumulators and Fault Tolerance
Spark Core Programming – Understanding RDDs
Resilient Distributed Datasets (RDD)
Data sources for creating RDDs
Creating RDDs from text, csv and tsv files.
Creating RDDs from JSON files & Sequence files
Creating RDDs from Hadoop InputFormat
Creating RDDs from HDFS and Amazon S3 files
Creating RDDs from NOSQL Databases
RDD Operations – Transformations and Actions
Lazy evaluations
Loading and Saving RDDs
Passing functions to Spark, Spark Closures
Spark Key Value RDDs, Creating Pair RDDs
Pair RDD Transformations – Aggregations, Grouping, Joins
& Sorting
Actions on Pair RDDs
Building and Running a Spark Scala program
Spark Scala API , Spark JAR files
Running a Spark program using spark-submit
Running a spark program on Standalone Cluster
Running a spark program on YARN
Launching Spark jobs from Java and Scala
Building a Spark application with Eclipse/Scala IDE and
Maven, Maven Dependencies
Building a Spark application with Eclipse/Scala IDE and SBT
Building a Spark Fat JAR
Tuning and Debugging Spark for Performance
Configuring Spark with SparkConf
Components of a Spark program – Jobs, Tasks and Stages
Spark Web UI Deep Dive
Spark RDD Lineage
Spark Logs
Serialization and Memory Management to improve performance
Project Tungsten
Hardware Provisioning an Performance Management
Monitoring and Debugging a Spark Application
Spark SQL and Dataframes Programming I
Spark SQL and Hive Interoperability, Spark SQL Performance
Advantages
ETL and Data warehousing with Spark SQL
Initializing Spark SQL using SQLContext
Dataframes Introduction, Caching Dataframes
Creating Dataframe from RDD using case class and toDF method
to infer schema
Creating Dataframe from RDD using StructType and
createDataFrame to specify schema
Creating Dataframes from Scala Collections
Creating Dataframes from text files, csv and tsv files
Creating Dataframes from JSON files, Parquet files &
Hive Tables
Loading & Saving Dataframes
Hands-on Projects using Spark RDDs
IT Networks © 2019-20 | All Rights Reserved
Powered by Best Digital Marketing Courses in Bangalore