Big Data Hadoop Certification Training

Gain in-depth advanced knowledge from our most popular Big Data Hadoop Certification Course Online. This course Curriculum is Designed to Meet global Industry Standards. The learners will use Hadoop Latest and advanced technologies to take their career into next level.

Data Science & Analytics

35 Hours

Description

Simpliv’s Big Data Hadoop Training covers in-depth knowledge on Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, and Pig. 

Course Objectives

Hadoop basics and Hadoop ecosystem

Managing, monitoring, scheduling and troubleshooting Hadoop clusters effectively

Working with Apache Spark, Scala and Storm for real-time data analytics

Working with Hive, Pig, HDFS, MapReduce, Sqoop, ZooKeeper and Flume

Testing of Hadoop clusters with MRUnit and other automation tools

Successfully integrating various ETL tools with Hive, Pig, and MapReduce

Target Audience

Software Developers

Project Managers

ETL and Data Warehousing Professionals

Software Architects

Data Analysts & Business Intelligence Professionals

DBAs

Mainframe professionals

Data Engineers

Senior IT Professionals

Testing professionals

Graduates interested in Big Data Field

Basic Understanding

There are no specific prerequisites to learn Hadoop. Prior knowledge of Java and SQL is beneficial.

Course Content

No sessions available.

Simpliv Logo
Simpliv LLC
39658 Mission Boulevard,
Fremont, CA 94539, USA

Big Data Hadoop Certification Training

Session 1: Understanding Big Data and Hadoop

  1. Introduction to Big Data & Big Data Challenges
  2. Limitations & Solutions of Big Data Architecture
  3. Hadoop & its Features
  4. Hadoop Ecosystem
  5. Hadoop 2.x Core Components
  6. Hadoop Storage: HDFS (Hadoop Distributed File System)
  7. Hadoop Processing: MapReduce Framework
  8. Different Hadoop Distributions

Session 2: Hadoop Architecture and HDFS

  1. Hadoop 2.x Cluster Architecture Preview
  2. Federation and High Availability Architecture Preview
  3. Typical Production Hadoop Cluster
  4. Hadoop Cluster Modes
  5. Common Hadoop Shell Commands Preview
  6. Hadoop 2.x Configuration Files
  7. Single Node Cluster & Multi-Node Cluster set up
  8. Basic Hadoop Administration

Session 3: Hadoop MapReduce Framework

  1. Traditional way vs MapReduce way
  2. Why MapReduce
  3. YARN Components
  4. YARN Architecture
  5. YARN MapReduce Application Execution Flow
  6. YARN Workflow
  7. Anatomy of MapReduce Program
  8. Input Splits, Relation between Input Splits and HDFS Blocks
  9. MapReduce: Combiner & Partitioner

Session 4: Advanced Hadoop MapReduce

  1. Counters
  2. Distributed Cache
  3. MRunit
  4. Reduce Join
  5. Custom Input Format
  6. Sequence Input Format
  7. XML file Parsing using MapReduce

Session 5: Apache Pig

  1. Introduction to Apache Pig
  2. MapReduce vs Pig
  3. Pig Components & Pig Execution
  4. Pig Data Types & Data Models in Pig
  5. Pig Latin Programs
  6. Shell and Utility Commands
  7. Pig UDF & Pig Streaming
  8. Testing Pig scripts with Punit
  9. Aviation use-case in PIG

Session 6: Apache Hive

  1. Introduction to Apache Hive
  2. Hive vs Pig
  3. Hive Architecture and Components
  4. Hive Metastore
  5. Comparison with Traditional Database
  6. Hive Data Types and Data Models
  7. Hive Partition
  8. Hive Bucketing
  9. Hive Tables (Managed Tables and External Tables)
  10. Importing Data
  11. Querying Data & Managing Outputs
  12. Hive Script & Hive UDF
  13. Retail use case in Hive

Session 7: Advanced Apache Hive and HBase

  1. Hive QL: Joining Tables, Dynamic Partitioning
  2. Custom MapReduce Scripts
  3. Hive Indexes and views
  4. Hive Query Optimizers
  5. Hive Thrift Server
  6. Hive UDF
  7. Apache HBase: Introduction to NoSQL Databases and HBase
  8. HBase v/s RDBMS
  9. HBase Components
  10. HBase Architecture
  11. HBase Run Modes
  12. HBase Configuration
  13. HBase Cluster Deployment

Session 8: Advanced Apache HBase

  1. HBase Data Model
  2. HBase Shell
  3. HBase Client API
  4. Hive Data Loading Techniques
  5. Apache Zookeeper Introduction
  6. ZooKeeper Data Model
  7. Zookeeper Service
  8. HBase Bulk Loading
  9. Getting and Inserting Data
  10. HBase Filters

Session 9: Processing Distributed Data with Apache Spark

  1. What is Spark
  2. Spark Ecosystem
  3. Spark Components
  4. What is Scala
  5. Why Scala
  6. SparkContext
  7. Spark RDD

Session 10: Oozie

  1. Oozie
  2. Oozie Components
  3. Oozie Workflow
  4. Scheduling Jobs with Oozie Scheduler
  5. Demo of Oozie Workflow
  6. Oozie Coordinator
  7. Oozie Commands
  8. Oozie Web Console
  9. Oozie for MapReduce
  10. Combining flow of MapReduce Jobs
  11. Hive in Oozie
  12. Hadoop Talend Integration

Coupons

No offers available at this time.

Live Support

Call

+510-849-6155

Mail to

support@simplivlearning.com

Similar Courses

Our Trusted Clients