Course Contents of Hadoop

• Hadoop

• what is big data?

• What is Hadoop?

• Relation between Big Data and Hadoop

• what is the need of going ahead with Hadoop?

• Scenarios to apt Hadoop Technology in real Time Projects

• challenges with Big Data

 Storage

 Processing

• How Hadoop is addressing Big Data Changes

• Comparison With Other Technologies

 RDBMS

 Data Warehouse

 TeraData

• Different Components of Hadoop Echo System

 Storage Components

 Processing Components

HDFS(HADOOP DISTRIBUTED FILE SYSTEM)

• what is a Cluster Environment ?

• Cluster Vs Hadoop Cluster.

• Significance of HDFS in Hadoop

• Features of HDFS

• Storage of HDFS

 Block
 How to Configure block size
 Default vs. Configurable Block Size
 Why HDFS Block Size so large?
 Design Principles of Block Size

• HDFS Architecture -5 Daemons of Hadoop

 NameNode and its functionality
 DataNode and its functionality
 JobTracker and its functionality
 TaskTracker and its functionality
 secondary Name Node and its functionality.

• Replication in Hadoop-Fail over Mechanism

 Data storage in Data Nodes
 Fail Over Mechanism in Hadoop -Replication
 Replication Configuration
 Custom Replication
 Design Constraints with Replication Factor

• Accessing HDFS

 CLI(Command Line Interface) and HDFS Commands
 Java Based Approach

• Hadoop Archives

MAPREDUCE

• Why Map Reduce is essential in Hadoop?

• Processing Daemons of Hadoop

 Job Tracker
 Roles Of Job Tracker
 Drawbacks w.r.to Job Tracker failure in Hadoop Cluster
 How to configure Job Tracker in Hadoop Cluster
 Task Tracker
 Role of Task Tracker
 DrawBacks of w.r.to Task Tracker Failure in Hadoop Cluster

• Input Split

 Input Split
 Need of Input Split in Map Reduce
 Input Split Size
 Input Split Size Vs Block Size
 Input Split Vs Mappers

• Map Reduce Life Cycle

 Different Phases of Map Reduce Algorithm
 Different Data types in Map Reduce
 Primitive Data Types Vs Map Reduce Data Types
 How to write a basic Map Reduce Program
 Driver code
 Mapper code
 Reducer code
 Driver Code
 Importance of Driver Code in a Map Reduce program
 How to identify the Driver Code in Map Reduce Program
 Different sections of Driver code
 Mapper Code
 importance of Mapper Phase in Map Reduce
 How to Write a Mapper Class?
 Methods in Mapper Class
 Reduce Code
 Importance of Reducer phase in Map Reduce
 How to Write Reducer Class?
 Methods in Reducer class
 IDENTITY MAPPER &IDENTITY REDUCER
 Input Format's in Map reduce
 TextInputFormat
 Keyvalue TextInputFormat
 NLineInputFormat
 DBInputFormat
 SequenceFileInputFormat
 How to use the specific input format in Map Reduce
 Output Formats in Map Reduce
 TextOutputFormat
 keyvalue TextOutputFormat
 NLineOutputFormat
 DBOutputFormat
 SequenceFileOutputFormat
 How to use the specific output format in Map Reduce
 Map Reduce api(Application Programming Interface )
 New API
 Deprecated API
 Combiner in Map Reduce
 Is combiner mandate in Map Reduce
 How to use the cobiner class in Map Reduce
 Performance tradeoffs w.r.to Combiner
 Partitioner in Map Reduce
 Importance of Partitioner class in Map Reduce
 How to use the Partitioner class in Map Reduce
 hash Partitioner functionality
 How to Write a custom Partitioner
 Compression Techniques in Map Reduce
 Importance of Compression class in Map Reduce
 What is CODEC
 Compression Types
 GzipCodeec
 BzipCodec
 LZOCodec
 SnappuCoedc
 Configurations w.r.to Compression Techinques
 How to Customize the Compression per one job Vs all the job
 Joins -in Map Reduce
 Map Side Join
 Reuce Side Join
 Performance Trade Off
 Distributed Cache
 How to debug MapReduce Jobs and Pseudo Cluster mode
 Introduction to MapReduce Streaming
 Data localization in Map Reduce

• Secondary Sorting Using Map Reduce

Apache PIG

• Introduction to Apache Pig

• Map reduce Vs Apache Pig

• SQL Vs Apache Pig

• Different data types in pig

• Modes Of Execution in pig

 Local Mode
 Map Reduce OR Distributed Mode

• Execution Mechanism

 Grunt Shell
 Script

• Embedded

• Transformations in pig

• How to write a simple pig Script

• How to develop the Complex Pig Script

• Bags,Tuple and fields in PIG

• UDFS in Pig

 Need of using UDF'S IN PIG
 how to use UDFs
 Register Key Word in PIG

• When to use Map Reduce & Apache PIG in REAL Projects

HIVE

• Hive Introduction

• Need of Apache Hive in Hadoop

• Hive Architecture

 Driver
 Complier
 Executor (Semantic Analyzer)

• Meta Store in Hive

 Importance Of Hive Meta Store
 Embedded meta store configuration
 External meta store configuration
 Communication Mechanism with Metastore

• Hive Integration with Hadoop

• HIve Query Language(HiveQL)

• configuring HIve with MySQL MetaStore

• SQL Vs Hive QL

• Data Slicing Mechanisms

 Partitions In Hive
 Buckets In Hive
 Partitions Vs Bucketing
 Real time Use Cases

• Collection data Types in Hive

 Array
 Struct
 Map

• User Defined Functions(UDFs) in Hive

 UDFs
 UDAFs
 UDTFs
 Need of UDFs in Hive

• Hive Serializer/Deserializer-serDe

• Hive -Hbase Integration

SQOOP

• Introduction to Sqoop

• Mysql client and server Installation

• How to connect to relational Database

• Different Sqoop Commands

 Different flavours of Imports
 Export
 Hive-Imports

HBase

• Hbase Introduction

• Hdfs Vs Hbase

• Hbase usecases

• Hbase basics

 Column families
 Scans

• HBase Architecture

• Clients

 Rest
 Thrift
 Java based
 Avro

• Map Reduce Integration

• Map Reduce over Hbase

 Schema Definition
 Basic CRUD Operations

Oozie

• Oozie Introduction

• Oozie Architecture

• Oozie Configuration Files

• Oozie job Submission

 Workflow.xml

 Coordinator.xml

 job.coordinator.properties

Phone: 9900282636
Phone: 9900284626
No # 3, Groud Floor, V R K H Building
Vivekananda Layout
Opposite to Home town
Beside Biriyani Zone
Bangalore - 560 037,Karnataka, India.