RSS

Big Data Basics – Part 7 – Hadoop Distributions and Resources to Get Started

In the previous post, the 6th part of the Big Data Basics series, we saw few Related Projects in Hadoop Ecosystem, which were few popular ones among the vast Hadoop Ecosystem. In this post, we will take a look at the Hadoop Distributions available in the market, highlights of each distribution, and few pointers on how to get started with each of these distributions.

In this post, we will discuss the following aspects and distributions in brief:

  • What are the Hadoop distributions?
  • How are the Hadoop distributions different from one another?
  • Which distribution should I choose?
  • List of Hadoop Distributions
    • Azure HDInsight
    • Cloudera
    • Hortonworks
    • Amazon Elastic Map Reduce (EMR)
    • MapR
  • Links/References/Pointers on Getting Started

To continue reading, catch the full article here: Big Data Basics – Part 7 – Hadoop Distributions and Resources to Get Started.

 

Tags: , , , , , ,

Big Data Basics – Part 6 – Related Apache Projects in Hadoop Ecosystem

In the previous post, the 5th part of the Big Data Basics Series, we saw an Introduction to MapReduce. In this post, we will take a look at the Hadoop Ecosystem and Related Apache Projects in the Hadoop Ecosystem.

Apart from HDFS & MapReduce, there are various other components/frameworks which are part of Hadoop Ecosystem. In this post, we will cover the following:

  • Overview of Hadoop Ecosystem
  • Overview of Related Apache Projects
    • Apache Pig
    • Apache Hive
    • Apache Mahout
    • Apache HBase
    • Apache Sqoop
    • Apache Oozie
    • Apache ZooKeeper
    • Apache Ambari

To continue reading, catch the full article here: Big Data Basics – Part 6 – Related Apache Projects in Hadoop Ecosystem.

 

Tags: , , ,

Big Data Basics – Part 5 – Introduction to MapReduce

In the previous post, the 4th part of the Big Data Basics Series, we saw an Introduction to Hadoop Distributed File System (HDFS). In this post, we will take a look at the other core component of Hadoop Ecosystem – MapReduce.

MapReduce is the core component of Hadoop Ecosystem which is responsible for computation. In this post, we will cover the following aspects of MapReduce:

  • Few Key Concepts on MapReduce
  • MapReduce Logical Data Flow
  • MapReduce Word Count Example Flow
  • Highlights of MapReduce

To continue reading, catch the full article here: Big Data Basics – Part 5 – Introduction to MapReduce.

 

Tags: , , , , ,