sevenmentor
Hadoop is an open source software framework designed for storage and processing of large scale variety of data on clusters of commodity hardware. The Apache Hadoop software library is a framework that allows the data distributed processing across clusters for computing using simple programming models called as Map Reduce. It is designed to scale up from single servers to cluster of machines and each offering local computation and storage in efficient way.
Hadoop solutions normally include clusters that are hard to manage and maintain. In many scenarios it requires an integration with other tools like mysql, mahout etc. It works in series of map reduce jobs and each of these jobs are high-latency and depend with each other. So no job can start until previous job has been finished and successfully completed. Map Reduce: Map Reduce is a programming model of hadoop for processing a hdfs data.
Apache Hadoop can run MapReduce programs written in different languages like Java, Ruby, and Python. MapReduce programs executes in parallel in cluster efficiently. It works in following phases:
1. Map phase
2. Reduce phase