Apache Hadoop is a
framework written in Java for running applications on large cluster built of
commodity hardware.
The Hadoop
framework transparently provides both reliability and data motion to sapplications.
Hadoop implements a computational paradigm named MapReduce, where
the application is divided into many small fragments of work, each of which can
execute or re-execute on any node in the cluster. In addition, it provides a
distributed file system that stores data on the compute nodes, providing very
high aggregate bandwidth across the cluster.
All the
modules in Hadoop are designed with a fundamental assumption that hardware
failures (of individual machines or racks of machines) are common and thus
should be automatically handled in software by the framework. Apache Hadoop's
MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google
File System (GFS) papers.
Hadoop was
created by Doug Cutting and Mike
Cafarella in 2005. Cutting, who was working at Yahoo at the time, named it
after his son's toy elephant. It was originally developed to support
distribution for the Nutch search engine project.
No comments:
Post a Comment