Hadoop has become the industry standard for batch processing in the large especially for big data applications that require hefty amount of machine learning over a large collection. MapReduce being the center of a lot of such processing requirements as well as taking advantage of the sub-modules of the platform to facilitate various specialized tasks within a pipeline or workflow. The core aspects to Hadoop mainly incorporate the storage of files and the processing of such files. In process, each stage is loosely abstracted out within the sub-projects to resolve various needs. There are also various open source alternatives available instead of using Hadoop or even in conjunction to it. Especially, for real-time needs where Hadoop often does not fit well or where one's needs go well beyond the demands of which HDFS is able to meet. There might also be situations where one's needs are just not as demanding enough to justify using Hadoop which would often just make it overkill on complexity.