IJCSE Abstract

Consider any data structure, an Array for instance and declare the size of an Array either using static approach or dynamic approach. This cannot be a generic solution for large text files as this involves in huge memory allocations for the data structure. Even this can be a difficult procedure as the data size increases, processing the data will be time consuming process. Existing solutions such as lists and even heap will process the data effectively for large text files even to a certain boundary level (depends on the ram constraint). Addressing these huge volumes of data, the solution will not work in a single node and it has to spread across the cluster (storing data on the disk) .Hadoop will address all these big data problems using map reduce technique, as processing will be done in parallel manner. Map reduce is a functional programming model which has two functions map and reduce and will perform distributed parallel processing. In order to make the retrieval much faster, introducing the concept of implementing combiners between mapper and reducer. Implement a combiner function after the mapper function as the mapper generates output. The combined data that is performed by the combiners will be sent to the shuffle and sort functionality. And then from there it sends to the reduce function for obtaining the final output. The time taken to retrieve the data after processing by map reduce without using combiners will be more when compared with the map reduce processing using combiners. We generally make use of computation time and data transfer time constraints to support the above statement. This paper presents an effective approach for processing big data using combiners which will be also considered as map side reducers or mini reducers.

INDIAN JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

ABSTRACT