The last post we see how to load and query data using hive, this time we going to use the .NET SDK for Hadoop for the same purpose.
In this case we are using the SDK for Hadoop this allow us to use Map/Reduce Jobs to query data in a distribute file system environment that can be composed for hundreds or thousands of nodes. MapReduce is a programming model for processing large data sets being typically used to do distributed computing on clusters of computers.
In a typically Map/Reduces program we can find two class the mapper and the reducer.
The mapper: this is the collection data phase, in this phase the Mapper breaks up large pieces of work into smaller ones and then takes action on each pieces.
The Reducer: this is the processing phase. Reduce combines the many results from the map step into a single output.