In the last post we talk about what big data is and how HDInsight bring to the Microsoft and Windows world all the capacities of Big Data. This time we are going to create an example of how we can load data from other system to our Hadoop Node using Hive. Hive allow us to run MapReduce job using a SQL-like scripting language, called HiveQL.

For this example I create a File called Products.csv, this file was created from the Product table of the AdventureWorksDW2012 Database and look like this.

Product File

Now the first thing we will do is create a table using Hive, for this will need to open our Hadoop command line, we can find a direct access in our desktop.

Now we need to type the next:

HIVE example

1. Type hive to begin using hive commands.


2.  We create a table called Product with the column ProductKey,ProductName, Color, SafetyStockLevel and Status.

>CREATE TABLE Product(ProductKey string,ProductName string, Color string, SafetyStockLevel string, Status string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

3. Load the data into our table


4. Make a query and see the results.

 >SELECT Color AS sev, COUNT(*) AS cnt FROM Product GROUP BY Color;

 Hive result

 Also we can see the table and the data that we load using the Hadoop namenode status and browsing the file system through the hive directory.

Brose the filesystem

table product in hive

table product in hive 2