In the last post we talk about what big data is and how HDInsight bring to the Microsoft and Windows world all the capacities of Big Data. This time we are going to create an example of how we can load data from other system to our Hadoop Node using Hive. Hive allow us to run MapReduce job using a SQL-like scripting language, called HiveQL.
For this example I create a File called Products.csv, this file was created from the Product table of the AdventureWorksDW2012 Database and look like this.
Now the first thing we will do is create a table using Hive, for this will need to open our Hadoop command line, we can find a direct access in our desktop.
Now we need to type the next:
1. Type hive to begin using hive commands.
2. We create a table called Product with the column ProductKey,ProductName, Color, SafetyStockLevel and Status.
>CREATE TABLE Product(ProductKey string,ProductName string, Color string, SafetyStockLevel string, Status string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
3. Load the data into our table
>LOAD DATA LOCAL INPATH “c:\BigData\Products.csv” OVERWRITE INTO TABLE Product;
4. Make a query and see the results.
>SELECT Color AS sev, COUNT(*) AS cnt FROM Product GROUP BY Color;
Also we can see the table and the data that we load using the Hadoop namenode status and browsing the file system through the hive directory.