Loading and querying data using Hive with HDInsight

Etiquetas

,

The-hive-l

In the last post we talk about what big data is and how HDInsight bring to the Microsoft and Windows world all the capacities of Big Data. This time we are going to create an example of how we can load data from other system to our Hadoop Node using Hive. Hive allow us to run MapReduce job using a SQL-like scripting language, called HiveQL.

For this example I create a File called Products.csv, this file was created from the Product table of the AdventureWorksDW2012 Database and look like this.

Product File

Now the first thing we will do is create a table using Hive, for this will need to open our Hadoop command line, we can find a direct access in our desktop.

Now we need to type the next:

HIVE example

1. Type hive to begin using hive commands.

>hive

2.  We create a table called Product with the column ProductKey,ProductName, Color, SafetyStockLevel and Status.

>CREATE TABLE Product(ProductKey string,ProductName string, Color string, SafetyStockLevel string, Status string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

3. Load the data into our table

>LOAD DATA LOCAL INPATH “c:\BigData\Products.csv” OVERWRITE INTO TABLE Product;

4. Make a query and see the results.

 >SELECT Color AS sev, COUNT(*) AS cnt FROM Product GROUP BY Color;

 Hive result

 Also we can see the table and the data that we load using the Hadoop namenode status and browsing the file system through the hive directory.

Brose the filesystem

table product in hive

table product in hive 2

Anuncios

Introduction to HDInsight and the Big Data explosion

Etiquetas

Big data

When we talk about Big Data we are talking about solution that need to analyze a huge volume of data, so big data is not the solution for every scenario but is the solution for the next scenarios:

  • Scenario that need to offer analytical capacities over petabytes or zeta bytes of information
  • Scenario where a lot of information is generated every second
  • Scenarios where data is generated in huge volume (thousands of MB or GB)

We can say that what every big data solution have in common is that all of them need to deal with load, processing and analysis enormous amount of data.

We can find several examples of big data utilization in social network applications like Facebook, twitter or foursquare where millions of photo are upload every minute, millions of comments, shares, twits, and check in are generated every second.

Keep reading

Using Power View to consume data from a SSAS Tabular Model

Etiquetas

,

Power View

How we can show in our last post one of the advantage of using a tabular model is the possibility of create reports using Power View.

Power View is a reporting services add-in for SharePoint and allows us to create reports and explore data in a rich interactive way without leaving SharePoint. How I said before Power view is reporting services add-in so will need to install Reporting Services integrated with SharePoint and configured properly (You can see how to do it in an early Post).

From a library that use the reporting services content types (In this case the Report Data Source content type), go to New Document and select Report Data Souce.

You will need to configure the connection string and credential information.

45

After created go to the context menu of the new connection file and choose Create Power View Report.

46

And that’s all, we now can see our designer interface and begin to create our reports.

47

48

Comparison between Analysis Services multidimensional model and tabular model

Etiquetas

,

Tabular Model

Without going to deeply we can say that the Tabular Model bring some advantage over the multidimensional model, this advantage are:

-We don’t need to have extensive knowledge about business intelligence to create a tabular model, to create a tabular model we only need to know about relational data base modeling to create a proper data model for our solution. Of course this depend of how far we need to go with our tabular model, the solution will more complex and will required more skill if you need to crate KPI, DAX calculation, etc. We can say that the skill required will be half way between power pivot skills and multidimensional model skills requirements.

-If you want to change the data model is too much easy and quickly implements change in a tabular model than in a multidimensional model. For a multidimensional model you probably need to redefine dimension, measures, relationships, partitions etc.

 -The tabular model allows you to use PowerView as tool for creating reports using Excel o SharePoint interface. PowerView use DAX query to create reports, Tabular Model and PowerPivot use support DAX.

-Tabular model use a In-Memory architecture that improve query performance in some scenarios.

In the other hand we have multidimensional model that have some advantage to:

–          The multidimensional model creates data aggregation in the multidimensional data base making the user query too much faster for complex cubes.

–          We can use different kind of data architecture MOLAP (store aggregated data measures, and calculation for faster query’s), ROLAP (online query )or HOLAP (Hybrid approach).

–          Other features that are supported out of the box for multidimensional model that are not supported for tabular model.

  • WriteBack
  • Translation
  • Parent-child Hierarchies
  • Many-to-many relationships
  • Custom Assemblies
  • Aggregations
  • Actions

 For more information go to Comparing Tabular and Multidimensional Solutions (SSAS)

SQL Server Analysis Services and the new BI Semantic Model

Etiquetas

,

bi semantic model

SQL Server 2012 arrives a new model for create analysis services solutions, this new model is called tabular model and is part of the new BI semantic model composed also for the traditional multidimensional model.

In this opportunity we are going to talk about the new Tabular model. We can say that the new tabular model is an evolution of power pivot; it has the same orientation about flexibility and in-memory access of the data, and use the same query language DAX.

For create a Tabular model solution we need to use the SQL Server Data Tools (before business intelligence development studio) and create a data model in the same way we did it using power pivot, once create a data model we can deploy this model to an instance of SQL Server Analysis Services so the model can be used as a data source for create reports using tools like Reporting Services, Excel and PowerView.

To be able to deploy a tabular model to SQL Server Analysis Services you need to create an instance of SQL Server Analysis Services using Tabular Model, to achieve this you only need to select Tabular Model when you are installing the instance of SQL Server Analysis Services.

Keep reading

BI Tools: Installing and configuring Reporting Services 2012 integrated with SharePoint 2013

Etiquetas

, , , ,

Reporting Services

This time we are going to talk about how to install SQL Server 2012 Reporting Services integrated with SharePoint Server 2013.

The first thing we need to know is that with the new release of SharePoint 2013 Reporting Services became an application services so all the configuration related with Reporting Service will be done in the SharePoint central administration, that change the installation a little bit. Now let’s see how to make the installation.

First thing we need to launch the SQL Server 2012 installation Wizard (I’m assuming you already have a SharePoint Server 2013 installed and configured) and choose New SQL Server stand-alone installation.
Keep reading

Windows Azure as an application platform for the public cloud

Etiquetas

, , , , ,

windows azure components

In this opportunity we are going to talk about some of this component:

Web Sites: Web sites in windows azure allow you to create a web site using visual studio the same way that you use to do it on an on-premise scenario and publish it on the cloud. When you create a web site you can use either create a ASP.NET web form or ASP.NET MVC, this way you can just take care of your application and not about the servers administration, on other hand you count with the possibility of use several instance running your application having fault tolerance and a better performance.

Cloud services: the cloud services are composed for two types of roles, Worker Roles and Web Roles, this approach offer a solution with most possibilities for escalation and a better utilization of the resources because you can create as many instances as you can for any type of role so you can create different numbers of instance for Web Roles (run on IIS) and Worker Roles.

The Web Roles is the role running on IIS and serve the user request, while the Worker roles we can compared with Windows Services that are running and listening for petitions or executing a recurrent process. This way a web role instance might accept requests from users and then pass them to a worker role instance for processing.

Keep reading

Considerations before install Windows Server 2012

Etiquetas

, ,

Windows Server 2012

  1. Which version to install

The first thing to decide is which version of windows server 2012 to install, in this case we must to take in consideration the number of physical and virtual instance of Windows Server that we are allowed to install, the type of instances that it’s possible  to install are POSE(physical operating system environment) and VOSE (virtual operating system environment). The POSE instance refers to the installation on the physical server and the VOSE instance refer to the installation on virtual machines. In the next table we can see the limitations for each version:

Edition POSE Instances VOSE Instances
Data Center 1 Unlimited
Standard 1 2
Foundation 1 0
Essentials 1 (POSE or VOSE) 1 (POSE or VOSE

2. Minimum requirements

The Minimum requirements for the installation are very low considering the resource levels of the servers that we can find today. To install Windows Server 2012 we need:

  • 1.4 GHz 64-bit processor
  • 512 MB RAM
  • 32 GB available disk spaces
  • DVD Drive
  • Super VGA (800×600) or higher resolution
  • Keyboard and mouse
  • Internet access

But there are also a Maximum configuration supported, we should not exceed the next configuration when we install windows server 2012:

Windows Server 2012 Windows Server 2008 R2
Logical processor  640 256
RAM 4 terabytes 2 terabytes
Failover Cluster nodes 63 16

Keep reading

Scale in the cloud

Etiquetas

, , , ,

cloud computing

In High Scalability we cannot lose the opportunity to talk about the cloud, especially because today we cannot talk about scale without talk about cloud computing, that’s why today we are going to talk about Windows Azure.

For those that still don’t know what Windows Azure is, we can say that is a group of services offered by Microsoft that present a solution for different scenarios, this services conform the Windows Azure platform enabling the creation of applications, data base, web services, virtual machines, and different storage type in the cloud bringing with this a lot of advantages, normally this scenarios are:

Keep reading

Gamification for SharePoint Sites

Etiquetas

,

gamification

This year there was much talk about gamification, for those who don’t know what gamification is about, we can say that the gamification concept consist on bring video games aspects to real life applications like web sites, intranets and other applications where users contributions is a key, those aspect are those one that produce that hook  the users to our applications keeping them motivated and making them more productive, more collaborative and spending more time participating in our applications.

Today gamification is widely used in marketing campains and in almost everything where you can see challenging, recognition and competition.

In the case of SharePoint sites in almost every case the participation of the users is fundamental to our applications success, for that reason we can use the principal of gamifications to achieve that users spend more time in our web site, upload more documents, create more quality content, share ideas, tag articles, and get more involve in business process.

So let’s see how gamification can be applied to our SharePoint applications following the next gamifications principles:

 1. Status/reputation: this is normally get it using a leader board, the fact that the user compare itself with other create a motivation base on social encourage.

For your SharePoint site you can create leader board showing the users that upload more documents, create more articles, etc.

leaderboard

2. Surprising and discovery: this video game concepts play with the idea that the user always want know more about the game its playing and what will coming next.

 Try to surprise your user given them access to new zones in your site when they achieve some goals in a business process.

3. Rewards: they key here is to give something to user that make them feel positive about their achievements, this may be virtual gif or real stuff.

4. Loss Aversion: this in difference with rewards means something that you don’t want to lose at unless you keep playing.

5. Power: this principle is involve with what the user can and cannot do in the application, this may mean access to page no one can see, or even power over other users.

6. Feedback: the importance of this is that the users always need to be informed about the result of their actions, their progression, their performance, status, etc.

How we can see gamification is designed to use psychology principles to influence in user behavior.

Right now there are several companies that offer gamificatios products that you can integrate in SharePoint applications some of them are:

uboost

uboost

bigdoor

bigdoor

badgeville

badgeville