DATTA's blog - All about "DATA": December 2012

Saturday, December 22, 2012

SSAS - Designing approaches for dimensional model

In SSAS, there are two ways to design dimensional models. Top-down or bottom up.

Top Down Approach

This is generally not much used option.
The relational data source does not exist in this case. The tables are created from the dimensional model from SSAS.
Cube and dimension wizards has the option to implement this option.
Schema Generation Wizard is used to generate and deploy schema in the database..
DBO permission required in order to create database objects.

Bottom Up Approach

Most of the times this is the approach thats taken.
The data mart already exists.
The dimensional model and cubes are designed from the Relational tables in the Data Mart.
Cube and Dimension wizards have the option to implement this solution.

Monday, December 10, 2012

SQL Server - Cost estimation issues even with up-to-date statistics

The only way to know if statistics are update-to-date or outdated is look at Actual Records count and Estimated Records count in an execution plan. A mismatch indicates that statistics need to be updated. But there are exceptions to this rule. There are situations when even if statistics is up-to-date still there is a difference in estimated and actual count.

First let's check the result with fixed parameter query. In my example I have bloated the data in AdventureWorksDW2008R2.FactInternetSales by duplicating the data multiple times (of couse, ensuring that the Primary key is unique).

T-SQL with fixed query (OK)

T-SQL with local variables (Issue)

Stored procedure with local variables (Issue)

Here are the scenarios when the same query with different implementation style would give a matching estimated and actuals count.

Possible Solutions...

Here are the possible solutions to correct the issue.

Stored procedure without local variables (OK)

Dynamic sql with sp_executesql (OK)

Dynamic sql with EXEC (OK)

Saturday, December 1, 2012

Big Data - 1: A big picture

Everyday we get bombarded with terminologies in connection with big data like NoSQL, RavenDB, Hadoop, MongoDB and so on. Since this is an upcoming area in IT, we struggle to understand what those terms are and how these technologies can help us. I am not an expert in Big data, but I would share what I have known so far.

The diagram above gives a high level view of various terminologies in the big data arena and their relationship with each other.

Big Data :

Big data is the data that is big in terms of

Volume (terabytes, petabytes and above)
Velocity (rate of transmission and reception)
Variety (complexity, text, pictures, videos etc).

Big data by itself is messy and would not make any sense as it is unstructured or semi-structured unlike traditional data. The only way it can be useful is pull out only the relevant information. Pulling out relevant information out of such big data is the real problem that traditional sytems cannot help in. This is where big data technologies come into picture

There are three technology classes that support Big Data. Hadoop, NoSQL and Massively Parallel Processing (MPP) databases. I would start with MPP databases as they have been there for decades.

MPP Databases: These databases spread the data into independent storage and CPU thus achieving parallellism and great processing capability. This is special hardware and software combination that help you achieve such parallellism. The big products in market are IBM's Netezza, HP's Vertica and EMC's Greenplum. Microsoft's SQL Server Parallel Datawarehouse (PDW) is one upcoming MPP database.
Hadoop : Hadoop is one of the technology classes that support big data. It is an opensource version of Mapreduce created by Apache software foundation. In general, Mapreduce is parallel programming framework.
NoSQL : NoSQL stands for "not only SQL". It represents a class of databases that do not follow RDBMS principles (like transactional integrity) and mainly deal with big data. There are four categories of NoSQL databases. Please refer to the diagram above for the database products that fall under the below categories. Few of the NoSQL databases work in conjunction with Hadoop components.

Document store
Graph Database
Key-Value store
Wide Column store

I hope this post was useful in understanding the basic terminologies of big data and related technologies.

In the next big data post, I would provide more details on NoSQL database categories.