Archive

Archive for November, 2009

Understanding ETL

November 29, 2009 Leave a comment

The Extract-Transform-Load (ETL) system is the foundation of the data warehouse. A properly designed ETL system extracts the data from source systems, performs transformations and cleansing and delivers the data in a presentation ready format after which the data will be loaded to the ware house. The following figure is the schematic description of the ETL process.

image

The 4 steps of ETL process are explained below:

Extracting: In this phase, data from different types of source systems are fetched into the staging area. The source systems can be mainframe, production sources or any other OLTP sources. Source files can also be different. Data can be stored in relational tables as well as flat files (e.g. Notepad files). The first job of ETL is to fetch data from these different sources.

Cleansing: In most cases level of data quality in an OLTP source system is different from what is required in a data warehouse. To achieve those data quality, cleansing has to be applied. Data cleansing consists of many discrete steps, including checking for valid values, ensuring consistency across values, removing duplicates and checking whether complex business rules and procedures have been applied.

Conforming: Data conformation is required whenever 2 or more data sources are merged in the data warehouse. Separate data sources cannot be queried together unless some or all the textual labels in these sources have been made identical and unless similar numerical measures have been rationalized.

Delivering: The main motto of this step is to make the data ready for querying. It includes physical structuring the data into a set of simple, symmetric schemas known as dimensional models, or star schemas (we will discuss star schema and dimensional model later). These schemas are a necessary basis for building an OLAP system.

Technorati Tags: ,

Data Warehousing… What is it all about?

November 18, 2009 2 comments

We are all familiar with the term “Database”. What is a Database? We can say that a database is a storage of some meaningful data specific to one’s business/personal need. A database can store any type of data. But if a database is storing the data then what’s the need of a Data Warehouse? So at first we have to know the basic difference between a Database and a Data Warehouse.

A Database is that which is storing only the most current data. In a more technical term, it is known as OLTP source. OLTP stands for Online Transactional Processing. But whenever we use Oracle/DB2/Teradata etc. to store historical data, it becomes a Data Warehouse. By the term Historical data, we mean “Snapshots” of data. That is data of same process taken at different instant of time. Technically its known as OLAP (Online Analytical Processing) system.  A simple example can be given at this context. Lets consider a bank. A database of the bank will store the latest transactional data only. But each day the bank have these types of data. So they need to archive these. So they are storing it in a daily basis, i.e. they are taking snapshots of data and consequently the are creating a data warehouse.

So am I able to draw the line clearly between a database and a data warehouse? Maybe I am ( thinking positively 🙂 ). Now another question comes into the picture. So why do one have to maintain a data warehouse. Because maintaining a data warehouse is a costly job. Day by day the size of the warehouse will grow. The answer is Business. One have to maintain the historical data to analyse the trend of his business and making decisions based on those data. This decisions can be a big factor in case of the present and upcoming performance of his business.

The job of a data warehouse specialist can be in many phase. He/she has to extract the data from different OLTP source systems, apply the modifications or cleansing according to the requirements and then loading this data to the data warehouse. And in another phase he have to fetch the data from the data warehouse, analyse it and develop the report. The end users can see those reports and based on those he/she can make the business decisions.

Ok… lets end it here. I will discuss more about this in depth from the next post. Take care..see you soon friends.

Technorati Tags:
Categories: Data warehousing

SYSTEM FILES>>>

November 4, 2009 2 comments

The Difference between the FAT 32 and NTFS file syetems::

NTFS::

1. Allows access local to w2k w2k3 XP win NT4 with SP4 and later may get access for some file.
2. Maximum size of partition is 2 terabytes and more.
3. Maximum file size is upto 16TB.
4. File and folder encryption is possible only in NTFS.

FAT 32::
1. Allows access to win 95 98 Win Millenium Win2k XP on local partitions.
2. Maximum size of partition is upto 2TB.
3. Maximum file size is upto 4GB
4. File and folder encryption is not possible.

Categories: 1 Tags:

I am BACK ..after a long gap

November 4, 2009 1 comment

Hello everyone. It was a tough 3 months for me. Oh, I forgot, you don’t have a idea of my present status. I have joined Accenture on August and relocated to Bangalore. So my present location is Bangalore aka Bengaluru. The first 2 month was really a hectic one. It was tough to follow a daily schedule of 8-30 to 20-30 with exam after each couple of days. Many friends of mine were unable to sustain in this hectic procedure and some of them got unlucky and eventually  they have been sacked. I got lucky and finally after 2 months of “without food – without sleep” effort I have cleared the training procedure. I have been trained on Informatica, the ETL tool. Previously there was a fascination in my mind that someday I will be a successful programmer. But it was again proved that there is something called as destiny.

By the way, though the training is a tough one, somehow It was also enjoyable. At least we have to think like that. The services provided in the Accenture premises are excellent. And also I have to mention about the building structure. The skyscrapers are really eye catching.

DSCN0668 bang4

          Accenture(Bang6), Divyashree Tech Park                                     Accenture(Bang4), Eco Space SEZ

Accenture (Bang3), Bannerghata Main Road, Dairy Circle

Above are some of the pictures of the Accenture offices in Bangalore. Presently I am working at Bang4 at Eco Space SEZ.

Now lets come to the technical stuff. So in the training period I was allocated Data warehousing as a stream. At first I thought it would be boring and conceptual one. But later on I am in love with this subject and specially the tool Informatica. The subject is totally inline with the present market needs and consumer goods and productions. Anyone can think of many practical scenarios in which the subject will be applicable.

Presently I am working on MicroStrategy, a reporting tool ( a reporting tool is necessary to create reports on the basis of data available in data warehouse).

So if all goes well then in the coming years I am gonna have some competency in Data warehousing field. Lets pray for that. Oh, one more thing. The upcoming posts will contain more about Informatica, Business Objects, MicroStrategy and other concepts of Data warehousing. So Hasta luego.

Categories: Data warehousing