Thursday, February 4, 2010

Datastage interview questions from major companies?

What does level of Granularity of a fact table signify?
In simple terms, level of granularity defines the extent of detail. As an example, let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.

Differences between star and snowflake schemas ?
The star schema is created when all the dimension tables directly link to the fact table. Since the graphical representation resembles a star it is called a star schema. It must be noted that the foreign keys in the fact table link to the primary key of the dimension table. This sample provides the star schema for a sales_ fact for the year 1998. The dimensions created are Store, Customer, Product_class and time_by_day. The Product table links to the product_class table through the primary key and indirectly to the fact table. The fact table contains foreign keys that link to the dimension tables.

What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.

What is a Data Warehouse?
Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated….This makes it much easier and more efficient to run queries over data that originally came from different sources. Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

Steps In Building the Data Model
While ER model lists and defines the constructs required to build a data model, there is no standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up

Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common

What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

What are the methodologies of Data Warehousing?
They are mainly 2 methods.1. Ralph Kimbell Model
2. Inmon Model.
Kimbell model always structed as Denormalised structure.
Inmon model structed as Normalised structure.
Depends on the requirements of the company anyone can follow the company's DWH will choose the one of the above models.

What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values.?

What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information
A factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are often used to record events or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that didn?t sell)
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or university

Is it correct/feasible develop a Data Mart using an ODS?
Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to?store transaction data and few Days (less historical data) this is what datamart is required so it is coct to develop datamart using ODS .

Explain degenerated dimension.
A Degenerate dimension?is a?Dimension which has only a single attribute.
This dimension is typically represented as a single field in a fact table.
The data items thar are not facts and data items that do not fit into the existing dimensions are termed as Degenerate Dimensions.
Degenerate Dimensions are the fastest way to group similar transactions.
Degenerate Dimensions are used when fact tables represent transactional data.
They can be used as primary key for the fact table but they cannot act as foreign keys.

What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view,? the SQL statement executes.
materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.

What are non-additive facts?
Fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.

No comments:

Post a Comment