Data Stage FAQS: Data Stage Interview Questions

Difference between Hashfile and Sequential File?

1. Hashfile can be used as lookup but not the seq file.

2. Hashfile works based on Hashed algorithm.

3. The performence is more in Hashfile when it is used as a ref link(for lkp)

4. we can eliminate duplicates by selecting key value in server jobs.

5. Must and should we have to mention one key in Hashfile.

6. There is a limit of 2GB in Seq file.

How do you rename all of the jobs to support your new File-naming conventions?

Create a Excel spreadsheet with new and old names. Export the wholeproject as a dsx. Write a Perl program, which can do a simple rename of the strings looking up the Excel file. Then import the new dsx file probably into a new project for testing. Recompile all jobs. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic?

There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options. They are radically different in permissions (Truncate requires you to have alter table permissions where Delete doesn't).

Tell me one situation from your last project, where you had faced problem and How did u solve it?

A. The jobs in which data is read directly from OCI stages are running extremely slow. I had to stage the data before sending to the transformer to make the jobs run faster.
B. The job aborts in the middle of loading some 500,000 rows. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted. To make sure the load is proper we opted the former.

How will you determine the sequence of jobs to load into data warehouse?

First we execute the jobs that load the data into Dimension tables, then Fact tables, then load the Aggregator tables (if any).

What are the command line functions that import and export the DS jobs?

A. dsimport.exe- imports the DataStage components.
B. dsexport.exe- exports the DataStage components.

What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?

Using AutoSys tool we can schedule ds jobs by invoking shell script wriiten to schedule the job.

What are the Different methods of loading Dimension tables?
Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.

What is a linked cube?
A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. End users connected to any of these analysis servers can then access the cube. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

What is degenerate dimension table?
The values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn,t have its own dimensions.

Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?
Star Schema means
A centralized fact table and sarounded by diffrent dimensions
Snowflake means
In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake? contains Partially normalized
Star can not have parent table
But snow flake contain parent tables
Why need to go there Star:
Here 1)less joiners contains
2)simply database
3)support drilling up options
Why nedd to go Snowflake schema:
Here some times we used to provide?seperate dimensions from existing dimensions that time we will go to snowflake
Dis Advantage Of snowflake:
Query performance is very low because more joiners is there

What are slowly changing dimensions?
Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time .
If the?data??in the?dimension table happen to change very rarely,then it is called as slowly changing dimension.
ex: changing the name and address of a person,which happens rerely.

Data Stage FAQS

Thursday, February 4, 2010

Data Stage Interview Questions