Data Stage FAQS: February 2010

Tuesday, February 23, 2010

Real time Problems in datastage with Solutions

1)I am getting this error while opening the administrator client.

DSR_ParseConfig: OPENSEQ of /opt/IBM/InformationServer/Server/PXEngine/etc/master_config.apt took ELSE clause, STATUS()=2

The default permission for the etc directory is rwxr-xr-x (755). This means that only the owner has write permission. Update the permission of this etc directory (located under ../IBM/InformationServer/Server/PXEngine on the WebSphere DataStage server machine) to give write permission to the user.
To do this, change the permission to allow group write. In the Unix shell, cd to the PXEngine directory and issue the command:

chmod 775 etc

2)How can we done the Unit Testing on datastage Jobs?

Usually, Unit testing covers the testing of individual job/sequence.

We need to ensure whether the job is completely performing the task, what it is supposed to do.

It should fullfill all the business logic coded in the job. We should prepare the test cases corresponding to all key logic coded like

- check of all functions coded,
- creation of file/dataset,
- insert/update/delete logic,
- Nullablilty checking etc..

Unit Testing - Need to ensure every single expression/logic is working fine.

3)I need to do multiple lookups for different columns in the source against the same dataset. I cannot use the Range lookup since the values are not necessarily in a certain range.

Do I have to create multiple datasets for each lookup? Or, are there any better options that I can avoid creating multiple datasets?

A)You need to use 'multiple lookup' stages not 'multiple datasets'.

I am creating a Parallel job using Change Capture Stage. SQL Server(source stage), Oracle (target stage)
I like change capture to do a insert when there is a insert(working)
update when there is a delete (not working)
update when there is a update(working)

Please let me know how I can achieve this.






A) I'm assuming if you say "not working" , the delete records are not coming out of the Change Capture Stage. If this is the case, inside the Change Capture Stage in the Stage Properties Tab please select the option as "FALSE" for Drop Output For Delete.

5) Just want to understand the partitining concepts in PX.

Ex. I have configuration files with 2 nodes . The source file is partitioned on custno.
e,g custno are 1,2,3,4.

So , datawise there are 4 partitions but only 2 logical nodes.

1> How will the data be allocated to the 2 logical nodes.
2> Also, if have a stage variable.As i understand 2 copies will be created.
or number of stage variable will be equal to number of data partitions.
A)





In datastage Partioning and nodes is different concepts. The job can be partioned into seven types ,Nodes is reduce the time given for nodes

Monday, February 15, 2010

ETL Fundamentals

DataStage Parallel Processing Architecture Overview

Saturday, February 13, 2010

datastage faqs

1.how to generate even numbers in surrogate or tranformar stage ? 2. how many ways to remove duplicate values?

To generate even numbers using transformer, we can make use 
of stage variables. declare a stage variable say SV1 and 
initialise it to 0. In the derivation of that stage 
variable increment it by 2 which looks like SV1+2. By doing 
this we can generate even numbers.

how many write modes are there in ds ?

1.Append
2.Overwrite
3.Create(Error if exist)
4.Use existing (discard record)
5.Use existing (discard record & schema)

How to stop a job when its status is running?

To stop a running job go to DataStage Director and click the stop button (or Job -> Stop from menu). If it doesn't help go to Job -> Cleanup Resources, select a process with holds a lock and click Logout

If it still doesn't help go to the datastage shell and invoke the following command: ds.tools 
It will open an administration panel. Go to 4.Administer processes/locks , then try invoking one of the clear locks commands (options 7-10).

How to send notifications from Datastage as a text message (sms) to a cell phone

There is a few possible methods of sending sms messages from Datastage. However, there is no easy way to do this directly from Datastage and all methods described below will require some effort. 

The easiest way of doing that from the Datastage standpoint is to configure an SMTP (email) server as a mobile phone gateway. In that case, a Notification Activity can be used to send message with a job log and any desired details. DSSendMail Before-job or After-job subroutine can be also used to send sms messages. 
If configured properly, the recipients email address will have the following format: 600123456@oursmsgateway.com 

If there is no possibility of configuring a mail server to send text messages, you can to work it around by using an external application run directly from the operational system. There is a whole bunch of unix scripts and applications to send sms messages. 
In that solution, you will need to create a batch script which will take care of sending messages and invoke it from Datastage using ExecDOS or ExecSh subroutines passing the required parameters (like phone number and message body).

About AWK Command in Unix

Renaming within the name:
ls -1 *old* | awk '{print "mv "$1" "$1}' | sed s/old/new/2 | sh
(although in some cases it will fail, as in file_old_and_old)
remove only files:
ls -l * | grep -v drwx | awk '{print "rm "$9}' | sh
or with awk alone:
ls -l|awk '$1!~/^drwx/{print $9}'|xargs rm
Be careful when trying this out in your home directory. We remove files!
remove only directories
ls -l | grep '^d' | awk '{print "rm -r "$9}' | sh
or
ls -p | grep /$ | wk '{print "rm -r "$1}'
or with awk alone:
ls -l|awk '$1~/^d.*x/{print $9}'|xargs rm -r
Be careful when trying this out in your home directory. We remove things!
killing processes by name (in this example we kill the process called netscape):
kill `ps auxww | grep netscape | egrep -v grep | awk '{print $2}'`
or with awk alone:
ps auxww | awk '$0~/netscape/&&$0!~/awk/{print $2}' |xargs kill

The 15 Most Important UNIX commands

The 15 Most Important UNIX commands

man - show manual for a command, example: man ls hit q to exit the man page.
cd - change directory, example: cd /etc/
ls - list directory, similar to dir on windows. example: ls /etc, use ls -l /etc to see more detail
cp - copy a file or directory, example: cp source dest if you want to copy a directory use the -R option for recursive: cp -R /source /dest
mv - move a file, example: mv source dest
rm - remove a file, example: rm somefile to remove a directory you may need the -R option, you can also use the -f option which tells it not to confirm each file: rm -Rf /dir
cat - concatenate, or output a file cat /var/log/messages
more - outputs one page of a file and pauses. example: more /var/log/messages press q to exit before getting to the bottom. You can also pipe to more | more from other commands, for example ls -l /etc | more
scp - secure copy, copies a file over SSH to another server. example:scp /local/file user@host.com:/path/to/save/file
tar - tape archiver, tar takes a bunch of files, and munges them into one .tar file, the files are often compressed with the gzip algorithm, and use the .tar.gz extension. to create a tar tar -cf archive.tar /directory, then to extract the archive to the current directory runtar -xf archive.tar to use gzip, just add a z to the options, to create a tar.gz: tar -czf archive.tar.gz /dir to extract it tar -xzf archive.tar.gz
grep - pattern matcher, grep takes a regular expression, or to match a simple string you can use fast grep, fgrep failure /var/log/messages, I'm usually just looking for a simple pattern so I tend to use fgrep more than regular grep.
find - lists files and directories recursively on a single line, I usually pipe grep into the mix when I use find, eg: find / | fgrep log
tail - prints the last few lines of a file, this is handy for checking log files tail /var/log/messages if you need see more lines, use the -noption, tail -n 50 /var/log/messages you can also use the -f option, which will continuously show you the end of the file as things are added to it (very handy for watching logs) tail -f /var/log/messages
head - same as tail, but shows the first few lines the file
vi - text editor, there are several text editors such as emacs, and nano, but vi is usually installed on any server so its a good one to learn. To edit a file type vi file to edit a line press Esc i then to save changes and exit use Esc wq, or to quit without saving use Esc q!. There are a million other commands, but that will enable you to edit files at a basic level.

UNIX FAQS

How do you change File Access Permissions?

Every file has following attributes:
owner's user ID ( 16 bit integer )
owner's group ID ( 16 bit integer )
File access mode word

'r w x -r w x- r w x'

(user permission-group permission-others permission)
r-read, w-write, x-execute
To change the access mode, we use chmod(filename,mode).
Example 1:
To change mode of myfile to 'rw-rw-r–' (ie. read, write permission for user - read,write permission for group - only read permission for others) we give the args as:
chmod(myfile,0664) .
Each operation is represented by discrete values

'r' is 4

'w' is 2

'x' is 1

Therefore, for 'rw' the value is 6(4+2).

Example 2:
To change mode of myfile to 'rwxr–r–' we give the args as:

chmod(myfile,0744).

How do you find out what’s your shell?
- echo $SHELL
What’s the command to find out today’s date?
- date
What’s the command to find out users on the system?
- who
How do you find out the current directory you’re in?
- pwd
How do you remove a file?
- rm

Can anyone explain me the following command?
I came across this in one of the shell acripts.

sed -e '/^$/d' -e 's/^[ ][ ]*/ /'

A)This sed command contains two edits:

'/^$/d' this deletes all blank lines

's/^[][]*/ /' for each line, replace all leading spaces with one space.

Wednesday, February 10, 2010

DATASTAGE ARCHITECTURE DIAGRAM

Can one retrieve only the Nth row from a table?

SELECT * FROM t1 a
WHERE  n = (SELECT COUNT(rowid)
              FROM t1 b
             WHERE a.rowid >= b.rowid);

Difference between TRUNCATE, DELETE and DROP commands?

The DELETE command is used to remove some or all rows from a table. A WHERE clause can be used to only remove some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it. Note that this operation will cause all DELETE triggers on the table to fire.

TRUNCATE removes all rows from a table. The operation cannot be rolled back and no triggers will be fired. As such, TRUNCATE is faster and doesn't use as much undo space as a DELETE.

DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command. Therefore DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot be rolled back.

Important SQL Queries

1)How to delete Duplicate rows in a Table ?

DELETE from table_name A WHERE ROWID>(SELECT min(ROWID) from table_name B WHERE A.Col_name=B.Col_name)

2)How to Identify Duplicates in a Table ?

Select Col1,Col2,Col3....... from table_name GROUP BY Col1,Col2,Col3....Having Count(*)>1;

3)To get Top Five Salary Employee Details ?

Select * from (Select * from table_name ORDER BY Salary Desc) WHERE ROWNUM<=5;

How to release a lock held by jobs?

Click on Job->Clean up Resources->Select the process which holds lock and click on logout.

4 Ways of Executing DS Job

1.Datastage Director

2.Datastage Routine--DSRUNJOB

3.Thru Job Sequencer

4.Thru Unix Command --dsjob -run

Aggregator Stage

Aggregator Stage

The Aggregator stage is a processing stage. It classifies data rows from a single input link into groups and computes totals or other aggregate functions for each group. The summed totals for each group are output from the stage via an output link. Follow this link for a list of steps you must take when deploying an Aggregator stage in your job.

The stage editor has three pages:

Stage page. This is always present and is used to specify general information about the stage.

Inputs page. This is where you specify details about the data being grouped and/or aggregated.

Outputs page. This is where you specify details about the groups being output from the stage.

The aggregator stage gives you access to grouping and summary operations. One of the easiest ways to expose patterns in a collection of records is to group records with similar characteristics, then compute statistics on all records in the group. You can then use these statistics to compare properties of the different groups. For example, records containing cash register transactions might be grouped by the day of the week to see which day had the largest number of transactions, the largest amount of revenue, etc.

Records can be grouped by one or more characteristics, where record characteristics correspond to column values. In other words, a group is a set of records with the same value for one or more columns. For example, transaction records might be grouped by both day of the week and by month. These groupings might show that the busiest day of the week varies by season.

About Change Capture Stage (IMPORTANT)

Change Capture Stage

The Change Capture Stage is a processing stage. The stagecompares two data sets and makes a record of the differences. An example before and after data set are given in Parallel Job Developer's Guide. Follow this link for a list of steps you must take when deploying a Change Capture stage in your job.

The Change Capture stage takes two input data sets, denoted before and after, and outputs a single data set whose records represent the changes made to the before data set to obtain the after data set. The stage produces a change data set, whose table definition is transferred from the after data set’s table definition with the addition of one column: a change code with values encoding the four actions: insert, delete, copy, and edit. The preserve-partitioning flag is set on the change data set.

The compare is based on a set of key columns, rows from the two data sets are assumed to be copies of one another if they have the same values in these key columns. You can also optionally specify change values. If two rows have identical key columns, you can compare the value columns to see if one is an edited copy of the other.

The stage assumes that the incoming data is hash-partitioned and sorted in ascending order (this is done automatically if (auto) is selected on the partitioning tab). The columns the data is hashed on should be the key columns used for the data compare. You can achieve the sorting and partitioning using the Sort stage or by using the built in sorting and partitioning abilities of the Change Capture stage.

You can use the companion Change Apply stage to combine the changes from the Change Capture stage with the original before data set to reproduce the after data set.
The Change Capture stage is very similar to the Difference stage.

The stage editor has three pages:

Stage page. This is always present and is used to specify general information about the stage.

Inputs page. This is where you specify details about the data set having its duplicates removed.

Outputs page. This is where you specify details about the processed data being output from the stage.

The General tab allows you to specify an optional description of the stage.

The Properties tab lets you specify what the stage does. The Advanced tab allows you to specify how the stage executes. The Link Ordering tab allows you to specify which input link carries the before data set and which the after data set.

Concepts Of DWH

1. Difference between Hashfile and Sequential File?. What is modulus?

2. What is iconv and oconv functions?.

3. How can we join one Oracle source and Sequential file?.

4. How can we implement Slowly Changing Dimensions in DataStage?.

5. How can we implement Lookup in DataStage Server jobs?.

6. What are all the third party tools used in DataStage?.

7. what is the difference between routine and transform and function?.

8. what are the Job parameters?.

9. Plug-in?.

10.How can we improve the performance of DataStage jobs?.

11.How can we create Containers?.

12.What about System variables?.

13 What is the use of Usage analysis ?

14 Different ways a project can be moved to production ?...example ….export-
import and Version control.

15 15 What database is the Datastage repository use …? Answer: Universe Database

16 How is the scheduling done in the project ?

17 Which version of DataStage is used in the project?

18 What are the performance tuning required while dealing with large data ?

19 What do the reject option in transformer do?

20 What is the architecture of datastage?

21 How do you define and use the job parameters?

22 What is stage variables,system variables,environment varaibles?

23 How to use routines in datastage?

24 What is difference between shared-container and local-container?

25 How do you connect to Oracle?

26 Please explain any ETL process that you have developed?

27 What is hash file? Types of hash files.

28 If you are doing any changes in shared-container will it reflect in all the jobs
wherever you used this shared- container?

29 Have u written any custom routines in your project? If so explain?

30 How do you get log info into a file?

31 What is before job subroutine/After job subroutine? When do you use them?

32 How do you backup and restore the project?

33 What is Clear Status File and when do you use it?

34 What is Cleanup Resources and when do you use it?

35 Can I join a flat file and oracle and load into oracle? Is this possible?

36 While loading some data into target suddenly thier is a problem loading process
stopped how can u start loading from the records that were left?

37 What are the general problems that u face in Datastage?

38 What are the various reports that could be generated using this Datastage?

39 How to remove blank spaces from data

40 What is Active and Passive stage?

41 What all are the stages you have used in your project?

42 Could DataStage generate test cases?

43 What is difference between hash file and sequential file

44 What is the difference between Transform and routine

45 What is sequencer?

Thursday, February 4, 2010

Data Warehousing Basic Questions

What is Fact Table?

Fact Table is

a) Completely Normalize

b) Partially Normalized

c) Completely De-normalized

d) Partially De-normalized

What is a Data Warehousing?

Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources.

Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

What are Data Marts?

Data Mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.

What is a Star Schema?

A relational database schema organized around a central table (fact table) joined to a few smaller tables (dimension tables) using foreign key references. The fact table contains raw numeric items that represent relevant business facts (price, discount values, numberof units sold, dollar value, etc.)

What is Dimensional Modelling?

Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

What is Snow Flake Schema?

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance

Data Stage Components

Types of DataStage clients:

DataStage Administrator
DataStage Designer
Datastage Manager
Datastage Director

DataStage Administrator
Most DataStage configuration tasks are carried out using the DataStage Administrator, a client program provided with DataStage.

DataStage Designer
DataStage Designer is used to create DataStage Jobs that are compiled into executable programs. Design the jobs that extract, integrate, aggregate,load, and transform the data. Create and reuse metadata and job components. Allows you to use familiar graphical point-and-click techniques to develop processes for extracting, cleansing, transforming, I ntegrating and loading
data.

Datastage Manager
Datastage Manager used to store and manage re-usable metadata for the jobs. Also used to import and export components from file-system to Datastage projects. Primary interface to the
DataStage Repository. Custom routines and transforms can also be created in the Manager.

Datastage Director
DataStage Director is the client component that validates, runs, schedules, and monitors jobs run by the DataStage Server. It is the starting point for most of the tasks a DataStage operator needs to do in respect of DataStage jobs.

Server Components:

1. Data stage Engine

2.Metadata Repository

3.Package Installer

How to Make Money Online Learn Secrets for FREE ?

Earn Money By Reading Positive News

View Best Ads and Earn Money

Datastage interview questions from major companies?

What does level of Granularity of a fact table signify?
In simple terms, level of granularity defines the extent of detail. As an example, let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET.

Differences between star and snowflake schemas ?
The star schema is created when all the dimension tables directly link to the fact table. Since the graphical representation resembles a star it is called a star schema. It must be noted that the foreign keys in the fact table link to the primary key of the dimension table. This sample provides the star schema for a sales_ fact for the year 1998. The dimensions created are Store, Customer, Product_class and time_by_day. The Product table links to the product_class table through the primary key and indirectly to the fact table. The fact table contains foreign keys that link to the dimension tables.

What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.

What is a Data Warehouse?
Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated….This makes it much easier and more efficient to run queries over data that originally came from different sources. Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

Steps In Building the Data Model
While ER model lists and defines the constructs required to build a data model, there is no standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up

Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common

What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

What are the methodologies of Data Warehousing?
They are mainly 2 methods.1. Ralph Kimbell Model
2. Inmon Model.
Kimbell model always structed as Denormalised structure.
Inmon model structed as Normalised structure.
Depends on the requirements of the company anyone can follow the company's DWH will choose the one of the above models.

What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values.?

What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information
A factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are often used to record events or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that didn?t sell)
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or university

Is it correct/feasible develop a Data Mart using an ODS?
Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to?store transaction data and few Days (less historical data) this is what datamart is required so it is coct to develop datamart using ODS .

Explain degenerated dimension.
A Degenerate dimension?is a?Dimension which has only a single attribute.
This dimension is typically represented as a single field in a fact table.
The data items thar are not facts and data items that do not fit into the existing dimensions are termed as Degenerate Dimensions.
Degenerate Dimensions are the fastest way to group similar transactions.
Degenerate Dimensions are used when fact tables represent transactional data.
They can be used as primary key for the fact table but they cannot act as foreign keys.

What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view,? the SQL statement executes.
materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.

What are non-additive facts?
Fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it.

DataStage Terms and Concepts

Aggregator stage
A stage type that computes totals or other functions of sets of
data.

BCPLoad stage
A plug-in stage supplied with DataStage that bulk loads data
into a Microsoft SQL Server or Sybase table.

CFD
COBOL File Description. A text file that describes the format of
a file in COBOL terms.

Column definition
Defines the columns contained in a data table. Includes the
column name and the type of data contained in the column.

Container stage
A built-in stage type that represents a group of stages and links
in a job design.

Data Browser
A tool used from within the DataStage Manager or DataStage
Designer to view the content of a table or file.

Data Stage Interview Questions

Difference between Hashfile and Sequential File?

1. Hashfile can be used as lookup but not the seq file.

2. Hashfile works based on Hashed algorithm.

3. The performence is more in Hashfile when it is used as a ref link(for lkp)

4. we can eliminate duplicates by selecting key value in server jobs.

5. Must and should we have to mention one key in Hashfile.

6. There is a limit of 2GB in Seq file.

How do you rename all of the jobs to support your new File-naming conventions?

Create a Excel spreadsheet with new and old names. Export the wholeproject as a dsx. Write a Perl program, which can do a simple rename of the strings looking up the Excel file. Then import the new dsx file probably into a new project for testing. Recompile all jobs. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic?

There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options. They are radically different in permissions (Truncate requires you to have alter table permissions where Delete doesn't).

Tell me one situation from your last project, where you had faced problem and How did u solve it?

A. The jobs in which data is read directly from OCI stages are running extremely slow. I had to stage the data before sending to the transformer to make the jobs run faster.
B. The job aborts in the middle of loading some 500,000 rows. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted. To make sure the load is proper we opted the former.

How will you determine the sequence of jobs to load into data warehouse?

First we execute the jobs that load the data into Dimension tables, then Fact tables, then load the Aggregator tables (if any).

What are the command line functions that import and export the DS jobs?

A. dsimport.exe- imports the DataStage components.
B. dsexport.exe- exports the DataStage components.

What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director?

Using AutoSys tool we can schedule ds jobs by invoking shell script wriiten to schedule the job.

What are the Different methods of loading Dimension tables?
Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.

What is a linked cube?
A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. End users connected to any of these analysis servers can then access the cube. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

What is degenerate dimension table?
The values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn,t have its own dimensions.

Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?
Star Schema means
A centralized fact table and sarounded by diffrent dimensions
Snowflake means
In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake? contains Partially normalized
Star can not have parent table
But snow flake contain parent tables
Why need to go there Star:
Here 1)less joiners contains
2)simply database
3)support drilling up options
Why nedd to go Snowflake schema:
Here some times we used to provide?seperate dimensions from existing dimensions that time we will go to snowflake
Dis Advantage Of snowflake:
Query performance is very low because more joiners is there

What are slowly changing dimensions?
Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time .
If the?data??in the?dimension table happen to change very rarely,then it is called as slowly changing dimension.
ex: changing the name and address of a person,which happens rerely.

Tuesday, February 23, 2010

Monday, February 15, 2010

Saturday, February 13, 2010

Wednesday, February 10, 2010

Thursday, February 4, 2010

Blog Archive