I did some SSIS development this week where we had to run a number of instances of a package in parallel. We are using a script component to do some fairly complicated calculations between rows in a collection, but running a single instance the machine was doing little more than idling. First we parametrised our source query so we could pass in some parameters and it would return correctly segmented data. The total number of rows from the query was 16,777,216. We implemented this using the row_number function to determine the split for example:
row_number() over (order by Id) % @NumberOfSplits as Split
Here’s an example of the where clause to get the correct Split:
where Split = @Split
Extract Load Package:
Parallel Controller Package:
Normally when you execute a package in BIDS it displays the number of rows as they flow through the data flow. The same thing is true when executing the Parallel Control Package. However the row count displayed is the total of all the running priligy online pharmacy instances of the packages, see below.
You will notice there were only 6 instances running, this is due to the default SSIS setting; the number of executables equals the number of processors plus two. In my case I had 4 processors and as a result it was only executing 6 of 8 instances. After running for a few seconds you could see the number of rows was far higher than the expected value of 2,097,152 for one split.
When Split 7 and 8 started I noticed the row count dropped significantly and when it finished it only had the total row count of 2 splits as seen below.
I’m not sure why this happens in BIDS and I don’t know what SSIS is doing behind the scenes that causes this behaviour. So it would be great if someone could enlighten us so that we can see whether we can exploit this to gain some additional performance.
by Eugene Niemand @