Friday, February 06, 2009
Using Oracle Parallel Query in a RAC Environment
Let’s say that you have a table scan that is going to read 1,000,000 blocks. Furthermore, ignoring all disk caching, driver optimizations, etc. let’s assume that each physical I/O takes 2ms. So, in order to read this entire table into the SGA it will take 1,000,000 I/Os x 0.002 sec/I/O = 2,000 sec which is 33.3 minutes. So, it will take a half an hour (plus overhead) just to read the data into the SGA. Most of the time which is spent idle waiting on the I/O to return.
Now let’s turn on parallel query with a parallelism of 10. Oracle will internally divide this 1,000,000 block table into multiple pieces and pass those pieces off to the 10 parallel query processes. If you are using anything other than a single SATA drive for your database the I/O subsystem can easily handle the parallelism and will now return that same 1,000,000 blocks in 1/10th of the time, or 3.33 minutes.
Note: I prefer to have control over my parallelism, so I will use the parallel hint to invoke parallel query as such; SELECT /*+ PARALLEL(table_alias, degree) */
Taking it up a notch, now let’s consider that the operations are consuming all of the CPU in the system. I can now additionally add nodes to use in a RAC cluster for parallel query, thus taking advantage of parallelism and RAC. This can be done by extending the hint to include the number of nodes to use in a parallel query operation as such; /*+ PARALLEL(table_alias, degree, instances) */
This is only part of the story. Now it is necessary to specify whether we are talking about 10g or 11g. Let’s start with Oracle 10g.
Parallel Query and Oracle 10g RAC
With Oracle 10g, the number of RAC nodes that will participate in a parallel query operation depends on a couple of parameters; instance_groups and parallel_instance_group. The instance_groups parameter is used to set up instance_groups and the parallel_instance_group parameter is used to specify which group you belong to.
First, set up the instance groups. In this example I have a four node RAC cluster:
INSTANCE_GROUPS
Assign the node a set of instance groups that it participates in
- node1.instance_groups = node1,first_two,all_nodes
- node2.instance_groups = node2,first_two,all_nodes
- node3.instance_groups = node3,last_two,all_nodes
- node4.instance_groups = node4,last_two,all_nodes
Setting the parallel_instance_group parameter will define which group you belong to, thus defining where your parallel queries can run.
For example, if I set my parallel_instance_group parameter to Node1, my parallel queries will only run on node 1 as shown here:
Setting the parameter parallel_instance_group in the init.ora file will specify the default value for each user who logs into that instance. An alter session command can be used to modify that parameter as shown here: SQL> alter session set parallel_instance_group=all_nodes;This will allow the parallel query to span all nodes in the RAC cluster.
Note: You don’t have to be on the node that parallel query is configured to run on. If you happen to log onto node 3 and your parallel_instance_group is Node1 non-parallel queries will run on node 3, but any parallel queries will run on node 1.
Parallel Query and Oracle 11g RAC
So, what about Oracle 11g? If you are running on Oracle 11g forget everything I just told you. One of the best features about Oracle 11g is that it now uses services to specify which nodes your parallel query runs on. So, depending on which service you connect to, the preferred nodes for that service are the nodes that parallel queries will use.
Conclusions
Depending on the operation that you are doing, you might be able to achieve huge performance gains by parallelizing across multiple nodes in a RAC cluster. Of course your mileage might vary, and if you aren’t careful you might increase the load on nodes to a point where you are affecting other users. I have a saying that “If you aren’t careful with parallelism, you might paralyze your system.”