Tuning tez for hive (Optimizing tez):

Step 1 - Determine your YARN Node manager Resource Memory 
(yarn.nodemanager.resource.memory-mb) and your YARN minimum container 
size (yarn.scheduler.minimum-allocation-mb). 

Your yarn.scheduler.maximum-allocation-mb is the same as 
yarn.nodemanager.resource.memory-mb.
yarn.nodemanager.resource.memory-mb is the Total memory of RAM allocated for 
all the nodes of the cluster for YARN. Based on the number of containers, 
the minimum YARN memory allocation for a container is 
yarn.scheduler.minimum-allocation-mb. 

yarn.scheduler.minimum-allocation-mb will be a very important setting for our 
Tez Application Master and Container sizes.
Step 2 - Determine your Tez Application Master and Container Size, that is 
tez.am.resource.memory.mb and hive.tez.container.size.
Set tez.am.resource.memory.mb to be the same as 
yarn.scheduler.minimum-allocation-mb the YARN minimum container size.
Set hive.tez.container.size to be the same as or a small multiple 
(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb 
but NEVER more than yarn.scheduler.maximum-allocation-mb. You want to have 
headroom for multiple containers to be spun up.
A general guidance: Don't exceed Memory per processors as you want one processor 
per container. So if you have for example, 256GB and 16 cores, you don't want to 
have your container bigger than 16GB.
Container Reuse set to True: tez.am.container.reuse.enabled (Default is true)
 Prewarm Containers when HiveSever2 Starts, under Hive Configurations in Ambari.
Step 3 - Application Master and Container Java Heap sizes (tez.am.launch.cmd-opts 
and hive.tez.java.ops respectively)
By default these are BOTH 80% of the container sizes, tez.am.resource.memory.mb and 
hive.tez.container.size respectfully.
NOTE: tez.am.launch.cmd-opts is automatically set, so no need to change this.
In HDP 2.3 and above, no need to also set hive.tez.java.ops as it can be 
automatically set controlled by a new property 
tez.container.max.java.heap.fraction which is defaulted to 0.8 in tez-site.xml. 

This property is not by default in Ambari. If you wish you can add it to 
the Custom tez-site.sml.
if you wish to make the heap 75% of the container, then set the Tez Container 
Java Heap Fraction to 0.75
If you wish this set manually, you can add to hive.tez.java.ops for example 
-Xmx7500m -Xms 7500m, as longs as it is a fraction of hive.tez.container.size.
Step 4: Now to determine Hive Memory Map Join Settings parameters.
tez.runtime.io.sort.mb is the memory when the output needs to be sorted.
tez.runtime.unordered.output.buffer.size-mb is the memory when the output 
does not need to be sorted.
hive.auto.convert.join.noconditionaltask.size is a very important parameter 
to size memory to perform Map Joins. You want to perform Map joins as much as 
possible.
In Ambari this is under the Hive Confguration
SET tez.runtime.io.sort.mb to be 40% of hive.tez.container.size.  
You should rarely have more than 2GB set.  
By default hive.auto.convert.join.noconditionaltask = true
SET hive.auto.convert.join.noconditionaltask.size  to 1/3 of hive.tez.container.size
SET tez.runtime.unordered.output.buffer.size-mb to 10% of hive.tez.container.size
The following parameters control the number of mappers for splittable formats with Tez:
set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split
Increase min and max split size to reduce the number of mappers.
0 Comments

There are no comments yet

Leave a comment

Your email address will not be published. Required fields are marked *