VCAP-DCA Objective 6.2 – Troubleshoot CPU and Memory Performance
This is the first pass at studying Objective 6.2 – Troubleshoot CPU and Memory Performance . I have hit most of the areas outlined in this section of the objective and this section as the others will be a work in progress over time with more detailed information to come as I progress in my studying.
• Identify resxtop/esxtop metrics related to memory and CPU
Esxtop does not exist in vMA, you must use resxtop
One limitation of resxtop is the lack of replay mode.
• Identify vCenter Server Performance Chart metrics related to memory and CPU
This can be checked at the cluster or server level and you will have more granular options to check at the server level.
You must first have a firm understanding of terminology when it comes to memory in the VMware world. This blog from Scott Sauer is a must read if you are not familiar with the below terms.
- Transparent Page Sharing
- Memory Overcommitment
- Memory Overhead
- Memory Balloon Driver
- With the ability to overcommit memory you will want to make sure excessive swapping is not occurring at both the host and virtual machine level.
Use memory reservations cautiously. Memory that is reserved cannot be used by another virtual machine that may need it.
Memory ballooning relies on drivers installed in the guest with VMware tools. No VMware tools means potential performance impacts to your server.
There are many counters that can be added and checked including core usage and reservations. Again reservations should be used cautiously.
Watch out for virtual machines that are consistently using a large percentage of cpu resources. A typical server is idle most of the time so check and see if something out of the ordinary is occurring. If in fact the server is using these resources then allocate another vCPU.
High CPU ready times are a dead giveaway for other issues that may be going on.
Virtual machines that have multiple CPU’s installed but the incorrect HAL will not help the virtual guest out.
Skills and Abilities
• Troubleshoot ESX/ESXi Host and Virtual Machine CPU performance issues using appropriate metrics
From the blog of Duncan Epping, these are four commonly needed values to look at when taking into account CPU performance issues. His blog entry is one that is updated over time based on the community so read the comments there and check if any of these thresholds are changed over time. Ultimately performance is relative to the environment so some of this may not always apply.
|CPU||%RDY||10||Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs|
|CPU||%CSTP||3||Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.|
|CPU||%MLMTD||0||If larger than 0 the world is being throttled. Possible cause: Limit on CPU.|
|CPU||%SWPWT||5||VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.|
If you want to move historical data over to a gui based format you can use esxplot or Windows’ Perfmon to interpret the data. To gather this data you would use batch mode as shown below.
esxtop -b -d delay in seconds -n iterations > capturefile.csv
• Troubleshoot ESX/ESXi Host and Virtual Machine memory performance issues using appropriate metrics
Again from the blog of Duncan Epping, five commonly needed values to look at when troubleshooting memory performance. Same applies as above.
|MEM||MCTLSZ (I)||1||If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.|
|MEM||SWCUR (J)||1||If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.|
|MEM||SWR/s (J)||1||If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.|
|MEM||SWW/s (J)||1||If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.|
|MEM||N%L (F)||80||If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”.|
• Use Hot‐Add functionality to resolve identified Virtual Machine CPU and memory performance issues
A couple of good blogs by David Davis and Jason Boche outline what and how to use Hot-Add/Hot-Plug. The ability to use this without having to reboot the guest virtual machine is extremely limited. ON the Microsoft side Windows 2008 Server Datacenter is neccessary to support both features without a reboot while Windows 2008 Server Enteprise edition does not require a reboot for Hot Adding memory. When it comes to removing either hot added memory or hot plugged cpu’s a reboot is required for all Windows guest operation systems.
Other relevant blogs and websites related to this section
http://communities.vmware.com/docs/DOC-10352 http://communities.vmware.com/docs/DOC-11812 http://www.boche.net/blog/index.php/2009/01/28/esxtop-drilldown/ http://www.vreference.com/public/vReference-esxtop1.2.pdf http://labs.vmware.com/flings/esxplot http://www.simonlong.co.uk/blog/2010/03/24/using-esxtop-with-vmware-esxi/