The Health Group
The Health Group

There are at least eleven predefined health groups in ASWM. Five hardware-dependant groups are MB Fans, MB Temperatures, MB Voltages, Backplane, and Power. Plus six hardware-independent groups are Disks, Drives, Network, Memory, CPU, and RAID (software, Linux only). We will give a more specific explanation to these groups. Now we spent some time to have a look at the common sense about how to check these groups and their sensors.

When you click a health group's title on the health panel, the information panel will display all the sensors associated to that group. The sensors will be arranged in a column from top to bottom. Each sensor is shown in a specific appearance depending on its character of value. As we once mentioned, a sensor value could be numerical or descriptive, therefore the appearance of a sensor would be numerical or descriptive.

A numerical sensor has the following elements: a sensor value, a value's unit, four optional threshold values (fatal-high, fatal-low, warning-high, and warning-low), maximum value, minimum value, average value, and an optional set of properties that affiliated to that sensor. (A nice-looking meter shows the simple relationship between some of these elements)


Fig. A numerical sensor that has no affiliated properties

Fig. A numerical sensor that has affiliated properties

Each numerical sensor may have three (or two) round buttons above sensor's name. The first one is the property button (optional, only for those sensors that have affiliated properties), the second one is threshold values button, and the last one is real-time chart button. Click the property button will expand the property section of the sensor. Click the threshold values button will expand the setting section of threshold values for that sensor. Click the real-time chart button will change the information panel to real-time chart panel and begin to chart that sensor in a coordination system.


Fig. A setting section of threshold values is expanded.

Fig. A property section is expanded.

To change the threshold values in the setting section of threshold values, just fill in the preferred values in appropriate text box, and then click the "Apply" button. If the ASWM transmits the changes successfully to ASWM agent on that system, this section will be collapsed consequently. You have to wait for the next time that the health panel retrieves the data again from ASWM agent, and then the result of changed threshold values would be confirmed on the meter of sensor.

To ask ASWM to reset the statistics of minimum, maximum and average values of a sensor's value, simply click the round button that on the right side below the group's title in the information panel.

A descriptive sensor may have variety of appearances. Different descriptive sensors in different health group always have different look.


Fig. A descriptive sensor in the Backplane group

Fig. A description sensor in the Disks group

Fig. A descriptive sensor in the RAID group

A descriptive sensor may have the elements of sensor value and an optional set of properties that affiliated to that sensor. The sensor value always can be translated into one or more descriptions or statements that could describe the sensor's status.

MB Fans

This group has all sensors of fan's rotating speed on the system except those fans located on the backplane. A fan speed sensor always has warning-low and fatal-low threshold values. Its unit always is RPM (rotation per minute). The first thing you have to know is that these threshold values cannot always be set as low as your wishes. There is always a limitation of threshold values of fans since the hardware monitor can only detect the fan speed that is above a circuit's constraint. You will be denied if you are trying to set an unacceptable value.

If a new fan is installed despite the system is powered on or off, the sensor of this new fan will be automatically detected and joins into the health group. On the contrary, if a fan is removed, since the hardware monitor cannot figure out whether the fan is broken or removed, the alert will always be raised. To eliminate such ambiguous situation, you have to "rescan sensors" in agent's configuration page that will be explained later.


Fig. The MB Fans Group

MB Temperatures

This group has all thermal sensors on the system except those located on the backplane. A thermal sensor always has warning-low, fatal-low, warning-high and fatal-high threshold values. Its unit always is degree of Celsius.


Fig. The MB Temperatures Group

MB Voltages

This group has all voltage sensors on the system except those located on the backplane. The amount or type of sensors may be varied on different model of system. The voltage sensor always has warning-low, fatal-low, warning-high and fatal-high threshold values. Its unit always is mV (micro volt).


Fig. The MB Voltages Group

Backplane

This group has all sensors on the backplane including drive bay's power status, existences, backplane's voltages, and temperature. The amount or type of sensors may be varied on different model of backplane. The information panel also shows the backplane's circuit board image on the top of the left side. It also shows the picture of the backplane's drive bays that reacts the real-time changes on the top of the right side. When an existed and drive-loaded tray is pulled out off a bay, there is an animated image that will be bounced on the real bay position on that picture. If a drive-loaded tray is pushed into an empty bay, then a tray image will show up immediately on the exact bay position on that picture.

When the status of any bay is changed (pulled out or pushed in), a rectangle button, called "remap", will show up at once above the Driver Bay 1's title. This remap button allows you to "confirm" the changes of bays. By applying "remap" action, the sensors of "existence of hard drive in a bay" and the sensors of "power status of a bay" will be quiet and all existences of hard drives and drive bay's power status will be updated to react the current circumstance.


Fig. The Backplane Group

Power

This group has all sensors on system's redundant power supplies. Each power supply may have AC or DC sensors. The amount or type of sensors may be varied on different model of power supply.

Disks

This group provides sub-groups that all information of data storages including hard drive, floppy diskette drive, read-only storage (e.g., CD-ROM, DVD-ROM, etc), erasable media storage (e.g., CD-R, CD-RW, DVD-R, etc), and removable media storage (e.g., USB storage, etc). On a Linux platform, only hard drive sub-groups are supported.

Each sub-group of hard disk has at least one S.M.A.R.T. (Self Monitoring Analysis and Reporting Technology) sensor. Currently, only IDE and SCSI hard disks support S.M.A.R.T. The S.M.A.R.T. is a drive technology that reports its own degradation enabling the OS to warn the user of potential failure.

This group also supports a rectangle button, called "rescan". If the storage on the system is changed and your OS or ASWM agent does not recognize that, you could click that button to forcefully get acquaintance with the changes.

Each sub-group may have the following properties: Model, Bus Type, Firmware Revision, Serial Number, Total Capacity, Bus ID, Target ID, LUN ID, and Drive Dependence. The "Drive Dependence" shows what file systems are located on this disk.


Fig. The Disks Group

Drives

This group contains the sensors that will detect the utilization of each active file system. An active file system means a partition has a drive letter is assigned on a Windows platform, or a device that has been mounted on a point of directory on a Linux platform. The sensor's value always be ranged from 0 to 100 and its unit is percentage (%). 0% means it's an empty file system and 100% means it's a full (or out of space) file system. A drive utilization sensor always has warning-high and fatal-high threshold values.

Each sensor has affiliated properties such as Volume, Volume Name, File System Type, Capacity, Free Space and Disk Dependence. The "Disk Dependence" means which physical hard drive that this file system is located. A single physical hard drive always can have multiple file system resided on it.


Fig. The Drives Group

Network

This group holds the sensors that detect the traffic utilization of each network device. The bandwidth usage (amounts of input and output octets) is collected and calculated with the transmission speed of the network device. For network cards, since the duplex mode cannot be detected correctly, hence full duplex is assumed for further calculation. If a network card works in half duplex mode, the real utilization should be half of reported value. A network utilization sensor always has warning-high and fatal-high threshold values. The loopback device will not be monitored.

Each sensor has affiliated properties: Name, Speed, Discarded Packets, Error Packets, Input Octets, and Output Octets.


Fig. The Network Group

Memory

This group possesses three kinds of memory usage: physical memory utilization (%), page file utilization (%), and page fault per second (Hz). The first two sensors are obvious to understand. The last one is how often that the activity of page fault occurs. A page fault means that some data are already swapped out into secondary storage such as disks but now they need to be swapped back into physical memory for further use. A heavy loading system (with too few physical memory or has too many tasks executed) usually has high page fault occurrence. This is an important index of system performance. You should always concern this and tune up to gain more computing capability. A memory utilization sensor always has warning-high and fatal-high threshold values.

The group has affiliated properties: Total Physical Memory, Available Physical Memory, Total Page File Size, Available Page File Size, Total Virtual Memory, and Available Virtual Memory.


Fig. The Memory Group

CPU

This group has sensors that detect how many percentages of each CPU loadings. In a multi-processor system, the CPU is logical only if the system supports Hyper-Threading Technology (i.e. Intel Pentium 4 Xeon, Intel Pentium 4 533Mhz FSB 3.06GHz, Intel Pentium 4 800Mhz FSB 3.2GHz, 3.0GHz, 2.8C GHz, 2.6C GHz, and 2.4C GHz); otherwise, CPU is physical only.

In addition, there is an sensor of overall CPU utilization that is the total utilization of all CPU loadings. A CPU utilization sensor always has warning-high and fatal-high threshold values.

This group has affiliated properties: Total Physical Processors, Total Logical Processors, and Threshold Event for CPU Utilization. The first two are trivial but the last one is somehow important. You can stop ASWM to generate events when the utilization goes beyond the threshold values. This is useful if your system always in a high-loading condition.


Fig. The CPU Group

RAID

This group encloses the sensors that are detecting the RAID status. Currently, it is only supported on a Linux platform for Soft RAID system. RAID status sensor is somewhat like a disk S.M.A.R.T. sensor that also is a sort of descriptive sensor.

Each sensor has the affiliated properties: RAID name, RAID type, RAID Configuration, Total Disks, Active Disks, Working Disks, Failed Disks, Spare Disks, Members, and Partition Members.


Fig. The RAID Group