Infinispan Cache Metrics

From NovaOrdis Knowledge Base
Revision as of 15:27, 3 November 2016 by Ovidiu (talk | contribs) (→‎TODO)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Internal

Overview

Each Infinispan individual cache exposes a number of performance metrics, and the cache container managing those caches aggregate some of those metrics at container level. These metrics are described below.

Enabling Statistics

Enabling Cache Statistics

Resetting Statistics

it is possible to reset statistics for an individual cache using the :reset-statistics CLI operation, applied to the management model node corresponding to that cache.

Cache Status

An individual cache instance exposes its status as a String (example: "RUNNING"), over JMX ("cacheStatus") and CLI ("cache-status").

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Time Statistics

The cache statistics mechanism maintains both the amount of seconds since the cache was started (elapsed time) and the amount of seconds since the cache statistics were reset (time since reset).

Read, Write and Remove Count

These statistics are maintained at individual cache level, as well as container, for a specific node.

The number of reads can be calculated by adding the number of hits and misses.

The number of writes is maintained individually as stores.

The number of deletions from the cache can be calculated by adding removeHits and removeMisses

It is preferable to expose these metrics as a rate per second. Some monitoring solutions allow the rate to be calculated automatically (see Data Dog counters).

The container also maintains derivative values, such as readWriteRatio Update with Formula and hitRatio Update with Formula.

Cluster-wide values for these metrics are available as 'clusterwide-hits', 'clusterwide-misses', 'clusterwide-hit-ratio' (cluster-wide hit/miss ratio for the cache (hits/hits+misses)), 'clusterwide-read-write-ratio' (the cluster-wide read/write ratio of the cache ((hits+misses)/stores)), 'clusterwide-remove-hits', 'clusterwide-remove-misses'.

Individual Cache

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Cache Container

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Average Read, Write and Remove Time

These metrics represent an individual cache read/write/remove operation average time response time, in milliseconds. For reads, the value includes hits and misses. May return null if the cache is not started. Maintained as a long. The metric always maintains the last value, even after the cache become idle. In order to reset it, you must reset underlying caches statistics, individually, with :reset-statistics.

The values for these metrics are aggregated at container level, for all caches managed by this container. They are calculated by averaging values for of the corresponding values for individual cache managed by this container.

Individual Cache

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Cache Container

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Number of Entries

Cache Number of Entries

Evictions

The number of evictions is reported both at individual cache level and at container level. It is a long representing the number of cache eviction operations for this specific node. May return null if the cache is not started.

For more details about cache eviction see Infinispan Eviction.

Individual Cache

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Cache Container

JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Invalidations

A long representing the number of cache invalidations. May return null if the cache is not started.

For more details about cache invalidation see Invalidation Mode.

Passivations and Activations

activations is a long representing the number of cache node activation events (bringing a node into memory from a cache store). May return null if the cache is not started. For more details about cache activation see Cache Store Activation.

passivations is a long representing the number of cache node passivation events (writing an entry from memory into a cache store). May return null if the cache is not started. For more details about cache passivation see Cache Store Passivation.

Clustering Performance Statistics

For replicated and distributed caches, RpcManager instances maintain statistics on the number and the quality of internal cluster calls. "Replication" in the name of the metrics means remote invocation of a method in this context. For example, put operation to a distributed cache of numOwners=2 will cause a single remote invocation to create a backup.

replicationCount represents the number of successful remote invocations.

replicationFailures represents the number of remote invocations that failed.

successRatio is equal to:

             replicationCount
 ---------------------------------------- * 100
  replicationCount + replicationFailures

averageReplicationTime is a long representing the average time spent in the transport layer to duplicate data around the cluster, in milliseconds. It is calculated according to the following formula:

   total time taken by all replications
 ---------------------------------------- 
           replicationCount
JDG 6 JMX JDG 6 CLI
JDG 7 JMX JDG 7 CLI

Distributed Cache Metrics

TODO

capacity-factor: A read-write double that controls the proportion of entries that will reside on the local node, compared to the other nodes in the cluster. Value must be positive. This element is only used in 'distributed' cache instances. By default is undefined, which corresponds to a logical value of 1.0.

virtual-nodes: deprecated

TODO

hit-ratio:A double representing the hit/miss ratio for the this node (hits/hits+misses), where the number of successful attempts is divided by the total number of attempts. Expressed in percentage. May return null if the cache is not started. Also see clusterwide-read-write-ratio.

read-write-ratio: A double representing the read/write ratio of the cache ((hits+misses)/stores) for this specific node. May return null if the cache is not started.

cache-loader-stores: The number (as long) of cache loader store operations, for this specific node. May return null if the cache is not started.

cache-loader-loads: The number (as long) of cache loader load operations, for this specific node. May return null if the cache is not started.

cache-loader-misses: The long representing the cache loader miss operation, for this specific node. May return null if the cache is not started.

prepares: The long representing the number of transaction prepares, since the last reset. May return null if the cache is not started.

commits: The long representing the number of transaction commits, since the last reset. May return null if the cache is not started.

rollbacks: The long representing the number of transaction rollbacks, since the last reset. May return null if the cache is not started.

number-of-locks-available: An integer representing the number of exclusive locks available to this cache. Maintained by LockManager.