DataBot User Manual

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

DataBot

Overview

DataBot is a low-overhead O/S level event collector that generates events-compatible events. DataBot is designed to run as a system daemon, indefinitely. Only one DataBot instance per VM is necessary. DataBot will collect timed events and channel them to various destinations, such as files, network, etc. It is capable of collecting memory, CPU, etc. usage statistics, as well as WildFly management domain model and JMX metrics.

Overview.png

Concepts

Metric Definition

Metric Source

Data Consumer

Installation

Download the stable release from

https://github.com/NovaOrdis/databot/releases

The release consists in a ZIP file with a name matching "databot-<version>.zip".

Unzip the release file in a conventional binary directory, such as /opt or /usr/local. An "databot-<version>" sub-directory will be created.

Add .../databot-<version>/bin to PATH.

DataBot needs a Java VM to run. It will attempt to use, in this order:

  1. Value of "DATABOT_JAVA_HOME" environment variable, if set.
  2. Value of "JAVA_HOME" environment variable, if set.
  3. The "java" executable found in path.

Upgrade

Unzip the release ZIP in the parent directory of the previous DataBot installation and then redirect the symbolic link from the previous release to the current one.

Assuming that DataBot 1.0 is installed in /opt/databot-1.0, the sequence to upgrade to DataBot 1.1 consists of the following steps:

cd /opt
unzip databot-1.1.zip

# remove the symbolic link to the previous release:
rm databot

# re-create the symbolic link to the new release:
ln -s ./databot-1.1 databot

Configuration File

Choose a directory to store the configuration file.

If the configuration will be shared by multiple users and there will be used by just one DataBot instance on the system, /etc/databot is a recommended location. Otherwise, each user could maintain an individual configuration file in ~/.databot (recommended) or a directory of their choosing. The location of the configuration file should be exposed as the value of the DATABOT_CONF environment variable in the environment of the user who will execute DataBot. If no DATABOT_CONF environment variable is defined, DataBot will attempt to read ~/.databot/databot.yaml.

The configuration file location can be overridden from command line using -c|--configuration= options. If one of these options is specified, the environment variables and default locations are ignored.

Regardless of how the configuration file is declared, DataBot will fail if the file is not found. For details on the configuration file syntax see Configuration section below.

Complete any of target-specific configuration procedures, if they apply.

Usage

databot [status|stop]

If a DataBot process is already running in the background, an attempt to start another DataBot instance will fail.

To start an instance that runs in foreground, use -f|--foreground command line option. In foreground mode, the output is switched automatically from the configured file destination to /dev/stdout and the output.file configuration, as described below, is ignored.

Commands

help

Display in-line help.

version

Display version information.

status

Display whether a background DataBot process already runs on the system. If a process is found running, the command provides more information about it (such as the PID).

stop

Stop the background DataBot process, if running.

Options

-c|--configuration

-f|--foreground

Run the command in foreground and automatically switch the output from the configured file destination to /dev/stdout.

-v|--verbose

Turns on DEBUG logging at stdout.

-d|--debug

Start the JVM in debug mode, so it can be accessed by a debugger. It also turns on DEBUG logging.

Configuration

#
# DataBot configuration file
#

#
# sampling interval (in seconds)
#
sampling.interval: 20

#
# override embedded logging configuration
#
logging:
  file: /var/log/databot/databot.log
  loggers:
    - io.novaordis: INFO
    - io.novaordis.utilities.os: INFO

sources:

  local-jboss-instance:
    type: jboss-controller
    host: localhost
    port: 9999
    username: admin
    password: admin123
    classpath:
      - /Users/ovidiu/runtime/jboss-eap-6.4.15/bin/client/jboss-cli-client.jar

  remote-jboss-instance:
    type: jboss-controller
    host: other-host
    port: 10101
    username: admin
    password: something
    classpath:
      - /Users/ovidiu/runtime/jboss-eap-6.4.15/bin/client/jboss-cli-client.jar

  local-jboss-over-jmx:
    type: jmx
    host: localhost
    port: 9999
    classpath:
      - /Users/ovidiu/runtime/jboss-eap-6.4.15/bin/client/jboss-cli-client.jar

  remote-jboss-over-jmx:
    type: jmx
    host: some-jmx-host
    port: 4447
    username: admin
    password: something
    classpath:
      - /Users/ovidiu/runtime/jboss-eap-6.4.15/bin/client/jboss-cli-client.jar

#
# output configuration
#
output:
  file: /var/log/databot/databot.csv
  append: true

#
# metrics
#
metrics:
  - PhysicalMemoryFree
  - PhysicalMemoryTotal
  - CpuUserTime
  - CpuKernelTime
  - CpuIdleTime  
  - ${local-jboss-instance}/subsystem=messaging/hornetq-server=default/jms-queue=DLQ/message-count
  - jmx://admin:admin123@localhost:9999/jboss.as:subsystem=messaging,hornetq-server=default,jms-queue=DLQ/messageCount

Variable Support

The configuration file allows declaration and reference of variables.

Some configuration elements cause implicit variable declaration, and references to those variables can be used in subsequence configuration element. An example is a metric source: declaring a metric source enable metric definitions to refer to that metric source name as a variable. When the metric definition is parsed, the variable is evaluated to the metric source's address.

Example:

sources:
  some-jmx-source:
    type: jmx
    host: ...
    ...

...

metrics:
  ${some-jmx-source)/some.domain:....

Environment variables can also be referred to from the configuration file.

Global Options

sampling.interval

Represents the interval, in seconds, between two successive readings. If not specified, the default value is 10 seconds.

If configured with 0, DataBot will read once and exit.

Sources

This section specifies configuration details for metric sources to be queried for metrics, such as the address, etc.

The section is optional, as the metric sources can be specified in-line in the metric definition. However, when a large number of metric definitions are declared, it may become cumbersome to specify the full address of the source within each definition, so declaring it in the "sources" section and then referring to it by name is a better alternative.

Each source declared in the "sources" section must be named, and the names must be unique: if two sources are listed with the same name, only the second value is considered, the first one being overwritten by the second.

Data Consumers

"output" and "consumers" can be both used at the same time. "consumers" has built-in ordering, but because at the top configuration file level "output" and "consumers" are map keys, the order in which they are specified in the file is not returned by the YAML parser, so by convention, we always place the "output" (if exists) at the top of the consumer list. In the future we may refactor to make this consistent, and make the "output" an element of the "consumers" list.

Stdout

The output is sent to /dev/stdout:

...
output: stdout
...

Output CSV File

...
output:
  file: /tmp/databot.csv
  append: true
...

output.file - the name of the output file. If not specified, the default value is /tmp/databot.csv. Note that if --foreground (or -f) option is used, the output will forcibly send to /dev/stdout, regardless of the value of 'output.file' configuration parameter.

output.append - true/false. Indicates whether to append to an already existing output file or to overwrite the existing file. The default value is "true" (append); this configuration will allow accumulation of historical data. Every time DataBot is restarted in "append" mode, a new header line will be inserted in the file.

Generic Consumers

...
consumers:
   - io.novaordis.SomeConsumer
...

As per 1.0.8, we only support fully qualified class names, which should be available in the classpath and will be instantiated by reflection.

Metrics

This section contains the definitions of the metrics to be collected.

metrics - comma-separated list of the definitions for the metrics to be collected from the system.

Example:

   metrics=PhysicalMemoryUsed,CpuUserTime,jboss:/subsystem=web/connector=http/bytesReceived

For a complete list of supported metrics, syntax details and extensive documentation, see https://kb.novaordis.com/index.php/DataBot_Metric_Reference

jboss.home - the path to a locally accessible JBoss instance. If it needs to monitor JBoss CLI metrics, DataBot must be configured to detect and use the libraries from a JBoss instance it has access to (it does not ship with the required JARs, as those may be different depending on the version of the target JBoss instance. In order to enable DataBot to build the classpath fragment, jboss_home must be specified in the configuration file.

Example

https://github.com/NovaOrdis/databot/blob/master/main/src/test/resources/data/configuration/reference.yaml

Logging

Databot Process Logging

The location of the databot log is specified in the configuration file as log.file and the logging level is specified as log.level. Valid log level values are log4j log levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL, ALL and OFF.

...
logging:
  file: /tmp/databot.log
  loggers:
    - io.novaordis.databot: DEBUG
...

If the configuration file does not specify logging configuration, the default logging level is INFO and the default logging file is "databot.log", placed in the same directory as the output data file. This is part of the base logging configuration, shipped as $DATABOT_HOME/lib/log4j.xml.

The command line flag "-v", if specified, will modify logging behavior until the configuration file is parsed and the new logging configuration is applied.

Also see:

Alternative log4j Configuration

Garbage Collection Logging

Garbage collection activity in the Java Virtual Machine running the databot agent is logged by default. The log file, named datebot-gc.log, is placed in the same directory as the data file. The startup script reads the YAML configuration file and infers the location of the directory, based on the output file configuration element content. If the output file is not specified in the YAML configuration file, the garbage collection log is written in /tmp.

Target-Specific Configuration Procedures

JBoss

In-Line Help

databot --help

Metric Reference

Metric Reference

Troubleshooting

DataBot can be configured to provide TRACE-level logging information by setting the "io.novaordis" logger to TRACE level in the "logging:" section of the configuration file, as follows:

logging:
  file: /tmp/databot.log
  loggers:
    - io.novaordis: TRACE

Note that this setting has has the side-effect of enabling TRACE logging on other libraries in use, some of which can be quite verbose. Netty is such a case. To turn off TRACE in other layers, use a configuration similar to:

logging:
  file: /tmp/databot.log
  loggers:
    - io.novaordis: TRACE
    - com: INFO
    - org: INFO