System monitoring controller
Contents
The service provides tools to monitor system parameters of local and remote machines.
Collected data can be used later for custom alerts, dashboards, analytics etc.
Monitoring
After setup, the service creates sensors for various system parameters. Reports are sent by provider modules, which can be flexibly configured in the service configuration.
Note
If a system metric contains symbols, disallowed in EVA ICS v4 OIDs, these symbols are replaced with triple underscores (“___”).
System
The system provider creates sensors in a sub-group “os”, which display OS version, kernel version, CPU architecture etc. as well as system uptime.
CPU
The cpu provider creates a sub-group “cpu” and displays CPU usage and frequency for every CPU core in the system. The list of CPUs is auto-generated.
Load average
The load average provider creates a sub-group “load_avg” and displays system load averages for 1, 5 and 15 seconds (UNIX/Linux standard).
Memory
The memory provider creates sub-groups “ram” and “swap” and displays memory and swap information.
Disks
The disk provider creates a sub-group “disk” and displays information about mount points.
The list of mount points is detected automatically. When a mount point is removed, the monitoring is stopped. This behaviour can be changed if the list is specified in the service configuration:
only specific mount points are monitored
if there is no mount point found, its sensors’ status becomes -1 (ERROR)
Mount point sensors are specified without leading slashes. For system root in Linux/UNIX systems, sensors are created in a sub-group called “SYSTEM_ROOT”.
Block devices
The block devices provider creates a sub-group “blk” and displays information about block devices (physical disk drives).
The list of block devices is detected automatically (loop devices are ignored). When a block device is removed, the monitoring is stopped. This behaviour can be changed if the list is specified in the service configuration:
only specific devices are monitored
if there is no device found, its sensors’ status becomes -1 (ERROR)
Explanation for certain sensors:
r read operations a second
rb read bytes a second
w write operations a second
wb written bytes a second
util % of block device utilization (100 = completely busy)
Note
The block devices provider works on Linux systems only.
Network
The network provider creates a sub-group “network” and displays information about network interfaces.
The list of interfaces is detected automatically. When an interface is removed, the monitoring is stopped. This behaviour can be changed if the list is specified in the service configuration:
only specific interfaces are monitored
if there is no interface found, its sensors’ status becomes -1 (ERROR)
Explanation for certain sensors:
rx received (incoming) packets during the last second
rx_total total number of received packets
rx_err received packet errors during the last second
rx_err_total total number of received packet errors
rxb received bytes during the last second
rxb_total total number of received bytes
The same abbreviations apply for transmitted (outgoing) packets. These sensors start with tx_ prefix.
Setup
Use the template EVA_DIR/share/svc-tpl/svc-tpl-controller-system.yml:
# System controller service
command: svc/eva-controller-system
bus:
path: var/bus.ipc
config:
# accept remote agents
#api:
# client OID prefix. may contain ${host} variable
# if no host variable exist, the host name is automatically added at the end
#client_oid_prefix: sensor:${system_name}/system/host
#listen: 0.0.0.0:7555
#max_clients: 128
# if a front-end server or TLS terminator is used
#real_ip_header: X-Real-IP
# host map (name/key)
# note: the service has got an own key database
# the keys are not related with EVA ICS API keys
#hosts:
#- name: host1
#key: "secret"
report:
oid_prefix: sensor:${system_name}/system
# when started on a secondary point
#oid_prefix: sensor:${system_name}/system/host/SPOINT_NAME
# system info and uptime
system:
enabled: true
# cpu info
cpu:
enabled: true
# system load average
load_avg:
enabled: true
# memory info
memory:
enabled: true
# disk info
disks:
enabled: true
# enable specific mount points only
# note: for automatic mountpoint list reporting unavailable is not supported
#mount_points:
#- /
#- /var
# block device info (Linux only)
blk:
enabled: true
# enable specific devices only
# note: for automatic device list reporting unavailable is not supported,
# loop devices in auto-list are omitted
#devices:
#- nvme0n1
#- nvme1n1
#- sda
#- sdb
# network info
network:
enabled: true
# enable specific interfaces only
# note: for automatic interface list reporting unavailable is not supported
#interfaces:
#- eth0
#- eth1
# custom tasks with external executables
# the executable must return either a value or JSON payload
exe:
tasks:
- command: "echo OK"
name: test
enabled: true
interval: 1
# put the result as-is
map:
- name: result
- command: "sensors -j"
name: sensors
enabled: true
interval: 1
# parsing JSON values is performed with a lightweight JsonPath syntax:
# $.some.value - value is in a structure "some", field "value"
# $.some[1].value[2] - work with array of structures
# $.[1] - top-level array of values
# $. - payload top-level (the path can be omitted)
map:
- name: fan1
path: $.bus.fan1.fan1_input
# an optional value transforming section
transform:
- func: multiply # multiply the value by N
params: [ 1000 ]
- func: divide # divide the value by N
params: [ 1000 ]
- func: round # round the value to N digits after comma
params: [ 2 ]
- func: calc_speed # use the value as calc-speed gauge (with N seconds delta)
params: [ 1 ]
- func: invert # invert the value between 0/1
#params: []
- name: fan2
path: $.bus.fan2.fan2_input
user: eva
Create the service using eva-shell:
eva svc create eva.controller.system /opt/eva4/share/svc-tpl/svc-tpl-controller-system.yml
or using the bus CLI client:
cd /opt/eva4
cat DEPLOY.yml | ./bin/yml2mp | \
./sbin/bus ./var/bus.ipc rpc call eva.core svc.deploy -
(see eva.core::svc.deploy for more info)
Monitoring remote hosts
The service monitors only the host on which it is running.
Monitoring secondary points
To monitor a secondary point, it must run an own service.
Monitoring non-EVA ICS hosts
Non-EVA ICS hosts can send system telemetry data using either pre-built agents or HTTP API.
Enable api section in the service configuration and configure the list of allowed hosts and their API keys.
For wide-area networks it is recommended to use a front-end server to secure API port with SSL/TLS and apply additional limits on incoming connections.
Commons
Agents can be downloaded at https://pub.bma.ai/eva-cs-agent/
Note
Agent binaries have got own release cycles and are not updated with every EVA ICS stable build.
For all systems agent configuration is the same and is similar to the service configuration:
client:
server_url: http://server-host:7555/
# enable FIPS-140 mode
fips: false
auth:
name: test
key: xxx
report:
system:
enabled: true
# cpu info
cpu:
enabled: true
# system load average
load_avg:
enabled: true
# memory info
memory:
enabled: true
# disk info
disks:
enabled: true
# enable specific mount points only
# note: for automatic mountpoint list reporting unavailable is not supported
#mount_points:
#- /
#- /var
# block device info (Linux only)
blk:
enabled: true
# enable specific devices only
# note: for automatic device list reporting unavailable is not supported,
# loop devices in auto-list are omitted
#devices:
#- nvme0n1
#- nvme1n1
#- sda
#- sdb
# network info
network:
enabled: true
# enable specific interfaces only
# note: for automatic interface list reporting unavailable is not supported
#interfaces:
#- eth0
#- eth1
Warning
The provided agent binaries are not FIPS-140 compliant and should not be used with HTTPS URLs if FIPS-140 is mandatory. FIPS-140 compliant binaries can be provided for Enterprise customers by request.
Linux agents
The configuration file must be placed as /etc/eva-cs-agent/config.yml
It is highly recommended to run the agent under a restricted user
The configuration should be secured and allowed to access by the agent user only
The agent binary can be started manually, e.g. for tests. In this case it outputs logs to the system console. When piped/started with systemd or other system launcher, the agent outputs its logs to syslog
For Debian/Ubuntu systems pre-built .deb packages can be used. The packages automatically create eva-cs-agent user in the system.
For other systems the following systemd service template can be used: https://github.com/eva-ics/eva4/blob/stable/svc/controller-system/eva-cs-agent.service
Microsoft Windows agents
The agent executable can be placed to any folder (e.g. C:\ProgramData\eva-cs-agent)
The configuration file must be placed in the same folder as the agent binary and called config.yml
The configuration should be secured and allowed to access by system administrators/system services only
The agent binary can be started manually, e.g. for tests with “run” argument. In this case it outputs logs to the system console. When started as a Windows service, the agent outputs its logs to the Windows event log (section Application).
To register the windows agent as a service and start it, use the following commands:
.\eva-cs-agent.exe register
Start-Service EvaCSAgent
or using a custom name:
SC.exe create EVA.cs.Agent binPath=path\to\eva-cs-agent.exe
To unregister the service, use the following command:
Stop-Service EvaCSAgent
.\eva-cs-agent.exe unregister
The last command stops the service by itself however it is recommended to stop it manually before to ensure the instance is stopped.
Using HTTP API
Metrics can be sent by custom agents using the service HTTP API:
HTTP header X-System-Name must contain the host name
HTTP header X-Auth-Key must contain the host API key
Requests must be submitted with POST to URL
http://HOST:7555/report
with the following payload:
[
{
"i": "some/metric",
"status": 1,
"value": 123
},
{
"i": "some/metric",
"status": 1,
"value": 777
}
]
All fields are mandatory, for status and value, short forms “s” and “v” can be used. Values may contain any data, status should be set to “1” if the measured resource is working properly or to “-1” or other negative (the status register is 16-bit signed integer) values for errors.