Critical Systems Monitoring Piston


#1

1) Give a description of the problem
I am trying to work out a piston that will check critical components every 15 minutes. I know I’m missing something. Specifically with the variables removing devices from the list.

I need help designing this and pointing out what is wrong. I’m new to WC, but really enjoying it.

2) What is the expected behavior?
Notify me by SMS and email when any of the components have a status of ‘offline’.
Before notifying me, I want to do a refresh. If it fails, notify me that it is down. If it is back online, I want it to notify me with a different message.

3) What is happening/not happening?
It is working as far as I can test it… but I believe if a device is online, put into ‘devices_responding’, it will remain in there, even if it fails.

4) Post a Green Snapshot of the piston!

5) Attach any logs (From ST IDE and by turning logging level to Full)
I don’t really have logs yet. I’m looking for advice on how this should be designed/fixed up…


#2

Here’s the one is use.


#3

Mine is a little more complex. I have a 10 minute timer piston that fires off the pistons that check and do the alerts.

Here is the timer:

and here is the one that collects the devices status and does the alerting:


#4

Thanks for that! I went ahead and modified it to my needs.

Would you mind just taking a quick look at this? I’m not 100% sure about what some things mean.

1st IF:
Check the items in the ‘devicestocheck’ variable for status.
if the device to check is offline - set variable isdeviceoffline=true
==Save in {devices_not_responding}
If the device to check is online - set variable isdeviceoffline=false.
==Save in {devices_responding}

2nd IF:
If the devices that are offline are not in ‘devices_not_responding_save’ and if a device is offline,
then send notifications, set the piston state, and save the device not responding into the ‘devices_not_responding_save’ variable.

3rd IF:
What exactly is going on here? I’m a bit confused by using the ‘devices_not_responding’ variable in the IF statement.

4th IF:
if isdevicesoffline = false, then — this says that the devices are online.
THEN - set string = all systems are good…
Set Piston state to ‘green’ = all good at $now (time).

Here is what I have now:
I also created a scheduled piston (every 10 minutes to exec this piston).


#5

The 3rd if statement is needed to keep it from sending SMS’s over and over due to sometimes I receive the “online” events multiple times with the same list of devices.


#6

Oh! That make sense!

It looks like this is working really well.

Thanks for your help!