Loss of device control over time

kvng · 2024-03-07 19:04:20 UTC

I’ve just had another piston fail and have verified that adjusting the piston logging setting fixes the problem. I’m not sure how to retrieve the requested log information because steps taken to do the retrieval destroy the evidence.

kvng · 2024-03-08 19:14:16 UTC

I’ve had a rash of these failures this week. Housecleaners came through and turned off some of the light switches connected to smart bulbs. It is possible the problems are induced by ZigBee mesh reorganizations or the communications failures associated with turned-off devices.

kvng · 2024-03-08 19:16:43 UTC

Another observation: When device control is lost, it is usually lost at the same time in all pistons that control the device. To restore full function, I need to independently touch all pistons that control that device.

E_Sch · 2024-03-09 03:31:51 UTC

Without some details or logs, this becomes a guessing game.

webcore for triggers may be tracking changes, and if some events are missed being sent, then the tracking can get out of sync.

I hear you say devices may go away and later come back (due to pulling the power). If you need to protect for this, you may want to add a timer trigger in your piston.

In general webcore subscribes to device events. HE wakes webcore up for each trigger, and then webcore runs to determine if the trigger comparison matches.

I’m not sure what ‘touch the piston means’, is this pause/resume, or running a test event?

E_Sch · 2024-03-09 03:48:55 UTC

Do ensure you have the latest webCoRE on HE. HPM repair to get the latest.

kvng · 2024-03-09 16:29:51 UTC

I am currently running WebCoRE 1.0.89. Not the absolute latest (1.0.92 apparently). But quite recent.

“Touch the piston” means doing anything that we have discovered fixes the behavior. Current known options are editing and saving, pausing and resuming, and toggling piston logging.

I know that it would be helpful to have logs. We’ve posted what we can here. Data collection is complicated by the fact that turning on piston logging to collect the data fixes the problem.

When I’ve caught this problem with piston logging enabled, I don’t see anything unusual. There is a normal message that indicates the command was sent to the device. In the Hubitat device event logs, we see the command is registered. What is missing in the failure case is the acknowledgment from the device that the requested change has been made. These responses are present when the piston is working. They are also present when I make changes to the same device when the piston is broken using the Hubitat web GUI or from ActionTiles.

Please let us know what additional information would be helpful to make this less of a guessing game. Please provide any suggestions for how to collect this data.

kvng · 2024-03-16 23:52:37 UTC

One of my devices (Kitchen Ceiling Dishwasher) went into this unresponsive state. This device is controlled by a piston (Kitchen ceiling) responsive to a virtual device (Kitchen Lights). I had logging at medium level enabled prior to the failure so can share some log data. When I turn on Kitchen Lights, the piston sends commands to each light:

The event log for Kitchen Ceiling Dishwasher shows the setColorTemperature command but no acknowledgment from the device.

When I turn off Kitchen Lights, the piston skips turning off this device because state indicates it is already off.

As to what got it into this state, it acknowledged being turned off at night. In the morning there is a rash of commands to get things set up for the day and there is an acknowledgment of a level setting but no acknowledgment that the light was ever turned on, and indeed it remains off in my kitchen.

Again, in this state, I am able to control the light from the HE web interface even though when the piston sends the apparently same commands, the device is unresponsive.
I am able to make the piston work again by changing piston logging. In this case, changing from medium to full logging brings the prison back to health and it now properly controls the device.

E_Sch · 2024-03-17 11:57:08 UTC

Do you need to first turn on the light before changing the color temperature?

kvng · 2024-03-17 13:58:29 UTC

Not usually. The setColorTemperature also includes a level setting which should turn it on. I have 8 identical ceiling lights and in this case, only one of them was misbehaving and stopped misbehaving when I made a change to piston logging.

E_Sch · 2024-03-17 18:51:27 UTC

You can consider using command execution delay to test/try/understand if there is a timing issue in your zwave network. See: 2nd part of this article:

kvng · 2024-03-17 22:27:37 UTC

Thank you for the suggestion. I can experiment with this.

Do you have any more suggestions for additional log information I can collect to help us understand where the problem is? If not, I’m going to disable logging, leaving it on seems to be causing some performance issues.

I am a developer so I understand how frustrating problems like this can be. I see two important clues:

Commands from the Hubitat web interface, ActionTiles or Echo all work. In the failed state, commands from WebCoRE appear to be happening as intended and are logged by Hubitat but they don’t change the device state. No errors are logged, there is just an absence of acknowledgment. Question: what is different about the WebCoRE command?
The problem is resolved by making a minor adjustment (pause/resume, change logging level, save unchanged) to the piston. Question: What do these minor actions do to the piston that could explain the fix?

eibyer · 2024-03-18 00:15:05 UTC

I didn’t read the whole thread but have you tried disabling command optimization? This will force webcore to send the command to the device no matter what wc thinks the device state is.

kvng · 2024-03-18 00:33:33 UTC

I haven’t but the WebCoRE logs show that command optimization is not in play in this failure. The command is being invoked. I see that in WebCoRE logs and HE device event logs.

I suspect the failure is further down the Zigbee stack on Hubitat or possibly at the device but somehow the fault originates with WebCoRE because the device is still controllable from other apps and the problem is fixed by tapping WebCoRE.

kvng · 2024-04-30 19:06:17 UTC

I have tried disabling command optimization and this has improved things. It appears that the failure is related to the combination of command optimization and my habit of using Set Level without first Turn On to turn on lights.
I believe the failure scenario starts when a light fails to respond (for spontaneous reasons unknown) to a Set Level attempting to turn it on. Despite this, the light state remains “off”. Attempts to turn off the light then do nothing because of command optimization. A Turn On or Turn Off command can make the bulb responsive again to Set Level and so with command optimization disabled, the Trun Off command is sent and there is recovery.

kvng · 2024-09-04 21:39:43 UTC

I’ve captured another instance of this and this time with (Minimal) logging enabled on the piston. The output is:

Executed device command [Kitchen Ceiling Fridge].off() (34ms)

This does not turn the light off
Then I turned the light off from the Hubitat web interface and asked my piston to turn it on.

+101ms ║Executed device command [Kitchen Ceiling Fridge].setColorTemperature(2700,17) (13ms)
+102ms ║Executed [Kitchen Ceiling Fridge].setColorTemperature W (13ms)

This did not turn the light on

I’ve mapped these messages to the WebCoRE source code and this leads me to a function called “pcmd” (line 4631). I’m not able to follow it any further to try and determine why this doesn’t cause Hubitat to control this particular device.

Can anyone proficient in Groovy tell me what the pcmd function is doing?

private static void pcmd(device,String cmd,List nprms=[]){
if(nprms.size()>iZ) device."$cmd"(nprms as Object[])
else device."$cmd"()
}

E_Sch · 2024-09-06 21:59:16 UTC

I would suggest look at your device logs in HE, and see if the device shows the commands going to the device. The logging webcore is doing, if it says it sent the command (and there is nothing else in the logs), then it sent it.

You can check command optimization on your piston, and you can turn that off if you see something in the logs saying webcore did not send the command due command optimization.

kvng · 2024-09-08 17:14:33 UTC

Thanks for the suggestions.

Yes, the commands are going to the device. I see that in the HE event logs for the device. What I don’t see is the response from the device acknowledging the change in state requested by the command. When I promptly do the same sort of command directly from the HE web interface or another platform such as Amazon Echo, I see both the command and the response (and the device performs the requested action e.g. turns on).

This leads me to theorize that there is something about the form of the command sent by WebCoRE that HE or the device is not happy about. I was trying to learn more about what might be going on here by understanding the source code.

E_Sch · 2024-09-21 02:30:54 UTC

I don’t expect the format of the command to be wrong.

You may have Hubitat device/mesh issues that you need to check/resolve

Examples may be wave mesh issues (ghosts), etc.

kvng · 2024-09-21 19:26:51 UTC

No doubt, I will work on that. I have some old devices that may be acting as repeaters and may need an upgrade.

The reason I’m pointing the finger at WebCoRE is because the devices that WebCoRE is having trouble controlling are reliably controlled via the Hubitat web interface, ActionTiles, Alexa, Google Home etc. when WebCoRE is having trouble. Also, I can fix the issue by twiddling pistons in the WebCoRE interface.