Loss of device control over time


#21

I’ve just had another piston fail and have verified that adjusting the piston logging setting fixes the problem. I’m not sure how to retrieve the requested log information because steps taken to do the retrieval destroy the evidence.


#22

I’ve had a rash of these failures this week. Housecleaners came through and turned off some of the light switches connected to smart bulbs. It is possible the problems are induced by ZigBee mesh reorganizations or the communications failures associated with turned-off devices.


#23

Another observation: When device control is lost, it is usually lost at the same time in all pistons that control the device. To restore full function, I need to independently touch all pistons that control that device.


#24

Without some details or logs, this becomes a guessing game.

webcore for triggers may be tracking changes, and if some events are missed being sent, then the tracking can get out of sync.

I hear you say devices may go away and later come back (due to pulling the power). If you need to protect for this, you may want to add a timer trigger in your piston.

In general webcore subscribes to device events. HE wakes webcore up for each trigger, and then webcore runs to determine if the trigger comparison matches.

I’m not sure what ‘touch the piston means’, is this pause/resume, or running a test event?


#25

Do ensure you have the latest webCoRE on HE. HPM repair to get the latest.


#26

I am currently running WebCoRE 1.0.89. Not the absolute latest (1.0.92 apparently). But quite recent.

“Touch the piston” means doing anything that we have discovered fixes the behavior. Current known options are editing and saving, pausing and resuming, and toggling piston logging.

I know that it would be helpful to have logs. We’ve posted what we can here. Data collection is complicated by the fact that turning on piston logging to collect the data fixes the problem.

When I’ve caught this problem with piston logging enabled, I don’t see anything unusual. There is a normal message that indicates the command was sent to the device. In the Hubitat device event logs, we see the command is registered. What is missing in the failure case is the acknowledgment from the device that the requested change has been made. These responses are present when the piston is working. They are also present when I make changes to the same device when the piston is broken using the Hubitat web GUI or from ActionTiles.

Please let us know what additional information would be helpful to make this less of a guessing game. Please provide any suggestions for how to collect this data.


#27

One of my devices (Kitchen Ceiling Dishwasher) went into this unresponsive state. This device is controlled by a piston (Kitchen ceiling) responsive to a virtual device (Kitchen Lights). I had logging at medium level enabled prior to the failure so can share some log data. When I turn on Kitchen Lights, the piston sends commands to each light:


The event log for Kitchen Ceiling Dishwasher shows the setColorTemperature command but no acknowledgment from the device.

When I turn off Kitchen Lights, the piston skips turning off this device because state indicates it is already off.

As to what got it into this state, it acknowledged being turned off at night. In the morning there is a rash of commands to get things set up for the day and there is an acknowledgment of a level setting but no acknowledgment that the light was ever turned on, and indeed it remains off in my kitchen.

Again, in this state, I am able to control the light from the HE web interface even though when the piston sends the apparently same commands, the device is unresponsive.
I am able to make the piston work again by changing piston logging. In this case, changing from medium to full logging brings the prison back to health and it now properly controls the device.


#28

Do you need to first turn on the light before changing the color temperature?


#29

Not usually. The setColorTemperature also includes a level setting which should turn it on. I have 8 identical ceiling lights and in this case, only one of them was misbehaving and stopped misbehaving when I made a change to piston logging.


#30

You can consider using command execution delay to test/try/understand if there is a timing issue in your zwave network. See: 2nd part of this article:


#31

Thank you for the suggestion. I can experiment with this.

Do you have any more suggestions for additional log information I can collect to help us understand where the problem is? If not, I’m going to disable logging, leaving it on seems to be causing some performance issues.

I am a developer so I understand how frustrating problems like this can be. I see two important clues:

  • Commands from the Hubitat web interface, ActionTiles or Echo all work. In the failed state, commands from WebCoRE appear to be happening as intended and are logged by Hubitat but they don’t change the device state. No errors are logged, there is just an absence of acknowledgment. Question: what is different about the WebCoRE command?
  • The problem is resolved by making a minor adjustment (pause/resume, change logging level, save unchanged) to the piston. Question: What do these minor actions do to the piston that could explain the fix?

#32

I didn’t read the whole thread but have you tried disabling command optimization? This will force webcore to send the command to the device no matter what wc thinks the device state is.


#33

I haven’t but the WebCoRE logs show that command optimization is not in play in this failure. The command is being invoked. I see that in WebCoRE logs and HE device event logs.

I suspect the failure is further down the Zigbee stack on Hubitat or possibly at the device but somehow the fault originates with WebCoRE because the device is still controllable from other apps and the problem is fixed by tapping WebCoRE.


#34

I have tried disabling command optimization and this has improved things. It appears that the failure is related to the combination of command optimization and my habit of using Set Level without first Turn On to turn on lights.
I believe the failure scenario starts when a light fails to respond (for spontaneous reasons unknown) to a Set Level attempting to turn it on. Despite this, the light state remains “off”. Attempts to turn off the light then do nothing because of command optimization. A Turn On or Turn Off command can make the bulb responsive again to Set Level and so with command optimization disabled, the Trun Off command is sent and there is recovery.