Missed motion events; outage?


#35

If you’d like to play at home, here’s my test:

Not pictured: A simulated light bulb, and a simulated motion switch created in the samsung graph control panel.


#36

Below are some test results for today.

The flipper:
image

The counter:
image

To me, this implies that the counter piston missed 2 each of the active and inactive. I’m not sure if this isolates the issue any more or not, or this this behavior is normal. Has anyone else tried this, or something similar?


#37

I would not expect them to match up.

Your {Flipped} variables change every 20 minutes…
Your {Count} variables only changes when the motion sensor changes.


Pro Tip:

An easy way to keep track of daily totals is to add a “Log to console” command in your code. This lets you easily go back and see what happened throughout the day.


#38

Sorry, blonde moment…

I just noticed that your motion sensor is simulated, and in both pistons…


#39

Wait, why not? Every 20 minutes, I flip the status of the virtual motion sensor.

Oh, you just replied.

I get those a lot.

Do you think this is an issue? Would you be willing to run these two pistons tonight? After posting, I noticed that I had a few of the local variables configured as Dynamic, when they should have been integers.

Do you know if this piston and simulated motion sensor would stay completely in the cloud, or does my local hub get involved?


#40

If memory serves me right, I have never seen a command to a Simulated Switch not be properly processed…

(with the exceptions mentioned above… Pistons looping… multiple pistons firing at once… heavy downloading / streaming / gaming etc)


With SimSwitches, it is never the battery, distance, or poor mesh network… but the other reasons are still a possibility…


Need for back-up (or secondary) pistons)
#41

Today, I happened to have full logging turned on in one of my pistons just as it failed to run, and I saw something very different. Right in the middle of its execution, it timed out with an error that it took more than 20 app execution seconds.

I don’t have the live logs for this anymore (I’m not sure how helpful they would be anyway), but this is a piston that normally executes in under a second. It does not have any waits, and parallelism is turned on, so it should never be waiting at a semaphore.

Here’s the piston:

The timeout would happen somewhere inside the “on events” block, preventing it from turning Switch 16 and Switch 17 on or off.


#42

Also seeing the same with missed/missing events in Webcore but no issues on the SmartThings side. Actiontiles also appears to not be missing anything when I go look at my panels. In the last two days, possibly longer I’ve had long term existing, functional pistons fail to fire on switch on/off or contact/motion state change events.

This morning the scheduled task at 9A local time to turn on a virtual switch and run my downstairs Neato BotVac fired as expected, and ST changed the switch to ON. The second piston that watches for the switch state to change to on and actually send the start command to the vacuum failed to see the state change. This has worked reliably for months before this.

Yesterday evening, twice the long-term piston that monitors my garage entry door contact and turns on the garage light for 2 minutes failed to turn the light back off after the wait was up. Had what was likely another instance of this a week or so ago that was attributed to something else or someone leaving the light on after using the manual switch.

Also yesterday evening the long-term piston that monitors for a virtual switch to turn on and temporarily bump up my thermostat temps failed to see and fire on the on state change for my downstairs thermostat. The same one for the upstairs thermostat/v-switch fired as expected.

There’s definitely something going on with missed events.


#43

In the logs, I see some negative numbers for some events. For example:
+0ms ╔Received event [My home].time = 1577724588004 with a delay of -778ms

Is that sort of message normal? Could one of the servers have a clock that is off? I ran into something once running on a VM: Ping timings were coming back as negative like this. The system engineers adjusted some setting on the VM host to correct the problem.


#44

I think this is normal; I’ve seen this for a while. My understanding is that events aren’t guaranteed to be fired at an exact time – they can be fired at time T +/- 2 seconds.

In fact, I’ve even seen double events when the event fires early enough (it fires again at the correct time). I had to write code in one particular piston that is sensitive to double events to detect a double event and ignore the second one.


#45

Just watched it happen in our bedroom. My bride walked into the bedroom, no lights. I quickly checked and ST registered motion and none of the conditional switches that would prevent the light from coming on were on.

She walked into the master bath and the light came on in the bath as it should. While in the bath the motion sensor timed out. She walked back into the bedroom and the bedroom light turned on as it is supposed to. I don’t have any logs but I know ST saw the motion sensor switch to active because the dashboard had updated.


#46

I’m having motion based lighting rules not fire either. Same symptoms as everyone else: just started acting up since the last ST update, events seem to log in ST but webcore doesn’t trigger. I’m getting weird errors in the live logs too. I’ll try to capture and post.

no matter, i know i need to clean up my pistons. I have stuff with a ton of triggers and lots of semaphores at times.


#47

At this point, I’ve simulated 52 “active” and 52 “inactive”, but the 2nd piston has only seen 41 of each. Earlier today, there was an imbalance of activations and inactivations on the second piston. This eliminates my earlier theory that I was only missing the motion clear events, at least in the test/simulated pistons.

I’m not sure what the next steps are. Is there a smartthings support number to call?


#48

Do you know the date of the last update?


#49

late november, i think? V2 and V3 hubs only. firmware 28 i think


#50

This has nothing to do with the hub. I don’t have a hub in my setup.

It started acting up for me after that big outage on December 19.

Earlier this evening, corresponding to a piston failure, I found this in the live logs for one of my devices:

7:04:13 PM: error org.springframework.jdbc.UncategorizedSQLException: Hibernate operation: could not execute query; uncategorized SQLException for SQL [select this_.id as id88_0_, this_.version as version88_0_, this_.date_created as date3_88_0_, this_.keyas key4_88_0_, this_.type as type88_0_, this_.value as value88_0_ from server_config this_ where this_.key=?]; SQL state [null]; error code [0]; [SimpleAsyncTaskExecutor-702] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:50; busy:1; idle:0; lastwait:30000].; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [SimpleAsyncTaskExecutor-702] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:50; busy:1; idle:0; lastwait:30000]. @line 121 (sendSwitch)


#51

I’m getting similar/same springframework errors in my logs


#52

Are you running tomcat, or do you think that’s from webcore? Googling around, I think the default connection pool is 100, but that error seems to imply that it is configured for 50.


#53

It’s coming from somewhere in the SmartThings ecosystem. I used to think it was from webcore but then I found it in the live logs for a device that uses an official DTH.

I’m starting to suspect that the December 19 outage wasn’t actually resolved properly.


#54

@dds82 I think I agree with you.

Perhaps I’ll revise my test to see if webcore tests clean.

Late last night, I opened an email ticket with them. The auto-reply said that they were unusually busy.