Waiting on semaphore


#1

I’m getting occasional “Piston waited at a semaphore for 10019ms” type messages in many of my pistons. Here’s a simple example:


Here’s some log information:

9/21/2020, 2:00:24 PM +255ms
+1ms ╔Received event [Motion Sensor 1].motion = active with a delay of 62ms
+10130ms ║Piston waited at a semaphore for 10019ms
+10169ms ╚Event processed successfully (10169ms)

9/21/2020, 2:00:24 PM +153ms
+1ms ╔Received event [Motion Sensor 2].motion = inactive with a delay of 84ms
+161ms ╚Event processed successfully (161ms)

I have two sensor events triggering the same piston at almost exactly the same time. This is going to happen sometimes.

The first event is still being processed when the second comes in so the second needs to wait for execution of the piston from the first event to finish and release the semaphore before the second can be processed.

The first event is processed in 161ms but the second waits over 10s for the semaphore and finally gives up and processing of the second event is skipped.


#2

FYI… you will need to redo this piston since routines will cease on Oct. 14th when the Classic app is retired. If you have migrated to new app through the migration tool, use of routInes would cease.


#3

Good to know.

None of the code in my if statement is actually executing in this scenario. The active event will cancel an internal timer and the inactive event may set one up.


#4

*takes a deep breath*

You are attempting to use 5 devices as triggers in the same piston.
(I would expect semaphore delays like crazy)

In a nutshell, there are 15 different events that will execute this piston.

  • Sensor 1 changes to active (piston runs, but nothing happens)
  • Sensor 3 changes to active (piston runs, but nothing happens)
  • Sensor 4 changes to active (piston runs, but nothing happens)
  • Sensor 5 changes to active (piston runs, but nothing happens)
  • Sensor 6 changes to active (piston runs, but nothing happens)
  • Sensor 1 changes to inactive (set wakeup schedule)
  • Sensor 3 changes to inactive (set wakeup schedule)
  • Sensor 4 changes to inactive (set wakeup schedule)
  • Sensor 5 changes to inactive (set wakeup schedule)
  • Sensor 6 changes to inactive (set wakeup schedule)
  • 90 min after Sensor 1 changes to inactive (execute THEN block)
  • 90 min after Sensor 3 changes to inactive (execute THEN block)
  • 90 min after Sensor 4 changes to inactive (execute THEN block)
  • 90 min after Sensor 5 changes to inactive (execute THEN block)
  • 90 min after Sensor 6 changes to inactive (execute THEN block)

Normally, I only use one trigger per piston… but when using “STAYS”, I insist on it.
(I do not want overlapping timers in the same piston)

It is too easy for something unrelated to break the timer.


#5

I don’t know how one would write this type of logic with just one trigger.

I can’t believe WebCORE is designed to flail like this if a piston happens to be triggered in this way.

Processing of the second event should be delayed while the first processes. The second event shouldn’t experience a 10-second wait and then an abort.


#6

If you turn your logging level to Full, I think you will be blown away at what webCoRE is doing behind-the-scenes hundreds of times a day.

My advice is based on:

  • Each STAY command potentially sets up a timer
  • A single STAY timer is easily aborted if the piston runs again
  • Motion sensors pistons are notoriously chatty (dozens of runs per hour)
  • Multiple motion sensors can trigger hundreds of runs per hour
  • Having a timer running, while a new trigger comes in introduces potential for disaster.

… but there is really no need to waste time with this logic, since “Routines” are down to their last 3 weeks…


#7

I will fix my routines issue by changing the home mode directly but I am confident I will still have the semaphore issue once that is done.

I have turned logging to full and seen the glory. There is a lot going on but that’s how it is with computers. They do a lot in the blink of an eye. Am I asking too much for Smartthings to process a few events every time someone walks from room to room in my house?

Each piston has a single timer and on each execution the piston sets the timer for the nearest pending event (if any). It is a common case for a timer to be running when an event comes in - this is always the case for the simplest possible STAY statement once initially triggered. If the situation is as you describe, it would be impossible for STAY to work reliably. I guess I should ask then, are STAY statements fundamentally unreliable?


#8

Why do you think that?

The maximum semaphore wait is about 10 seconds, after which the piston runs regardless.

There is nothing for the piston to do. A motion sensor is active so the very first condition fails.


#9

My recollection looking at the log in full mode indicates is that the piston aborts after waiting 10s. This is proper behavior for a deadlock. It would cause a reentrancy tangle if the second event ignored the semaphore and processed without it.

The nut of the issue is there is no reason processing the second event should have to wait 10s. The semaphore should only be held by the processing of the first event for 161ms. It should be released and the wait should be over and the second event should be processed. It doesn’t look like the instance waiting is being notified that the semaphore has been released by the first.

ETA: The first should set up a 90-minute timer (if none of the other sensors are currently active). The second should cancel the timer.


#10

In a perfect world, a piston would run to completion before it runs again.
(but unlikely with a single motion sensor, and near impossible with a slew of motion sensors)


I love them, but typically only use them in single trigger pistons.
(they are only “unreliable” when mixed with outside triggers)


I will be honest. I have never personally witnessed a semaphore releasing early.
(From what I have seen, it is always about ten sec, regardless)


#11

You only see the, “Piston waited at a semaphore…” message after the 10 second timeout. If there is no timeout, there is no message, not even with full logging. Because only failures are logged we can’t tell when or if this mechanism is working correctly.


#12

There isn’t a deadlock. Piston instances can run in parallel. The default behaviour is just to wait up to ten seconds to try and have them run in sequence. Pistons can run for up to twenty seconds but most run for a few seconds at most.

Most semaphore waits I see last the full ten seconds (plus or minus half a second or so). I don’t know why. I’ve seen hundreds upon hundreds of ten second ones, but only two or three short ones.

The stay timer probably wouldn’t be cancelled. It is more likely the piston will wakeup after ninety minutes and have nothing to do. That’s just how webCoRE seems to work. I think it may be because SmartApps can’t cancel timers of other apps.


#13

Full logging shows much more than failures, and studying the numbers on the left in the log can give clues about semaphores, but again, I have never seen one less than 9500ms.


#14

Different pistons can run in parallel but apparently pistons are not reentrant so the same piston can’t be started if it is already running. If a new event comes in while the piston is already running, that event must wait for the previous run to finish and semaphore to be released before running.

See my reply above. There’s no logging for waits shorter than the 10s timeout. The timeout is 10s so that it is sure to finish aborting cleanly before Smartthings kills it at 20s. The slight variation is because it can take a little time for the piston to be notified of the timeout.


#15

I will try to catch one of these with full logging enabled and post it here.


#16

Got one. This one was a bit more complex. Looks like I remembered/assumed incorrectly - the piston does run after the 10s timeout.

9/21/2020, 6:45:15 PM +11ms
+1ms ╔Received event [Office Motion].motion = inactive with a delay of 82ms
+124ms ║RunTime Analysis CS > 17ms > PS > 75ms > PE > 31ms > CE
+126ms ║Runtime (41394 bytes) successfully initialized in 75ms (v0.3.110.20191009) (124ms)
+127ms ║╔Execution stage started
+153ms ║║Comparison (enum) inactive stays (string) inactive = true (2ms)
+155ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+157ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+159ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+162ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+164ms ║║Cancelling any timed trigger schedules for condition 19
+165ms ║║Cancelling statement #19's schedules...
+166ms ║║Condition #19 evaluated false (35ms)
+167ms ║║Condition group #18 evaluated false (state did not change) (36ms)
+169ms ║╚Execution stage complete. (41ms)
+170ms ╚Event processed successfully (169ms)
9/21/2020, 6:45:00 PM +794ms
+1ms ╔Received event [Office Motion].motion = active with a delay of 84ms
+10136ms ║RunTime Analysis CS > 18ms > PS > 10084ms > PE > 34ms > CE
+10137ms ║**Piston waited at a semaphore for 10018ms**
+10139ms ║Runtime (41472 bytes) successfully initialized in 10084ms (v0.3.110.20191009) (10137ms)
+10140ms ║╔Execution stage started
+10174ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+10177ms ║║Comparison (enum) inactive stays (string) inactive = true (2ms)
+10179ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+10182ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+10184ms ║║Comparison (enum) active stays (string) inactive = false (2ms)
+10187ms ║║Cancelling any timed trigger schedules for condition 19
+10187ms ║║Cancelling statement #19's schedules...
+10188ms ║║Condition #19 evaluated false (44ms)
+10189ms ║║Condition group #18 evaluated false (state did not change) (45ms)
+10191ms ║╚Execution stage complete. (51ms)
+10192ms ╚Event processed successfully (10192ms)
9/21/2020, 6:45:10 PM +626ms
+0ms ╔Received event [TV Motion].motion = active with a delay of 68ms
+107ms ║RunTime Analysis CS > 20ms > PS > 59ms > PE > 29ms > CE
+109ms ║Runtime (41390 bytes) successfully initialized in 59ms (v0.3.110.20191009) (108ms)
+110ms ║╔Execution stage started
+135ms ║║Comparison (enum) inactive stays (string) inactive = true (2ms)
+137ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+139ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+142ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+144ms ║║Comparison (enum) active stays (string) inactive = false (2ms)
+146ms ║║Cancelling any timed trigger schedules for condition 19
+147ms ║║Cancelling statement #19's schedules...
+148ms ║║Condition #19 evaluated false (34ms)
+149ms ║║Condition group #18 evaluated false (state did not change) (35ms)
+151ms ║╚Execution stage complete. (41ms)
+152ms ╚Event processed successfully (153ms)
9/21/2020, 6:45:09 PM +515ms
+1ms ╔Received event [TV Motion].motion = inactive with a delay of 83ms
+192ms ║RunTime Analysis CS > 20ms > PS > 144ms > PE > 29ms > CE
+195ms ║Runtime (41392 bytes) successfully initialized in 144ms (v0.3.110.20191009) (193ms)
+196ms ║╔Execution stage started
+222ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+225ms ║║Comparison (enum) inactive stays (string) inactive = true (2ms)
+227ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+229ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+231ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+234ms ║║Cancelling any timed trigger schedules for condition 19
+235ms ║║Cancelling statement #19's schedules...
+236ms ║║Condition #19 evaluated false (35ms)
+237ms ║║Condition group #18 evaluated false (state did not change) (36ms)
+238ms ║╚Execution stage complete. (43ms)
+239ms ╚Event processed successfully (239ms)
9/21/2020, 6:45:05 PM +373ms
+1ms ╔Received event [Laundry Motion].motion = inactive with a delay of 81ms
+127ms ║RunTime Analysis CS > 20ms > PS > 73ms > PE > 34ms > CE
+130ms ║Runtime (41393 bytes) successfully initialized in 73ms (v0.3.110.20191009) (127ms)
+131ms ║╔Execution stage started
+157ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+159ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+162ms ║║Comparison (enum) active stays (string) inactive = false (2ms)
+164ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+166ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+169ms ║║Cancelling any timed trigger schedules for condition 19
+170ms ║║Cancelling statement #19's schedules...
+171ms ║║Condition #19 evaluated false (35ms)
+172ms ║║Condition group #18 evaluated false (state did not change) (37ms)
+173ms ║╚Execution stage complete. (43ms)
+174ms ╚Event processed successfully (174ms)
9/21/2020, 6:45:00 PM +649ms
+1ms ╔Received event [Dining motion].motion = inactive with a delay of 116ms
+127ms ║RunTime Analysis CS > 25ms > PS > 75ms > PE > 27ms > CE
+129ms ║Runtime (41394 bytes) successfully initialized in 75ms (v0.3.110.20191009) (127ms)
+130ms ║╔Execution stage started
+159ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+162ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+164ms ║║Comparison (enum) active stays (string) inactive = false (2ms)
+166ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+168ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+171ms ║║Cancelling any timed trigger schedules for condition 19
+172ms ║║Cancelling statement #19's schedules...
+173ms ║║Condition #19 evaluated false (39ms)
+174ms ║║Condition group #18 evaluated false (state did not change) (40ms)
+175ms ║╚Execution stage complete. (45ms)
+176ms ╚Event processed successfully (176ms)
9/21/2020, 6:44:52 PM +420ms
+1ms ╔Received event [TV Motion].motion = active with a delay of 78ms
+122ms ║RunTime Analysis CS > 23ms > PS > 74ms > PE > 26ms > CE
+124ms ║Runtime (41389 bytes) successfully initialized in 74ms (v0.3.110.20191009) (123ms)
+125ms ║╔Execution stage started
+151ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+154ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+156ms ║║Comparison (enum) active stays (string) inactive = false (2ms)
+158ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+160ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+163ms ║║Cancelling any timed trigger schedules for condition 19
+164ms ║║Cancelling statement #19's schedules...
+165ms ║║Condition #19 evaluated false (35ms)
+165ms ║║Condition group #18 evaluated false (state did not change) (36ms)
+167ms ║╚Execution stage complete. (42ms)
+168ms ╚Event processed successfully (169ms)
9/21/2020, 6:44:50 PM +420ms
+1ms ╔Received event [Laundry Motion].motion = active with a delay of 73ms
+114ms ║RunTime Analysis CS > 18ms > PS > 69ms > PE > 26ms > CE
+116ms ║Runtime (41396 bytes) successfully initialized in 69ms (v0.3.110.20191009) (114ms)
+117ms ║╔Execution stage started
+144ms ║║Comparison (enum) active stays (string) inactive = false (2ms)
+146ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+149ms ║║Comparison (enum) active stays (string) inactive = false (1ms)
+151ms ║║Comparison (enum) inactive stays (string) inactive = true (2ms)
+153ms ║║Comparison (enum) inactive stays (string) inactive = true (1ms)
+156ms ║║Cancelling any timed trigger schedules for condition 19
+157ms ║║Cancelling statement #19's schedules...
+158ms ║║Condition #19 evaluated false (35ms)
+159ms ║║Condition group #18 evaluated false (state did not change) (37ms)
+160ms ║╚Execution stage complete. (43ms)
+161ms ╚Event processed successfully (161ms)

#17

No, piston instances happily run in parallel. There is even an ‘enable parallelism’ setting to turn off the default behaviour. The semaphore wait is just a delay to try and serialise execution a bit.

Yes there is. If a piston is delayed by a semaphore it says so. There isn’t a timeout as such, it is a ten second delay loop that can exit early depending on the results of its polling.


#18

Do you have any more information about this behavior? Is my situation a “works as intended”? Why does WebCORE treat the occasional event in this manner? Where is the ‘enable parallelism’ setting?