Wait Instruction – bug demonstration


#1

1) Give a description of the problem
In several of my applications I have found the WAIT Instruction to be problematic. With the Forum’s help, a workaround was usually found. I thought that the misbehavior of the WAIT was limited to being interrupted by any other calls made to the same piston while execution.
These two pistons demonstrate that the WAIT can disrupt other executing instruction. To my mind, this is dramatically troubling. Is there a set of rules by which the WAIT can be used safely?

2) What is the expected behaviour?
The piston and its expected behavior are very trivial. It is triggered, it then executes several Set Variable instructions in a row, and if successful it turns on a lamp (Piston Example #1)

3) What is happening/not happening?
(Piston Example #2) Taking the above piston and putting WAIT instructions between each of the Set Variable instructions causes the code to fail by disrupting other Set Variable instructions. This happens when the delay is set to 2000 milliseconds, however, it doesn’t fail at 1000 msec. I have not tested other delays to see if a different type of disruption might occur.

4) Post a Green Snapshot of the pistonimage
image
image

5) Attach logs after turning logging level to Full

9/13/2019, 9:58:03 AM +66ms
+0ms ╔Received event [Home].time = 1568383082958 with a delay of 107ms
+92ms ║RunTime Analysis CS > 27ms > PS > 50ms > PE > 16ms > CE
+95ms ║Runtime (40240 bytes) successfully initialized in 50ms (v0.3.10f.20190822) (94ms)
+96ms ║╔Execution stage started
+110ms ║║Calculating (integer) 10 - (integer) 2 >> (integer) 8
+113ms ║║Executed virtual command setVariable (1ms)
+118ms ║║Comparison (integer) 8 is_equal_to (integer) 7 = false (1ms)
+120ms ║║Condition #8 evaluated false (4ms)
+121ms ║║Condition group #7 evaluated false (state did not change) (5ms)
+123ms ║╚Execution stage complete. (28ms)
+124ms ╚Event processed successfully (125ms)
9/13/2019, 9:57:52 AM +795ms
+1ms ╔Received event [Desk Chair].contact = open with a delay of 120ms
+85ms ║RunTime Analysis CS > 21ms > PS > 49ms > PE > 15ms > CE
+88ms ║Runtime (40247 bytes) successfully initialized in 49ms (v0.3.10f.20190822) (86ms)
+89ms ║╔Execution stage started
+96ms ║║Comparison (enum) open changes_away_from (string) closed = true (1ms)
+98ms ║║Cancelling condition #2’s schedules…
+98ms ║║Condition #2 evaluated true (5ms)
+99ms ║║Cancelling condition #1’s schedules…
+100ms ║║Condition group #1 evaluated true (state changed) (7ms)
+102ms ║║Cancelling statement #3’s schedules…
+106ms ║║Executed virtual command setVariable (0ms)
+111ms ║║Executed virtual command wait (0ms)
+112ms ║║Waiting for 2000ms
+2117ms ║║Calculating (integer) 0 + (integer) 5 >> (integer) 5
+2120ms ║║Executed virtual command setVariable (1ms)
+2124ms ║║Executed virtual command wait (1ms)
+2125ms ║║Waiting for 2000ms
+4130ms ║║Calculating (integer) 5 + (integer) 1 >> (integer) 6
+4133ms ║║Executed virtual command setVariable (1ms)
+4137ms ║║Executed virtual command wait (1ms)
+4138ms ║║Waiting for 2000ms
+6144ms ║║Calculating (integer) 6 + (integer) 1 >> (integer) 7
+6147ms ║║Executed virtual command setVariable (1ms)
+6152ms ║║Executed virtual command wait (1ms)
+6153ms ║║Waiting for 2000ms
+8157ms ║║Executed virtual command setVariable (1ms)
+8161ms ║║Executed virtual command wait (0ms)
+8163ms ║║Requesting a wake up for Fri, Sep 13 2019 @ 9:58:02 AM EDT (in 2.0s)
+8167ms ║╚Execution stage complete. (8078ms)
+8168ms ║Setting up scheduled job for Fri, Sep 13 2019 @ 9:58:02 AM EDT (in 1.996s)
+8176ms ╚Event processed successfully (8176ms)
indent preformatted text by 4 spaces

REMOVE BELOW AFTER READING
If a solution is found for your question then please mark the post as the solution.


#2

The images you posted are too small to read legibly.


#3

Sorry, are these better?



#4

Woops, duplicated.

The images have been acquired via the Webcore system so I’m not sure why they are readable.


#5

Thanks. Your original post did not allow zooming in, but your latest pics do. :+1:


#6

I wrote all that I presented in a Word Doc. including capturing and storing the piston images there. Then copied and pasted them from the Word doc to the WebCore site. The Word doc must have frozen the images. (I know more information than you needed :slight_smile:


#7

There have been 53 views of this thread and yet no meaningful comments. I am surprised as what I have found, in my view, and yet to be confirmed by others, is very serious. The Wait instruction not only have its own dysfunction (can be interrupted by other calls to a piston) but has an addition dysfunction which scuttles other executing code (in this case, causes Set Variable instruction(s) to miscalculate). The demonstration I have provided is extremely straight forward so that it is unlikely to be flawed. If so, please tell me what I have wrong? Is there a solution to this problem?

All best, Ray


#8

Hi Ray,
I’m interested in this thread, as I think I may be seeing similar issues. I have a piston which triggers on multiple motion sensors. When one triggers it increments a variable, waits 2 mins then decrements it (this uses never cancel task). If a door opens and the counter is 0, it sends an alert. This works most of the time, but occasionally the variable ends up permanently not 0, either positive or negative. I’ve had to introduce a watchdog timer to rest the variable if its remained non zero for 5 mins.


#9

Paul,

Interesting solution regarding the WDT. Good that you have a solution in this case. What concerns me it that it (the Wait dysfunction) is not yet well defined on exactly how it misbehaves. Hopefully this thread will get us all thinking about it.

Cheers,
Ray


#10

This makes sense to me.

Essentially, if a WAIT is very short (approx a second or less), the piston pauses right there, and then continues. If a WAIT is longer than a second, the piston often reschedules a wake up timer to continue the rest of the code. When it comes back from a wakeup, it processes slightly different, by checking the conditions for the current block, to see if it is allowed to continue.

I think you need to set TCP to Never on the WITH block containing the pauses. (not on the IF block)
This is because when the piston resumes, the trigger is no longer true…


A good method to analyze your piston, is to add sporadic “logs to consoles” at key events. Such as:

IF Sensor's contact changes away from closed
Then
    With location  (TCP = Never)         <-- TCP change here
    Set variable {Counter} = 0
    Wait {delay} seconds
    Set variable {Counter} = {Counter+5}
    Log to console = {Counter}           <-- New code
    Wait {delay} seconds
    Set variable {Counter} = {Counter+1}
    Log to console = {Counter}           <-- New code
    Wait {delay} seconds
    Set variable {Counter} = {Counter+1}
    Log to console = {Counter}           <-- New code
etc

This lets you see in the logs below exactly where the code succeeded, and where it failed.

But again, changing TCP to Never on the WITH block should fix this


#11

Thanks for your feedback. I tried having the TCP = Never, on the with block only, and it did not fix the problem.
image
My original post was not clear enough. It is not that I am trying to make this code work, it is rather that it demonstrates that the Wait Instruction can corrupt other instructions when it returns from and suspended delay.

I tried the Log To Console without success. I have not used it before. I get this error.
image
That said, the code is so simple that it is easy to see where it gets corrupted in the trace data already provided.

The Set Variable = 9 fails. It somehow becomes 10, then when decremented by 2 it fails the last test of not being = to 7.
image
It is challenging in that the rules for success and failure of the Wait are so far indeterminate.

It is curious that in this example piston that the 2000 mSec Wait executes 4 of the 5 times within the piston before calling for a suspension and reschedule. It is more of accumulated time in the piston the causes rescheduling?

Is there a path by which a bug (I should say: possible bug, yet to be validated by WC) can be escalated for correction?

I have tried to think of stable workarounds. For example, using a short 100 mSec delay in a counting loop to create any long delay needed, however, I am hesitant as this type of construct may cause the system to not be real-time responsive to other pistons (i.e. choke the system).

Sorry to be a burden. Just trying to find out how this is handled in a tractable way.

Cheers,
Ray


#12

This also makes sense to me.

Per SmartThings infrastructure, a programming logic cannot last more than 20 seconds. To circumvent this, webCoRE tries to keep logic under 10 seconds. Anything beyond that will schedule the piston to wakeup & continue in the future… (Where things are handled slightly different)

To see this in action, make your delay 12 seconds, and each command will only happen AFTER the scheduled wakeup


I am not at my PC now, but in a few hours, I will send you a test piston


#13

Here is that test piston… Working with 100% reliability. I added a few “Log to console” commands, which you can see outputted below the piston.

Notice the first ten seconds was processed during the first execution, and the last 2 seconds carried over into the scheduled wakeup. Changing the duration of the waits will affect just how many commands gets sent in the first execution. (about 10 sec is the max without a wakeup scheduled.)

If I really need to avoid wakeups, then I keep the piston at 9 seconds or less.


Side Note:
This piston has no triggers, so there was no need to change TCP for this test.


Piston Delay / Pause
#14

Notice, if I change each of the waits to 12 seconds, then each wait will force a wakeup:

temp



If I change each of the waits to 10ms, then it is all processed in the same execution:

temp


#15

Thanks for the synergy. With your work I think I have found that bug.

I was absolutely puzzled why your code didn’t fail and mine did. My first attempt was to remove your Log Instructions that were between each Set Variable. This because I thought that these extra instructions affected the real-time dynamics and might mask the failure. They did not. It still failed. I won’t bore you with my many other attempts, but in the end, I did find the problem.

If you take your piston and change the Counter initialization to 10 (at variable define time) then it fails. When the Wait reschedules returns to the piston it must clobber the Set Variable = 9 instruction and sets it to the initialization value. The trace shows the failure.
image
What exactly this means I’m not yet sure but have learned a lot. Does this mean that it is not the Wait rescheduling that is the problem but rather that the Set Variable instruction is venerable to the rescheduling return?

What are your thoughts?

Cheers,

Ray


#16

Yes. Sorry. I was zoomed in on this post, so I did not notice that you were hard coding the variable up top in the define section.

Personally, I NEVER define up top for variables that change often. Instead, I use the code inside the body of the piston to do that at the appropriate time.


Essentially, whenever a piston comes back from a scheduled wakeup, it checks the parameters up top, then checks the conditions in the block it left off, then continues processing (if allowed)


The issue here was the “define” section was processed again at the scheduled wakeup.
I always keep the define section (no value set) if I plan on changing the variable often.