Strangeness in timings and dependency on the internet

mvillion · May 18, 2024, 10:32pm

@jsimmonds Here is a sequence that does not work unless there are the 5 second delays introduced into the code sequence.

This sequence checks to see if a tool is loaded, and if so dismounts the tools it then mounts the watering head then moves to a Z=0 position to ensure that it is safe to move afterwards. If this sequence is executed without the 5 second delays, it will say there is a tool already connected and the go to ‘Park - in error’.

This is what made me state that there is a dependency on internet state rather than local state else the 5 second delay would not be required. I would be most curious to what you find. Maybe I am missing the obvious (but I don’t think so - obviously).

jsimmonds · May 19, 2024, 7:19am

Thanks @mvillion . . rather than now asking you “What’s the code inside Dismount tool SAFELY ?” etc. are you able to create a short self-contained reproducer in Lua which clearly shows where and when the hard delay is needed ?

If not that’s fine . . I’ll just ask for the details of Dismount tool SAFELY

mvillion · May 19, 2024, 7:39am

I don’t know enough about LUA to do that (yet) - but it is nothing really clever.
It checks if something is loaded, then dismount tool, then checks to ensure it worked.
I did it this way to stop the B-C errors popping up if nothing was loaded in the UTM.

Dismount tool - SUB executes dismount_tool()
Park - Error has occurred - Moves to a set location (So it is a visual cue) and e-stops

If the delays from the main code are removed, the error detection fails and the system ‘Parks’.
With the delays introduced, it is reliable code

jsimmonds · May 28, 2024, 11:22am

@mvillion I can reproduce that WAIT requirement (despite lack of a physical UTM on my ‘headless’ bot )

Will report findings as soon as I can gather some additional evidence.

edit

Along the way I see anomalies like this

11:37:25.346 [info] Starting ------ Forum 9120 Mount Tool 7 SAFELY ------

11:37:25.360 [info] Evaluated IF statement: Tool Verification is 0; continuing execution

11:37:25.370 [info] Waiting for 4000 milliseconds

11:37:29.400 [info] Starting Mount Tool

and earlier

11:33:52.147 [info] Starting ------ Forum 9120 Mount Tool 7 SAFELY ------

11:33:52.160 [info] Evaluated IF statement: Tool Verification is 0; branching

11:33:52.202 [info] Starting funky Dismount Tool

11:33:53.627 [info] The Weeder is mounted on the UTM

11:33:55.687 [info] Moving to (120, 100, -50)

Note that the IF block reports condition value 0 for both branches taken

mvillion · May 28, 2024, 4:14pm

Glad to see I am not imagining things.

jsimmonds · May 30, 2024, 12:09pm

Hi @mvillion
Think I’ve located the root issue which causes a band-aid WAIT 5s to be needed.

It’s the way the Lua code in dismount_tool() tries to nil the "mounted_tool_id" in the Device Asset by using integer 0 rather than NUL or nil or whatever.
More to come . .

mvillion · May 30, 2024, 2:48pm

Looking at the LUA code.

dismount_tool() code

       update_device({mounted_tool_id = 0})

verify_tool() code uses

local mounted_tool_id = get_device("mounted_tool_id")

I would be interested in the IF statement code to see the code behind it.

Also, where is the code for update_device() and get_device(). My searches are failing and now finding anything. I think this is user error and not knowing how to drive github. LOL

I am also interested to see what will happen if I drop the wait statement and place a verify_tool() in its place.

jsimmonds · May 31, 2024, 3:48am

Hi @roryaronson

Here’s a summary of what I’ve uncovered relating to this issue raised by @mvillion.

There are bugs and there’s a simple solution I think.

A statement of the issue

Users are forced to insert a blind WAIT 5000ms in the timeline between Lua dismount_tool() and Lua mount_tool().

Here’s what happens without that WAIT block.

Lua dismount_tool() does it’s good work and just before returning it does

update_device({mounted_tool_id = 0})

Looks innocent enough . . local Device db field mounted_tool_id is updated to integer 0 Ok, because there are no constraints on mounted_tool_id values.

Now we call Lua mount_tool() and pretty soon it does some checks

elseif get_device("mounted_tool_id") then
        toast("There is already a tool mounted [ . . . ]

which fails because the Lua language considers integer 0 to be true.

So, what are we waiting for ?
With that 5s WAIT in place we allow DirtyWorker create a changeset that results from our local Device db update and send that up to ‘Head Office’ (the Web App) to make sure local and remote databases remain coherent.
Head Office gets that changeset and tells FBOS that "Er, we don’t like 0 mounted_tool_id, here, have a nil instead.
Being an obedient type, FBOS now updates local Device db with nil for mounted_tool_id.

Now that that’s all settled (usually takes < 3 seconds) … Lua mount_tool() checks will succeed because Lua language treats nil as false.

So what we’re waiting for is DirtyWorker to get it’s work done for the Device db.
No User Sequence should ever have to wait for DirtyWorker.

A simple fix that I came up with is to amend the update_device() code so that it works just like this Elixir code

FarmbotOS.Asset.update_device! %{mounted_tool_id: nil}

i.e. allow nil in the argument map.

I’m not really sure how the Lua → Elixir function mapping works so I didn’t submit a GitHub Issue.

jsimmonds · May 31, 2024, 4:15am

With the Sequence open in the Editor just press the </> toggle code view button on the IF block.

edit

That Celery Script is transpiled at run-time into Elixir.

That’s not correct. The CeleryScript is converted once to a %FarmbotOS.Celery.AST{} structure and stored in the local Sequences db as that. At execution time that AST is fetched and sent to a StepRunner for processing by specialized Elixir code.

end edit

That Tool Verification Sensor reading is eventually performed by Elixir code as a read_cached_pin() function call.

Those are in data_manipulation.ex Elixir code in lib/os/lua/data_manipulation.ex.

mvillion · May 31, 2024, 7:05am

Wow.

Amazing investigation. I never would have thought that ‘0’ was true.

@roryaronson How do we report a bug like this as I certainly do not feel qualified to tackle Github yet.

roryaronson · May 31, 2024, 9:44pm

This thread is enough! @jsimmonds’s explanation was spot on and @Gabriel has implemented a fix to map the Lua 0 to an Elixir nil so that FarmBot has the correct value (nil) immediately, rather than getting corrected by the head office a few seconds later.

Additionally we’re working on some new helpers to replace API calls with retrieving info from FarmBot’s local database. Doing QA on the release right now…

jsimmonds · June 1, 2024, 7:29am

Hi @roryaronson and @Gabriel

While testing on FBOS version 15.4.9-rc0 I’ve noticed a (pre-existing) annoyance which we should be able to fix quickly.

Decsription of Issue

If you’ve configured a MOUNTED TOOL in the Designer/Tools tab then right after an FBOS boot/reboot, if any tool is physically mounted to the UTM across that reboot then the Sequence below (basically @mvillion 's demo) will fail at first try.

Suspected Cause

Reason is that the code behind an IF THEN block uses read_cached_pin() calls and the Tool Verification Sensor pin 63 is not yet in the cache(BotState)

Solution

I can think of a few solutions but first I am curious why read_cached_pin() is used for IFs ?

roryaronson · June 4, 2024, 10:30pm

If I recall correctly, that was implemented for a workflow requiring the pin to be read in one sequence, and then that value used in another sequence. For example:

FarmBot mounts the soil moisture sensor tool, takes a measurement, and then dismounts the tool.
Later, a different sequence is run to water the garden based on the outcome of an IF statement evaluating the cached soil moisture sensor pin reading. If it didn’t check a cached result, then it would attempt to read the pin in the moment, when the soil sensor tool was no longer mounted.

Here is this stated in different words: Soil sensor and sequences - #2 by RickCarlino

A quick solution for that ---- Forum 9120 sequence would be to add a READ SENSOR command for the Tool Verification Pin right before the IF command.

Edit: I suppose at the FBOS level if the cache is empty, it could read the pin in the moment? What were you thinking?

jsimmonds · June 5, 2024, 4:27am

Thanks @roryaronson for that scenario which I hadn’t considered. I had assumed cached pins were invented so that pin values could be read without waiting in the firmware Command Queue if the MCU was already handling a Motion command. Not sure that would ever arise in Sequences because Sequences tend to be very . . . Sequential about everything

Something you’d have to remember to begin your On Boot sequence with

Strongly Agree with that one because it’ll help pins other than the Tool Verification sensor also.
I was thinking about adding code in FBOS startup just to make sure Pin 63 was cached but your suggestion is completely general and I can’t think of any cons, even If the cache load gets delayed in the firmware Command queue.

mvillion · June 5, 2024, 7:01am

Ok.

So, for me, the simple fix is to place a verify_tool statement at the beginning and end of the mount_tool() and dismount_tool() code blocks in FBOS, thus ensuring clean entry and exit. The truth be known, these sequences WANT to know the real state before executing, and the exit verification ensures that anything relying on the real state will be informed promptly.

This respects the cached nature of the code design and ensures the code is executed safely during the mount and dismount blocks.

Win win and zero downside or impact on other functions as far as I can see.

system · July 5, 2024, 7:02am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.