Farmbot Control via MQTT random reconnects

Hello there,

I’m controlling my FarmBot via Python and MQTT sending rpc requests with movement commands formatted as CeleryScript.
After I publish a command using Paho MQTT library, QoS 0 and TLS, I wait for the
corresponding rpc_ok message to continue sending further commands.

When monitoring the from_device and from_clients topics, everything looks okay.

Like this:

Incoming MQTT messages:
bot/device_XXXX/from_clients b’{“kind”: “rpc_request”, “args”: {“label”: “2db2d145-c5c1-48e9-bb23-29005fabba4c”}, “body”: [{“kind”: “move_relative”, “args”: {“x”: 200, “y”: 0, “z”: 0, “speed”: 100}}]}’
Incoming MQTT messages:
bot/device_XXXX/from_device b’{“args”:{“label”:“2db2d145-c5c1-48e9-bb23-29005fabba4c”},“kind”:“rpc_ok”}’
Incoming MQTT messages:
bot/device_XXXX/from_clients b’{“kind”: “rpc_request”, “args”: {“label”: “0b2a276a-afd1-4caa-a2a9-2539fcf58ba1”}, “body”: [{“kind”: “move_absolute”, “args”: {“location”: {“kind”: “coordinate”, “args”: {“x”: 580, “y”: 650, “z”: -250}}, “speed”: 100, “offset”: {“kind”: “coordinate”, “args”: {“x”: 0, “y”: 0, “z”: 0}}}}]}’
Incoming MQTT messages:
bot/device_XXXX/from_device b’{“args”:{“label”:“9f73fb90-a0f3-4275-b6fa-22b8e9ca5e8f”},“kind”:“rpc_ok”}’

But I ran into random connection issues. At some times I got disconnected from the MQTT broker. When monitoring the connection with Wireshark I can see only a TCP reset from the broker as seen in the screenshot:

It seems to be a network related issue since the connection just crashes. After about 8 minutes the MQTT client reconnects. I tested it on Ubuntu 19.10, Ubuntu 19.4 with both paho mqtt version 1.4 and 1.5. and the MQTT.fx client.
The FarmBot is connected via ethernet.

Re-creating an API token every time didn’t change anything so I stick with the existing one which valid for 40 days. Adjusting the MQTT keepalive value in any ranges from 10 to 60 seconds also didn’t do the trick.

Any advice is appreciated.

@TH007, you’d need to see the broker logs I suppose. Probably only authorized FarmBot staff can.

Just curious about that MQTT monitoring snippet you gave.
The "move_absolute" request label arg doesn’t match the label arg in the "rpc_ok" ?

Also curious about that TCP RST arriving from port 1883 ? I thought that the FarmBot broker normally listens on port 5672 ?

@jsimmonds thanks for your reply. That’s right, the "move_absolute" label doesn’t match the label arg in the "rpc_ok". This was just a bad capture, the rpc_ok with the label exist. I just copied the wrong label. Sometimes FarmBot seems to respond to rpc’s which aren’t sent over the from_clients topic.

I am using MQTT default port 1883 (unencrypted) to view the info in Wireshark. I think port 5672 is used for AMQP and the FarmBot device itself.

According to the python-examples link ports 1883 or 8883 should be okay.

Maybe the FarmBot staff can help me?

@TH007 You are correct. 1883 is fine for MQTT. Additionally, we use MQTT.fx for debugging locally. I have not seen this issue come up before. I will take a look today and let you know.

1 Like

That’s curious. Can you exhibit an example trace capture of that ?

Just for comparison, I re-wrote the Python example, in JS (CoffeeScript) and used the MQTT endpoint returned in the Token ( Secure Web Socket "mqtt_ws": )

The MQTT connection (from here in Melbourne, Australia) is rock-solid stable.

@jsimmonds My hunch is that this is a local issue specific to @TH007’s device, but I am not 100% certain. I am making this guess because:

  • Like you, I was able to hold the connection open for hours without problem.
  • We have not received this issue from any other customer as far as I know.

Another suspicion I had was that there was a malformed topic string (the API will kick you off without warning if you try to publish/subscribe to unauthorized topics). Since kicking users off without notice is difficult to debug, I added an error message generator, but it appears that malformed topics are not the issue at play.

Side note: Can you share the snippet you made? That sounds like something that could help folks in a similar situation.

My “comparison” app was not very “comparable” to @TH007 's client . .

  • different code
  • different VM
  • different libraries
  • different Transport over IP.

Did you use a Python client and the un-encrypted TCP connection to port 1883 ?

Of course ! Where can I send it ? ( It’s targeted for Node.js )

My “comparison” app was not very “comparable”

This is true. My main concern is finding problems with the server itself, although local issues (such as the ones you’ve noted) are more likely to be the culprit at this point. In the case of my example, I was using the same transport (MQTT vs. MQTTWS) but a different library (MQTTFx vs. Paho).

Of course ! Where can I send it ?

A new post in the software section would be a good place for it. Thanks for all your help!

The interessting thing with this issue is, that we have the feeling, that the network seems to affect how fast this issue occurs, but there is nothing deterministic about it. On some machines in our university network it works most of the time, but on some other machines we have fast failures. The TCP reset happens sometimes after 30 seconds or sometimes after 8 minutes or even hours. Because we belived in a problem with our university network, we tested the code in two other networks (outside of the university), with a different service provider and there the problem does also exist. Therefore we belive it is not a local network issue.

As far as I was able to debug the issue of my colleague, it seems not to occur with the websockets connection in the browser and furthermore I have seen the TCP reset only during the FarmBot is controlled. During Idle I have never seen a disconnect in MQTT.fx. The python script closes the connection during idle, therefore we can currently not see this behaviour there.

I changed my ( MQTT.js ) test client to use raw TCP to port 1883. No failures so far.
( I’m on a domestic/consumer-grade Internet service in Melbourne, Australia )

It’s my experience that firewall/router appliances these days are configured with very short idle timeouts . . and idle timeouts just silently drop the connection ( neither end is notified ).
This is worth checking in your University network path. That MQTT.js client pings every 60s. I don’t use Python, so that client library that you’ve chosen is an unknown quantity.

Wireshark is good at analyzing “conversations” between endpoints. I think you’ll need to capture enough to do that.

I’ve just understood why this can happen. The MQTT topic bot/device_nnnn/from_device is a shared message bus ( that’s what the Pub-Sub pattern is) . If you have your browser open and logged in to your FarmBot Account, you’ll receive “unexpected” messages on that topic while you’re using your test client !
That’s also why people use a UUID in the label argument . . to match a request with one response.

1 Like