@jebba We have plans to update the SSH parts of FBOS in the next three months or so. As a side note for anyone wishing to discuss security matters on the forum (since the RSA issue has been brought up once previously this month), please see our responsible disclosure statement prior to posting or privately send the matter to security@farmbot.io.
@RickCarlino SSH Console is way too noisy in the logs.
All thatās there is just your SSH session startup. Something is killing IPv4 connectivity between the Bot and everyone else. Is it normal that FBOS reboots after loss of Internet ? ( Iāve forgotten )
@jrwaters to go further youāll need to connect up to the physical RPi3B Serial Console ( see other posts in here ) with your trusty serial 3V3TTL-USB cable
| edit |
Another recent post in here discovered inadequate 5V supply to the RPi3B ā¦ are you able to check the 5V0 main supply at the RPi3B ?
@Jsimmonds, you are amazing man. I donāt think the loss of connectivity is the cause, rather, suspect spurious reboot which immediately causes loss of connectivity. I say that because the time period is so short - it is happening every 5 or so minutes. And there is only about a 40 second period where connectivity is lost (during the reboot). But, of course, I could be wrong. I will look at the other thread no matter what. I love learning about this stuff. Thank you!
From day 1 my voltage icon shows Yellow and sometimes Red FWIW. I might be able to check that but will have to read up as Iām a novice with my multi-meter.
Just wanted to update for those who are interested - especially @jsimmonds.
We swapped out the power supply and the Raspberry Pi and the spurious reboots still happened.
The reboots happen even when the Farmduino is not attached.
This is good news in the sense that we have fewer variables. Some of the main variables left
Things happening in the network
Software.
Iāve just downgraded to 10.1.2 and the issue still happens - so it isnāt that simple. Iāve also opened my network completely for the FarmBot. I can use tping and see that the FarmBot can ping things in the Internet.
Next up - that debug cable. My debug cable should arrive tomorrow. I have an idea to eliminate the network equation. Tomorrow Iāll re-flash and put my iPhone out by the FarmBot and use it as a hotspot. I timed the reboots by sshing into the FarmBot and using the Toolshed uptime command. They always seem to happen after 7.5 minutes and usually before 9 minutes. So, it shouldnāt take long to see.
Sorry I wasnāt clear. When I said āwe swapped out the power supplyā, I meant that I had an official Raspberry Pi power supply because I had a spare Pi of the same version (silkscreen same and everything). So, the Pi was powered by that with no Arduino connected.
But, having yellow voltage has bothered me - I assumed others might have the same. If not the case then sounds like I should swap out that cable. Thank you.
Here is the latest. My cable is not here yet but it will be here tomorrow. In the mean time, Iāve got Screen working on my Windows box under the Linux sub-system.
I have the Pi inside with no Arduino attached. A couple of other interesting things
With completely different Pi and completely different power supply, confirm same symptoms - reboot every 7 to 8 minutes.
With my power supplies, voltage reading shows GREEN so I suspect the power cable between my Arduino and Pi is flawed in some way or perhaps even my power supply (unlikely) but this is definitely not triggering this issue.
Cleaned up firewall rules and I think I have it pretty pristine - made not difference.
Tried different firewall - made no difference.
WiFi hotspot didnāt work as my AT&T coverage at home is poor.
Iāve tried all the released versions of FarmBot OS that I have and - interestingly enough, 9.2.2 does not exhibit the issue. I let it run for over 15 minutes.
Also interesting is that 10.2.0-rc0 seems immune. I think this is for self hosted users but it is interesting. One thing I noticed is that SSH has periods (couple of seconds) of non-responsiveness with this build.
I tried 10.2.0-rc1 and it rebooted promptly at 7 minutes. This got me thinking about 10.2.0-rc0 - why did it stay up. So, I reloaded 10.2.0-rc0 and looked more closely. It does go through a restart process but it just doesnāt disconnect so you get to see the full reset (I assume).
@jrwaters Interesting note about 9.2.2. I will talk to the developer who managed FBOS at that time to see if they have any input on the matter.
Itās unfortunate that the hotspot did not work. If you have a friend with known-good WiFi, it might be worth carrying the the RPi to their LAN and trying that (if the TTL cable is going to be delayed significantly).
Only if you are interested in 9.2.2 Rick . . . Iām a bit of a completionist But certainly appreciate that you all donāt have cycles to dig through old code.
If no TTY cable today, Iāll throw my family off the network and put a simple switch on my cable modem and hook laptop and Pi directly!
Cable came - captured the failure with 10.1.3. Logs attached (keys redacted) but the interesting bit is below. I logged back in and catted out the crash.dump but its too large to attach. If you (Rick or the intellectually curious John Simmonds) want a copy then please give me an e-mail address and Iāll share it on Google drive or something (open to ideas).
Thanks
Jack
eheap_alloc: Cannot allocate 457731892 bytes of memory (of type āol[ 365.750349] heart: Erlang is crashing ā¦ (waiting for crash dump file)
d_heapā).
Crash dump is being written to: /root/crash.dumpā¦done
e[999H
[nbtty: terminating]
e[?25h[ 365.750375] heart: waiting for dump - timeout set to -1 seconds.
[ 367.995257] erlinit: Erlang VM exited
[ 368.007896] erlinit: Sending SIGTERM to all processes
[ 368.014564] watchdog: watchdog0: watchdog did not stop!
[ 369.695761] erlinit: Sending SIGKILL to all processes
Ok, we have a genuine Crash dump and Reboot issue now vs. a suspected ālow powerā issue (?) ( What colour is your RPi Power now ? ) Very good.
Re: the crash.dump file . . the Slogan text that you posted is #1 key
The #2 key is the Current Process . .
I donāt want to see the dump, but, we can get a good picture of the problem with those 2 items.
FBOS has some imitations on quantities of things in Groups and Farm Events at the moment.
What workload is your bot executing when this dump+reboot occurs ?
( Size of Group that the active Sequence is working with ā¦ ā¦ Number of Farm Events lined up ā¦ ā¦ etc. ?)
Be kind to your family . . With that test, what are you proving/disproving ?
Current Process
In terms of the current process, there are a number of instances in the dump file relating to this - spread out over 127 lines. I hope it wonāt be offensive to put that much here? Posted down at the bottom.
Power
I have the Raspberry Pi inside with a Canakit power supply so it is green. When in the normal FarmBot electrical box it is nearly always yellow . . sometimes red and rarely green. While not the trigger here, this seems like something I need to resolve.
Workload
Literally asking it to do nothing. In an idle state whether connected to arduino or not, it reboots every 7-8 minutes.
The Test to Which I Referred and the Potential Family Suffering
This problem happens round the clock. I canāt keep my system up for 10 minutes and it happens on any modern firmware version (10.X). All of this stuff was working for weeks and weeks so Iām trying to figure out āwhat changedā. It looks like there is a bug because there is a crash but Iām also interested in what triggered it. Iāve eliminated many variables (the Pi itself, the power supply). So, my network may have changed or my SDCARD may have degraded, etc. While I think my network is clean, I havenāt proven that by using a different network. For now, I havenāt set this up because, as you recommended, Iām trying to be kind to my family
Current Process Details
Slogan: eheap_alloc: Cannot allocate 457731892 bytes of memory (of type āheapā).
System version: Erlang/OTP 22 [erts-10.7.2.1] [source] [smp:4:4] [ds:4:4:10] [async-threads:1]
Compiled: Wed Jul 1 21:37:29 2020
Taints: Elixir.Circuits.I2C.Nif,Elixir.FarmbotCore.Asset.FarmEvent,esqlite3_nif,Elixir.Circuits.GPIO.Nif,asn1rt_nif,crypto
Atoms: 33119
Calling Thread: scheduler:0
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:3
Scheduler Sleep Info Flags: SLEEPING | POLL_SLEEPING | WAITING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:4
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | INACTIVE
Current Process:
=dirty_cpu_scheduler:5
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_cpu_scheduler:6
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_cpu_scheduler:7
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_cpu_scheduler:8
Scheduler Sleep Info Flags:
Scheduler Sleep Info Aux Work:
Current Process: <0.531.0>
Current Process State: Garbing
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | GC | DIRTY_ACTIVE_SYS | DIRTY_RUNNING_SYS
Current Process Program counter: 0x764d5a04 (gen_server:loop/7 + 336)
Current Process CP: 0x00000000 (invalid)
Current Process Limited Stack Trace:
0x4b83329c:SReturn addr 0x73065494 (proc_lib:init_p_do_apply/3 + 36)
0x4b8332b8:SReturn addr 0x2EE61C ()
=dirty_cpu_run_queue
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
=dirty_io_scheduler:9
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:10
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:11
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:12
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:13
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:14
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:15
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:16
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:17
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_scheduler:18
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Process:
=dirty_io_run_queue
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
True, however, all but the one on dirty_cpu_scheduler:8 are nil.
Further down in the =process: details section of the dump you should find process <0.531.0> If there is more detail in there, can you post it ?
Ok, but in the background FBOS does a ātonā of housekeeping and checking to keep things in sync with Web Appās view of the world. Iāve observed crashes similar to this ( where a Process is in the middle of doing a Garbage Collection for itself and suddenly asks for more heap than the system can supply . . caused in my case by too many Farm Events ( > 2000 ). The current process was Elixir.FarmbotCeleryScript.Scheduler trying to get the Farm Event time ordering up-to-date ( it carries around a huge amount of state so needing to GC quite often ).
Thatās why I was enquiring about the numbers of āthingsā the bot is dealing with in your current garden setup.
Youāre being so thorough up to now. That test needs to happen
Regarding the housekeeping - sorry - I misunderstood. I have only 50-ish plants and just a couple of things scheduled each morning. I donāt have any weeds in my bed yet and my camera calibration isnāt quite right (see other post) so I havenāt scheduled any of that. Once a day I have a moisture sequence check in the early morning and then I have a watering sequence. Still, Iāll go back over all of that and see if I changed anything recently. Worst case, Iāll delete my scheduled events (after that other test).
Here is the process info:
=proc:<0.531.0>
State: Garbing
Name: āElixir.FarmbotCeleryScript.Schedulerā
Spawned as: proc_lib:init_p/5
Spawned by: <0.318.0>
Message queue length: 1
Number of heap fragments: 1651
Heap fragment data: 847508
Link list: [<0.532.0>, <0.318.0>, {to,<0.3982.0>,#Ref<0.1440891332.806092801.248177>}, {to,<0.4753.0>,#Ref<0.1440891332.806354945.4523>}, {to,<0.3976.0>,#Ref<0.1440891332.806092804.227742>}, {to,<0.3975.0>,#Ref<0.1440891332.806092804.227787>}, {to,<0.3977.0>,#Ref<0.1440891332.806092804.227822>}, {to,<0.3979.0>,#Ref<0.1440891332.806092804.227838>}, {to,<0.3981.0>,#Ref<0.1440891332.806092804.227835>}, {to,<0.3980.0>,#Ref<0.1440891332.806092804.227809>}, {to,<0.3978.0>,#Ref<0.1440891332.806092804.227786>}, {from,<0.4753.0>,#Ref<0.1440891332.806354947.167595>}]
Reductions: 93526437
Stack+heap: 38323372
OldHeap: 79467343
Heap unused: 614327
OldHeap unused: 6263772
BinVHeap: 6
OldBinVHeap: 596
BinVHeap unused: 46416
OldBinVHeap unused: 45826
Memory: 474553836
New heap start: 42602018
New heap top: 4B5DB3EC
Stack top: 4B83329C
Stack end: 4B8332C8
Old heap start: 53208018
Old heap top: 64947DE4
Old heap end: 6612CD54
Program counter: 0x764d5a04 (gen_server:loop/7 + 336)
CP: 0x00000000 (invalid)
Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | GC | DIRTY_ACTIVE_SYS | DIRTY_RUNNING_SYS
Regarding that test - I have conceived of the way I want to do it without disrupting the family. Hopefully tonight!
PS - thanks for the commentary on the dump file and happy to extract anything else.
Dear diary. Iām shocked. I took a completely different brand of WiFi router (Asus) and connected it to my cable modem. Then I connected the Farmbot to the Asus. None of my network policies, etc. are in the picture. After about 9 minutes, I got the same crash - āeheap_alloc: Cannot allocate 549278268 bytes of memory (of type āhe[ 567.256445] heart: Erlang is crashing ā¦ (waiting for crash dump file)
apā).ā
Running SpeedTest from this machine (on the same WiFi as Pi) shows normal rates for my home.
Iām running out of ideas. I did try a new San Disk Class 10 SD Card and the result was the same.
@RickCarlino told me that you can run a FarmBot without the RTC daughter card and without the Arduino. I just want to double triple check that. If, for example, there is a bug where you get a crash after 10 minutes if you donāt have proper communication with your arduino or RTC . . . then that would send me in another direction.
@jrwaters if Elixir.FarmbotCeleryScript.Scheduler is the crashing process again, the problem is likely to relate to the total number of Events ( which can be Sequences or Regimes ). @RickCarlino is in the best position to sort this out with you, rather than this post-reply ping-pong that weāre conducting
For reference Iād like to see your Events list ( scroll to the bottom of the list )
Hereās mine ( which doesnāt seem to bother my RPi3B+ on FBOS 10.1.3 )
I didnāt suspect events because I created a new FarmBot account and associated my FarmBot with that. The problem still happened [might have lost mind and be making this up but nearly sure].