Soak test code for X-Axis

Hi

When I run this code on my FB Genesis 1.7, the X2 motor drops out on a regular basis.
The code just moves the X axis back and forth 100 times. Mine fails around 7 to 8 times.

It even fails if there is no load - as in I have disconnect the drive belts the X2 motor still stops driving.

Can someone please run this code and let me know if you get X2 motor dropout.

local cycles = 100

-- Perform the soak test for the specified number of cycles
for i = 1, cycles do
  move_absolute(500, 0, 0)
    move_absolute(800, 0, 0)
  
  -- Report the current cycle count
  toast("Completed cycle " .. i .. " of " .. cycles)
end

-- Move to the final position 0,0,0
move_absolute(0, 0, 0)

-- Ask the user to verify there are no movement errors and the head is at 0,0,0
send_message("info", "Please verify there are no movement errors in the log and visually confirm the head is at 0,0,0.", "toast")

I’m glad I’m not the only one who noticed this issue. I always assumed it was an issue with the connector and was planning on replacing it. I ran this code for 20 iterations and had 4 cutouts on the x2 axis - I didn’t have the patience to watch it 100 times and I figured this was enough evidence.

Thank you so much for running the test. I thinking it is a software issue but I have no hard evidence at the moment, but I will get there.

I have purchased an oscilloscope and will be connecting up to the drive lines to see what is going on. I am hoping with more data, we will get to understand the issue.

I do hope other people run the test as more evidence will be useful

No worries, I’m just glad I’m not the only one with this issue. I’ve got an oscilloscope as well (although I’d have to figure out how to use it) so if you want any tests replicated then let me know.

2 Likes

I have captured a trace direct from the circuit board stepper motor output. It makes for an interesting study as there is a visible pattern.

In the attached video you can watch the X2 motor (Top line) and the X1 motor (bottom line) run, stop and change directions - according to the script.

What is interesting is that some of the time (when it is working properly) the motors stopping in synchronisation as you expect. When it fails, you can see the X2 motor drive line go high - before the signal stops on X1 then X1 reverses direction and X2 is stalled. X2 does eventually power up again.

There is a very visible repeatable pattern. This does not look like a hardware issue. It is too repeatable for random.

What is even more interesting is the duty cycle of the failures.

It fails for 5 seconds every 20 seconds.

For those without an oscilloscope, you would be able to watch this by loosening the belts, over the X1 and X2 motor so they can spin freely. X2 simply just stop spinning.

I have studied movement.cpp and the class movementaxis.cpp and everything seems to checkout from a logical code structure point of view. (Actually it is incredibly well structured) IMHO.
This is the main axis call where the determination of X2 exists.

void MovementAxis::setMotorStep()
{
  stepIsOn = true;

  //digitalWrite(pinStep, HIGH);
  (this->*setMotorStepWrite)();

  if (pin2Enable)
  {
    (this->*setMotorStepWrite2)();
    //digitalWrite(pin2Step, HIGH);
  }
}

The issue is that pin2enable is only set or reset is the loadPinNumbers function
void MovementAxis::MovementAxis::loadPinNumbers(int step, int dir, int enable, int min, int max, int step2, int dir2, int enable2)

void MovementAxis::MovementAxis::loadPinNumbers(int step, int dir, int enable, int min, int max, int step2, int dir2, int enable2)
{
  pinStep = step;
  pinDirection = dir;
  pinEnable = enable;

  pin2Step = step2;
  pin2Direction = dir2;
  pin2Enable = enable2;

  pinMin = min;
  pinMax = max;
}
1 Like

Which signal exactly ?

If it is the STEP signal, then the X2 Red LED on the Farmduino would be solid ON during an X2 “freeze” . . i.e. eyeballs become the oscilloscope :slight_smile:

That’s a good clue.

In the Farmduino firmware, we need to understand why MovementAxis::resetMotorStep() wasn’t called in a timely manner.

The signals were tapped at the 4 pin Stepper motor driver socket.
(Originally tapped at the motors but back tracked to the socket to rule out components - Cables and motor).

UPDATE:
I think I have isolated the issue. I think it is the drive chip overheating.
Using a freezer spray on the X2 chip brings the recovery time to instant, rather than 5 seconds.

Interestingly, all chips are hot - I mean really hot. Z is powered on to hold. So is Y and X is moving back and forth. I think I need to heatsink the chips to improve reliability. Using a visual thermometer, the chips were reaching over 80 Degree C.

I no longer think it is a software issue. Just a simple overheating issue. Which sort of makes sense as it was failing in a soak test which is designed to, well ‘test’.

I must admit I was not expecting an overheating problem - but it does make the most sense.
Looking into the chip spec, they have an overheating shutdown procedure that protects the chip if the core reaches 150 Degree C. (about 75-80 C at the outside)

Next is to design a cooling solution. I think it will comprise of three elements.

  1. Physical cooling. Heat sink and a small fan should do the trick.
  2. Establish the current requirements for actual reliable movement as mine are set to 100%.
  3. Software to switch from ‘Axis hold’ current rating to ‘Axis moving’ current rating.

It would be good if there was an option to specify these ratings in the Setting panel. I need to keep Z powered on to stop the head dropping. I have a LUA code block to powers up Z (and rehome after power up) and it de-powers Z (when put back I the park position) where the head drops. I think I could probably drop the Z constant power to 20 % of max as ‘hold’. More experiments are required to determine the ratings.

3 Likes

I have added a long thin heatsink across the four chips and the X-soak test ran perfectly.

100 x X movements with no drop out. This is first for me.

The heatsink was still getting up to 70 ° C which is still too hot for my liking.

Next step is to reduce the current ratings to reduce the heat on the chips.

See Current settings for X,Y and Z for the motor settings I have calculated for my rig.

I will be looking to add a small fan on the heat sink but I don’t want the fan to run all the time.
I want it to either run when there motors are active or in the temperature exceeds 40 ° C.
Any one with any ideas for ‘taps’ I can connect to, to drive the fans when FB is running the motors?

This means I can now turn my attention to the soak testing of head load / unload (that was failing because of the X axis failing)

Here is a photo of the heatsink installed

2 Likes

Progress :slight_smile:

During that successful soak test run,

  • What was “ambient” (in the electronics box, outside the box) ?
  • What was the Raspberry Pi CPU temp during the soak test ? (you can see that in the Web App)

May be over-engineering, based on the actual temperatures measured above ??

The ambient temperature with the box open was 21 ° C.

I did not measure the PI temperature.

I have dropped the current massively so maybe the combination of current drop plus heatsink will be enough.

Final comment from me (I promise :blush: )

Seems odd that the X2 TMC2130 was the only one to enter overtemperature shutdown ?
Is the X2 motor “healthy” ? (i.e. not short-circuiting at elevated temperatures)
Did you compare X1 and X2 motor body temperatures at X2 failure ?

I did wonder that but thought about what the X-axis motors do. The reality is X2 is the only motor that does not have an encoder so a fault condition will not turn up on any other motor as other axis will ‘fix itself’. On all other motors, it will exhibit as an unexplained pause. X2 is the only motor where a fault will be visible.

In my mind, it could have been either of the X motors and I have seen X pause so I think both X1 and X2 do fail but X1 will fix it self thus hiding the problem.

The X-axis motors move the heaviest load as it is moving the weight of the Z, Y and overall supporting body so if any motor was to overheat, the X is the most likely.

I have also witnessed Y ‘pauses’ when the Y axis pause for a few seconds, then just picks up from where it left off. I was never able to provide a logical explanation for that either - but this scenario fits too. I wondered if it was an intermittent network connection but as far as I can see, the network is only used to read the plants at the front end of script. I made that resilient too with 9 retries over 4 minutes.

I have also replaced the Z motor with in a few months of setting the system up as it was randomly dropping the head. This is also explained by the chips overheating. To solve the Z-axis drop, the motor was replaced and I created software to PARK the head in a safe place and power down the Axis and power up the Z-Axis and verify home, prior to doing work.

A test I could have done was swap the X1 and X2 cabling (reversing the motors too) to see if X1 failed but as I have fitted the heatsink, it does not fail anymore so it is a moot point.

I know that there are two other people with the same symptoms of failing X2 motors but I believe there are many other people with the fault where the FB compensates because there is an encoder to detect the movement failure and the chips cool down resolving the issue for a short while. I think Genesis 1.9 should include a heatsink on these chips. The heatsink provides the required cooling of the chips thus removing relying on the overheating protection of the chip itself.

I would also like to introduce current calibration of the X,Y Z axis using my test scripts. There is no need to run the motors at 100% current when about 1/2 is needed. It would be better for longevity and it also provides feedback if the system starts to require more power in that if it starts to skip steps (because it is hard to move) then the question must be answered, why is it harder to move.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.