Skip to main content

New OpenMoCo Arduino Libraries

Posted in

Hello everybody, I'm happy to announce that the first Alpha versions of the new OpenMoCo libraries are available. 

These libraries are the core of the new TimeLapse Engine designed to run on the nanoMoCo boards, and are already implemented in the (alpha) TLE for the nanoMoCo. While some of these libraries are only useful if you want to use MoCoBus networking (an improved version of the serial protocol used in the TLE, now supporting up to 65k addressable nodes on a single network) over TTL or RS-485, a couple of the others may be particularly interesting to other DIY projects:

OMCamera is a simple, event-driven, non-blocking camera control implementation, automatically handling exposure, focus, exp+focus, and post-exposure delays all with no delays in the execution of your sketch. A simple callback function informs your sketch when the particular action is done.

OMMotor is another simple, event-driven, non-blocking library for motor control. It handles both continuous and specified (move X steps) motion, complete microstep control, and more - all while introducing no delays in your sketch.

OMState is a simple state engine - allowing you to clean up your code and implement your logic flow without a bunch of if's and else's, and instead using simple state codes. Callbacks again trigger the right action when a state is entered.

The library documentation (doxygen) is here: http://openmoco.org/docs/OMLibraries/index.html and the source (including the new TLE written against the libraries) is here: http://openmoco.svn.sourceforge.net/viewvc/openmoco/OpenMocoComponents/nanoMoCo/trunk/ (Or, download tarball link: http://openmoco.svn.sourceforge.net/viewvc/openmoco/OpenMocoComponents/nanoMoCo/trunk/?view=tar )

I look forward to any feedback, suggestions, etc. Thanks!

Chris

I love the Doxygen

I love the Doxygen documentation. That makes it a lot easier to understand. Will test this later this weekend. Have you tested the performance of the new stepper code compared to the old one?

I haven't compared the two

I haven't compared the two yet, but I kinda consider it apples and oranges - with the old blocking code, it may have allowed slightly faster stepping, but it prevented really long moves while still retaining control over the engine. Like, you couldn't abort a move in-progress if you were about to smash into something =)

But, there still is tuning to do on the new code, I've identified several areas where it can be made a little faster, but it will never be able to exceed about 10k steps/second while still being responsive. This is due to the fact that running the interrupt more often than once every 40uS results in other aspects of the sketch being so delayed that normal activity starts to stutter and timeout. (At a maximum speed, presuming nothing else happened but a bit flip - it takes two interrupt cycles to run a step, so even if we don't account for the operations to actually step the motor, the minimum step-cycle time is 80uS, for a maximum of 12,500 steps/second. This corresponds to the min_stp setting of 80 in the old firmware. Of course, it actually has other stuff to do now, so it won't ever get down to 80uS between actual steps.) The old firmware is a little misleading though, as the digitalWrite() in the old code took several uS its self, so 80uS delay would never have actually been achieved. Once the new code is a little more stream-lined (exchanging memory for CPU cycles), it should match exactly any performance of the old code.

!c

Ok, I compared performance

Ok, I compared performance with the old and new this weekend, and they both are able to exceed the speed which the stepper I'm testing with is capable of moving at its current voltage (12V). That is, the stepper can only do 1K steps/second at 12V, and both algorithms effectively drive it at that speed. I can get to about 1200 s/s at 16V, but would have to go higher (up to 30V) to get much more speed out of this motor.

I've also implemented a spline-based motion profile that will be very nice. It uses Cubic Hermite curves and also does automatic error correction, as is required when doing time-slicing (I can't slice time such that there's always a whole number of steps or delay cycles in one time period, so error correction adjust both delays between steps to prevent moving faster than necessary, and takes additional steps when required). Additionally, it also does dynamic slicing to achieve best resolution given the input parameters - such that longer moves have smoother curves, and so forth. (It is a poor situation to run timeslices such that there are slices where a whole step isn't taken - running on error correction alone results in rough movement.) One simply inputs the following parameters:

Distance to move
Time to arrive at destination
Time to accelerate to full speed
Time to decelerate to stopping speed

And it takes care of everything from there.

Once I get the final gotchas worked out, I'll add it to the OMMotor library, and also add multi-point translation (so you can map out a complex move as a series of such 'keyframes') in both continuous and SMS motion modes. Afterwards, I may also add linear acceleration profiles, but there is no need to add smoothstep as the CH splines can replicate this close enough, but some other profiles might be nice, like bounce, etc.

!c

I am starting to understand

I am starting to understand what you are saying, but I am still a bit puzzled :-). The motion profile capabilities look pretty impressive to me. I would love to see the Slim script for that. I am very happy to see that the performance of the new code is comparable since that was the part that gave me some issues initially.

Please let us know when the new code is on sourgeforge because I cannot wait to see this running on my NanoEngine board.

What do you mean with the concept of time in "Time to arrive at...". Is that the time calculated from a specific set point or really something that you set in time? I would be nice to have some sort of clock I guess. One clock somewhere on the MocoBus would be enough. That can be an RTC or a clock provided by the machine that is running Slim, or an embedded board that is running Linux (for full control of the camera). I would be great if these clocks are discoverable and accessible via the MocoBus protocol, or am I talking nonsense here?

Well, clock sync can be an

Well, clock sync can be an issue since drift occurs between sync cycles, and generally it takes a few tries each time to sync clocks. But, with nanoMoCo, that's less of an issue since only device will be a "program timing master", and it triggers cycle changes on the others by changing the state of one of the common lines. So, the clock drift (from the overall set of nodes, not from the 'wall time') is minimal on a network as far as the 'macro' elements go (when to shoot, etc.).

But, that's a big hairy subject - I'm not sure clock drift (about 3-10mS per day) is going to be a major issue in general, unless you're doing week+ shots.

As for what I meant by time? I meant wall-time, i.e.:

Move 3000 steps in the next 30 seconds - take 5 seconds to accelerate, and 2 seconds to decelerate.

In 30 seconds it will have traveled 3,000 steps and now be at a stop. (Having accelerated to full speed and decelerated to stopping.)

I'm currently working on optimizing the algorithm, as initially I chose to slice time into windows such that in any given window (a multitude of 40uS timing cycles) at least one full step was being taken, with error correction for both delay and "not enough steps". It was precise both in steps taken and time to take them. However, I found this to be very rough for short movements, with several "jerking" moments as it jumped through two big speed changes.

I switched to a model wherein all windows are 1mS long and an overflow counter is updated for the time between each step, but this created a situation where updating the current speed would take up to half an entire timing cycle, which meant major drift in arrival time. (Arriving later than anticipated.) This is caused by the fact that the time to calculate the next speed cycle cannot be measured as it's run within the ISR for taking a step.

So, if I can't compensate for the error or reduce the time to process the next position, I'm going to have to switch to something more efficient to calculate, like a quadratic easing along the lines of smoothstep. The cubic form is much nicer, but it may be too much to ask the little processor to keep up with =)

!c

Ok, so after much optimizing,

Ok, so after much optimizing, timing, coding, etc. I have found that it is not possible to have both very high speed stepping (significantly greater than 2,000 steps/second) and both accurate, smooth variable splines. Even rudimentary cubic forms take north of 180uS to process position information on the 8-bit processor with no floating-point co-processor. Better mathematicians than I can probably do better, but it presents a stark choice:

Either limited control over the curve (a la smoothstep or linear easing) and high-speed stepping, or fine control over the curve and lower-speed stepping.

I think both cases are meaningful and must be supported.

I think that for those wanting very fast movement, but limited definition of the acceleration curve profiles, we can provide both linear and less expensive quadratic forms combined with a high sampling frequency on when to step. For those wanting the smoothest of motion, with high accuracy in expected position, then they will have to satisfy themselves with a lower stepping frequency.

I have based, as already noted in the existing OMMotor library, the motion control around a sample period upon which Timer1 triggers an ISR to determine whether we should currently be stepping, or standing still, given we know the number of timing cycles that we must not step in which to achieve the desired speed. In the latest implementation, I have also added error correction based on fractional cycle delays to adjust the speed measured across multiple slices to on average meet the expected speed. It has also been somewhat optimized: determination of what activities to take at any given cycle average at or below 20uS. Fewer decisions or comparisons would increase the performance in cpu time, but reduce output reliability. To ensure the CPU remains available for other activities while the motor is running, it only adjust the motor speed once per millisecond, resulting in both a very smooth, natural movement and approximately 75% of the CPU time available for other activities. Decreasing the speed adjustment time consumes more CPU time for motor driving and offers no appreciable improvement in performance, and increasing the time between adjustments results in rougher movement with minimal improvement in CPU availability. As the average time to run the step determination and cubic interpolation combined in one cycle exceeds 240uS, the minimum achievable step check frequency is 250uS, resulting in a maximum step rate of 2,000 steps/second. (See Nyquist theorem.)

Currently, using the hermite spline, the code is able to achieve extremely high levels of accuracy: it always takes the exact number of steps with 0.00028% over-speed accuracy. That is, it will arrive either exactly at the time expected or 0.00028% faster in a worst-case scenario. To give an example of this: on a 60-second move it will arrive 17mS early. This is a constant error rate at a given speed, so all units will express the exact same error given the same duration, and therefore arrive at the same time. The speed is continuously variable, and this results in a smooth and natural move for both short and long moves.

Every millisecond, the speed of the motor is determined by running the cubic interpolation to determine the location of the motor at the end of the next speed adjustment cycle. From this value, the speed the motor should be moving at (i.e. the number of ISR triggers it should remain low for) is re-calculated, and an overflow counter is updated. When this counter exceeds the desired amount, another step is taken. For most serious accuracy, we have to deal with speeds that involve fractional delay periods - for example, a delay of 60uS given a 40uS sampling rate results in 1.5 samples run low. This, its self, is not possible to achieve - instead, we can increment the current delay error by the amount specified in the current time band, and delay an additional sample once the cumulative error exceeds 1.0. In a steady-state of .5% error, we re-correct speed every two steps, and it appears to be moving at the exact requested speed to the human eye, and the overall analysis of motion shows the motor arriving to the expected endpoint at the expected time. (However, any one individual step may arrive at its location up to 0.99 samples early.) This allows us to re-create speeds that are not wholly divisible into one sample, for example: 0.0000015 steps per speed calibration window!

Note that what we're talking about here are the real things happening on the CPU, and not what the motor interprets them at. You cannot exceed the 2,000 step/second limit of the cubic model using microsteps, as it is a limitation of time to calculate, and not a limitation of the motor!

What does 2,000 steps per second look like? On a rotational axis, using a 1.8' motor in full-step mode with 100:1 gear reduction ratio, the maximum speed is 3.6' per second, or a complete rotation in 100 seconds.

Finally, what we're left with is two models: one complex, and one simplified. Both should offer the same execution per sample, but the simplified one will be able to operate at higher speeds by limiting the execution time required to calculate the speed for the next cycle to a reasonable fraction of the sample rate. For example, linear interpolation (without acceleration of deceleration control) operates under 5uS. So, with a standard sample handling time of 20uS, this means that 40uS would be the minimum achievable sample size - or, to put it more bluntly: up to 12,500 steps/second. (Which would require an extremely exacting motor driven at extremely high voltages for full stepping, and hope like hell you don't hit a resonant frequency along the way.)

(EDIT: needed to clarify here, as I don't want it to be construed as limiting a factor as it appears.)

Of course, all of that above refers to a -single, continuous- move where all four factors are defined. Thus, what we're talking about above are "continuous motion" profiles. As the motions between shots in a shoot-move-shoot style execution are irrelevant in their placement outside of appropriately accelerating for the given load, there is no real value in spending all the time on the cubic interpolation for the movements between shots. Instead, in this case we actually apply -both- cubic interpolation and a faster interpolation like linear or quadratic. The higher-order spline is used to define the number of steps in the motions between shots, and the lower-order splines are used to execute those shots to allow for reasonable acceleration profiles under load. Thus, the output motion looks like the cubic spline, but the individual moves between shots are not limited to the speed limit of the cubic spline, and can therefore execute much faster: up to our theoretical limit of 12,500 steps/second.

There is still a drawback here: the output motion will appear more jagged in a shoot-move-shoot cycle as the number of possible samples in the slice is (interval * time), and the natural smoothing of a constant variable speed will be eliminated. That is, the spline will dictate that there are a required 3.25 steps in the each of the next four shot cycles, resulting in an accumulated error of 1 step by the time we reach the fourth, resulting in the fourth shot having a 4-step move. In most cases, this won't be noticeable, but the overall curve will, by nature, be more jagged due to the reduced sample set.

To compare:

30s continuous move @ 1 frame per second: 30,000 speed changes
30s S-M-S move @ 1 frame per second: 30 speed changes

!c

This almost sounds that we

This almost sounds that we are at the end of what an Arduino can do? I still have to check the new libraries in a bit more detail, but what about the following. Maybe we can completly separate the hardware specific stuff? That we it would be easier to run the code on other hardware like the new BeagleBoard Bone (with a NanoMoco like shield or Cape as they call it)? That way you would have more freedom in your hardware choice? A NanoMoco board would still be used 80% of the time I would think, but it would give the possibility to have faster hardware for those who need it.

Well, not entirely at the

Well, not entirely at the end, I was going to update yesterday, but was mired in code. I found a way to approximate the same curves using a linear algorithm (it comes out close enough for our uses - if we were doing high-resolution 3d animation, it wouldn't be as effective =), and with a tweak to the way the high-step timing cycle was done, we're up to a max speed of 5,000 steps/second now! It's even faster than smoothstep =) (which runs at about 75% of the cubic time, or 150% of the new linear formula time).

As for separating the hardware-specific stuff? I'd have to leave that as an exercise to someone ready to support that specific hardware =) Given that it's all just C++, it's not a leap to replace Wiring with another GPIO library for linux. But there are only so many hours in the day which I have *grin* Until I have more than just me on the dev team here, I have to stick with one target at a time =)

!c

5000 steps should be enough I

5000 steps should be enough I would think. Wow that's fast. Do you have any idea about when you will post this new code? I would love to test this out.

I will do some porting to embedded Linux so that we have a camera controller that can set ISO, shutter speeds etc via the MocoBus: DevTeam++.

Will have the code uploaded

Will have the code uploaded soon (I'm hoping tonight, after we upgrade the site) - I have a largely working library (there are still some issues to work out about taking non-profiled moves [e.g. move 100 steps without specified accel and decel, or move continuously forever] that need to be worked out), and the documentation is still rough around the edges (but now with pictures and graphs =) - also, wanted to add smoothstep.

Additionally, as you know (but no one else, lol =) I had to get the bootloader development done to finish the next stag in a related project, so we also now have a proper bootloader that will let you upload sketches over the RS-485 network, and as soon as I test out the changes to the MoCoBus libraries, we can go up to 115,200bps on the MoCoBus, which is quite, quite nice!

I'll have the full bootloader source in there as well. Unfortunately, it's going to be quite a bit of re-work to get the timelapse engine re-factored around the new motion profiles, so the timelapse engine firmware won't be working when the libraries get checked in.

!c