-
-
Notifications
You must be signed in to change notification settings - Fork 7k
SoftwareSerial improvements, fixes and optimizations #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SoftwareSerial improvements, fixes and optimizations #2032
Conversation
See #2019 for the script I used to test all baudrates, btw. |
Can I build this pull request? |
Can one of the admins verify this patch? |
Good point, fixed. |
I get this after a reset (tested with 9600 and 115200):
Arduino Uno -> FTDI Adapter In the loop i simply translate softserial to normal serial. Apart from the y thing it seems to work fine for the first look with 115200. Great! I will try to integrate it in my PinchangeInterrupt Library, so we can still use PinChangeInterrupt with Arduino and maybe also integrate this as well to the core? Edit: |
@ArduinoBot build this please |
Merged build finished. Test PASSed. |
It works, why does nobody merge this? Its a perfect patch! |
Also included the SoftSerial fix by matthijskooijman arduino/Arduino#2032
@NicoHood, seems I missed your comment on erronous initial bytes before. Can you explain your setup in a bit more detail so I can try to reproduce? |
Use an Arduino Uno, setup softserial and print the hello world in the setup() function. read from another independent arduino via hardware serial. Then reset the uno. on a reset it will always output the y. I dont know if this is a problem of the HW serial from the 2nd arduino or the softserial of the uno. It happens for 9600 and 115200 baud rate (at probably all others too). |
I had a go at reproducing this, with some success. Sometimes I received a 0x00 byte when pressing the reset button, sometimes 0xff (which is shown as ÿ in ISO 8859-1) when releasing the reset button. Looking at the signal everything looks good, but during reset the serial pin is floating. Sometimes it would go low when pressing the reset button, which looks like a start bit followed by all zeroes, causing 0x00 to be received (which is technically incorrect on the receiving side, since the stop bit is missing). Sometimes it would very shortly go low when releasing the reset, which looks like a start bit followed by one bits, causing 0xff to be received. So the bottom line is: During reset, pins are floating, so anything might happen. If this is a problem, add an external pullup to keep the pin high during reset (just tested this, works for me). In other news, someone contacted me by mail asking if this PR was merged yet. He manually applied it, which solved his softwareserial problems on his Yun. |
One more thing: the erratic reset stuff also happens without this PR applied, as can be expected. |
Does this also happens on HW Serial ports? As I remember it does, so its a general problem on any pin, not related to SoftSerial, more related to a general Serial/floating pin problem? Yeah, merge it guys! He did a great job! |
Before enabling interupts, begin would see if the given receive pin actually has an associated PCINT register. If not, the interrupts would not be enabled. Now, the same check is done, but when no register is available, the rx parameters are not loaded at all (which in turn prevents the interrupt from being enabled). This allows all code to use the same "is rx enabled" (which will be added next).
In this case, SoftwareSerial::begin will not have enabled the interrupts, so better not allow the SoftwareSerial instance to enter the listening state either.
The current check is still always false when the old check was, but additionally it will not disable the interrupts when they were never enabled (which shouldn't matter much, but this is more consistent).
This moves the interrupt mask enabling / disabling code into a separate method, so we can call it from multiple spots next.
This allows one to explicitly stop a SoftwareSerial instance from listening, without having to make another one listening.
This prevents interrupts from triggering when the SoftwareSerial instance is not even listening. Additionally, this removes the need to disable interrupts in SoftwareSerial::listen, since no interrupts are active while it touches the variables.
If an interrupt causing overflow would occur between reading _buffer_overflow and clearing it, this overflow condition would be immediately cleared and never be returned by overflow(). By only clearing the overflow flag if an overflow actually occurred, this problem goes away (worst case overflow() returns false even though an overflow _just_ occurred, but then the next call to overflow() will return true).
stopListening also disabled the interrupt, if needed, so calling that function makes more sense. Since stopListening only disables the interrupt when the current SoftwareSerial is the active object, and that can only be the case when _rx_delay_stopbit is non-zero, there is no need to separately check _rx_delay_stopbit anymore.
Before, there was nearly identical code for the inverted and regular cases. However, simply inverting the byte in the inverted case allows using the regular code twice, reducing the generated code size by 100 bytes (on an Arduino Uno and gcc 4.3, on gcc 4.8 the reduction is 50 bytes).
Somehow gcc 4.8 doesn't inline this function, even though it is always called with constant arguments and can be reduced to just a few instructions when inlined. Adding the always_inline attribute makes gcc inline it, saving 46 bytes on the Arduino uno. gcc 4.3 already inlined this function, so there are no space savings there.
This change restructures the loop, to help the compiler generate shorter code (because now only the LSB of the data byte is checked and subsequent bytes are shifted down one by one, it can use th "skip if bit set" instruction). Furthermore, it puts most attributes in local variables, which causes the compiler to put them into registers. This makes the timing-critical part of the code smaller, making it easier to provide accurate timings. On an Arduino uno using gcc 4.3, this saves 58 bytes. On gcc 4.8, this saves 14 bytes.
Similar to SoftwareSerial::write, this rewrites the loop to only touch the MSB and then shift those bits up, allowing the compiler to generate more efficient code. Unlike the write function however, it is not needed to put all instance variables used into local variables, for some reason the compiler already does this (and doing it manually even makes the code bigger). On the Arduino Uno using gcc 4.3 this saves 26 bytes. Using gcc 4.8 this saves 30 bytes. Note that this removes the else clause in the code, making the C code unbalanced, which looks like it breaks timing balance. However, looking at the code generated by the compiler, it turns out that the old code was actually unbalanced, while the new code is properly balanced.
Previously, up to four separate but identical ISR routines were defined, for PCINT0, PCINT1, PCINT2 and PCINT3. Each of these would generate their own function, with a lot of push-popping because another function was called. Now, the ISR_ALIASOF macro from avr-libc is used to declare just the PCINT0 version and make all other ISRs point to that one, saving a lot of program space, as well as some speed because of improved inlining. On an Arduino Uno with gcc 4.3, this saves 168 bytes. With gcc 4.8, this saves 150 bytes.
Since those functions are only called once now, it makes sense to inline them. This saves a few bytes of program space, but also saves a few cycles in the critical RX path.
This shortens the generated code a bit more.
This precalculates the mask register and value, making setRxIntMask considerably less complicated. Right now, this is not a big deal, but simplifying it allows using it inside the ISR next.
Before, the interrupt would remain enabled during reception, which would re-set the PCINT flag because of the level changes inside the received byte. Because interrupts are globally disabled, this would not immediately trigger an interrupt, but the flag would be remembered to trigger another PCINT interrupt immediately after the first one is processed. Typically this was not a problem, because the second interrupt would see the stop bit, or an idle line, and decide that the interrupt triggered for someone else. However, at high baud rates, this could cause the next interrupt for the real start bit to be delayed so much that the byte got corrupted. By clearing the interrupt mask bit for just the RX pin (as opposed to the PCINT mask bit for the entire port), any PCINT events on other bits can still set the PCINT flag and be processed as normal. In this case, it's likely that there will be corruption, but that's inevitable when (other) interrupts happen during SoftwareSerial reception.
Instead of using a lookup table with (wrong) timings, this calculates the timings in SoftwareSerial::begin. This is probably a bit slower, but since it typically happens once, this shouldn't be a problem. Additionally, since the lookup tables can be removed, this is also a lot smaller, as well as supporting arbitrary CPU speeds and baudrates, instead of the limited set that was defined before. Furthermore, this switches to use the _delay_loop_2 function from avr-libc instead of a handcoded delay function. The avr-libc function only takes two instructions, as opposed to four instructions for the old one. The compiler also inlines the avr-libc function, which makes the timings more reliable. The calculated timings directly rely on the instructions generated by the compiler, since a significant amount of time is spent processing (compared to the delays, especially at higher speeds). This means that if the code is changed, or a different compiler is used, the calculations might need changing (though a few cycles more or less shouldn't cause immediate breakage). The timings in the code have been calculated from the assembly generated by gcc 4.8.2 and gcc 4.3.2. The RX baudrates supported by SoftwareSerial are still not unlimited. At 16Mhz, using gcc 4.8.2, everything up to 115200 works. At 8Mhz, it works up to 57600. Using gcc 4.3.2, it also works up to 57600 at 16Mhz and up to 38400 at 8Mhz. Note that at these highest speeds, communication works, but is still quite sensitive to other interrupts (like the millis() interrupts) when bytes are sent back-to-back, so there still are corrupted bytes in RX. TX works up to 115200 for all combinations of compiler and clock rates. This fixes arduino#2019
The debugPulse function definition already checks for _DEBUG, resulting in an empty function definiton and the function call being optimized away.
Previously, the TX pin would be set to output first and then written high (assuming non-inverted logic). When the pin was previously configured for input without pullup (which is normal reset state), this results in driving the pin low for a short when initializing. This could accidenttally be seen as a stop bit by the receiving side. By first writing HIGH and then setting the mode to OUTPUT, the pin will have its pullup enabled for a short while, which is harmless.
4abea08
to
90ca393
Compare
Good point. I just tried hardware TX. I had to use an external pulldown to trigger the problem, since the pin stayed high. With the pulldown, I saw 0x00 reads on hardware serial too. I did notice that I didn't see an 0x255 read without the pulldown, so no short low pulse when releasing reset. This got me thinking: The SoftwareSerial initialization actually switches to OUTPUT LOW for a short while before switching to OUTPUT HIGH, causing this 0x255 read. By switching initialization order, this can be prevented. I added one more commit which does this, so now you might be able to reset without any junk being read (but since floating pins are involved, you might still get junk of course) I also rebased the PR on top of the current ide-1.5.x branch. |
Yeah that makes sense. Setting a pin to output results in a low signal, input in high (used that for 3.3v gamecube controller on 5v pins once). I am really excited to merge this PR as you can see. It fixes all problems we had so far really good. |
Someone has to test it with 8mhz and then it can be merged, i'd say. Not sure if the original supported 8mhz though. I really want to see this in the official next update, since it is such a great fix. |
Results with 8MHZ: Regarding to this, it is acceptable for 8MHZ 115200 (8mhz) for HW Serial works crapy, so why should SS do it at 8mhz :D No other bugs found yet. As stated many times, this PR is ready to merge in my opinion. |
My results with UNO: No bugs found so far @matthijskooijman , these patches are outstanding and such a great speed improvement over the previous version! |
Thanks! |
someone please help me get this patch so that I can test it on my Arduino IDE 1.6.6? Im getting no data from a jpeg camera and Im thinking this might be the issue. |
@quique123, these changes are already included in 1.6.6, so they won't help you :-) For getting help with your issue, I recommend posting on the forum, or asking on the #arduino IRC chat channel. |
Oh ok. Thanks. I have already posted in the forum. It seems like nobody knows what is going on. I've asked the manufacturer and the distributor and no one has been able to help me. MV
|
Also included the SoftSerial fix by matthijskooijman arduino/Arduino#2032
Also included the SoftSerial fix by matthijskooijman arduino/Arduino#2032
This pullrequest contains a bit of everything, for the SoftwareSerial library. The first couple of commits I still had lying around from last year, forgot to send a pullrequest for those earlier. They're mostly cleanups around the introduction of a
stopListening()
method, that can be made to disable RX of a SoftwareSerial instance.The second half of the commits are a lot of optimizations and fixes, to make the SoftwareSerial code faster and more reliable, so it works again at higher baudrates. This is only partly succesful for RX, since at higher baudrates where bytes are sent back-to-back, the CPU spends most of the time in the interrupt handler. At higher baudrates, the interrupt overhead (cleanup and setup) becomes longer than than the available time (from halfway the stopbit to halfway the startbit) and reception starts to fail.
This problem is particularly evident when running at 8Mhz, or when using gcc 4.3.2 (which makes terrible register allocation choices). Using gcc 4.8.2 and running at 16Mhz, things work up to 115200 (though bytes are still regularly corrupted because of the millis() timer ISR delaying the SoftwareSerial interrupt). Using 1.5 or 2 stop bits on the other side will probably help here (though I haven't tested this, and there is no way to make SoftwareSerial do TX with longer stop bits.
There is probably some room for improvment still, but that would require the RX ISR /
recv
method to be handcoded in assembly. The upside of that would be that it's 100% deterministic, without depending on the compiler used, but it's a lot of work that I'm not going to invest right now (getting this far has cost me enough time already).These commits also remove the delay lookup tables that were previously present, since the values can easily be calculated at runtime. This saves around 160 bytes of program memory.
In addition to making the code faster, it is also a lot smaller. Overall, a typical SoftwareSerial sketch on the Uno saves slightly over 400 bytes of program space (including the savings for the lookup table removal).