-
Notifications
You must be signed in to change notification settings - Fork 3k
TCP Packet loss happens due to running of out of Mailbox size. #11517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TCP Packet are getting lost and application fails to receive the packets because the recv_tcp API fails when the receive mailbox size gets full.
Add an LWIP_ASSERT with the error and also when MEMP_TCPIP_MSG_INPKT fails to get memory.
This issue is seen when Streaming Audio from URL http://stream2.dancewave.online:8080/dance.mp3 using HTTP Chunked transfer encoding https://en.wikipedia.org/wiki/Chunked_transfer_encoding
Please review these changes to make the TCP Receive Mailbox size, TCP Default Mailbox and Mailbox configurable via mbed_lib.json. |
@balajicyp, thank you for your changes. |
@@ -253,6 +253,7 @@ tcpip_inpkt(struct pbuf *p, struct netif *inp, netif_input_fn input_fn) | |||
|
|||
msg = (struct tcpip_msg *)memp_malloc(MEMP_TCPIP_MSG_INPKT); | |||
if (msg == NULL) { | |||
LWIP_ASSERT( "failed to alloc MEMP_TCPIP_MSG_INPKT\n", msg != NULL ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a place to assert. That brings the system down just because it failed to allocate a buffer for a received packet. It means the system can be crashed by a trivial packet flood attack.
As far as I can see, a failure to allocate here just drops a packet - as long as we eventually get some memory back, and the sender is retransmitting, progress should be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm yes I agree I will remove the assert.
The UDP packet receive flooding can cause this assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will remove this
/* don't deallocate p: it is presented to us later again from tcp_fasttmr! */ | ||
LWIP_ASSERT("sys_mbox_trypost failed ret != ERR_OK\n", (ret == ERR_OK)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand your failure mode here. It appears the intent is that if this recv_tcp
fails, the TCP core holds on to the acked data that was not taken by the app and tries to offer it again later. (See comment "If the upper layer can't receive this data, store it" in tcp_input
, and the comment above).
Why isn't this mechanism working? Is there a deadlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this mechanism does not work and corrupted data is received ( TCP out of order data), because the TCP Streams received has to be particular order for the Chunk Data, it looks for the data 0x0d 0x0a followed by length of the data. When the TCP stream is fail to receive and the next tag index does not match the header (0x0d 0x0a) the audio stops working because of corrupted data.
In the below if you look at the last two TCP sequence numbers there is a packet loss (4880) bytes where each chunk data is of 1400 bytes, the chunk data can be in separate TCP streams, and it keeps track of the start and end of the data and it looks for the marker (0d 0a)
ack no:352366379
ack no:352361499
Log::
netconn_recv_data: received 0x800caa8, len=6340
tcp_recved: received 6340 bytes, wnd 9720 (8760).
tcp seqno:4049006799 ack no:352343019
tcp_r len:2047 sys_mbox_trypost failed..
sys_mbox_trypost failed..
sys_mbox_trypost failed..
Done with chunk: idx 406, data_length 2047 size:406 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
Done with chunk: idx 1813, data_length 2047 size:1400 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
tcp_r len:2047 sys_mbox_trypost failed..
Done with chunk: idx 1173, data_length 2047 size:1173 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
tcp_r len:2047 Done with chunk: idx 533, data_length 2047 size:533 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
Done with chunk: idx 1939, data_length 2047 size:1399 chunk_size:1399 chunk_bytes:1399
chunk data: d a 35
tcp_r len:199 sys_mbox_trypost failed..
netconn_recv_data: received 0x800cd48, len=1460
tcp_recved: received 1460 bytes, wnd 3880 (14600).
tcp seqno:4049006799 ack no:352350319
tcp_r len:1460 Done with chunk: idx 1100, data_length 1460 size:1100 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
sys_mbox_trypost failed..
netconn_recv_data: received 0x800d3a8, len=1460
tcp_recved: received 1460 bytes, wnd 3880 (14600).
tcp seqno:4049006799 ack no:352351779
tcp_r len:1460 Done with chunk: idx 1047, data_length 1460 size:1047 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
netconn_recv_data: received 0x800da08, len=1460
tcp_recved: received 1460 bytes, wnd 5340 (13140).
tcp seqno:4049006799 ack no:352351779
tcp_r len:1460 Done with chunk: idx 994, data_length 1460 size:994 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
netconn_recv_data: received 0x800e068, len=1460
tcp_recved: received 1460 bytes, wnd 1460 (17020).
tcp seqno:4049006799 ack no:352357119
tcp_r len:1460 Done with chunk: idx 941, data_length 1460 size:941 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
netconn_recv_data: received 0x800e6c8, len=1460
tcp_recved: received 1460 bytes, wnd 2920 (15560).
tcp seqno:4049006799 ack no:352357119
tcp_r len:1460 Done with chunk: idx 888, data_length 1460 size:888 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
netconn_recv_data: received 0x800b128, len=1460
tcp_recved: received 1460 bytes, wnd 4380 (14100).
tcp seqno:4049006799 ack no:352357119
tcp_r len:1460 Done with chunk: idx 835, data_length 1460 size:835 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
netconn_recv_data: received 0x80104dc, len=5840
tcp_recved: received 5840 bytes, wnd 5840 (12640).
tcp seqno:4049006799 ack no:352361499
tcp_r len:2047 Done with chunk: idx 782, data_length 2047 size:782 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
tcp_r len:2047 Done with chunk: idx 142, data_length 2047 size:142 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
Done with chunk: idx 1549, data_length 2047 size:1400 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
tcp_r len:1746 Done with chunk: idx 909, data_length 1746 size:909 chunk_size:1400 chunk_bytes:1400
chunk data: d a 35
netconn_recv_data: received 0x800e580, len=1460
tcp_recved: received 1460 bytes, wnd 2420 (16060).
tcp seqno:4049006799 ack no:352366379
tcp_r len:1460 Done with chunk: idx 570, data_length 1460 size:570 chunk_size:1400 chunk_bytes:1400
chunk data: 65 41 14
Something strange!!! idx 570, data_length 1460
65 41 14 20 16 8a 8b 25 25 05 d2 f6 a8 e8 7d c3
54 d9 9b 0f 33 57 be 95 39 ef 75 32 9b b5 73 98
c6 5d a5 af 65 7a 80 eb 37 c9 52 91 18 c8 24 a9
cb c1 9a b8 57 10 43 13 76 0e 9e 31 7d d6 77 ab
89 96 bb f3 5d 9b cb 0c 22 a3 cc 2e 13 85 9c 72
7d 01 e0 8a 85 0f 83 91 4b 36 ff fb 90 64 c3 8f
f2 7a 2d 58 03 4f 1a 60 53 66 3a e0 2f 03 5c 09
a0 95 60 0c bc a9 41 53 96 6b 80 77 95 30 8e 7a
ed 2a f2 a5 dc a9 c8 b9 9d 56 a7 a7 a6 61 4e 49
57 eb 42 e1 e6 11 a4 91 34 44 38 90 89 9c 05 94
50 28 19 ce 80 ce 0b 59 ac 7f 07 40 49 99 32 c3
39 d3 1d e3 2a ed ef 84 c4 ad 78 87 24 46 29 7e
1b c9 48 c8 af 54 fd b1 4a 86 87 d1 2c 64 7c 6c
d7 5f 1b 5c 0a b7 5e ae 58 cf 35 e7 fd bf 87 12
10 3d 8c 6d e7 f8 d2 d7 b9 84 00 cf 16 e3 e7 66
01 0f c0 88 44 48 41 97 80 90 fc 7d d2 1a 03 0e
28 70 82 5d f5 3c d1 81 71 b9 a6 bc bb e5 8e 51
e7 df 70 a2 0f 7b 2a e5 94 b5 6f 73 f4 c9 67 cd
09 0b 54 98 13 07 43 ce 49 f5 09 06 92 6f e8 1c
6f 08 21 4f e2 7f fa 97 f2 f7 cd f1 55 27 de 81
a9 6f 77 47 f3 97 7b 35 9e 0f b3 fc ad 55 dd a6
be 06 19 77 59 f9 e0 11 35 e2 bf 52 ea 18 b2 4d
24 24 39 d0 7a ce 43 83 11 97 df a2 67 52 f8 45
34 b2 d5 5b 58 cd ea e7 ef bf 96 7a fe 77 0e 73
3f e7 3e b6 5d d6 b1 df 79 f8 7d 9d 4f 84 da 58
42 b1 d9 06 a8 46 40 33 7a d4 0e 67 92 e9 56 0e
32 83 9e 27 7e 50 1a 53 75 b9 31 72 30 ef 32 e1
19 45 e0 57 69 10 99 a5 af 99 1a 1c 34 e3 a2 98
c7 65 cd 82 4c 98 68 81 16 5d f0 f3 a3 21 80 62
ea 37 1f 90 34 b5 0f 5b 2d 59 24 92 5e 42 db 41
6d 06 02 49 05 4f 34 a0 8c 02 7d a6 b0 f4 f5 e4
42 53 9c b6 58 d6 51 c1 d3 4e 47 ff fb 92 64 df
8f f2 7e 1a d7 83 2f 61 c0 52 a6 4a d0 31 e3 5c
4c a9 09 5e 15 95 80 01 71 93 ab 02 b1 80 00 75
2d dd e8 7d 10 07 8f 05 ae b8 b3 f7 1c ae b5 eb
5b 63 2c 35 a9 30 07 8d 5b d0 22 ad 4a b1 fd 6c
cb ba 02 53 36 2a d2 e8 27 17 3b bf 4b 28 77 e4
e9 70 a9 67 cc 80 43 47 54 20 09 e0 f3 a1 f2 39
21 aa c8 6e 83 20 54 c1 63 c0 f9 c3 b2 aa 93 b6
b9 96 3b c1 39 20 b8 d2 27 81 46 0e 38 b1 13 4d
7b b4 c9 d2 22 20 00 bc 4c b1 bb 96 70 28 f0 61
0b a2 80 76 73 f6 79 dc f5 ff ff ff ff f1 b6 69
4f 28 4e ba 3d ff ff ff ff 85 82 16 0b 0f 44 27
35 a4 bd d4 fd bb 35 94 cd 7a d0 e5 26 70 0f 24
2c 54 42 32 a2 08 26 ae de a0 7a 04 84 9c 90 50
bd 20 83 d8 6c 19 63 c4 2d 42 fc 61 88 48 34 d2
0d cd ec 4a f7 af d0 26 42 95 b4 e6 79 1f 78 ad
ec e9 db f3 af 64 fc bd bf 89 ef 69 df bf 71 7f
78 b0 95 2c 6a 56 68 8f 67 d4 28 b3 c0 dc 35 19
e6 65 c5 8d 15 0f 39 67 1e a9 8d 21 69 a4 47 37
90 7d a1 61 5c 72 1c 0a 36 25 63 d9 22 0f 33 4d
f3 89 e1 73 c5 1a b5 24 4a 6e 5c 6f 19 be e0 a1
8a 05 72 71 41 19 40 84 2b f2 79 23 15 c2 8d 42
ab 1d 6d 08 7c 53 f6 99 fe 9f ff ff ff c7 3a 1e
b3 cd fe ff ff fe 67 23 d4 45 00 8d 0d 0a 35 37
37 0d 0a c4 89 70 80 e7 18 6d
0017 00:02:37.121 Audio client HTTP thread ended.
0018 00:02:37.127 audio_client_http_reader_stop
0019 00:02:41.461 Setting EOS flag on last buffer (122368)
0020 00:02:41.461 audio_client_mp3: mp3_thread exit
0021 00:02:42.237 Playback EOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for info, I'll inspect this log and the code and see if I can deduce anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The apparent bug will take a bit longer to investigate, and I'd rather fix it than put an "assert" into the failure path.
But if you want to make this be a PR for the configuration options alone, that's fine. I suggest moving investigation of the TCP bug to a separate issue.
"help": "Default TCPIP receive mailbox size. Current default (used if null here) is set to 8 in lwipopts.h, unless overridden by target Ethernet drivers.", | ||
"value": null | ||
}, | ||
"mbox-size": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting doesn't exist in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, It looks like the TCP Fast timer will send packets but meanwhile if the new TCP data is received ( and receive mailbox is not full) then new TCP Data arrives first before the failed TCP packet.
I am fine with having a separate PR for this TCP bug.
I am fine with this PR for configuration options alone.
I had the mbox-size and tcp receive mailbox size, default tcp mail box size in targert overrides sections in mbed_app.json , I will move into this mbed_lib.json.
features/lwipstack/mbed_lib.json
Outdated
"tcp-mss": { | ||
"help": "TCP Maximum segment size. Current default (used if null here) is set to 536 in opt.h, unless overridden by target Ethernet drivers.", | ||
"value": null | ||
}, | ||
"tcpip-mbox-size": { | ||
"help": "TCPIP mailbox size. Current default (used if null here) is set to 8 in lwipopts.h, unless overridden by target Ethernet drivers.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two settings - tcpip-mbox-size, default-tcp-recvmbox-size have never had #ifndef
on them, so the "overridden by target Ethernet drivers" doesn't apply.
The null
stuff only applies when we want to defer to either some target defines (basically a pre-JSON legacy mechanism), or we want to defer to lwIP's own default. Here, lwIP has no default and we have to specify.
Probably simplest to make them straight options with no defaults
"tcpip-mbox-size": {
"help": "TCPIP mailbox size.",
"value": 8
},
#define TCPIP_MBOX_SIZE MBED_CONF_LWIP_TCPIP_MBOX_SIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will specify this to default 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the blurb about "defaults" in the help can go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
07b223e
to
121e7bd
Compare
I am done with my requested changes, if anything required please mention. |
@@ -46,7 +46,11 @@ typedef struct { | |||
} sys_mutex_t; | |||
|
|||
// === MAIL BOX === | |||
#ifdef MBED_CONF_LWIP_MBOX_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need an #ifdef
. Default 8 can be in the JSON.
features/lwipstack/lwipopts.h
Outdated
#define MEMP_NUM_TCPIP_MSG_INPKT MBED_CONF_LWIP_MEMP_NUM_TCPIP_MSG_INPKT | ||
#endif | ||
|
||
#ifdef MBED_CONF_LWIP_TCPIP_MBOX_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need an #ifdef
. Default 8 can be in the JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
features/lwipstack/lwipopts.h
Outdated
#define TCPIP_MBOX_SIZE 8 | ||
#endif | ||
|
||
#ifdef MBED_CONF_LWIP_DEFAULT_TCP_RECVMBOX_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need an #ifdef
. Default 8 can be in the JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
features/lwipstack/mbed_lib.json
Outdated
"tcp-mss": { | ||
"help": "TCP Maximum segment size. Current default (used if null here) is set to 536 in opt.h, unless overridden by target Ethernet drivers.", | ||
"value": null | ||
}, | ||
"tcpip-mbox-size": { | ||
"help": "TCPIP mailbox size. Current default (used if null here) is set to 8 in lwipopts.h, unless overridden by target Ethernet drivers.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the blurb about "defaults" in the help can go.
features/lwipstack/mbed_lib.json
Outdated
@@ -179,12 +195,16 @@ | |||
"tcpip-thread-stacksize": 8192, | |||
"default-thread-stacksize": 640, | |||
"memp-num-tcp-seg": 24, | |||
"memp-num-tcpip-msg-inpkt": 32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of making these settings per-target - PSOC6 isn't particularly in need of this, it's a particular application. This will increase memory usage for other applications.
But I guess if PSOC6 has plenty of memory to spare, generally increasing the defaults is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, initial thought was application will override these value of default. I will remove it so that PSOC6 chips with low memory does not take a hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I will remove this change and application will override these values. The PSOC6 with low memory will not take hit as I will remove this from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, but squash all the commits together and give them a neater message.
By the way, I've not found time to look into the actual lwIP bug yet. It would be a good idea for you to submit your report to lwIP's own support channels, as it doesn't appear to be an Mbed OS-specific issue. |
@balajicyp Please squash commits as requested above, will run CI right after. |
62be0a5
to
4f8036b
Compare
CI started |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
Description
TCP Packets are getting lost when the TCP receive mailbox is full and the TCP packets which are ACK-ed by tcpip_thread is not forwarded to application by recv_tcp (i.e to API netconn_recv_data_tcp calls netconn_recv_data calls sys_arch_mbox_tryfetch ( &conn->recvmbox..) which fails to post)
Make the TCP default mailbox, TCP Receive Mailbox and Mailbox to be configurable via mbed_lib.json file.
This issue is found while streaming an Audio from URL http://stream2.dancewave.online:8080/dance.mp3 ( which is using HTTP Chunked Transfer Encoding) https://en.wikipedia.org/wiki/Chunked_transfer_encoding
Pull request type
Reviewers
Release Notes