-
-
Notifications
You must be signed in to change notification settings - Fork 174
[NimBLEClient] When connecting multiple devices, a Panic error occurs on the third device #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the info. That section of client code needs some work I think. If you comment out the whole case statement and test I’d be interested in the results. |
Just did a test using the connection parameters I can see in your log. With the client connected to all 3 servers I have no issues at all. For further investigation on your end I suggest you look here and un-comment those 3 statements and comment out the 3 below so you can see what the NimBLE stack is doing. |
I just had another look at this now and have a theory that the connection parameters are too tight for a third device. Reason being; the default minimum connection event time is 10ms, but the connection interval may be as low as 20ms, so there only room to schedule 2 connections at once. |
fter modifying modlog.h, there is no change in Panic errors after the suggested setConnectionParams (). I'm going to challenge JTAG debugging over the weekend.
|
Any chance to include backtrace? |
I don't know how to analyze backtrace because of my lack of skills... |
There may be one more thing. As far as i know NimBLE got CCC descriptors limit in menuconfig, is it possible it has been exceeded? Dont ask me how many CCC is set by default, because i dont know. You can use this tool standalone for backtrace decode: |
I can see from the nimble command sent that it appears it did not change the connection parameters in the reply. Do you still have your code modified to return 0 in the L2CAP_UPDATE gap event? If so please remove that and try again. |
@chegewara I have patched the problem in this library with the cccds. Side note: you can change the limits in src/nimconfig.h. |
Something strange happened. When I increased CONFIG_BT_NIMBLE_MAX_CCCDS in nimconfig.h from 8 to 16 or 24, the condition became worse, but when I changed it back to 8, Panic error did not occur. When comparing logs, it seems that 'Rejected peer params' are returned for BLE_GAP_EVENT_L2CAP_UPDATE_REQ. After all, all changes made to NimBLE including CONFIG_BT_NIMBLE_MAX_CCCDS have been canceled and there is no difference even if compared with git. Are connection parameters stored inside ESP32? |
By the way, in NimBLEClient :: getPeerType (), there is a process that checks whether there is a difference in the connection parameters for BLE_GAP_EVENT_L2CAP_UPDATE_REQ, and if there is no difference, returns BLE_ERR_CONN_PARMS. |
I'm sorry to bother you over and over. In the case of 'BLE_GAP_EVENT_L2CAP_UPDATE_REQ' in 'NimBLEClient :: handleGapEvent', it seems that 'client->m_pConnParams' was nullptr when a Panic error occurred. What does it mean that 'client->m_pConnParams' is nullptr ... |
No bother at all I’d like to get this stable so your input is greatly appreciated. I’ll have a look in a little while and get back to you with a better response. |
I don't know if this is correct, but I modified 'NimBLEClient :: handleGapEvent()' as follows. case BLE_GAP_EVENT_CONN_UPDATE_REQ:
case BLE_GAP_EVENT_L2CAP_UPDATE_REQ: {
if(client->m_conn_id != event->conn_update_req.conn_handle){
return 0; //BLE_HS_ENOTCONN BLE_ATT_ERR_INVALID_HANDLE
}
NIMBLE_LOGD(LOG_TAG, "Peer requesting to update connection parameters");
NIMBLE_LOGD(LOG_TAG, "MinInterval: %d, MaxInterval: %d, Latency: %d, Timeout: %d, min_ce_len:%d,max_ce_len:%d",
event->conn_update_req.peer_params->itvl_min,
event->conn_update_req.peer_params->itvl_max,
event->conn_update_req.peer_params->latency,
event->conn_update_req.peer_params->supervision_timeout,
event->conn_update_req.peer_params->min_ce_len,
event->conn_update_req.peer_params->max_ce_len);
rc = 0;
// if we set connection params and the peer is asking for new ones, reject them.
if(client->m_pConnParams != nullptr) {
if(event->conn_update_req.peer_params->itvl_min != client->m_pConnParams->itvl_min ||
event->conn_update_req.peer_params->itvl_max != client->m_pConnParams->itvl_max ||
event->conn_update_req.peer_params->latency != client->m_pConnParams->latency ||
event->conn_update_req.peer_params->supervision_timeout != client->m_pConnParams->supervision_timeout ||
event->conn_update_req.peer_params->min_ce_len != client->m_pConnParams->min_ce_len ||
event->conn_update_req.peer_params->max_ce_len != client->m_pConnParams->max_ce_len)
{
//event->conn_update_req.self_params->itvl_min = 6;//client->m_pConnParams->itvl_min;
rc = BLE_ERR_CONN_PARMS;
}
}
else {
rc = BLE_ERR_CONN_PARMS;
}
if(rc != 0) {
NIMBLE_LOGD(LOG_TAG, "Rejected peer params");
}
return rc;
} // BLE_GAP_EVENT_CONN_UPDATE_REQ, BLE_GAP_EVENT_L2CAP_UPDATE_REQ |
So after looking at the nimble code that handles this I think the cause might be in the stack but not necessarily a bug, that's yet to be determined. I think I need to explain this a bit from my interpretation, but fair warning I could be absolutely wrong, I don't know the nimble internals that well and they are hard to follow. What I think is happening is when you accept the peer parameters it changes the timer in the nimble host stack and because the timing between the 3 connections is so close it causes an error. So the code I asked you to use in your app forced the stack to reject the peer request but kept the parameters the same as they showed in your log. As the client connecting to the server you are also the master in control of the connection parameters and they should be set such that you can juggle the 3 connections. The I interpret their meanings to be the minimum and maximum length of the connection event that happens on each connection interval with the server. So if you have a 20ms interval and a 10ms minimum connection event time, this will mean there is only 10ms left, assuming more time was not needed. Now you add another device and it too uses 10ms, now you have no time left, but you add a 3rd device connection and get a crash. That said I'm not sure that's how it works. The information I learned today was that when the peer parameters are accepted it seems to reset internal nimble host timing and may throw an error somewhere with connection timing but I'm not sure. Lastly the code I had you try in your app creates the data for This is all just my theory and I need to find more ways to test it fully but I hope this gives you some idea of whats going on. |
Interesting findings you have with the MAX_CCCDS, I don't think that should play a role in your issue as you are not bonding from what I can see in your code and I have patched the library to not store that data unless bonding. I need to do some research on this. That modification to compare the |
Good news and bad news, I have just found a way to reproduce the problem reliably, bad news is the backtrace provided nothing to work with. Time to deep dive into NimBLE. |
@wakwak-koba After a lot of testing I believe I'm onto the issue and the results are amazingly better connection times :). If you wouldn't mind testing for me this simple change and post your results I'd be very grateful. With NO modification to the library, please use as is in this repo;
To:
Then test with your original code posted above without setting any connection parameters and let me know if it crashes. I think my theory above is somewhat correct and my tests so far have lead me to believe I need to implement some rules for these parameters, I just need to know what they should be. |
My theory is that should be option in one of those events to send back params in case they are not accepted:
This may be good lead and related to it. |
@chegewara Agreed, I think there needs to be a callback for the app developer to choose how to handle it. Most of my testing is connecting to other esp32 servers. There could be issues connecting to other devices that need to be accounted for. One thing seems constant though is you need more interval time for more client connections. Just an aside note; the backtrace simply indicates the error happens at vPortTaskWrapper() and is the only thing in the backtrace. Not sure what’s going on there but seems a potential timing issue. |
I just pushed up a bugfix branch, if you get a chance, check it out and let me know if it helps or not. |
NimBLEClient.cpp Missing |
Sorry about that, bad copy \ paste 😄. Fixed now. |
There is a very good story. #define BLE_GAP_INITIAL_CONN_MIN_CE_LEN 0x0008
#define BLE_GAP_INITIAL_CONN_MAX_CE_LEN 0x0024 I will test a little more, including reconnecting after disconnection, but the response is pretty good. |
That’s great, but also not haha. I really don’t want to alter the NimBLE core files if not necessary. That does indicate that my theory of how those values work was somewhat correct. I still haven’t found how they are used internally though, more work to do. |
I think this issue has been resolved because I have gained unparalleled stability in single task when connecting multiple devices. |
Awesome, glad it's working for you. I'm still experiencing a similar bug in some stress testing but it seems it could be an IDF issue. Feel free to reopen if you see this happening in the future. |
@wakwak-koba @chegewara I think I finally discovered the underlying cause of this issue. Try settting those 2 defines above to 0x0000, the stability speed and reliability in my testing so far is amazing, and much better than the ones above. What I think i found is the esp32 BLE controller is what was using those parameters. NimBLE does not actually do anything with them, that's why they never changed them. They do get passed from NimBLE to the controller though, so 0x0000 should tell it to ignore those values. |
I got the commit up to 2 days ago, but I don't see any change. BLE_GAP.H
|
Yes, I have not changed them in the repo yet as I’m trying to understand them better first. I’m currently discussing them with Espressif. I just meant to try changing them local copy to 0 as a test, you may experience better performance all together. |
Removed previouly un-merged patches for unbonded CCCD storage and `ble_hs_misc_conn_chan_find_reqd()` null pointer exception fix. Applied offically merged versions: mynewt-nimble #730 - nimble/host: Fix setting connection flags after pairing mynewt-nimble #804 - nimble/gap: Fix storing CCC for bonded devices. mynewt-nimble #802 - nimble/host: Add return parameter to the ble_hs_misc_conn_chan_find_reqd() Retaining un-merged: esp-nimble #7 - Allow host to resolve peer RPA without using local RPA.
Removed previouly un-merged patches for unbonded CCCD storage and `ble_hs_misc_conn_chan_find_reqd()` null pointer exception fix. Applied offically merged versions: mynewt-nimble #730 - nimble/host: Fix setting connection flags after pairing mynewt-nimble #804 - nimble/gap: Fix storing CCC for bonded devices. mynewt-nimble #802 - nimble/host: Add return parameter to the ble_hs_misc_conn_chan_find_reqd() Retaining un-merged: esp-nimble #7 - Allow host to resolve peer RPA without using local RPA.
Removed previouly un-merged patches for unbonded CCCD storage and `ble_hs_misc_conn_chan_find_reqd()` null pointer exception fix. Applied offically merged versions: mynewt-nimble #730 - nimble/host: Fix setting connection flags after pairing mynewt-nimble #804 - nimble/gap: Fix storing CCC for bonded devices. mynewt-nimble #802 - nimble/host: Add return parameter to the ble_hs_misc_conn_chan_find_reqd() Retaining un-merged: esp-nimble #7 - Allow host to resolve peer RPA without using local RPA.
Removed previouly un-merged patches for unbonded CCCD storage and `ble_hs_misc_conn_chan_find_reqd()` null pointer exception fix. Applied offically merged versions: mynewt-nimble #730 - nimble/host: Fix setting connection flags after pairing mynewt-nimble #804 - nimble/gap: Fix storing CCC for bonded devices. mynewt-nimble #802 - nimble/host: Add return parameter to the ble_hs_misc_conn_chan_find_reqd() Retaining un-merged: esp-nimble #7 - Allow host to resolve peer RPA without using local RPA.
Removed previouly un-merged patches for unbonded CCCD storage and `ble_hs_misc_conn_chan_find_reqd()` null pointer exception fix. Applied offically merged versions: mynewt-nimble #730 - nimble/host: Fix setting connection flags after pairing mynewt-nimble #804 - nimble/gap: Fix storing CCC for bonded devices. mynewt-nimble #802 - nimble/host: Add return parameter to the ble_hs_misc_conn_chan_find_reqd() Retaining un-merged: esp-nimble #7 - Allow host to resolve peer RPA without using local RPA.
This updates the esp-nimble #7 patch to the latest PR.
I am testing with multiple BLE Buttons with Bluetooth HID profile.
Pressing the button shows the equivalent behavior of pressing a key on the keyboard.
It seems to be very very stable compared to the original BLEClient !!
In simultaneous connection, if the number of BLE Buttons is one or two, no problem has been found, but if it is three, a Panic error will occur immediately after pairing.
The log is too long, so here is the most recent panic error:
I don't know how to look at the backtrace, but it seems that an error occurs immediately after NimBLEClient :: handleGapEvent receives BLE_GAP_EVENT_L2CAP_UPDATE_REQ generated by the device and returns rc = 0 indicating normality.
Forcing the device to reject the request does not cause the Panic error, albeit with a different problem.
Is there anything you can think of ?
I'm just getting started with JTAG, but I'm glad if I can tell you what to try.
The text was updated successfully, but these errors were encountered: