-
Notifications
You must be signed in to change notification settings - Fork 7.6k
WiFi Client Socket disconnecting #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I read the code and what I can see is that when a client connects, it get's into a loop that sends Hello World every 100ms. Is that what you are getting? Can you post some log? |
Hi me-no-dev
By the way I did not mention, the TCP Client app I am using on the iPad is 'Telnet Lite', which is a freebie to connect to TCP server sockets. I know this works reliably, I have the same connection tested running to an RPI running as a SoftAP, streaming the same data I am trying to supply from the esp32 after a long period of time the client reports that the server has disconnected.
and finally
and no further output. If I can add additional debug either into my code, or the libraries - that may help, but I am unclear how to do this. For example, I have no way of determining the cause of the disconnect, under POSIX I would usually see something like a SIGPIPE, or that the socket write returned an errno for EAGAIN. I know that most applications I have seen, tend to use webserver type connections, which are short lived TCP connections, or UDP which are not connection based, whereas in this instance, I need a TCP connection which persists for a long period of time. Kind Regards (Update)
On the call to select(), I notice the tv.tv_usec is set to 1000000, which of course is 1 second, does the value of 1000000 possibly get masked in select() by any chance, as this should probably overflow into the tv.tv_sec field, to make the value {tv.tv_sec = 1; tv.tv_usec = 0}, and if I could enable the log_e() messages to the monitor, then it would be useful to add one for the select() error return for (select(...) < 0); apologies for the long post, or if I am misunderstanding the send()/recv() semantics on esp32 |
After adding some debug to WiFiClient.cpp, I am getting an errno of 104 from the call to send() in WiFiClient::write. This is the error code I am also monitoring WiFi events through the use fo the callback WiFi.onEvent(cbfunc), and I do not see any WiFi event associated with this callback any help gratefully received |
the error you get means that the other end sent RST packet A RST packet is sent either in the middle of the 3-way handshake when the server rejects the connection or is unavailable OR in the middle of data transfer when either the server or client becomes unavailble or rejects further communication without the formal 4-way TCP connection termination process. |
Log messages level is enabled through the board menu in Arduino IDE or through menuconfig in IDF |
To see the current error: #include <errno.h>
void printPosixError(){
Serial.print("Current Socket Error Number: ");
Serial.println(errno, DEC);
} |
There is a reason why the code does not retry on |
Hi me-no-dev
I tried setting this to "Core Debug Level : Debug"
If I reset back to "None" I do not get this issue. E. |
You should have Serial.begin as first thing in the setup, else locks will not be initialized and bad things might happen. Also those backtraces can be decoded using https://github.com/me-no-dev/EspExceptionDecoder |
Thanks for the pointer, so I get the following
I tried previously with a different client running on Linux/Ubuntu, and pretty sure I got the same result, I will confrim this is the case and post later |
Having now connected from a x86 Linux/Ubuntu telnet client I see exactly the same behavior. At the moment I am running an experiment with wireshark / tcpdump (on the Linux host) connected to the wireless interface, monitoring the conection to the esp32, in the hope this will provide some pointers regarding where the problem exists. I have looked in detail at the esp32 WiFiClient and WiFiServer library code, these are thin layers on top of the underlying socket library, so it is unlikely that the problem exists here. This problem is very simple for anybody to reproduce, so I am looking for advice here - how and who could progress this ? debugging TCP/IP stacks is very much outside of my comfort zone. Maybe I am asking too much of this device/library, it does seem that in most applications I have seen for the esp32, the socket connection times are short bursts of information such as the HTTP request/response model, or examples which use UDP, which is inherently lossy Are there any appliations for this device which use long established persistent sockets ? E. |
After staring at lots of TCPDUMP outputs, I have noticed when the transactions seem to go wrong, normal sequences look like this
and I see hundreds of thousands of these - all successful.
Now the TCP connection is broken. I am no expert, so I would say either the ESP32/TCP-IP-Library is straying from the Spec, or the client telnet program is incorrect. does anybody know what this transaction means ? I am unable to rectify this issue myself, and it is blocking me from completing my design, if I cannot get a solution I think my only alternative is to switch to the RTL8710 in the hope that this does not suffer the same problem in the TCP-IP library - I really do not want to start coding from scratch again :-( E. |
"F" stands for FIN, which is sent to close he connection. Which IP address is which in your log (ESP32/client)? |
ESP32 is the 192.168.1.1 - so it is sending a FIN (finish) I was considering contacting Adam Dunkels [email protected], who I think is the author of the TCPIP Library ? |
I don't think such heavy artillery is required :) WifiServer library is enabling SO_KEEPALIVE option on the client socket upon connection. Is the remote client responding to keepalive packets? If it isn't, that would explain why the connection is closed from the ESP side. |
I really cannot say, I have tested the following as Clients Linux/Ubuntu - telnet I cannot say how these apps are configured regarding KEEPALIVE, but these are very reliable client apps - I am pretty sure the problem does not lie with the client.
It is a very long time, somewhere between 30-60 minutes. I was initially thinking the issue was something like a wrap-around/overflow of the seq/ack numbers - but it does not appear to be the case E. |
It would be nice to know whether the time is the same (pointing at a deterministic issue which should be easy to debug) or different (pointing at something more obscure). As a pure experiment, you may try disabling keepalive option in WiFiServer: https://github.com/espressif/arduino-esp32/blob/master/libraries/WiFi/src/WiFiServer.cpp#L48. Based on the tests we are doing with the ESP-IDF I know that occasionally we have WiFi disconnection issues which cause dropped TCP connections as well. But these issues happen on the time scale of many hour to days, so what you are seeing is probably a different issue. |
I will give this a go, but I think this is unlikely to be the issue. The same application Product I have running as a server on RaspberryPI runs a WiFi server with KEEPALIVE, but I will come back with my findings |
So it seems removing the KEEPALIVE option made no difference
|
I have now written a socket client in TCL, and I will record the times for connection/disconnect from the esp32 and see if there is a pattern. |
Hi All Almost at exactly the same time as the ESP32 / Server sends the FIN, I see the following.
|
Maybe AP DHCP renew is causing it? Are you connected to the ESP32's AP? |
Yes the ESP32 is configured as SoftAP. |
You can try something like this:
|
BTW seems to be 120 minutes or 3 hours. Also visible in the log above |
yes this is what I am getting 120 - you mean 2 hours right ? |
I'm afraid not :( |
What you can do though is this:
|
You will have to forgive my ignorance, what are the API calls for obtaining Date/Time or a Clock Counter, I have not been able to find anything ? |
better use |
So using millis() connect / disconnect is as follows
I think I make this 46 minutes ?
|
Why are we getting AP_STADISCONNECTED though? This probably needs to be checked by sniffing WiFi packets, to see why the ESP32 makes the station disconnect. |
Is this question for me ?
I have no idea how to do that, but very happy to give it a go if you can provide some instructions. |
They are not explaining the problem but are giving a much better idea. The issue is not at all in LwIP or Client/Server implementation, but rather in the WiFi stack. @igrr should we forward this to IDF? |
We have a very similar test in IDF, which doesn't show such issue; I will try reproducing this, if this does happen in IDF then will move the issue there. Otherwise we will need to do some sniffing with Wireshark to get more info. |
Side note: in AP_STA mode, solid connection can never be guaranteed, for example if the ESP32 STA starts scanning for APs to connect to. It's likely not the issue here, but some users have bumped at the similar issue with the ESP8266. So I suggest not to treat WiFi connection as something which is guaranteed to be robust; implementing reconnection procedure is will be necessary for any real-world use case. (Not dismissing the issue; just pointing out that proper disconnect handling needs to be implemented even if this issue is fixed.) |
Please let me know how you get on. |
@igrr @me-no-dev E. |
@eroom1966 Arduino uses IDF, so it is the same thing. In fact, you can compile your arduino code with idf instead of the prebuilt libs that I include. See here: https://github.com/espressif/arduino-esp32#using-as-esp-idf-component |
Thanks for the pointer. The monitor reports
does this mean it is deep in the libraries - not sure what more I can try. |
Very sadly I cannot seem to get past this roadblock for TCP disconnects,
|
Hello, |
Hi All,
I am pretty new to coding under Arduino, I have another project which is coded up using Vanilla C, and am in the process of porting across to Arduino/esp32
My esp32 is setup as a SoftAP with a listening socket on port 2000.
I make a connection from an iPad and run a telnet connection to port 2000, streaming ascii data from the esp32.
I cannot figure it out, but after a while I get a disconnect from the client running on the iPad, and I cannot work out if there is a bug in my (very simple) App or the libraries.
Running this connection for about 30-60 minutes will show the disconnect, any pointers gratefully received.
I think my retry code may be redundant, it looks like the underlying library attempts to perform retries on incomplete write()
Code as Follows
Kind Regards
E
The text was updated successfully, but these errors were encountered: