Skip to content

WiFi Client Socket disconnecting #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eroom1966 opened this issue Apr 10, 2017 · 42 comments
Closed

WiFi Client Socket disconnecting #307

eroom1966 opened this issue Apr 10, 2017 · 42 comments

Comments

@eroom1966
Copy link

eroom1966 commented Apr 10, 2017

Hi All,

I am pretty new to coding under Arduino, I have another project which is coded up using Vanilla C, and am in the process of porting across to Arduino/esp32

My esp32 is setup as a SoftAP with a listening socket on port 2000.
I make a connection from an iPad and run a telnet connection to port 2000, streaming ascii data from the esp32.
I cannot figure it out, but after a while I get a disconnect from the client running on the iPad, and I cannot work out if there is a bug in my (very simple) App or the libraries.
Running this connection for about 30-60 minutes will show the disconnect, any pointers gratefully received.

I think my retry code may be redundant, it looks like the underlying library attempts to perform retries on incomplete write()

Code as Follows
Kind Regards
E

#include <WiFi.h>

#define SOCKET_TIMEOUT 10
#define TICK_TIMEOUT   100

const char *ssid = "my_ssid";
IPAddress apIP(192, 168, 1, 1);

WiFiServer myServer(2000);
WiFiClient myClient;
unsigned char clientConnected = 0;

void connectWifi() {
    WiFi.disconnect();
    WiFi.mode(WIFI_AP_STA);
    WiFi.softAPConfig(apIP, apIP, IPAddress(255, 255, 255, 0));   // subnet FF FF FF 00
    WiFi.softAP(ssid, NULL, 6, 0); //, 1); // no pass, ch=6, ssid=broadcast, max_conns=1
}

void setup() {
    connectWifi();
    myServer.begin();
    Serial.begin(115200);
}

unsigned int cnt=0;
void serverPoll() {
    //
    // Test for new connection
    //
    if (!myClient.connected()) {
        myClient = myServer.available();
        if (myClient.connected()) {
            Serial.write("Connect...");
            cnt=0;
            clientConnected = 1;
            myClient.setNoDelay(true);
            myClient.setTimeout(SOCKET_TIMEOUT);
        }
    }

    if (myClient.connected()) {
        while (myClient.available()) {  // get data from Client
            Serial.write(myClient.read());
        }
    
        // Write
        char buf[64];
        sprintf(buf, "Hello World %u\r\n", cnt++);
        int len = strlen(buf);

        int rc;
        int bufp = 0;
        {
            rc = myClient.write(buf+bufp, len-bufp);
            if (rc >= 0) {
                bufp += rc;
                if (len > bufp) {
                    sprintf(buf, "write() retry rc=%d bufp=%d len=%d\r\n", rc, bufp, len);
                    Serial.write(buf);
                }
            } else {
                sprintf(buf, "write() error returned %d\r\n", rc);
                Serial.write(buf);
            }
        } while((len > bufp) && (rc >= 0));

        if (len > bufp) {
            sprintf(buf, "myClient.write(\"Hello World %d\") returned %d\r\n", cnt, rc);
        }
        Serial.write(buf);

    } else {
        myClient.stop();
        if (clientConnected) {
            Serial.write("Disconnect...");
            clientConnected = 0;
        }
    }
}

void loop() {
    static unsigned int last = 0;
    static unsigned int delta = TICK_TIMEOUT;
    unsigned int now = millis();
    unsigned int diff = now - last;
    unsigned int tick = (diff >= delta);
    if (tick) {
        delta = TICK_TIMEOUT;
        last = now;
        serverPoll();
    }
}
@me-no-dev
Copy link
Member

I read the code and what I can see is that when a client connects, it get's into a loop that sends Hello World every 100ms. Is that what you are getting? Can you post some log?

@eroom1966
Copy link
Author

eroom1966 commented Apr 11, 2017

Hi me-no-dev
You are absolutely correct, in your evaluation, the log is pretty simple TBH, at the TCP client, it is simply a stream of

Hello World 1
Hello World 2
Hello World 3
Hello World 4
...

By the way I did not mention, the TCP Client app I am using on the iPad is 'Telnet Lite', which is a freebie to connect to TCP server sockets. I know this works reliably, I have the same connection tested running to an RPI running as a SoftAP, streaming the same data I am trying to supply from the esp32

after a long period of time the client reports that the server has disconnected.
In the server serial monitor I see the same output text

Hello World 1
Hello World 2
Hello World 3
Hello World 4
...

and finally

write() error returned 0

and no further output.

If I can add additional debug either into my code, or the libraries - that may help, but I am unclear how to do this.

For example, I have no way of determining the cause of the disconnect, under POSIX I would usually see something like a SIGPIPE, or that the socket write returned an errno for EAGAIN.
It seems that is abstracted away here, so I am not sure how to identify the underlying issue.
any assistance gladly received.
I did notice in the library code the use of log_e(), but I could not work out how to enable these messages, or where they get sent - is it the serial monitor, or somewhere else ?
(Update, I may have found this, is it Serial.setDebugOutput(true) ?)

I know that most applications I have seen, tend to use webserver type connections, which are short lived TCP connections, or UDP which are not connection based, whereas in this instance, I need a TCP connection which persists for a long period of time.

Kind Regards
E

(Update)
Further reading of the library code (although I may be misunderstanding)
in WiFiClient.cpp WiFiClient::write
I notice the return from send is only ever checked for an error code (<0) else it is OK
If the send() is implemented the same as Linux/Posix, then it either returns the number of characters sent, or an error code. The number of characters sent, can be less than the number of characters requested, in which case the pointer is usually moved on in the input buffer, and a smaller number of chars are requested to be sent on a retry.
Also I see that the errno is only checked for (errno !=EAGAIN), does it also need to be checked against EWOULDBLOCK, ie

if ((errno==EAGAIN) || (errno==EWOULDBLOCK)) {
    // Perform retry
} else {
    // Unhandled errno 
}

On the call to select(), I notice the tv.tv_usec is set to 1000000, which of course is 1 second, does the value of 1000000 possibly get masked in select() by any chance, as this should probably overflow into the tv.tv_sec field, to make the value {tv.tv_sec = 1; tv.tv_usec = 0}, and if I could enable the log_e() messages to the monitor, then it would be useful to add one for the select() error return for (select(...) < 0);

apologies for the long post, or if I am misunderstanding the send()/recv() semantics on esp32

@eroom1966
Copy link
Author

After adding some debug to WiFiClient.cpp, I am getting an errno of 104 from the call to send() in WiFiClient::write. This is the error code
#define ECONNRESET 104 /* Connection reset by peer */
so what is causing this ?

I am also monitoring WiFi events through the use fo the callback WiFi.onEvent(cbfunc), and I do not see any WiFi event associated with this callback

any help gratefully received

@me-no-dev
Copy link
Member

the error you get means that the other end sent RST packet
here is some info from google:

A RST packet is sent either in the middle of the 3-way handshake when the server rejects the connection or is unavailable OR in the middle of data transfer when either the server or client becomes unavailble or rejects further communication without the formal 4-way TCP connection termination process.

@me-no-dev
Copy link
Member

Log messages level is enabled through the board menu in Arduino IDE or through menuconfig in IDF

@me-no-dev
Copy link
Member

To see the current error:

#include <errno.h>

void printPosixError(){
  Serial.print("Current Socket Error Number: ");
  Serial.println(errno, DEC);
}

@me-no-dev
Copy link
Member

There is a reason why the code does not retry on EWOULDBLOCK, but I do not remember what it was now. There were some differences between this POSIX and what I had running in Linux

@eroom1966
Copy link
Author

Hi me-no-dev

Log messages level is enabled through the board menu in Arduino IDE or through menuconfig in IDF

I tried setting this to "Core Debug Level : Debug"
and at boot I get an error

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0x00
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0008,len:8
load:0x3fff0010,len:2016
load:0x40078000,len:7780
ho 0 tail 12 room 4
load:0x40080000,len:252
entry 0x40080034
E (154) wifi: esp_wifi_set_config 969 wifi is not init
E (154) wifi: esp_wifi_set_config 969 wifi is not init
Guru Meditation Error of type LoadProhibited occurred on core  1. Exception was unhandled.
Register dump:
PC      : 0x400014dc  PS      : 0x00060930  A0      : 0x800d2300  A1      : 0x3ffccbd0  
A2      : 0xffffffff  A3      : 0xfffffffb  A4      : 0x000000ff  A5      : 0x0000ff00  
A6      : 0x00ff0000  A7      : 0xff000000  A8      : 0x3ffc9e00  A9      : 0x00000000  
A10     : 0x00000003  A11     : 0x3ffccb6c  A12     : 0x0000e9fc  A13     : 0x00000064  
A14     : 0x0000000d  A15     : 0x00000001  SAR     : 0x00000010  EXCCAUSE: 0x0000001c  
EXCVADDR: 0xffffffff  LBEG    : 0x400012e5  LEND    : 0x40001309  LCOUNT  : 0x800d10c5  

Backtrace: 0x400014dc:0x3ffccbd0 0x400d2300:0x3ffccbe0 0x400d23c0:0x3ffccc00 0x400d0dd0:0x3ffccc20 0x4010a9fe:0x3ffccc40

CPU halted.

If I reset back to "None" I do not get this issue.

E.

@me-no-dev
Copy link
Member

You should have Serial.begin as first thing in the setup, else locks will not be initialized and bad things might happen. Also those backtraces can be decoded using https://github.com/me-no-dev/EspExceptionDecoder

@eroom1966
Copy link
Author

eroom1966 commented Apr 12, 2017

Thanks for the pointer, so I get the following

E (157) wifi: esp_wifi_set_config 969 wifi is not init
E (157) wifi: esp_wifi_set_config 969 wifi is not init
[D][WiFiGeneric.cpp:174] _eventCallback(): Event: 2 - STA_START
[D][WiFiGeneric.cpp:174] _eventCallback(): Event: 12 - AP_START
[D][WiFiGeneric.cpp:174] _eventCallback(): Event: 14 - AP_STACONNECTED
[E][WiFiClient.cpp:218] write(): send() errno=104

I tried previously with a different client running on Linux/Ubuntu, and pretty sure I got the same result, I will confrim this is the case and post later
E.

@eroom1966
Copy link
Author

eroom1966 commented Apr 13, 2017

Having now connected from a x86 Linux/Ubuntu telnet client I see exactly the same behavior.
I have no idea how to progress or diagnose this bug.

At the moment I am running an experiment with wireshark / tcpdump (on the Linux host) connected to the wireless interface, monitoring the conection to the esp32, in the hope this will provide some pointers regarding where the problem exists.

I have looked in detail at the esp32 WiFiClient and WiFiServer library code, these are thin layers on top of the underlying socket library, so it is unlikely that the problem exists here.
I am much more concerned that there is an underlying issue in either the socket library or TCP/IP stack.

This problem is very simple for anybody to reproduce, so I am looking for advice here - how and who could progress this ? debugging TCP/IP stacks is very much outside of my comfort zone.

Maybe I am asking too much of this device/library, it does seem that in most applications I have seen for the esp32, the socket connection times are short bursts of information such as the HTTP request/response model, or examples which use UDP, which is inherently lossy

Are there any appliations for this device which use long established persistent sockets ?

E.
(desperately in need of assistance!)

@eroom1966
Copy link
Author

eroom1966 commented Apr 15, 2017

After staring at lots of TCPDUMP outputs, I have noticed when the transactions seem to go wrong, normal sequences look like this

15:38:05.908812 IP (tos 0x0, ttl 255, id 23475, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.43636: Flags [P.], cksum 0x4c8d (correct), seq 12106111:12106131, ack 1, win 5840, length 20
15:38:05.908924 IP (tos 0x10, ttl 64, id 61260, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.43636 > 192.168.1.1.cisco-sccp: Flags [.], cksum 0xece7 (correct), seq 1, ack 12106131, win 29200, length 0

and I see hundreds of thousands of these - all successful.
Whenever I see the disconnect it is always preceeded by the ESP32 sending Flags[FP] rather than simply Flags[P.] - I have no idea what the difference is - but this is the signature to the failure

15:38:08.893093 IP (tos 0x0, ttl 255, id 23535, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.43636: Flags [FP.], cksum 0x4d8e (correct), seq 12107131:12107151, ack 1, win 5840, length 20
15:38:08.893227 IP (tos 0x10, ttl 64, id 61311, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.43636 > 192.168.1.1.cisco-sccp: Flags [F.], cksum 0xe8e9 (correct), seq 1, ack 12107152, win 29200, length 0
15:38:08.894829 IP (tos 0x0, ttl 255, id 23536, offset 0, flags [none], proto TCP (6), length 40)
    192.168.1.1.cisco-sccp > 192.168.1.2.43636: Flags [.], cksum 0x442b (correct), seq 12107152, ack 2, win 5839, length 0

Now the TCP connection is broken.
This is totally reproducible, every time it fails - this is what has happened.

I am no expert, so I would say either the ESP32/TCP-IP-Library is straying from the Spec, or the client telnet program is incorrect.
I have tried a telnet client under Linux and under iOS (iPad) - so I am thinking the issue is the ESP32

does anybody know what this transaction means ?

I am unable to rectify this issue myself, and it is blocking me from completing my design, if I cannot get a solution I think my only alternative is to switch to the RTL8710 in the hope that this does not suffer the same problem in the TCP-IP library - I really do not want to start coding from scratch again :-(

E.

@igrr
Copy link
Member

igrr commented Apr 15, 2017

"F" stands for FIN, which is sent to close he connection. Which IP address is which in your log (ESP32/client)?

@eroom1966
Copy link
Author

eroom1966 commented Apr 15, 2017

ESP32 is the 192.168.1.1 - so it is sending a FIN (finish)
Client is 192.168.1.2
hmm thats bad, I really do not want that, but clearly explains the issue - which is half the battle

I was considering contacting Adam Dunkels [email protected], who I think is the author of the TCPIP Library ?

@igrr
Copy link
Member

igrr commented Apr 15, 2017

I don't think such heavy artillery is required :)

WifiServer library is enabling SO_KEEPALIVE option on the client socket upon connection. Is the remote client responding to keepalive packets? If it isn't, that would explain why the connection is closed from the ESP side.
Another thing, is the time before the connection is closed always the same? How long does it usually take?

@eroom1966
Copy link
Author

eroom1966 commented Apr 15, 2017

Is the remote client sending keepalive packets

I really cannot say, I have tested the following as Clients

Linux/Ubuntu - telnet
Linux/Ubuntu - nc (netcat)
iOS/iPad - Telnet Lite

I cannot say how these apps are configured regarding KEEPALIVE, but these are very reliable client apps - I am pretty sure the problem does not lie with the client.

Another thing, is the time before the connection is closed always the same? How long does it usually take?

It is a very long time, somewhere between 30-60 minutes. I was initially thinking the issue was something like a wrap-around/overflow of the seq/ack numbers - but it does not appear to be the case

E.

@igrr
Copy link
Member

igrr commented Apr 15, 2017

It would be nice to know whether the time is the same (pointing at a deterministic issue which should be easy to debug) or different (pointing at something more obscure).

As a pure experiment, you may try disabling keepalive option in WiFiServer: https://github.com/espressif/arduino-esp32/blob/master/libraries/WiFi/src/WiFiServer.cpp#L48.

Based on the tests we are doing with the ESP-IDF I know that occasionally we have WiFi disconnection issues which cause dropped TCP connections as well. But these issues happen on the time scale of many hour to days, so what you are seeing is probably a different issue.

@eroom1966
Copy link
Author

As a pure experiment, you may try disabling keepalive option in WiFiServer: https://github.com/espressif/arduino-esp32/blob/master/libraries/WiFi/src/WiFiServer.cpp#L48.

I will give this a go, but I think this is unlikely to be the issue. The same application Product I have running as a server on RaspberryPI runs a WiFi server with KEEPALIVE, but I will come back with my findings
E.

@eroom1966
Copy link
Author

So it seems removing the KEEPALIVE option made no difference

17:25:28.628755 IP (tos 0x0, ttl 255, id 1816, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.44286: Flags [FP.], cksum 0x84d0 (correct), seq 2505324:2505344, ack 1, win 5840, length 20
17:25:28.628927 IP (tos 0x0, ttl 64, id 8333, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.44286 > 192.168.1.1.cisco-sccp: Flags [F.], cksum 0x2128 (correct), seq 1, ack 2505345, win 29200, length 0
17:25:28.630032 IP (tos 0x0, ttl 255, id 1817, offset 0, flags [none], proto TCP (6), length 40)
    192.168.1.1.cisco-sccp > 192.168.1.2.44286: Flags [.], cksum 0x7c69 (correct), seq 2505345, ack 2, win 5839, length 0

@eroom1966
Copy link
Author

I have now written a socket client in TCL, and I will record the times for connection/disconnect from the esp32 and see if there is a pattern.
E.

@eroom1966
Copy link
Author

eroom1966 commented Apr 16, 2017

Hi All

Almost at exactly the same time as the ESP32 / Server sends the FIN, I see the following.
Is this an issue with the lease time ?

17:14:07.637256 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 00:19:86:81:1e:39 (oui Unknown), length 300, xid 0xf7c8d90d, Flags [none] (0x0000)
          Client-Ethernet-Address 00:19:86:81:1e:39 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Request
            Requested-IP Option 50, length 4: 192.168.1.2
            Hostname Option 12, length 13: "PareLnx"
            Parameter-Request Option 55, length 18:
              Subnet-Mask, BR, Time-Zone, Default-Gateway
              Domain-Name, Domain-Name-Server, Option 119, Hostname
              Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
              NTP, Classless-Static-Route, Classless-Static-Route-Microsoft, Static-Route
              Option 252, NTP
            END Option 255, length 0
            PAD Option 0, length 0, occurs 15
17:14:07.649239 IP (tos 0x0, ttl 255, id 63619, offset 0, flags [none], proto UDP (17), length 576)
    192.168.1.1.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 548, xid 0xf7c8d90d, Flags [Broadcast] (0x8000)
          Your-IP 192.168.1.2
          Client-Ethernet-Address 00:19:86:81:1e:39 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: ACK
            Subnet-Mask Option 1, length 4: 255.255.255.0
            Lease-Time Option 51, length 4: 7200
            Server-ID Option 54, length 4: 192.168.1.1
            Default-Gateway Option 3, length 4: 192.168.1.1
            Domain-Name-Server Option 6, length 4: 192.168.1.1
            BR Option 28, length 4: 192.168.1.255
            MTU Option 26, length 2: 576
            Router-Discovery Option 31, length 1: N
            Vendor-Option Option 43, length 6: 1.4.0.0.0.2
            END Option 255, length 0
            PAD Option 0, length 0, occurs 253

@me-no-dev
Copy link
Member

Maybe AP DHCP renew is causing it? Are you connected to the ESP32's AP?

@eroom1966
Copy link
Author

Yes the ESP32 is configured as SoftAP.
I looked at the API calls to see if there was a method to override the default Lease time, but if there is, I cannot find it. I am sure this can be set out in the lower level DHCP libraries, but this is maybe not exposed through the Arduino API's ?
If you have a suggestion to increase this time, I could try this as an experiment
E.

@me-no-dev
Copy link
Member

You can try something like this:

#include "tcpip_adapter.h"
void printLeaseTime(){
  uint32_t leaseTime = 0;
  if(!tcpip_adapter_dhcps_option(TCPIP_ADAPTER_OP_GET,TCPIP_ADAPTER_IP_ADDRESS_LEASE_TIME,(void*)&leaseTime, 4)){
    Serial.printf("DHCPS Lease Time: u\n", leaseTime);
  }
}

@me-no-dev
Copy link
Member

BTW seems to be 120 minutes or 3 hours. Also visible in the log above
Lease-Time Option 51, length 4: 7200

@eroom1966
Copy link
Author

eroom1966 commented Apr 16, 2017

yes this is what I am getting 120 - you mean 2 hours right ?
Is there a way to print the remaining time ?
there seems to be a 'lease_timer' member of 'dhcps_pool', I am trying to see if there is an API call to get at this field - any ideas ?

@me-no-dev
Copy link
Member

I'm afraid not :(

@me-no-dev
Copy link
Member

What you can do though is this:

void startLease(system_event_id_t event){
  //Client connected to AP. Save this time and compare against it :)
}
//...
WiFi.onEvent(startLease, SYSTEM_EVENT_AP_STACONNECTED);

@eroom1966
Copy link
Author

You will have to forgive my ignorance, what are the API calls for obtaining Date/Time or a Clock Counter, I have not been able to find anything ?
I presume a CPU Counter will be adequate given the clock rate
E.

@me-no-dev
Copy link
Member

better use millis()Cpu cycles will overflow very quickly

@eroom1966
Copy link
Author

eroom1966 commented Apr 16, 2017

So using millis() connect / disconnect is as follows

startLease at 4942
Connect...[D][WiFiGeneric.cpp:174] _eventCallback(): Event: 15 - AP_STADISCONNECTED
[E][WiFiClient.cpp:218] write(): send() errno=-1
write() retry rc=0 bufp=0 len=20
stopLease at 2800801
[D][WiFiGeneric.cpp:174] _eventCallback(): Event: 14 - AP_STACONNECTED
startLease at 2802119
[D][WiFiGeneric.cpp:174] _eventCallback(): Event: 15 - AP_STADISCONNECTED

I think I make this 46 minutes ?
Which seems unrelated to 7200 (2 hours)
and the last transactions caught by tcpdump before the [FP] was as follows
I have not idea what is happening, but it is absolutely repeatable.

13:19:10.814447 IP (tos 0x0, ttl 64, id 18058, offset 0, flags [DF], proto UDP (17), length 328)
    192.168.1.2.bootpc > 192.168.1.1.bootps: [udp sum ok] BOOTP/DHCP, Request from 00:19:86:81:1e:39 (oui Unknown), length 300, xid 0x60550666, Flags [none] (0x0000)
          Client-IP 192.168.1.2
          Client-Ethernet-Address 00:19:86:81:1e:39 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Request
            Hostname Option 12, length 13: "PareLnx"
            Parameter-Request Option 55, length 18:
              Subnet-Mask, BR, Time-Zone, Default-Gateway
              Domain-Name, Domain-Name-Server, Option 119, Hostname
              Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
              NTP, Classless-Static-Route, Classless-Static-Route-Microsoft, Static-Route
              Option 252, NTP
            END Option 255, length 0
            PAD Option 0, length 0, occurs 21
13:19:10.820371 IP (tos 0x0, ttl 255, id 42483, offset 0, flags [none], proto UDP (17), length 576)
    192.168.1.1.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 548, xid 0x60550666, Flags [Broadcast] (0x8000)
          Your-IP 192.168.1.2
          Client-Ethernet-Address 00:19:86:81:1e:39 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: NACK
            END Option 255, length 0
            PAD Option 0, length 0, occurs 304
13:19:10.820779 IP (tos 0x0, ttl 255, id 42484, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [P.], cksum 0x01c1 (correct), seq 2031171:2031191, ack 1, win 5840, length 20
13:19:10.821562 IP (tos 0x0, ttl 64, id 8110, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.39182 > 192.168.1.1.cisco-sccp: Flags [.], cksum 0x9f16 (correct), seq 1, ack 2031191, win 29200, length 0
13:19:10.847366 IP (tos 0x0, ttl 255, id 42485, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [P.], cksum 0x01ac (correct), seq 2031191:2031211, ack 1, win 5840, length 20
13:19:10.847427 IP (tos 0x0, ttl 64, id 8111, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.39182 > 192.168.1.1.cisco-sccp: Flags [.], cksum 0x9f02 (correct), seq 1, ack 2031211, win 29200, length 0
13:19:10.870146 IP (tos 0x0, ttl 255, id 42486, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [P.], cksum 0x0197 (correct), seq 2031211:2031231, ack 1, win 5840, length 20
13:19:10.870211 IP (tos 0x0, ttl 64, id 8112, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.39182 > 192.168.1.1.cisco-sccp: Flags [.], cksum 0x9eee (correct), seq 1, ack 2031231, win 29200, length 0
13:19:10.894870 IP (tos 0x0, ttl 255, id 42487, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [P.], cksum 0x0182 (correct), seq 2031231:2031251, ack 1, win 5840, length 20
13:19:10.894919 IP (tos 0x0, ttl 64, id 8113, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.39182 > 192.168.1.1.cisco-sccp: Flags [.], cksum 0x9eda (correct), seq 1, ack 2031251, win 29200, length 0
13:19:10.920251 IP (tos 0x0, ttl 255, id 42488, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [P.], cksum 0x016d (correct), seq 2031251:2031271, ack 1, win 5840, length 20
13:19:10.920305 IP (tos 0x0, ttl 64, id 8114, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.39182 > 192.168.1.1.cisco-sccp: Flags [.], cksum 0x9ec6 (correct), seq 1, ack 2031271, win 29200, length 0
13:19:10.929005 IP (tos 0x0, ttl 255, id 14747, offset 0, flags [DF], proto UDP (17), length 173)
    192.168.1.2.mdns > 224.0.0.251.mdns: [udp sum ok] 0*- [0q] 2/0/0 0.4.4.b.f.c.a.0.a.9.b.9.7.f.0.e.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa. (Cache flush) [0s] PTR PareLnx.local., PareLnx.local. (Cache flush) [0s] AAAA fe80::e0f7:9b9a:acf:b440 (145)
13:19:10.939230 IP6 (hlim 1, next-header Options (0) payload length: 36) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 1 group record(s) [gaddr ff02::fb to_ex { }]
13:19:12.303208 IP6 (hlim 1, next-header Options (0) payload length: 56) :: > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 2 group record(s) [gaddr ff02::1:ffcf:b440 to_ex { }] [gaddr ff02::fb to_ex { }]
13:19:12.320127 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 00:19:86:81:1e:39 (oui Unknown), length 300, xid 0x643cb87b, Flags [none] (0x0000)
          Client-Ethernet-Address 00:19:86:81:1e:39 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Request
            Requested-IP Option 50, length 4: 192.168.1.2
            Hostname Option 12, length 13: "PareLnx"
            Parameter-Request Option 55, length 18:
              Subnet-Mask, BR, Time-Zone, Default-Gateway
              Domain-Name, Domain-Name-Server, Option 119, Hostname
              Netbios-Name-Server, Netbios-Scope, MTU, Classless-Static-Route
              NTP, Classless-Static-Route, Classless-Static-Route-Microsoft, Static-Route
              Option 252, NTP
            END Option 255, length 0
            PAD Option 0, length 0, occurs 15
13:19:12.332087 IP (tos 0x0, ttl 255, id 42493, offset 0, flags [none], proto UDP (17), length 576)
    192.168.1.1.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 548, xid 0x643cb87b, Flags [Broadcast] (0x8000)
          Your-IP 192.168.1.2
          Client-Ethernet-Address 00:19:86:81:1e:39 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: ACK
            Subnet-Mask Option 1, length 4: 255.255.255.0
            Lease-Time Option 51, length 4: 7200
            Server-ID Option 54, length 4: 192.168.1.1
            Default-Gateway Option 3, length 4: 192.168.1.1
            Domain-Name-Server Option 6, length 4: 192.168.1.1
            BR Option 28, length 4: 192.168.1.255
            MTU Option 26, length 2: 576
            Router-Discovery Option 31, length 1: N
            Vendor-Option Option 43, length 6: 1.4.0.0.0.2
            END Option 255, length 0
            PAD Option 0, length 0, occurs 253
13:19:12.383366 IP (tos 0x0, ttl 255, id 42494, offset 0, flags [none], proto TCP (6), length 60)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [FP.], cksum 0x0157 (correct), seq 2031271:2031291, ack 1, win 5840, length 20
13:19:12.383757 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.1 tell 192.168.1.2, length 28
13:19:12.386479 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.1 is-at 24:0a:c4:02:0b:59 (oui Unknown), length 28
13:19:12.386499 IP (tos 0x0, ttl 64, id 8115, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.2.39182 > 192.168.1.1.cisco-sccp: Flags [F.], cksum 0x9eb0 (correct), seq 1, ack 2031292, win 29200, length 0
13:19:12.388128 IP (tos 0x0, ttl 255, id 42495, offset 0, flags [none], proto TCP (6), length 40)
    192.168.1.1.cisco-sccp > 192.168.1.2.39182: Flags [.], cksum 0xf9f1 (correct), seq 2031292, ack 2, win 5839, length 0

@igrr
Copy link
Member

igrr commented Apr 16, 2017

Why are we getting AP_STADISCONNECTED though? This probably needs to be checked by sniffing WiFi packets, to see why the ESP32 makes the station disconnect.

@eroom1966
Copy link
Author

eroom1966 commented Apr 16, 2017

Is this question for me ?

sniffing WiFi packets

I have no idea how to do that, but very happy to give it a go if you can provide some instructions.
I should say I have the "Core Debug" set to VERBOSE in order to get these messages on my serial console.
If you want me to post/upload a larger output from tcpdump, I am happy to do that.
These snippets may not be explaining the full picture

@me-no-dev
Copy link
Member

They are not explaining the problem but are giving a much better idea. The issue is not at all in LwIP or Client/Server implementation, but rather in the WiFi stack.

@igrr should we forward this to IDF?

@igrr
Copy link
Member

igrr commented Apr 16, 2017

We have a very similar test in IDF, which doesn't show such issue; I will try reproducing this, if this does happen in IDF then will move the issue there. Otherwise we will need to do some sniffing with Wireshark to get more info.

@igrr
Copy link
Member

igrr commented Apr 16, 2017

Side note: in AP_STA mode, solid connection can never be guaranteed, for example if the ESP32 STA starts scanning for APs to connect to. It's likely not the issue here, but some users have bumped at the similar issue with the ESP8266. So I suggest not to treat WiFi connection as something which is guaranteed to be robust; implementing reconnection procedure is will be necessary for any real-world use case.

(Not dismissing the issue; just pointing out that proper disconnect handling needs to be implemented even if this issue is fixed.)

@eroom1966
Copy link
Author

I will try reproducing this, if this does happen in IDF then will move the issue there

Please let me know how you get on.
E.

@eroom1966
Copy link
Author

@igrr @me-no-dev
Hi
did you have any success reproducing this issue in esp-idf ?
Yesterday I started to look at using esp-idf as the framework for coding to see if this would solve my issue reported in this thread.
I tried to run the example /examples/performance/tcp_perf (configured as server) as a way to reproduce what I have seen here in this thread using Arduino.
I thought I could use my existing TCP client to connect to this server, but was unsuccessful, the server reported that it could not send any data - I will try to investigate further

E.

@me-no-dev
Copy link
Member

@eroom1966 Arduino uses IDF, so it is the same thing. In fact, you can compile your arduino code with idf instead of the prebuilt libs that I include. See here: https://github.com/espressif/arduino-esp32#using-as-esp-idf-component

@eroom1966
Copy link
Author

@me-no-dev
@igrr

Arduino uses IDF, so it is the same thing. In fact, you can compile your arduino code with idf instead of the prebuilt libs that I include. See here: https://github.com/espressif/arduino-esp32#using-as-esp-idf-component

Thanks for the pointer.
So I successfully used this approach to compile the application as a 'component'
I built using
$ make app app-flash monitor
The result is the same, I see a brocken socket connection after somewhere between 30-60 mins

The monitor reports

[E][WiFiClient.cpp:211] write(): 104

does this mean it is deep in the libraries - not sure what more I can try.

@eroom1966
Copy link
Author

Very sadly I cannot seem to get past this roadblock for TCP disconnects,
looks like I will need to evaluate the Realtek8710

  • back to square 1
    :-(

@mpm756769
Copy link

Hello,
I have implemented Wi-Fi code on ESP32 and i am trying to send ESP32 data on serial terminal where i am facing one problem. if i send data in while(client.connected()) loop then serial terminal getting data bombarding so i put my data in if(client.connected()) condition to prevent data bombarding but here i am getting data only once on serial terminal then it is getting disconnected from my ESP32 module. so what should i do to send data continuously on serial terminal without getting disconnected.
Please, give me some suggestions. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants