Skip to content
This repository was archived by the owner on Jan 20, 2025. It is now read-only.

Bug: Using server.serveStatic causes WDT crashes almost every 2nd time for files >150K #984

Closed
ullix opened this issue May 8, 2021 · 179 comments
Labels

Comments

@ullix
Copy link

ullix commented May 8, 2021

I have static, binary log files which I download via WiFi. Works well most of the time, but then there are perpetual crashes. It seems the success depends on the size of the files. Up to 64k mostly ok, 64k ... 128k ok for half the trials, beyond that size almost never ok.

I am using the LittleFS file system. The files are build up of 32 bytes binary records, and may eventually grow >2MB. I use this command for downloading as static file:

// file system definition:   fs::LITTLEFSFS  * myFS       = &LITTLEFS;
server.serveStatic("/log.cam",      *myFS, "/log.cam");

I also tried this. Works just as well, but no improvement:

server.on("/log.cam", HTTP_GET, [] (AsyncWebServerRequest *request) {request->send(*myFS, "/log.cam", application/octet-stream");});

The download is triggered from a website, either as a straight link to download the binary file, or as Javascript code inside a function:

Link:   "<a href='/log.cam' >CAM</a>"
...
JS:   const fetchdata     = await fetch('/log.cam', {cache: "no-store"});

Your ReadMe seems to suggest that chunked responses are NOT needed for static files? So, I did not use that here. Am I forgetting any settings, or do I also need to use chunked response when the static file is beyond a certain size? Which size?

@ullix
Copy link
Author

ullix commented May 9, 2021

Upon further investigations I conclude that this is a bug. For me a very serious one as it drastically reduces download speed by a factor of more than 10!

As first test I also coded a chunked response, but as expected, it showed the same failures as reported above. However, this chunk code now allowed to define the chunk size, and this made a huge difference!

I created a 300k (307200 B) file and downloaded that in various configurations. In all tests it always crashed with a wdt message when the download touched 100000; I observed only the two values of 100030 and 101459. Message was:

E (144357) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (144357) task_wdt:  - async_tcp (CPU 1)
E (144357) task_wdt: Tasks currently running:
E (144357) task_wdt: CPU 0: IDLE0
E (144357) task_wdt: CPU 1: loopTask
E (144357) task_wdt: Aborting.
abort() was called at PC 0x401058a4 on core 0

I used various ESP32-WROOM-32E (all latest revision 3) with Flash of 4MB, 8MB, and 16MB. All gave the same result.

Using the code included below I tested for the impact of chunk size, and got a big surprise:

  • 1028 Bytes chunk size : all downloads worked. Even a file of 1.2MB could be downloaded correctly!

  • 1029 Bytes chunk size : only 1 Byte more and all downloads failed when the filesize was 100k or greater! Files of size 90k so far could be downloaded without problems.

I tested the file systems FFat and LittleFS. Outcome was identical.

I noticed that the download was very slow, slower by about an order of magnitude!

So I used a file of 90k, which could be downloaded under all conditions, and measured download times:

  • 1028 Bytes chunk size : 3.1 sec
  • 1029 Bytes chunk size : 0.4 sec
  • (set by async server) : 0.2 sec (varied between 5600 and 1400 Bytes during a download)

Using 1028 Bytes makes it 15 times slower, but is the only option for big files, when you need speed most!

And one last surprise: removing the print options in chunk-code made it even slower, not faster!

Here is the relevant code:

// File system
// either:
    fs::F_Fat       * myFS       = &FFat;
// or:
    fs::LITTLEFSFS  * myFS       = &LITTLEFS;



// ESPAsyncWebServer setup
    server.on("/log.cam",       HTTP_GET, [] (AsyncWebServerRequest *request) { staticDataType = CAM;
                                                                                request->sendChunked(text_plain, onStaticDownLoad);});

// chunk code
size_t onStaticDownLoad(uint8_t *buffer, size_t maxLen, size_t index){

    static  char *   filename;                 // file to be sent as chunks
    static  File     chunkfile;                // file handle of filename
    static  size_t   dnlLen     = 1028;        // limit to buffer size
            uint32_t countbytes = 0;           // bytes to be returned

    if (index == 0)  {
        if (staticDataType == CAM)  filename    = LOGCAM;
        else                        filename    = LOGCPS;
        samm->printf("onStaticDownLoad: START filename: %s index: %6i maxLen: %5i\n", filename, index, maxLen);
        chunkfile       = myFS->open(filename);
    }

    chunkfile.seek(index);
    countbytes = chunkfile.read(buffer, min(dnlLen, maxLen));

    if (countbytes == 0) {
        chunkfile.close();
    }
    samm->printf("onStaticDownLoad: f: %s index: %6u maxLen: %5u countb: %5u \n"
                    , filename
                    , index
                    , maxLen
                    , countbytes
                );

    return countbytes;
}

@ullix ullix changed the title Crashing on downloading static files seems size dependent Using server.serveStatic causes WDT crashes almost every 2nd time for files >150K May 20, 2021
@ullix ullix changed the title Using server.serveStatic causes WDT crashes almost every 2nd time for files >150K Bug: Using server.serveStatic causes WDT crashes almost every 2nd time for files >150K May 20, 2021
@ullix
Copy link
Author

ullix commented May 20, 2021

This is still a major problem for me, so I investigated further. To simplify I only used the webserver with command

server.serveStatic("/bindataST",      *myFS, "/data.cam");

and only on ESP32-WROOM-32E (revision3, the latest models) with 4MB Flash with the LittleFS file system. I created binary data files of various sizes. (While saved in binary mode, all bytes were printable ASCII, so the file content was legible to humans).

The data shown below were compiled with Arduino 2.0 beta 7, and ESP core 1.0.6, but I repeated some tests with Arduino 1.8.13 and ESP core 1.0.4 and got same results.

I automated this by creating an auto-refreshing website containg JavaScript, by which a serveStatic function of the webserver was called, and logged to the Serial Terminal.

Up to a file size of 150 Kibibytes all downloads were always successful. In an overnight run with a 90KB file all 15833 downloads were successful, not a single failure.

However, with only a few bytes more, 160 Kibibytes, the number of Watch-Dog-Timer induced crashes increased progressively. All were of the same type:

E (69589) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (69589) task_wdt:  - async_tcp (CPU 1)
E (69589) task_wdt: Tasks currently running:
E (69589) task_wdt: CPU 0: IDLE0
E (69589) task_wdt: CPU 1: loopTask
E (69589) task_wdt: Aborting.
abort() was called at PC 0x4014246c on core 0

The successful downloads decreased to as low as 58%, i.e. almost every 2nd download attempt resulted in a crash and reboot! Somewhat strangely, beyond a file size of ~300K the success rate increased a little bit, and was 70% at 1.4MB, the biggest file possible in this setting.

Auswahl_015

image

As I had mentioned on previous posts, one can mitigate this WDT problem by using chunked downloads. However, this slows down the transfer drastically by 40...50 times! Here are two examples done subsequently:

This shows the web page result for serveStatic (picking one when it worked!):

Binary Data in Human-Readable Format
#Total: 1433600 bytes, 1400 kB, 44800 records, dur: 2834 ms, speed: 506 kB/s
1 0123456789ABCDEFGHIJKLMNOPQRSTUV
2 0123456789ABCDEFGHIJKLMNOPQRSTUV

A speed of 506 kB/s is very nice; top-speed observed was even a fantastic ~800kB/s!

With a chunked download you need to limit the maxLen of the chunk to 1028 bytes, or you face again the WDT crash issue. With that limit this is what you get:

Binary Data in Human-Readable Format
#Total: 1433600 bytes, 1400 kB, 44800 records, dur: 124706 ms, speed: 11 kB/s
1 0123456789ABCDEFGHIJKLMNOPQRSTUV
2 0123456789ABCDEFGHIJKLMNOPQRSTUV

Instead of 3 sec it now takes over 2 min, a speed of only 11 kB/s!

I am building a device which logs data, and I will use even the bigger 8MB and 16MB ESP variants and files filling the Flash. This webserver cannot handle this!

@BlueAndi
Copy link

You download the files in the context of the web server (Async TCP task). Did you try to move it to the loop context?

@ullix
Copy link
Author

ullix commented May 23, 2021

@BlueAndi How can I do this? I think it is not possible, but please, correct me.

My webserver initialization gets this code:

AsyncWebServer      server(80);
void initWebServer() {
    ...
    server.serveStatic("/bindataST",      *myFS, "/data.cam");
    server.begin(); 
}

Then from a website I call <ESP-IP>/bindataST to download file "data.cam" from my file system LittleFs. That's it. Which part can I relocate?

@BlueAndi
Copy link

BlueAndi commented May 23, 2021

You can't do this with serveStatic(), but you can handle the request by your own.
Please note, below is just a rough principle I could imagine, which would be worthful to check. To be task safe, you can later use a freeRTOS queue, instead of a deferredRequest variable.


static AsyncWebServerRequest* deferredRequest = nullptr;

static void downloadPage(AsyncWebServerRequest* request)
{
    deferredRequest = request;
}

...
void setup()
{
    ...
    server.on("/bindataST/data.cam", HTTP_GET, downloadPage);
    server.begin();
    ...
}

...
void loop()
{
    ....
    if (nullptr != deferredRequest)
    {
        deferredRequest->send(LITTLEFS, "/bindataST/data.cam", String(), true);
        deferredRequest = nullptr;
    }
    ...
}

@zekageri
Copy link

Wow. This is gold

@ullix
Copy link
Author

ullix commented May 25, 2021

That sure is an idea! I hadn't expected it is possible. But it works.

However, only to a degree.

I put this in my code and tested file sizes up to 1400k on my ESP32-WROOM-E with 4MB Flash. The most sensitive file size range as shown in my graph above #984 (comment) seemed to be around 300k. And again, with 200 ... 300 downloads on each file size, I saw WDT crashes only at 300k and 400k, but it did crash.

Then I pulled out my ESP32-WROOM-E with 16MB Flash and created a 12MB(!) file, and repeatedly downloaded over night. The downloads were a very decent 650kB/s, but that still takes near 20sec. But this time almost 20% of all download attempts crashed with a triggered Watch-Dog-Timer following the command
deferredRequest->send(*myFS, "/data.cam", String(), true);:

Loop: deferredRequest handling
E (30449) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (30449) task_wdt:  - async_tcp (CPU 0/1)
E (30449) task_wdt: Tasks currently running:
E (30449) task_wdt: CPU 0: IDLE0
E (30449) task_wdt: CPU 1: loopTask
E (30449) task_wdt: Aborting.
abort() was called at PC 0x40140ae4 on core 0

What I find strange is the async_tcp (CPU 0/1) message. What does it mean? Is the WDT triggered from both CPU cores at exactly the same time? Out of a total of 293 crashes, 292 were of this (CPU 0/1) type, and only 1 of (CPU 1) type, none of (CPU 0) type.

So, while greatly improved, this is not the final solution. What else can I do?

@zekageri
Copy link

zekageri commented May 25, 2021

This seems the common async server crash problem to me.

Can you replace the ESPAsyncWebServer lib with this forked version by Depau and try again?

I'm currently testing this fork and it seems very stable

@ullix
Copy link
Author

ullix commented May 25, 2021

I'll try. Looking at both ReadMEs, the Depau version seems to be taken verbatim from me-no-dev. Where is the difference in the code? Which part is changed?

One other thing bothers me: Both parties offer their code is explicitly for ESPs. I learned that neither PROGMEM nor the F() macro make any sense on these CPUs (https://www.esp32.com/viewtopic.php?f=19&t=20595 ). Yet, both offerings expressly advertise for PROGMEM and for F(). Are they not aware of this? I don't believe that. So, why this nonsense?

@zekageri
Copy link

zekageri commented May 25, 2021

Here is the PR by Depau He fixed some linked list problems and things like that.

About the PROGMEM and F() things i don't know. Probably compatibility solutions between esp8266 and esp32 and things like that. There are so many esps around, you never know what was Me_no_dev's intention with these.

@zekageri
Copy link

I'll try. Looking at both ReadMEs, the Depau version seems to be taken verbatim from me-no-dev.

Yes. It is a fork from me-no-dev's lib. The readme and almost everything is the same except the linked list and some other fixes.

@Pablo2048
Copy link

One other thing bothers me: Both parties offer their code is explicitly for ESPs. I learned that neither PROGMEM nor the F() macro make any sense on these CPUs (https://www.esp32.com/viewtopic.php?f=19&t=20595 ). Yet, both offerings expressly advertise for PROGMEM and for F(). Are they not aware of this? I don't believe that. So, why this nonsense?

The reason for this is compatibility - ESP8266 requires using PROGMEM/F() if you don't want to waste the RAM. Definitely NOT a nonsense...

@ullix
Copy link
Author

ullix commented May 25, 2021

Getting the Depau lib is not very promising, so far:

I downloaded the ZIP which gave me a file of this name: ESPAsyncWebServer-partial-header.zip. The 'partial' is irritating, but I tried several times and Firefox tells me the download had correctly completed.

The tried to install in Ar1.8.13 and got this message:
Auswahl_020

I tried Ar2.0beta7 and got this:
Auswahl_021

Then I tried $ gh repo clone Depau/ESPAsyncWebServer and got this:
ERROR: ld.so: object 'libgtk3-nocsd.so.0' from LD_PRELOAD cannot be preloaded (failed to map segment from shared object): ignored.

Now, what?

@BlueAndi
Copy link

What I find strange is the async_tcp (CPU 0/1) message. What does it mean? Is the WDT triggered from both CPU cores at exactly the same time? Out of a total of 293 crashes, 292 were of this (CPU 0/1) type, and only 1 of (CPU 1) type, none of (CPU 0) type.

So, while greatly improved, this is not the final solution. What else can I do?

The async_tcp (CPU 0/1) sounds for me like, the loopTask is calling the send() function, which try to interact with the LwIP stack. But the LwIP stack is already waiting for the async_tcp queue, but the queue might be full. This causes the deadlock, just a rough guess.

To debug it down, I propose the following actions, each to be executed independently:

#define CONFIG_ASYNC_TCP_RUNNING_CORE 0
#define CONFIG_ASYNC_TCP_USE_WDT 0

One other thing bothers me: Both parties offer their code is explicitly for ESPs. I learned that neither PROGMEM nor the F() macro make any sense on these CPUs (https://www.esp32.com/viewtopic.php?f=19&t=20595 ). Yet, both offerings expressly advertise for PROGMEM and for F(). Are they not aware of this? I don't believe that. So, why this nonsense?

Just a note to PROGMEM, it depends on the MCU architecture (harvard, von neumann) and to say the truth, there might be exceptions depended on the used adresses.

@zekageri
Copy link

zekageri commented May 25, 2021

Getting the Depau lib is not very promising, so far:

I downloaded the ZIP which gave me a file of this name: ESPAsyncWebServer-partial-header.zip. The 'partial' is irritating, but I tried several times and Firefox tells me the download had correctly completed.

The tried to install in Ar1.8.13 and got this message:
Auswahl_020

I tried Ar2.0beta7 and got this:
Auswahl_021

Then I tried $ gh repo clone Depau/ESPAsyncWebServer and got this:
ERROR: ld.so: object 'libgtk3-nocsd.so.0' from LD_PRELOAD cannot be preloaded (failed to map segment from shared object): ignored.

Now, what?

You have to unzip the lib, and i suggest to you to use PlatformIO with VSCODE.

@ullix
Copy link
Author

ullix commented May 25, 2021

You have to unzip the lib.

I don't believe so. Both Ar1.8 and 2.0 have an explicit option to install ZIP libs. Like here Ar2.0

Auswahl_023

@ullix
Copy link
Author

ullix commented May 25, 2021

@BlueAndi I tried your 2nd option first and put this

#define CONFIG_ASYNC_TCP_RUNNING_CORE 0
#define CONFIG_ASYNC_TCP_USE_WDT 0

at the top of the code. Compiled and ran, made a 300k file and downloaded. Out of 10 attempts to download, only 1 succeeded, and 9 crashed. All for WDT, and all with - async_tcp (CPU 0/1). That is the worst-ever result.

I can only use IDEs Ar1.8 or Ar2.0. I have Platformio installed, but so far have troubles to run it (some installation problem).

What can be done? I can offer my test sketch, if you want to try it.

@BlueAndi
Copy link

Hmm ... I don't believe that the task watchdog was deactivated. I use platformio inside VSCode, as it offers more options.
Provide your test sketch and I will try during the week.

@ullix
Copy link
Author

ullix commented May 27, 2021

Auswahl_038
I am still seeing these annoying crashes, though progress has been made. I am showing an encouraging update, and the code at the end.

Here is the current stand as a graphics in the exact scale as above (#984 (comment)).

Much, much better, and basically all due to BlueAndi's suggestion for deferring the download execution into the Loop. But still some 1% crashes, which strangely seem to be largely around the 300k.

Here a picture on the download speed:

image

The smaller files may be impacted by the download overhead, but from 500k onwards 700kB/s are reached and held. I have seen better - up to 800kB/s - with this chip, but 700 would be quite alright.

These data were obtained with the code, I am attaching. Please, see the READ.ME for usage instructions. I ran it under Arduino 1.8.13, Arduino2.0beta7, and the uecide IDE. Let me know of any issues.

TESTPUB_async_server.zip

EDIT: I forgot: in the IDE select "Core Debug Level = VERBOSE" !

@zekageri
Copy link

zekageri commented Jun 1, 2021

My async server prject running 4 days straight for now ( with continuous up and downloads ). Can you try replace me_no_dev's ESPAsyncWebServer lib with this forked one from Depau?

My below link was not the full fixed fork

@gnalbandian
Copy link

gnalbandian commented Jun 1, 2021

Folks, how are you.

Many of the problems with this library reside in the undelying use of AsyncTCP or ESPAsyncTCP library aswell.
There many forks that have already got this working steadily.

The combination I' currently using is:

Lorol's ESPAyncWebServer fork: Lorol's

Adam5Wu ESPAsyncTCP fork Adams

Not sure if yours specific problems are solved by this combination, but it worked for me. No more crashes.

Also @philbowles have made his own version of this libraries and seems to work fine.
Please check:
https://github.com/philbowles/PangolinMQTT
https://github.com/philbowles/ESPAsyncTCP-master

@ullix
Copy link
Author

ullix commented Jun 1, 2021

Can you try replace me_no_dev's ESPAsyncWebServer lib with this forked one from Depau?

@zekageri :
This time it worked well to download the ZIP and install.

I ran it with the code I had published above. Unfortunately, it crashed as happily as the original me-no-dev code! At best, every second download crashed, but on average it was closer to 2 out of 3 crashing!

So this lib is not an improvement.

Am I correct that your 4-day-running project did not use my code, and that the downloaded files were smaller?

The sweet spot for bad behavior seems to be a 300k file. I make a binary file of records with 32 bytes of printable ASCII chars (makes it easier to read a "binary"), 9600 records, 307200 bytes.

I semi-automate the downloads by using HTML to auto-refresh the web page every 10 sec, and then count reboots and successful downloads from the Serial Monitor log.

I observed in some trials that having Core Debug Level to anything but 'None' made it worse; though that also was inconsistent. Anyway, I suggest to set Core Debug Level=None, and also comment out the esp_log_level_set("*", ESP_LOG_VERBOSE). It remains confusing.

@ullix
Copy link
Author

ullix commented Jun 1, 2021

Lorol's ESPAyncWebServer fork: Lorol's

@gnalbandian I tried this Lorol version of the webserver, but nope, same bad performance as with the others. A successful download about once in three trials!

Have you guys really tried to download 300k files? See my files comments here #984 (comment) and in the source code, attached here: #984 (comment)

Am I the only one with highly reproducible problems in downloading big files?

@gnalbandian
Copy link

Have you replaced ESPAsyncTCP library aswell?

@BlueAndi
Copy link

BlueAndi commented Jun 1, 2021

Have you replaced ESPAsyncTCP library aswell?

For esp32 AsyncTCP must be replaced, ESPAsyncTCP is only for esp8266.

Funny ... I have my own fork of both AsyncTcp and ESPAsyncWebServer. :-D

@BlueAndi
Copy link

BlueAndi commented Jun 1, 2021

@ullix Just to be sure, but you don't have the TESTPUB_async_server.ino and etc. as a vscode + platformio project available?

@BlueAndi
Copy link

BlueAndi commented Jun 1, 2021

And again to be sure, can you ensure that the downloadPage() is not called during a already running download?

Currently it is not ensured:

static void downloadPage(AsyncWebServerRequest* request){

    Serial.println("2 webserver: downloadPage");
    deferredRequest = request;
}

@ullix
Copy link
Author

ullix commented Jun 2, 2021

Have you replaced ESPAsyncTCP library aswell?

@gnalbandian No, I am using the ESP32, and as was pointed out, I need the AsyncTCP for the ESP32. Do you have this as well?

have the TESTPUB_async_server.ino and etc. as a vscode + platformio

@BlueAndi Unfortunately I have not. I have installed pio, but so far failed to run it. :-( If you manage anything, I would appreciate if you provide a copy.

ensure that the downloadPage() is not called during a already running download?

it is presently ensured because the download takes only <1 sec, while the HTTP refresh is set for 10sec. Thus a >9 sec break before the next download.

Though, I may be mistaken, but I think the ESP32 Async server should manage also that? But presently there is no overlap.

The crashes are all of this pattern:

09:59:26.730 -> 3 Loop: deferredRequest handling
09:59:31.995 -> E (58919) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
09:59:31.995 -> E (58919) task_wdt:  - async_tcp (CPU 0/1)
09:59:31.995 -> E (58919) task_wdt: Tasks currently running:
09:59:31.995 -> E (58919) task_wdt: CPU 0: IDLE0
09:59:31.995 -> E (58919) task_wdt: CPU 1: loopTask
09:59:31.995 -> E (58919) task_wdt: Aborting.
09:59:31.995 -> abort() was called at PC 0x40140f68 on core 0

in most of them there is this (CPU 0/1); very few have (CPU 1). I think I have never seen (CPU 0).

I am attaching my 300k file. Unzip to use, it is highly compressible due to the simple content. I hope that some of you give it a try.

TEST300k.cam.zip

@Pablo2048
Copy link

@zekageri I know ;-) but I just want to show that on "odrinary" github link I can just ctrl+click on the name library and I'm in the github repository, but in case of powerbroker2/SafeString... you simple can not do this (and even worse is this lib_deps = 1...
@ullix ok, I figured out how to include correct branch of the server - please use https://github.com/yubox-node-org/ESPAsyncWebServer/archive/refs/heads/yuboxfixes-0xFEEDC0DE64-cleanup.zip in your lib_deps and give it a try...

@zekageri
Copy link

zekageri commented Sep 7, 2021

@zekageri I know ;-) but I just want to show that on "odrinary" github link I can just ctrl+click on the name library and I'm in the github repository, but in case of powerbroker2/SafeString... you simple can not do this (and even worse is this lib_deps = 1...
@ullix ok, I figured out how to include correct branch of the server - please use https://github.com/yubox-node-org/ESPAsyncWebServer/archive/refs/heads/yuboxfixes-0xFEEDC0DE64-cleanup.zip in your lib_deps and give it a try...

I was asking that. :D Nvm

@Pablo2048
Copy link

Oh, sorry for misunderstanding... https://github.com/powerbroker2/SafeString

@ullix
Copy link
Author

ullix commented Sep 7, 2021

@Pablo2048
First, I made a mistake in my dry-run test, and put a correction into the previous post
Second, this correction relates to the AsyncTCPSock lib only
Third, my success did come with the "wrong" one! I hereby declare the "wrong" ESPAsyncWebserver the "right" one! Don't spoil my excitement right away by making me install a thing, which might have a regression! :-))
Fourth, I had asked this question already: when you see something distinguised with +sha.6f8bcef, how do you know which one it is, when all you see is:
Auswahl_010

@Pablo2048
Copy link

This sha.6f... thing is secure hash of certain commit. Github (and many others like Gitea, ...) displays just first 4 bytes of full hash. If you want to know where you are you have to click on "Commits":
Snímek z 2021-09-07 17-01-10
Here near "307" and the you see first 4 bytes of all commits (so you have to search for the right one). This is the only method I'm aware of...

@ullix
Copy link
Author

ullix commented Sep 7, 2021

IS <SafeString> 4.1.10 a wrapper for the Arduino String class to work with Strings properly?

Not a wrapper. I'd say an alternative to both the String (capital S) and string (low cap s) class.

I got burned pretty badly by using Strings. Then I changed to SafeString and all is well. While Strings are convenient, certain things become even simpler with SafeStrings. And I don't even want to mention the horrible C functions on strings. You'll find some examples in my "reproducer" https://github.com/ullix/ESP32-Flash-Crash

Much recommended!

@ullix
Copy link
Author

ullix commented Sep 7, 2021

@Pablo2048 Thanks. I am afraid it will remain cloudy to me for a days more ...

@zekageri
Copy link

zekageri commented Sep 7, 2021

IS <SafeString> 4.1.10 a wrapper for the Arduino String class to work with Strings properly?

Not a wrapper. I'd say an alternative to both the String (capital S) and string (low cap s) class.

I got burned pretty badly by using Strings. Then I changed to SafeString and all is well. While Strings are convenient, certain things become even simpler with SafeStrings. And I don't even want to mention the horrible C functions on strings. You'll find some examples in my "reproducer" https://github.com/ullix/ESP32-Flash-Crash

Much recommended!

Oh god. Thanks. I will definietly try this one!

@avillacis
Copy link
Contributor

I am running out of ideas for more stress tests. I think the problem is solved. Let's call them the "Galapagos libs", first, because they are an evolutionary result, and second, because @avillacis lives "close" to the islands (one of my dream locations).

...And the library is still evolving. As of commit yubox-node-org/AsyncTCPSock@8607ac1 I found and fixed yet another bug that risked incorrectly signaling of a RX-timeout if previously executed callbacks from the same or other connections took more time than the configured RX-timeout, even if by that time, there was already more data to read. Also, fixed an outbound-connection regression that slipped past me with the write-timeout fix.

Then I used 300K files, and started the webpage 10 times within approx 1 sec. This has now reached 1600 downloads, all at zero failure with respect to WDT crashes! I see a very occasional net::ERR_EMPTY_RESPONSE, but it is rare, and surely not surprising.

Also, a fix that keeps connections yet to be accepted in the listening socket backlog, if already CONFIG_LWIP_MAX_SOCKETS connections are active. This might fix the net::ERR_EMPTY_RESPONSE or at least make it much less frequent.

@zekageri
Copy link

zekageri commented Sep 8, 2021

I will test this soon.

In my case I have a lot of tasks. Both cores working hard and fighting for that precious ram. Maybe this is the problem in my side. One of my task is bound to core 0. This is a syncron modbus communication. It must be executed as soon as possible. ( It is looping every 10ms because of the wdt feed vTaskDelay(1) ). It burns the cycles pretty intensely. The other tasks are all running bound to core 1.

@zekageri
Copy link

zekageri commented Sep 8, 2021

Crash on page refresh:

/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/queue.c:1442 (xQueueGenericReceive)- assert failed!
abort() was called at PC 0x4009056d on core 1

ELF file SHA256: 0000000000000000

Backtrace: 0x4008f5c4:0x3ffd82a0 0x4008f83d:0x3ffd82c0 0x4009056d:0x3ffd82e0 0x401a55dc:0x3ffd8320 0x401a5728:0x3ffd8340 0x401a57cd:0x3ffd8370 0x401a6275:0x3ffd83a0 0x401a53cb:0x3ffd8400 0x40090842:0x3ffd8450
  #0  0x4008f5c4:0x3ffd82a0 in invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
  #1  0x4008f83d:0x3ffd82c0 in abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
  #2  0x4009056d:0x3ffd82e0 in xQueueGenericReceive at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/queue.c:2038
  #3  0x401a55dc:0x3ffd8320 in AsyncClient::_clearWriteQueue() at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
  #4  0x401a5728:0x3ffd8340 in AsyncClient::_error(signed char) at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
  #5  0x401a57cd:0x3ffd8370 in AsyncClient::_notifyWrittenBuffers(std::deque<AsyncClient::notify_writebuf, std::allocator<AsyncClient::notify_writebuf> >&, int) at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
  #6  0x401a6275:0x3ffd83a0 in AsyncClient::_sockIsWriteable() at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
  #7  0x401a53cb:0x3ffd8400 in _asynctcpsock_task(void*) at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
  #8  0x40090842:0x3ffd8450 in vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

Rebooting...

Other than that crash, everything is loading at first. It is really fast. No empty responses.
I'll try to reproduce it.

image

Can't cash that favicon....

@ullix
Copy link
Author

ullix commented Sep 8, 2021

I've got a weird error - not sure what to make out of it.

My 20-page-in-parallel output went well bexon 10000 downloads without any crash. But then it stopped the web activity, but otherwise kept running. No response to any web click. Closed and reopened browser, changed browser, rebooted, de-/re-powered, re-uploaded code -- nothing. All looked normal, monitor output normal, wifi connects normal, but nor response to web calls.

I then erased flash with `esptool.py erase_flash' and re-uploaded. Everything runs just fine again. Strange.

@Pablo2048
Copy link

Guys move this to the correct repository please. This issue right now has nothing to do with this library...

@ullix
Copy link
Author

ullix commented Sep 12, 2021

I was gone over the weekend, and had my FlashCrash tester running. One download after the other, with a 50ms delay, over >2 days. It amounted to over 120 000 downloads, and not a single crash, neither by WDT nor anything else!

It is time these libs move into Arduino mainstream!

I used these libs:

Dependency Graph
|-- <AsyncTCPSock> 0.0.1+sha.8607ac1
|-- <ESP Async WebServer> 1.2.3+sha.2f78426

I observed one strange thing: while the download speed was very consistent between 330 ... 380 kB/s, I found it strange that this cycled in a precise saw-tooth fashion, as seen in the graph:
image

The cycle length is very close to 2500 sec or 1600 downloads. Where does this come from? I have no idea. Doesn't seem to do any harm, but seems strange anyway.

@ullix
Copy link
Author

ullix commented Sep 13, 2021

Looking at this more closely, I see that the saw tooth is produced solely by the ESP Async WebServer's Send(..., file, ...) command. The actual download is rather smooth, with occasional disturbances. The downloaded file is 300K.

Auswahl_015

The send command is:

    wserver.on(filePath.c_str(), HTTP_GET, [](AsyncWebServerRequest* request){
            sendBegin = micros();                               
            request->send(*myFS, filePath, String(), false);
            sendDone  = micros();                               
            durSend   = sendDone - sendBegin;       
        });

The durSend is plotted as the blue line. The sum of red and blue gives the total download time.

The libs used are:

|-- <AsyncTCPSock> 0.0.1+sha.8ee20ad
|-- <ESP Async WebServer> 1.2.3+sha.2f78426

I find it strange that the Web Server's send command takes increasing amounts of time and then is kind of "reset" to start values. Its minimum value is 26 ms, its maximum is 168 ms, or >6 times more than the minimum!

Looks like an opportunity for improving download speed!

@proddy
Copy link

proddy commented Nov 6, 2021

@ullix this change fixed my issue with downloading >145K crashing my ESP32

@0xFEEDC0DE64
Copy link

May I ask you guys why you spend (waste?) so much time in that async webserver library?
I was working with it for months and in the end i saved a lot of time by removing it from our projects.
I think it was written for much more piwerful microcontrollers where performance and heap usage isnt an issue, but it was not meant for the esp32?

@proddy
Copy link

proddy commented Nov 6, 2021

@0xFEEDC0DE64 thing is it's embedded into the ESPAsyncWebServer library and also the AsyncMQTTClient library. I remember someone did a replacement stub library that used the ESP32 native lwip stuff but can't find it anymore. I also remember seeing an alternative library but can't find that either (I really should start bookmarking things!). Anyway what would you suggest?

@zekageri
Copy link

zekageri commented Nov 7, 2021

It is much faster then the sync server lib. Even when I run the sync server In a different task and core. :/ The websocket is also much faster then any other

@Pablo2048
Copy link

For me the advantages are obvious:

  • same API for ESP8266 and ESP32 (easy project migration from ESP8266 to ESP32)
  • automatic static file serving from internal filesystem, serving gziped files
  • websocket on the same port as the webserver (access behind NAT - try this with separate websocket port and four devices - it's nightmare)
  • multiple connections
  • speed

@zekageri
Copy link

For me the advantages are obvious:

  • same API for ESP8266 and ESP32 (easy project migration from ESP8266 to ESP32)
  • automatic static file serving from internal filesystem, serving gziped files
  • websocket on the same port as the webserver (access behind NAT - try this with separate websocket port and four devices - it's nightmare)
  • multiple connections
  • speed

You just let the server to automatically send a gzippped content if it exists? For me, if I let it send with .gz extension the IOS/safari can't display the page. I must remove the extension. But if if I do this the server do not recognize the zipped files so I must create an endpoint to every single gz file.

@Pablo2048
Copy link

I don't have any iXXXX device at hand so I can't test it. I've read somewhere that Apple devices needs original extensions and not .gz - if this is correct, then I suggest to change the behavior of async server static file serving - for example add flag named _gzAdded and set it here

_path = _path+".gz";

or move 3 lines from here
int filenameStart = path.lastIndexOf('/') + 1;
before line 511 (at least you can try it...).

@avillacis
Copy link
Contributor

You just let the server to automatically send a gzippped content if it exists? For me, if I let it send with .gz extension the IOS/safari can't display the page. I must remove the extension. But if if I do this the server do not recognize the zipped files so I must create an endpoint to every single gz file.

I use the gzipped functionality extensively in my projects, and I have never known of any issues with serving gzipped content. Then again, I do not have Apple devices to test this either.

The way I understand gzip functionality, it works as follows:

  • For a web URL named /resource.xxx, create it, then gzip into file resource.xxx.gz . Load this gzipped file into filesystem.
  • Use serveStatic() to serve filesystem content.
  • Browser asks for /resource.xxx . Webserver checks filesystem, fails to find /resource.xxx, but notices /resource.xxx.gz exists.
  • Webserver appends Content-Encoding: gzip to response headers, then serves gzipped content from /resource.xxx.gz as /resource.xxx
  • Browser sees response header and knows to uncompress gzipped content before processing.

Of course, this is supposed to be done only if browser announced gzipped-response support by using the "Accept-Encoding: gzip" request header. However, ESPAsyncWebServer does not implement any fallback to expand gzipped content on the fly if the browser fails to announce gzip support, and it could be unadvisable to implement it due to resource constraints.

@marcboon
Copy link

marcboon commented Nov 18, 2021

I have done projects using gzipped static content served by server.serveStatic() from ESP8266 to my iPhone without any problems. The process is exactly as @avillacis described.
In case the client does not accept gzipped content (doesn't include it in the accept header), and the unencoded original file is not available, the server should simply return a 404.

@zekageri
Copy link

Interesting. I tested with Mac laptop, and four different Iphones. They all wanted to download the static files if I left the .gz extension in the file name. All windows and Android devices working fine.

If I remove the .gz from the file name, tell the client that it is in fact a zipped content it will work on every device.

@stale
Copy link

stale bot commented Mar 30, 2022

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 30, 2022
@stale
Copy link

stale bot commented Apr 16, 2022

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

@stale stale bot closed this as completed Apr 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants