-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Bug: Using server.serveStatic causes WDT crashes almost every 2nd time for files >150K #984
Comments
Upon further investigations I conclude that this is a bug. For me a very serious one as it drastically reduces download speed by a factor of more than 10! As first test I also coded a chunked response, but as expected, it showed the same failures as reported above. However, this chunk code now allowed to define the chunk size, and this made a huge difference! I created a 300k (307200 B) file and downloaded that in various configurations. In all tests it always crashed with a wdt message when the download touched 100000; I observed only the two values of 100030 and 101459. Message was:
I used various ESP32-WROOM-32E (all latest revision 3) with Flash of 4MB, 8MB, and 16MB. All gave the same result. Using the code included below I tested for the impact of chunk size, and got a big surprise:
I tested the file systems FFat and LittleFS. Outcome was identical. I noticed that the download was very slow, slower by about an order of magnitude! So I used a file of 90k, which could be downloaded under all conditions, and measured download times:
Using 1028 Bytes makes it 15 times slower, but is the only option for big files, when you need speed most! And one last surprise: removing the print options in chunk-code made it even slower, not faster! Here is the relevant code:
|
This is still a major problem for me, so I investigated further. To simplify I only used the webserver with command
and only on ESP32-WROOM-32E (revision3, the latest models) with 4MB Flash with the LittleFS file system. I created binary data files of various sizes. (While saved in binary mode, all bytes were printable ASCII, so the file content was legible to humans). The data shown below were compiled with Arduino 2.0 beta 7, and ESP core 1.0.6, but I repeated some tests with Arduino 1.8.13 and ESP core 1.0.4 and got same results. I automated this by creating an auto-refreshing website containg JavaScript, by which a Up to a file size of 150 Kibibytes all downloads were always successful. In an overnight run with a 90KB file all 15833 downloads were successful, not a single failure. However, with only a few bytes more, 160 Kibibytes, the number of Watch-Dog-Timer induced crashes increased progressively. All were of the same type:
The successful downloads decreased to as low as 58%, i.e. almost every 2nd download attempt resulted in a crash and reboot! Somewhat strangely, beyond a file size of ~300K the success rate increased a little bit, and was 70% at 1.4MB, the biggest file possible in this setting. As I had mentioned on previous posts, one can mitigate this WDT problem by using chunked downloads. However, this slows down the transfer drastically by 40...50 times! Here are two examples done subsequently: This shows the web page result for serveStatic (picking one when it worked!):
A speed of 506 kB/s is very nice; top-speed observed was even a fantastic ~800kB/s! With a chunked download you need to limit the maxLen of the chunk to 1028 bytes, or you face again the WDT crash issue. With that limit this is what you get:
Instead of 3 sec it now takes over 2 min, a speed of only 11 kB/s! I am building a device which logs data, and I will use even the bigger 8MB and 16MB ESP variants and files filling the Flash. This webserver cannot handle this! |
You download the files in the context of the web server (Async TCP task). Did you try to move it to the loop context? |
@BlueAndi How can I do this? I think it is not possible, but please, correct me. My webserver initialization gets this code:
Then from a website I call |
You can't do this with serveStatic(), but you can handle the request by your own.
|
Wow. This is gold |
That sure is an idea! I hadn't expected it is possible. But it works. However, only to a degree. I put this in my code and tested file sizes up to 1400k on my ESP32-WROOM-E with 4MB Flash. The most sensitive file size range as shown in my graph above #984 (comment) seemed to be around 300k. And again, with 200 ... 300 downloads on each file size, I saw WDT crashes only at 300k and 400k, but it did crash. Then I pulled out my ESP32-WROOM-E with 16MB Flash and created a 12MB(!) file, and repeatedly downloaded over night. The downloads were a very decent 650kB/s, but that still takes near 20sec. But this time almost 20% of all download attempts crashed with a triggered Watch-Dog-Timer following the command
What I find strange is the So, while greatly improved, this is not the final solution. What else can I do? |
This seems the common async server crash problem to me.Can you replace the ESPAsyncWebServer lib with this forked version by Depau and try again? I'm currently testing this fork and it seems very stable |
I'll try. Looking at both ReadMEs, the Depau version seems to be taken verbatim from me-no-dev. Where is the difference in the code? Which part is changed? One other thing bothers me: Both parties offer their code is explicitly for ESPs. I learned that neither PROGMEM nor the F() macro make any sense on these CPUs (https://www.esp32.com/viewtopic.php?f=19&t=20595 ). Yet, both offerings expressly advertise for PROGMEM and for F(). Are they not aware of this? I don't believe that. So, why this nonsense? |
Here is the PR by Depau He fixed some linked list problems and things like that. About the PROGMEM and F() things i don't know. Probably compatibility solutions between esp8266 and esp32 and things like that. There are so many esps around, you never know what was Me_no_dev's intention with these. |
Yes. It is a fork from me-no-dev's lib. The readme and almost everything is the same except the linked list and some other fixes. |
The reason for this is compatibility - ESP8266 requires using PROGMEM/F() if you don't want to waste the RAM. Definitely NOT a nonsense... |
The To debug it down, I propose the following actions, each to be executed independently:
#define CONFIG_ASYNC_TCP_RUNNING_CORE 0
#define CONFIG_ASYNC_TCP_USE_WDT 0
Just a note to PROGMEM, it depends on the MCU architecture (harvard, von neumann) and to say the truth, there might be exceptions depended on the used adresses. |
@BlueAndi I tried your 2nd option first and put this
at the top of the code. Compiled and ran, made a 300k file and downloaded. Out of 10 attempts to download, only 1 succeeded, and 9 crashed. All for WDT, and all with I can only use IDEs Ar1.8 or Ar2.0. I have Platformio installed, but so far have troubles to run it (some installation problem). What can be done? I can offer my test sketch, if you want to try it. |
Hmm ... I don't believe that the task watchdog was deactivated. I use platformio inside VSCode, as it offers more options. |
Here is the current stand as a graphics in the exact scale as above (#984 (comment)). Much, much better, and basically all due to BlueAndi's suggestion for deferring the download execution into the Loop. But still some 1% crashes, which strangely seem to be largely around the 300k. Here a picture on the download speed: The smaller files may be impacted by the download overhead, but from 500k onwards 700kB/s are reached and held. I have seen better - up to 800kB/s - with this chip, but 700 would be quite alright. These data were obtained with the code, I am attaching. Please, see the READ.ME for usage instructions. I ran it under Arduino 1.8.13, Arduino2.0beta7, and the uecide IDE. Let me know of any issues. EDIT: I forgot: in the IDE select "Core Debug Level = VERBOSE" ! |
My async server prject running 4 days straight for now ( with continuous up and downloads ). Can you try replace me_no_dev's ESPAsyncWebServer lib with this forked one from Depau? My below link was not the full fixed fork |
Folks, how are you. Many of the problems with this library reside in the undelying use of AsyncTCP or ESPAsyncTCP library aswell. The combination I' currently using is: Lorol's ESPAyncWebServer fork: Lorol's Adam5Wu ESPAsyncTCP fork Adams Not sure if yours specific problems are solved by this combination, but it worked for me. No more crashes. Also @philbowles have made his own version of this libraries and seems to work fine. |
@zekageri : I ran it with the code I had published above. Unfortunately, it crashed as happily as the original me-no-dev code! At best, every second download crashed, but on average it was closer to 2 out of 3 crashing! So this lib is not an improvement. Am I correct that your 4-day-running project did not use my code, and that the downloaded files were smaller? The sweet spot for bad behavior seems to be a 300k file. I make a binary file of records with 32 bytes of printable ASCII chars (makes it easier to read a "binary"), 9600 records, 307200 bytes. I semi-automate the downloads by using HTML to auto-refresh the web page every 10 sec, and then count reboots and successful downloads from the Serial Monitor log. I observed in some trials that having Core Debug Level to anything but 'None' made it worse; though that also was inconsistent. Anyway, I suggest to set Core Debug Level=None, and also comment out the |
@gnalbandian I tried this Lorol version of the webserver, but nope, same bad performance as with the others. A successful download about once in three trials! Have you guys really tried to download 300k files? See my files comments here #984 (comment) and in the source code, attached here: #984 (comment) Am I the only one with highly reproducible problems in downloading big files? |
Have you replaced ESPAsyncTCP library aswell? |
For esp32 AsyncTCP must be replaced, ESPAsyncTCP is only for esp8266. Funny ... I have my own fork of both AsyncTcp and ESPAsyncWebServer. :-D |
@ullix Just to be sure, but you don't have the TESTPUB_async_server.ino and etc. as a vscode + platformio project available? |
And again to be sure, can you ensure that the downloadPage() is not called during a already running download? Currently it is not ensured:
|
@gnalbandian No, I am using the ESP32, and as was pointed out, I need the
@BlueAndi Unfortunately I have not. I have installed pio, but so far failed to run it. :-( If you manage anything, I would appreciate if you provide a copy.
it is presently ensured because the download takes only <1 sec, while the HTTP refresh is set for 10sec. Thus a >9 sec break before the next download. Though, I may be mistaken, but I think the ESP32 Async server should manage also that? But presently there is no overlap. The crashes are all of this pattern:
in most of them there is this I am attaching my 300k file. Unzip to use, it is highly compressible due to the simple content. I hope that some of you give it a try. |
@zekageri I know ;-) but I just want to show that on "odrinary" github link I can just ctrl+click on the name library and I'm in the github repository, but in case of |
I was asking that. :D Nvm |
Oh, sorry for misunderstanding... https://github.com/powerbroker2/SafeString |
@Pablo2048 |
Not a wrapper. I'd say an alternative to both the String (capital S) and string (low cap s) class. I got burned pretty badly by using Strings. Then I changed to SafeString and all is well. While Strings are convenient, certain things become even simpler with SafeStrings. And I don't even want to mention the horrible C functions on strings. You'll find some examples in my "reproducer" https://github.com/ullix/ESP32-Flash-Crash Much recommended! |
@Pablo2048 Thanks. I am afraid it will remain cloudy to me for a days more ... |
Oh god. Thanks. I will definietly try this one! |
...And the library is still evolving. As of commit yubox-node-org/AsyncTCPSock@8607ac1 I found and fixed yet another bug that risked incorrectly signaling of a RX-timeout if previously executed callbacks from the same or other connections took more time than the configured RX-timeout, even if by that time, there was already more data to read. Also, fixed an outbound-connection regression that slipped past me with the write-timeout fix.
Also, a fix that keeps connections yet to be accepted in the listening socket backlog, if already CONFIG_LWIP_MAX_SOCKETS connections are active. This might fix the |
I will test this soon. In my case I have a lot of tasks. Both cores working hard and fighting for that precious ram. Maybe this is the problem in my side. One of my task is bound to core 0. This is a syncron modbus communication. It must be executed as soon as possible. ( It is looping every 10ms because of the wdt feed vTaskDelay(1) ). It burns the cycles pretty intensely. The other tasks are all running bound to core 1. |
Crash on page refresh:/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/queue.c:1442 (xQueueGenericReceive)- assert failed!
abort() was called at PC 0x4009056d on core 1
ELF file SHA256: 0000000000000000
Backtrace: 0x4008f5c4:0x3ffd82a0 0x4008f83d:0x3ffd82c0 0x4009056d:0x3ffd82e0 0x401a55dc:0x3ffd8320 0x401a5728:0x3ffd8340 0x401a57cd:0x3ffd8370 0x401a6275:0x3ffd83a0 0x401a53cb:0x3ffd8400 0x40090842:0x3ffd8450
#0 0x4008f5c4:0x3ffd82a0 in invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
#1 0x4008f83d:0x3ffd82c0 in abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
#2 0x4009056d:0x3ffd82e0 in xQueueGenericReceive at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/queue.c:2038
#3 0x401a55dc:0x3ffd8320 in AsyncClient::_clearWriteQueue() at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
#4 0x401a5728:0x3ffd8340 in AsyncClient::_error(signed char) at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
#5 0x401a57cd:0x3ffd8370 in AsyncClient::_notifyWrittenBuffers(std::deque<AsyncClient::notify_writebuf, std::allocator<AsyncClient::notify_writebuf> >&, int) at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
#6 0x401a6275:0x3ffd83a0 in AsyncClient::_sockIsWriteable() at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
#7 0x401a53cb:0x3ffd8400 in _asynctcpsock_task(void*) at lib\AsyncTCPSock-master\src/AsyncTCP.cpp:339
#8 0x40090842:0x3ffd8450 in vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
Rebooting... Other than that crash, everything is loading at first. It is really fast. No empty responses. Can't cash that favicon.... |
I've got a weird error - not sure what to make out of it. My 20-page-in-parallel output went well bexon 10000 downloads without any crash. But then it stopped the web activity, but otherwise kept running. No response to any web click. Closed and reopened browser, changed browser, rebooted, de-/re-powered, re-uploaded code -- nothing. All looked normal, monitor output normal, wifi connects normal, but nor response to web calls. I then erased flash with `esptool.py erase_flash' and re-uploaded. Everything runs just fine again. Strange. |
Guys move this to the correct repository please. This issue right now has nothing to do with this library... |
Looking at this more closely, I see that the saw tooth is produced solely by the The send command is:
The The libs used are:
I find it strange that the Web Server's send command takes increasing amounts of time and then is kind of "reset" to start values. Its minimum value is 26 ms, its maximum is 168 ms, or >6 times more than the minimum! Looks like an opportunity for improving download speed! |
May I ask you guys why you spend (waste?) so much time in that async webserver library? |
@0xFEEDC0DE64 thing is it's embedded into the ESPAsyncWebServer library and also the AsyncMQTTClient library. I remember someone did a replacement stub library that used the ESP32 native lwip stuff but can't find it anymore. I also remember seeing an alternative library but can't find that either (I really should start bookmarking things!). Anyway what would you suggest? |
It is much faster then the sync server lib. Even when I run the sync server In a different task and core. :/ The websocket is also much faster then any other |
For me the advantages are obvious:
|
You just let the server to automatically send a gzippped content if it exists? For me, if I let it send with .gz extension the IOS/safari can't display the page. I must remove the extension. But if if I do this the server do not recognize the zipped files so I must create an endpoint to every single gz file. |
I don't have any iXXXX device at hand so I can't test it. I've read somewhere that Apple devices needs original extensions and not .gz - if this is correct, then I suggest to change the behavior of async server static file serving - for example add flag named _gzAdded and set it here ESPAsyncWebServer/src/WebResponses.cpp Line 512 in 1d46269
or move 3 lines from here ESPAsyncWebServer/src/WebResponses.cpp Line 527 in 1d46269
|
I use the gzipped functionality extensively in my projects, and I have never known of any issues with serving gzipped content. Then again, I do not have Apple devices to test this either. The way I understand gzip functionality, it works as follows:
Of course, this is supposed to be done only if browser announced gzipped-response support by using the "Accept-Encoding: gzip" request header. However, ESPAsyncWebServer does not implement any fallback to expand gzipped content on the fly if the browser fails to announce gzip support, and it could be unadvisable to implement it due to resource constraints. |
I have done projects using gzipped static content served by server.serveStatic() from ESP8266 to my iPhone without any problems. The process is exactly as @avillacis described. |
Interesting. I tested with Mac laptop, and four different Iphones. They all wanted to download the static files if I left the .gz extension in the file name. All windows and Android devices working fine. If I remove the .gz from the file name, tell the client that it is in fact a zipped content it will work on every device. |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions. |
I have static, binary log files which I download via WiFi. Works well most of the time, but then there are perpetual crashes. It seems the success depends on the size of the files. Up to 64k mostly ok, 64k ... 128k ok for half the trials, beyond that size almost never ok.
I am using the LittleFS file system. The files are build up of 32 bytes binary records, and may eventually grow >2MB. I use this command for downloading as static file:
I also tried this. Works just as well, but no improvement:
The download is triggered from a website, either as a straight link to download the binary file, or as Javascript code inside a function:
Your ReadMe seems to suggest that chunked responses are NOT needed for static files? So, I did not use that here. Am I forgetting any settings, or do I also need to use chunked response when the static file is beyond a certain size? Which size?
The text was updated successfully, but these errors were encountered: