Skip to content

Replace queue v2 part2 #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

willmmiles
Copy link

This PR is the core of #21: it replaces the FreeRTOS queue with a mutex and an intrusive list. This has a number of small benefits:

  • Queue clears for a closing/errored client can be performed atomically and quickly, even on the LwIP task;
  • Poll coaelscence can be performed atomically and quickly without needing to pump and reload the queue;
  • (Future) It permits pre-allocation of error events, which will allow us to guarantee that the dispose callback will be dispatched even under memory pressure.

Included is a small correctness patch for non-CONFIG_LWIP_TCPIP_CORE_LOCKING systems (eg. Arduino core 2), which have a potential race in AsyncClient::_close() where the tcp callbacks are unbound on the client task instead of the LwIP task.

Use a simple intrusive list for the event queue.  The ultimate goal here
is to arrange that certain kinds of events (errors) can be guaranteed
to be queued, as client objects will leak if they are discarded.  As a
secondary improvement, there are some operations (peeking, remove_if)
that can be more efficient as we can hold the queue lock for longer.

This commit is a straight replacement and does not attempt any logic
changes.
This eliminates a round-trip through the LwIP lock and allows
_tcp_close_api to specialize for AsyncClient.
Ensure that _tcp_close completely disconnects a pcb from an AsyncClient
- All callbacks are unhooked
- All events are purged
- abort() called on close() failure
This fixes some race conditions with closing, particularly without
CONFIG_LWIP_TCPIP_CORE_LOCKING, where an event might be processed for a
now-dead client if it arrived after arg was cleared but before the
callbacks were disconnected.
@mathieucarbou mathieucarbou requested a review from Copilot May 7, 2025 08:46
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the FreeRTOS queue mechanism with a mutex-guarded intrusive list for managing asynchronous TCP event packets, while also refactoring related callback and event handling logic.

  • Introduces SimpleIntrusiveList for event packet management.
  • Replaces FreeRTOS queue APIs with intrusive list operations guarded by a mutex.
  • Adjusts TCP callback binding/teardown and event processing logic to work with the new data structure.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/AsyncTCPSimpleIntrusiveList.h Introduces an intrusive list implementation for event packet management.
src/AsyncTCP.cpp Replaces queue operations with the newly implemented intrusive list; updates TCP callback binding and event handling logic.
Comments suppressed due to low confidence (1)

src/AsyncTCP.cpp:260

  • [nitpick] Consider explicitly capturing 'client' in the lambda (using [client] instead of [=]) to improve the clarity of the intended capture in _remove_events_for_client.
removed_event_chain = _async_queue.remove_if([=](lwip_tcp_event_packet_t &pkt) {

}
static inline lwip_tcp_event_packet_t *_get_async_event() {
queue_mutex_guard guard;
while (1) {
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider refactoring the infinite loop in _get_async_event for clarity, perhaps by using a clearer loop condition or restructuring the nested conditions to improve readability.

Copilot uses AI. Check for mistakes.

@@ -698,6 +684,12 @@ static esp_err_t _tcp_connect(tcp_pcb *pcb, int8_t closed_slot, ip_addr_t *addr,
static err_t _tcp_bind_api(struct tcpip_api_call_data *api_call_msg) {
tcp_api_call_t *msg = (tcp_api_call_t *)api_call_msg;
msg->err = tcp_bind(msg->pcb, msg->bind.addr, msg->bind.port);
if (msg->err != ERR_OK) {
// Close the pcb on behalf of the server without an extra round-trip through the LwIP lock
Copy link
Preview

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment clarifying that, on binding failure, the PCB is intentionally closed (with subsequent abort if needed) to ensure proper cleanup, which helps readers understand the rationale behind this block.

Suggested change
// Close the pcb on behalf of the server without an extra round-trip through the LwIP lock
// On binding failure, the pcb is intentionally closed to ensure proper cleanup.
// If closing the pcb fails, it is aborted to prevent resource leaks or undefined behavior.

Copilot uses AI. Check for mistakes.

@mathieucarbou
Copy link
Member

@willmmiles : I will first release a version with the pbuf_free fix (and update espasyncws). Then will go through this PR.

bool holds_mutex;

public:
inline queue_mutex_guard() : holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY)){};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That great idea for using a singleton pattern 👌
I have a comment here.
It is better like this: holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY) == pdTRUE) {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants