Skip to content

Crash after update from 2.8.0 to 2.10.0 #5057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
olegrok opened this issue Apr 28, 2025 · 11 comments
Closed

Crash after update from 2.8.0 to 2.10.0 #5057

olegrok opened this issue Apr 28, 2025 · 11 comments
Labels

Comments

@olegrok
Copy link

olegrok commented Apr 28, 2025

After updating Lua/Tarantool wrapper over librdkafka to the latest 2.10.0 version it started to crash.
Everything works fine on 2.8.0.

The last that I see in the logs:

rdlist.c:283:33: runtime error: call to function rd_kafka_topic_info_destroy through pointer to incorrect function type 'void (*)(void *)'
/home/runner/work/kafka/kafka/librdkafka/src/rdkafka_topic.c:2052: note: rd_kafka_topic_info_destroy defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior rdlist.c:283:33 
%5|1745844585.671|CONFWARN|rdkafka#consumer-3| [thrd:app]: No `bootstrap.servers` configured: client will not be able to connect to Kafka cluster
rd.h:157:26: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:188:35: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior rd.h:157:26 
@mensfeld
Copy link

mensfeld commented Apr 28, 2025

Yeah I see similar things in https://github.com/karafka/rdkafka-ruby (crashing) on:

  • rdkafka_metadata_cache.c:92:rd_kafka_metadata_cache_delete: assert: rk->rk_metadata_cache.rkmc_cnt > 0
  • (another place, not sure yet): Segmentation fault (core dumped)

plus few other stability issues.

@emasab
Copy link
Contributor

emasab commented Apr 29, 2025

Thanks for reporting @olegrok
@mensfeld first issue could be related to #5051

@mensfeld
Copy link

@emasab most likely yes. I will try to reproduce it more. The second one I did locate and while it doesn't happen on 2.8 I consider it self-inflicted and already implemented appropriate fixes on my side.

@emasab
Copy link
Contributor

emasab commented Apr 29, 2025

@olegrok the UB at rd.h:157 is for duplicating a NULL string, I'll see where it can happen. At rdlist.c:283:33 it's for passing a void rd_kafka_topic_info_destroy(rd_kafka_topic_info_t *ti) instead of a void rd_kafka_topic_info_destroy(void *ti) I'll see to create a dedicated function.
Both shouldn't cause problems if the sanitizer isn't activated, right? As the default of most implementations is to return NULL on strdup and the second function can be cast without issues.

@olegrok
Copy link
Author

olegrok commented Apr 29, 2025

Tests without UBSAN/ASAN also crash. But I don't have its stacktrace.

@emasab
Copy link
Contributor

emasab commented Apr 29, 2025

@olegrok in that case maybe try with #5055 and see if you still have crashes in that case you can get the stacktrace.

@olegrok
Copy link
Author

olegrok commented Apr 29, 2025

@olegrok in that case maybe try with #5055 and see if you still have crashes in that case you can get the stacktrace.

I'll try it a bit later.

For now it's a backtrace of the crash:

Thread 7 "rdk:main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7e64e27fe640 (LWP 2569223)]
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
74	../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) up
#1  0x00007e650e0a8583 in __GI___strdup (s=0x0) at ./string/strdup.c:41
41	./string/strdup.c: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00007e650e0a8583 in __GI___strdup (s=0x0) at ./string/strdup.c:41
#2  0x00007e64e3469ce1 in rd_strdup (s=<optimized out>) at /home/oleg/Projects/kafka/librdkafka/src/rd.h:157
#3  rd_kafka_brokers_add0 (rk=rk@entry=0x57100c2c7310, brokerlist=<optimized out>, is_bootstrap_server_list=is_bootstrap_server_list@entry=1 '\001') at rdkafka_broker.c:5425
#4  0x00007e64e3453168 in rd_kafka_rebootstrap_tmr_cb (rkts=<optimized out>, arg=<optimized out>) at rdkafka.c:2092
#5  0x00007e64e347f8ab in rd_kafka_timers_run (rkts=rkts@entry=0x57100c2c8338, timeout_us=timeout_us@entry=0) at rdkafka_timer.c:357
#6  0x00007e64e34573f8 in rd_kafka_thread_main (arg=0x57100c2c7310) at rdkafka.c:2241
#7  0x00007e650e094935 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:439
#8  0x00007e650e126850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@olegrok
Copy link
Author

olegrok commented Apr 29, 2025

But no. I applied #5055 to the master branch.

Still crash with the same backtrace:

Thread 7 "rdk:main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc3fff640 (LWP 2573522)]
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
74	../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x00007ffff7ca8583 in __GI___strdup (s=0x0) at ./string/strdup.c:41
#2  0x00007fffd5469ce1 in rd_strdup (s=<optimized out>) at /home/oleg/Projects/kafka/librdkafka/src/rd.h:157
#3  rd_kafka_brokers_add0 (rk=rk@entry=0x555558ba4e50, brokerlist=<optimized out>, is_bootstrap_server_list=is_bootstrap_server_list@entry=1 '\001') at rdkafka_broker.c:5425
#4  0x00007fffd5453168 in rd_kafka_rebootstrap_tmr_cb (rkts=<optimized out>, arg=<optimized out>) at rdkafka.c:2092
#5  0x00007fffd547f8ab in rd_kafka_timers_run (rkts=rkts@entry=0x555558ba5e78, timeout_us=timeout_us@entry=0) at rdkafka_timer.c:357
#6  0x00007fffd54573f8 in rd_kafka_thread_main (arg=0x555558ba4e50) at rdkafka.c:2241
#7  0x00007ffff7c94935 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:439
#8  0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@emasab
Copy link
Contributor

emasab commented May 1, 2025

@olegrok that's because the bootstrap.servers is NULL. I'll change it so there's a check on that, in case one wants to only add brokers with rd_kafka_brokers_add

olegrok added a commit to tarantool/kafka that referenced this issue May 1, 2025
This reverts commit 8d68897.

Still can't update to the latest version due to confluentinc/librdkafka#5057
olegrok added a commit to tarantool/kafka that referenced this issue May 1, 2025
This reverts commit 8d68897.

Still can't update to the latest version due to confluentinc/librdkafka#5057
olegrok added a commit to tarantool/kafka that referenced this issue May 1, 2025
This reverts commit 8d68897.

Still can't update to the latest version due to confluentinc/librdkafka#5057
airlock-confluentinc bot pushed a commit that referenced this issue May 5, 2025
given no `boostrap.servers` is present and brokers were added through `rd_kafka_brokers_add`

Closes #5057
@emasab
Copy link
Contributor

emasab commented May 5, 2025

@olegrok it should work with #5067 .

@emasab emasab added the bug label May 5, 2025
@olegrok
Copy link
Author

olegrok commented May 5, 2025

@olegrok it should work with #5067 .

Yes. With this patch I observe no crashes. Thanks!

airlock-confluentinc bot pushed a commit that referenced this issue May 21, 2025
given no `boostrap.servers` is present and brokers were added through `rd_kafka_brokers_add`

Closes #5057
@emasab emasab closed this as completed in 826f585 May 22, 2025
olegrok added a commit to tarantool/kafka that referenced this issue May 22, 2025
olegrok added a commit to tarantool/kafka that referenced this issue May 22, 2025
olegrok added a commit to tarantool/kafka that referenced this issue May 22, 2025
olegrok added a commit to tarantool/kafka that referenced this issue May 22, 2025
olegrok added a commit to tarantool/kafka that referenced this issue May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants