Skip to content

simple reproducer for mmap zfs_flush hang #17303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
markwiederspahn opened this issue May 5, 2025 · 5 comments
Open

simple reproducer for mmap zfs_flush hang #17303

markwiederspahn opened this issue May 5, 2025 · 5 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@markwiederspahn
Copy link

System information

Type Version/Name
Distribution Name RHEL, Alma
Distribution Version 8
Kernel Version 4.18.0-553.51.1.el8_10
Architecture x86_64
OpenZFS Version 2.1.16, 2.2.7, 2.3.1, 2.3.2, 2.3.99-323_g246e5883b

Describe the problem you're observing

production rhel8 system using python parallel threads with mmap hung when certain jobs are run. The issue was traced to a "flush" which does an msync() which leads to ZFS sync writes. Originally seen on a 2x12 disk raidz2 array with 2.1.16 36 cores 644G ram, occurred also with 2.2.7. Debugging moved to Alma8 system with 8 disk raidz1 8 cores 48G ram. Hang is generally reproducible within a few minutes at n=16, sometimes two copies of n=16

Describe how to reproduce the problem

on stock RHEL8/Alma8 with ZFS installed --enable-debug from the tarballs, dnf install python39 python39-numpy
/usr/bin/python3.9 ./memmap_flush_test.py -j 16 --flush will sometimes hang the system, two copies at one is faster.
echo 8191 > zfs_flags
no ASSERTS ever seen.
This seems to be related to various similar issues: #7964, #8751 and #17159, although I attempted to install the small changes from the latter in my copy of master and I continue to get hangs. Probably I'm doing something wrong and not actually getting the changes.

Include any warning/errors/backtraces from the system logs

...
/proc/spl/kstat/zfs/dbgmsg:
1746464023 ffff8f994d87d000 zap_leaf.c:501:zap_entry_read_name(): error 75
1746464023 ffff8f994d87d000 zap_leaf.c:501:zap_entry_read_name(): error 75
1746464023 ffff8f994d87d000 metaslab.c:4021:metaslab_flush(): flushing: txg 17473623, spa xx1, vdev_id 0, ms_id 5231, unflushed_allocs 647168, unflushed_frees
507904, appended 280 bytes
1746464023 ffff8f994d87d000 zap_leaf.c:501:zap_entry_read_name(): error 75
1746464023 ffff8f994d87d000 dbuf.c:3358:dbuf_findbp(): error 2
1746464023 ffff8f994d87d000 zap_leaf.c:501:zap_entry_read_name(): error 75

messages:
May 5 11:57:09 utig1 kernel: INFO: task kworker/u96:1:139110 blocked for more than 120 seconds.
May 5 11:57:09 utig1 kernel: Tainted: P OE -------- - - 4.18.0-553.51.1.el8_10.x86_64 #1
May 5 11:57:09 utig1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 5 11:57:09 utig1 kernel: task:kworker/u96:1 state:D stack:0 pid:139110 ppid:2 flags:0x80004080
May 5 11:57:09 utig1 kernel: Workqueue: writeback wb_workfn (flush-zfs-6)
May 5 11:57:09 utig1 kernel: Call Trace:
May 5 11:57:09 utig1 kernel: __schedule+0x2d1/0x870
May 5 11:57:09 utig1 kernel: schedule+0x55/0xf0
May 5 11:57:09 utig1 kernel: io_schedule+0x12/0x40
May 5 11:57:09 utig1 kernel: __lock_page+0x136/0x270
May 5 11:57:09 utig1 kernel: ? filemap_fdatawait_keep_errors+0x50/0x50
May 5 11:57:09 utig1 kernel: write_cache_pages+0x1f2/0x420
May 5 11:57:09 utig1 kernel: ? zpl_readpages+0x20/0x20 [zfs]
May 5 11:57:09 utig1 kernel: ? __wb_update_bandwidth+0x35/0x1e0
May 5 11:57:09 utig1 kernel: ? _cond_resched+0x15/0x30
May 5 11:57:09 utig1 kernel: zpl_writepages+0xaa/0x200 [zfs]
May 5 11:57:09 utig1 kernel: do_writepages+0xc6/0x1d0
May 5 11:57:09 utig1 kernel: ? ip_finish_output2+0x1a6/0x430
May 5 11:57:09 utig1 kernel: __writeback_single_inode+0x39/0x300
May 5 11:57:09 utig1 kernel: writeback_sb_inodes+0x1ea/0x450
May 5 11:57:09 utig1 kernel: __writeback_inodes_wb+0x5f/0xd0
May 5 11:57:09 utig1 kernel: wb_writeback+0x24c/0x2e0
May 5 11:57:09 utig1 kernel: ? get_nr_inodes+0x35/0x60
May 5 11:57:09 utig1 kernel: wb_workfn+0x392/0x4e0
May 5 11:57:09 utig1 kernel: process_one_work+0x1d3/0x390
May 5 11:57:09 utig1 kernel: worker_thread+0x30/0x390
May 5 11:57:09 utig1 kernel: ? process_one_work+0x390/0x390
May 5 11:57:09 utig1 kernel: kthread+0x134/0x150
May 5 11:57:09 utig1 kernel: ? set_kthread_struct+0x50/0x50
May 5 11:57:09 utig1 kernel: ret_from_fork+0x35/0x40

ps axl | grep D:

5 0 139110 2 20 0 0 0 - D ? 0:16 [kworker/u96:1+flush-zfs-6]
1 0 253108 252676 20 0 6226276 101836 - D pts/0 0:01 /usr/bin/python3.9 ../memmap_flush_test.py -j 16 --flush
1 0 253109 252676 20 0 6226276 101836 - D pts/0 0:01 /usr/bin/python3.9 ../memmap_flush_test.py -j 16 --flush
1 0 253113 252676 20 0 6226276 101836 - D pts/0 0:01 /usr/bin/python3.9 ../memmap_flush_test.py -j 16 --flush
1 0 253118 252676 20 0 6226276 101840 - D pts/0 0:01 /usr/bin/python3.9 ../memmap_flush_test.py -j 16 --flush
...

@markwiederspahn markwiederspahn added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 5, 2025
@tuxoko
Copy link
Contributor

tuxoko commented May 7, 2025

@markwiederspahn
When the hang happens, do you see any hung task showing inode_wait_for_writeback in the stack trace?
Do you see any thread spinning at close to 100% CPU?

If you don't see those thing then it's probably a different issue from #17159
Though this does look similar to #8751

@markwiederspahn
Copy link
Author

No inode_wait_for_writeback, ever. Typically there is a flush-zfs-N kworker in D wait, and 16 or 32 python tasks also in D wait. No looping ever. The pool stops functioning since no txgs succeed (I assume). Other pools are not affected and the system as a whole continues to function normally.

@markwiederspahn
Copy link
Author

the even simpler failure reproducer given by @dweezil in #8751 using fio fails for me if the task count is the same as the number of virtual cpus. n=8 works fine with 16 HT cpus; fails with n=16 with at least 2.3.2-1 and recent git-master on Alma8. n=8 fails with 8 cpus with 2.3.2-1 on ubuntu 2404 kernel 6.8.0-59-generic.. The latter:
...
1747058658 ffff9b7e2fab5280 spa_history.c:330:spa_history_log_sync(): ioctl create
1747058658 ffff9b7e2fab5280 spa_history.c:294:spa_history_log_sync(): command: zfs create tank/fs
1747058658 ffff9b7e2fab5280 metaslab.c:4013:metaslab_flush(): flushing: txg 9, spa tank, vdev_id 0, ms_id 3, unflushed_allocs 26624, unflushed_frees 0, a
ppended 40 bytes
1747058658 ffff9b7e2fab5280 metaslab.c:4013:metaslab_flush(): flushing: txg 10, spa tank, vdev_id 0, ms_id 4, unflushed_allocs 34816, unflushed_frees 0,
appended 112 bytes
1747058664 ffff9b7e2fab5280 metaslab.c:4013:metaslab_flush(): flushing: txg 11, spa tank, vdev_id 0, ms_id 0, unflushed_allocs 330676224, unflushed_frees
17408, appended 144 bytes
1747059278 ffff9b7e2fab5280 metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 131, spa tank, vdev_id 0, ms_id 4, weight 700000000000001, selected t
xg 13 (604149 ms ago), alloc_txg 11, loaded 621361 ms ago, max_size 536770560
1747059278 ffff9b7e2fab5280 metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 131, spa tank, vdev_id 0, ms_id 12, weight 740000000000001, selected
txg 13 (604149 ms ago), alloc_txg 0, loaded 621358 ms ago, max_size 536870912
1747059278 ffff9b7e2fab5280 metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 131, spa tank, vdev_id 0, ms_id 5, weight 740000000000001, selected t
xg 13 (604149 ms ago), alloc_txg 0, loaded 621358 ms ago, max_size 536870912
1747059278 ffff9b7e2fab5280 metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 131, spa tank, vdev_id 0, ms_id 13, weight 740000000000001, selected
txg 13 (604149 ms ago), alloc_txg 0, loaded 621358 ms ago, max_size 536870912
1747059278 ffff9b7e2fab5280 metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 131, spa tank, vdev_id 0, ms_id 6, weight 740000000000001, selected t
xg 13 (604149 ms ago), alloc_txg 0, loaded 621358 ms ago, max_size 536870912
1747059278 ffff9b7e2fab5280 metaslab.c:2617:metaslab_unload(): metaslab_unload: txg 131, spa tank, vdev_id 0, ms_id 7, weight 740000000000001, selected t
xg 13 (604149 ms ago), alloc_txg 0, loaded 621358 ms ago, max_size 536870912

/var/log/syslog

2025-05-12T09:08:17.985482-05:00 IG-D03574 kernel: INFO: task kworker/u16:0:28850 blocked for more than 122 seconds.
2025-05-12T09:08:17.985499-05:00 IG-D03574 kernel: Tainted: P OE 6.8.0-59-generic #61-Ubuntu
2025-05-12T09:08:17.985502-05:00 IG-D03574 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2025-05-12T09:08:17.985504-05:00 IG-D03574 kernel: task:kworker/u16:0 state:D stack:0 pid:28850 tgid:28850 ppid:2 flags:0x00004000
2025-05-12T09:08:17.985506-05:00 IG-D03574 kernel: Workqueue: writeback wb_workfn (flush-zfs-2)
2025-05-12T09:08:17.985508-05:00 IG-D03574 kernel: Call Trace:
2025-05-12T09:08:17.985510-05:00 IG-D03574 kernel:
2025-05-12T09:08:17.985530-05:00 IG-D03574 kernel: __schedule+0x27c/0x6b0
2025-05-12T09:08:17.985532-05:00 IG-D03574 kernel: schedule+0x33/0x110
2025-05-12T09:08:17.985534-05:00 IG-D03574 kernel: io_schedule+0x46/0x80
2025-05-12T09:08:17.985536-05:00 IG-D03574 kernel: folio_wait_bit_common+0x136/0x330
2025-05-12T09:08:17.985538-05:00 IG-D03574 kernel: ? __pfx_wake_page_function+0x10/0x10
2025-05-12T09:08:17.985540-05:00 IG-D03574 kernel: __folio_lock+0x17/0x30
2025-05-12T09:08:17.985542-05:00 IG-D03574 kernel: write_cache_pages+0x1cd/0x430
2025-05-12T09:08:17.985544-05:00 IG-D03574 kernel: ? __pfx_zpl_putfolio+0x10/0x10 [zfs]
2025-05-12T09:08:17.985545-05:00 IG-D03574 kernel: zpl_writepages+0xb0/0x1d0 [zfs]
2025-05-12T09:08:17.985547-05:00 IG-D03574 kernel: do_writepages+0xcd/0x1f0
2025-05-12T09:08:17.985549-05:00 IG-D03574 kernel: __writeback_single_inode+0x44/0x290
2025-05-12T09:08:17.985551-05:00 IG-D03574 kernel: writeback_sb_inodes+0x21b/0x520
2025-05-12T09:08:17.985553-05:00 IG-D03574 kernel: __writeback_inodes_wb+0x54/0x100
2025-05-12T09:08:17.985578-05:00 IG-D03574 kernel: ? queue_io+0x115/0x120
2025-05-12T09:08:17.985581-05:00 IG-D03574 kernel: wb_writeback+0x2df/0x350
2025-05-12T09:08:17.985583-05:00 IG-D03574 kernel: wb_do_writeback+0x1f1/0x2a0
2025-05-12T09:08:17.985585-05:00 IG-D03574 kernel: wb_workfn+0x5f/0x240
2025-05-12T09:08:17.985587-05:00 IG-D03574 kernel: ? finish_task_switch.isra.0+0x93/0x300
2025-05-12T09:08:17.985590-05:00 IG-D03574 kernel: ? __schedule+0x284/0x6b0
2025-05-12T09:08:17.985592-05:00 IG-D03574 kernel: process_one_work+0x175/0x350
2025-05-12T09:08:17.985608-05:00 IG-D03574 kernel: worker_thread+0x306/0x440
2025-05-12T09:08:17.985609-05:00 IG-D03574 kernel: ? _raw_spin_lock_irqsave+0xe/0x20
2025-05-12T09:08:17.985610-05:00 IG-D03574 kernel: ? __pfx_worker_thread+0x10/0x10
2025-05-12T09:08:17.985611-05:00 IG-D03574 kernel: kthread+0xef/0x120
2025-05-12T09:08:17.985612-05:00 IG-D03574 kernel: ? __pfx_kthread+0x10/0x10
2025-05-12T09:08:17.985613-05:00 IG-D03574 kernel: ret_from_fork+0x44/0x70
2025-05-12T09:08:17.985614-05:00 IG-D03574 kernel: ? __pfx_kthread+0x10/0x10
2025-05-12T09:08:17.985615-05:00 IG-D03574 kernel: ret_from_fork_asm+0x1b/0x30
2025-05-12T09:08:17.985616-05:00 IG-D03574 kernel:
2025-05-12T09:08:17.985623-05:00 IG-D03574 kernel: INFO: task fio:107822 blocked for more than 122 seconds.
2025-05-12T09:08:17.985631-05:00 IG-D03574 kernel: Tainted: P OE 6.8.0-59-generic #61-Ubuntu
2025-05-12T09:08:17.985659-05:00 IG-D03574 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2025-05-12T09:08:17.986837-05:00 IG-D03574 kernel: task:fio state:D stack:0 pid:107822 tgid:107822 ppid:107593 flags:0x00004002
2025-05-12T09:08:17.986845-05:00 IG-D03574 kernel: Call Trace:
2025-05-12T09:08:17.986848-05:00 IG-D03574 kernel:
2025-05-12T09:08:17.986849-05:00 IG-D03574 kernel: __schedule+0x27c/0x6b0
2025-05-12T09:08:17.986851-05:00 IG-D03574 kernel: schedule+0x33/0x110
2025-05-12T09:08:17.986853-05:00 IG-D03574 kernel: io_schedule+0x46/0x80
2025-05-12T09:08:17.986855-05:00 IG-D03574 kernel: folio_wait_bit_common+0x136/0x330
2025-05-12T09:08:17.986857-05:00 IG-D03574 kernel: ? __pfx_wake_page_function+0x10/0x10
2025-05-12T09:08:17.986859-05:00 IG-D03574 kernel: __folio_lock+0x17/0x30
2025-05-12T09:08:17.986861-05:00 IG-D03574 kernel: zfs_putpage+0x5a7/0x990 [zfs]
2025-05-12T09:08:17.986863-05:00 IG-D03574 kernel: zpl_putpage+0x90/0x1e0 [zfs]
2025-05-12T09:08:17.987571-05:00 IG-D03574 kernel: zpl_putfolio+0xe/0x20 [zfs]
2025-05-12T09:08:17.987578-05:00 IG-D03574 kernel: write_cache_pages+0x15b/0x430
2025-05-12T09:08:17.987580-05:00 IG-D03574 kernel: ? spl_kmem_cache_free+0x180/0x2b0 [spl]
2025-05-12T09:08:17.987581-05:00 IG-D03574 kernel: ? __pfx_zpl_putfolio+0x10/0x10 [zfs]
2025-05-12T09:08:17.988767-05:00 IG-D03574 kernel: zpl_writepages+0xb0/0x1d0 [zfs]
2025-05-12T09:08:17.988773-05:00 IG-D03574 kernel: do_writepages+0xcd/0x1f0
2025-05-12T09:08:17.988775-05:00 IG-D03574 kernel: ? zfs_refcount_remove+0x16/0x30 [zfs]
2025-05-12T09:08:17.988777-05:00 IG-D03574 kernel: ? dnode_rele_and_unlock+0x81/0x250 [zfs]
2025-05-12T09:08:17.988786-05:00 IG-D03574 kernel: filemap_fdatawrite_wbc+0x75/0xb0
2025-05-12T09:08:17.988788-05:00 IG-D03574 kernel: __filemap_fdatawrite_range+0x58/0x80
2025-05-12T09:08:17.988790-05:00 IG-D03574 kernel: filemap_write_and_wait_range+0x59/0xc0
2025-05-12T09:08:17.989725-05:00 IG-D03574 kernel: zpl_fsync+0xb4/0x1f0 [zfs]
2025-05-12T09:08:17.989728-05:00 IG-D03574 kernel: vfs_fsync_range+0x48/0xa0
2025-05-12T09:08:17.989729-05:00 IG-D03574 kernel: ? find_vma+0x34/0x60
2025-05-12T09:08:17.989730-05:00 IG-D03574 kernel: __do_sys_msync+0x1d5/0x2b0
2025-05-12T09:08:17.989731-05:00 IG-D03574 kernel: __x64_sys_msync+0x1c/0x30
2025-05-12T09:08:17.989731-05:00 IG-D03574 kernel: x64_sys_call+0x21ea/0x25a0
2025-05-12T09:08:17.989732-05:00 IG-D03574 kernel: do_syscall_64+0x7f/0x180
2025-05-12T09:08:17.989739-05:00 IG-D03574 kernel: ? balance_dirty_pages_ratelimited+0x10/0x20
2025-05-12T09:08:17.989740-05:00 IG-D03574 kernel: ? fault_dirty_shared_page+0x151/0x190
2025-05-12T09:08:17.989741-05:00 IG-D03574 kernel: ? do_shared_fault+0xa0/0x150
2025-05-12T09:08:17.989742-05:00 IG-D03574 kernel: ? do_fault+0x48/0x260
2025-05-12T09:08:17.989743-05:00 IG-D03574 kernel: ? handle_pte_fault+0x114/0x1d0
2025-05-12T09:08:17.989743-05:00 IG-D03574 kernel: ? __handle_mm_fault+0x654/0x800
2025-05-12T09:08:17.989744-05:00 IG-D03574 kernel: ? __count_memcg_events+0x6b/0x120
2025-05-12T09:08:17.989745-05:00 IG-D03574 kernel: ? count_memcg_events.constprop.0+0x2a/0x50
2025-05-12T09:08:17.989746-05:00 IG-D03574 kernel: ? handle_mm_fault+0xad/0x380
2025-05-12T09:08:17.989746-05:00 IG-D03574 kernel: ? do_user_addr_fault+0x21a/0x670
2025-05-12T09:08:17.989747-05:00 IG-D03574 kernel: ? irqentry_exit_to_user_mode+0x7b/0x260
2025-05-12T09:08:17.989748-05:00 IG-D03574 kernel: ? irqentry_exit+0x43/0x50
2025-05-12T09:08:17.989749-05:00 IG-D03574 kernel: ? exc_page_fault+0x94/0x1b0
2025-05-12T09:08:17.989750-05:00 IG-D03574 kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
2025-05-12T09:08:17.989751-05:00 IG-D03574 kernel: RIP: 0033:0x79dd38925c8d
2025-05-12T09:08:17.989752-05:00 IG-D03574 kernel: RSP: 002b:00007ffedf4e2b40 EFLAGS: 00000293 ORIG_RAX: 000000000000001a
2025-05-12T09:08:17.989753-05:00 IG-D03574 kernel: RAX: ffffffffffffffda RBX: 00005fe1a63e9c00 RCX: 000079dd38925c8d
2025-05-12T09:08:17.989754-05:00 IG-D03574 kernel: RDX: 0000000000000004 RSI: 0000000020000000 RDI: 000079dd00000000
2025-05-12T09:08:17.989762-05:00 IG-D03574 kernel: RBP: 00007ffedf4e2b60 R08: 0000000000000000 R09: 00005fe1873f3d35
2025-05-12T09:08:17.989763-05:00 IG-D03574 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 000079dd2ba8a000
2025-05-12T09:08:17.989763-05:00 IG-D03574 kernel: R13: 0000000000000003 R14: 0000000020000000 R15: 0000000000000000
2025-05-12T09:08:17.989764-05:00 IG-D03574 kernel:
...

@snajpa
Copy link
Contributor

snajpa commented May 14, 2025

oh, a reproducer, nice, I'll take a look as time permits

@snajpa
Copy link
Contributor

snajpa commented May 14, 2025

ok this is with an older kernel, can't reproduce with a new one, guessing some folio migration related impedance mismatch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants