Skip to content

Log cases where an onion failure cannot be attributed or interpreted #3629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 4, 2025

Conversation

joostjager
Copy link
Contributor

@joostjager joostjager commented Feb 28, 2025

Create more visibility into these edge cases. The non-attributable failure in particular can be used to disrupt sender operation and it is therefore good to at least log these cases clearly.

Additionally a bug is fixed where an unreadable failure with valid hmac wasn't properly attributed to a node.

Ok(p) => p,
Err(_) => return,
};
let decrypt_result = decrypt_onion_error_packet(&mut encrypted_packet, shared_secret);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check hmac first to distinguish between unreadable failures and hmac mismatches.

Copy link

codecov bot commented Feb 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.17%. Comparing base (eaeed77) to head (1d40811).

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #3629    +/-   ##
========================================
  Coverage   89.16%   89.17%            
========================================
  Files         152      152            
  Lines      118791   118947   +156     
  Branches   118791   118947   +156     
========================================
+ Hits       105921   106068   +147     
- Misses      10312    10316     +4     
- Partials     2558     2563     +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joostjager
Copy link
Contributor Author

joostjager commented Feb 28, 2025

Coverage reports suggests more test coverage needed...

@joostjager joostjager force-pushed the log-attribution-failures branch 5 times, most recently from 56b0aa7 to 70c8e56 Compare February 28, 2025 13:16
is_permanent: true,
});
let short_channel_id = Some(route_hop.short_channel_id);
res = Some(FailureLearnings {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug fix. Previously unreadable failures weren't attributable even though the hmac checked out.

@joostjager joostjager force-pushed the log-attribution-failures branch from 70c8e56 to 47caa1e Compare February 28, 2025 13:24
fn build_test_path() -> Path {
Path {
hops: vec![
RouteHop {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustfmt wanted this

.unwrap(),
channel_features: ChannelFeatures::empty(),
node_features: NodeFeatures::empty(),
short_channel_id: 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a move, but just these short_channel_id's made unique to make it easier to test things for attr errors

@joostjager
Copy link
Contributor Author

Tests added

@joostjager joostjager force-pushed the log-attribution-failures branch 3 times, most recently from cdda315 to fc11fe4 Compare February 28, 2025 13:49

let network_update = Some(NetworkUpdate::NodeFailure {
node_id: route_hop.pubkey,
is_permanent: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC is_permanent NodeFailures result in removing the node from the graph entirely, which seems a bit harsh, I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same was already done for a missing failure code, so it seemed reasonable to apply the same penalty for the more severe case where the message cannot even be decoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the node is removed for a week? That seems fine to me because in this case there must be a bug in the remote node software.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmmm, I guess okay if only because we don't have a way to punish all of a node's channels. It does seem like we should find a way to do that, though, without outright graph removal. I know some other stuff does that, but I'd call that legacy needs-updating too :)

Indeed, after a while we'd accept fresh announcements of that node's channels, but generally we don't resync so we'd most likely only get them on restart (or when the node opens new channels).

@joostjager joostjager force-pushed the log-attribution-failures branch from fc11fe4 to 9f16e69 Compare March 3, 2025 09:33
This function still decrypted the message in-place even when it returned
an error. This is confusing for the caller.
Create more visibility into these edge cases. The non-attributable
failure in particular can be used to disrupt sender operation and it is
therefore good to at least log these cases clearly.
This commit fixes a bug where a node penalty was not applied where it
should.
@joostjager joostjager force-pushed the log-attribution-failures branch from 9f16e69 to f38244c Compare March 3, 2025 09:47
@joostjager joostjager requested a review from TheBlueMatt March 3, 2025 09:50
@joostjager joostjager added the weekly goal Someone wants to land this this week label Mar 3, 2025
@ldk-reviews-bot ldk-reviews-bot requested a review from arik-so March 4, 2025 02:30
packet: &mut Vec<u8>, shared_secret: SharedSecret,
) -> Result<msgs::DecodedOnionErrorPacket, msgs::DecodeError> {
/// Decrypt the error packet in-place.
fn decrypt_onion_error_packet(packet: &mut Vec<u8>, shared_secret: SharedSecret) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost wonder whether it's one of those instances where in-place processing is what largely contributes to confusion. Getting a result back with the decrypted packet or with the decryption error, without worrying about the input argument getting changed, seems preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion was mainly that this used to be a function that returned a value and still had a side effect too. A side effect that was also needed, because the decrypted packet needs to be processed for the next hop.

Could have returned a decrypted copy in addition to the decoded failure to make it a pure function, probably with some performance implications. But I think now that this function has no return value anymore and only decrypts in-place, it's already a lot better.

@arik-so arik-so merged commit 0216d7d into lightningdevkit:main Mar 4, 2025
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants