Skip to content

store,bulk: log when delaying AddSSTable, collect + log more timings in bulk-ingest #41196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 30, 2019

Conversation

dt
Copy link
Member

@dt dt commented Sep 30, 2019

storage: log when AddSSTable requests are delayed

If the rate-limiting and back-pressure mechanisms kick in, they can dramatically delay requests in some cases.
However there is currently it can be unclear that this is happening and the system may simply appear slow.
Logging when requests are delayed by more than a second should help identify when this is the cause of slowness.

Release note: none.

Release justification: low-risk (logging only) change that could significantly help in diagnosing 'stuck' jobs based on logs (which often all we have to go on).

bulk: track and log more timings

This tracks and logs time spent in the various stages of ingestion - sorting, splitting and flushing.
This helps when trying to diagnose why a job is 'slow' or 'stuck'.

Release note: none.

Release justification: low-risk (logging only) changes that improve ability to diagnose problems.

dt added 2 commits September 30, 2019 13:50
If the rate-limiting and back-pressure mechanisms kick in, they can dramatically delay requests in some cases.
However there is currently it can be unclear that this is happening and the system may simply appear slow.
Logging when requests are delayed by more than a second should help identify when this is the cause of slowness.

Release note: none.

Release justification: low-risk (logging only) change that could significantly help in diagnosing 'stuck' jobs based on logs (which often all we have to go on).
This tracks and logs time spent in the various stages of ingestion - sorting, splitting and flushing.
This helps when trying to diagnose why a job is 'slow' or 'stuck'.

Release note: none.

Release justification: low-risk (logging only) changes that improve ability to diagnose problems.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt changed the title store,bulk: log when delaying AddSSTable, collectlog more timings in bulk-ingest store,bulk: log when delaying AddSSTable, collect + log more timings in bulk-ingest Sep 30, 2019
@dt dt requested review from ajwerner and thoszhang September 30, 2019 13:53
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang)

@dt
Copy link
Member Author

dt commented Sep 30, 2019

bors r+

craig bot pushed a commit that referenced this pull request Sep 30, 2019
40493: sql: Display inherited constraints in SHOW PARTITIONS  r=andreimatei a=rohany

SHOW PARTITIONS now displays the inherited zone configuration of the
partitions in a separate column. To accomplish this, the
crdb_internal.zones table now holds on to the inherited constraints of
each zone in a separate column. Additionally, the
crdb_internal.partitions table holds on to the zone_id and subzone_id of
the zone configuration the partition refers to. These id's correspond to
the zone configuration at the lowest point in that partitions
"inheritance chain".

Release justification: Adds a low risk, good to have UX feature.

Fixes #40349.

Release note (sql change):
* SHOW PARTITIONS now displays inherited zone configurations.
* Adds the zone_id, subzone_id columns to crdb_internal.partitions,
which form a link to the corresponding zone config in
crdb_internal.zones which apply to the partitions.
* Rename the config_yaml, config_sql and config_proto columns in
crdb_internal.zones to raw_config_yaml, raw_config_sql,
raw_config_proto.
* Add the columns full_config_sql and full_config_yaml to the
crdb_internal.zones table which display the full/inherited zone
configuration.

41138: movr: Add stats collection to movr workload run r=danhhz a=rohany

This PR adds tracking stats for each kind of query in the movr workload
so that output is displayed from cockroach workload run. Additionally,
this refactors the movr workload to define the work as functions on a
worker struct. This hopefully will avoid a common gotcha of having
different workers sharing the same not threadsafe histograms object.

Release justification: low risk nice to have feature

Release note: None

41196: store,bulk: log when delaying AddSSTable, collect + log more timings in bulk-ingest r=dt a=dt

storage: log when AddSSTable requests are delayed

If the rate-limiting and back-pressure mechanisms kick in, they can dramatically delay requests in some cases.
However there is currently it can be unclear that this is happening and the system may simply appear slow.
Logging when requests are delayed by more than a second should help identify when this is the cause of slowness.

Release note: none.

Release justification: low-risk (logging only) change that could significantly help in diagnosing 'stuck' jobs based on logs (which often all we have to go on).

bulk: track and log more timings

This tracks and logs time spent in the various stages of ingestion - sorting, splitting and flushing.
This helps when trying to diagnose why a job is 'slow' or 'stuck'.

Release note: none.

Release justification: low-risk (logging only) changes that improve ability to diagnose problems.


Co-authored-by: Rohan Yadav <[email protected]>
Co-authored-by: Rohan Yadav <[email protected]>
Co-authored-by: David Taylor <[email protected]>
@craig craig bot merged commit ea0b462 into cockroachdb:master Sep 30, 2019
@craig
Copy link
Contributor

craig bot commented Sep 30, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants