-
Notifications
You must be signed in to change notification settings - Fork 3.9k
pkg: Add support for user specified target cols in IMPORT INTO #39023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg: Add support for user specified target cols in IMPORT INTO #39023
Conversation
3d73368
to
8a49379
Compare
This is an add on to cockroachdb#39023. It adds the grammar required to support queries such as: `IMPORT INTO test CSV DATA ('...');` No target columns implies that all columns are imported into from the data sources. Release note: None
8a49379
to
adabb7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 11 of 12 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)
pkg/ccl/importccl/import_stmt.go, line 440 at r2 (raw file):
intoCols = append(intoCols, name.String()) } tableDetails = []jobspb.ImportDetails_Table{{Desc: &importing, IsNew: false, TargetCols: intoCols}}
should we be checking that the named cols are actually valid here?
I think we also want to validate that every column is not targetted has a default if we support filling defaults or has an implicit null default and is nullable, otherwise there is no point in starting the import, right?
pkg/ccl/importccl/import_stmt.go, line 631 at r2 (raw file):
parentID sqlbase.ID, tables map[string]*sqlbase.TableDescriptor, targetCols map[string]*distsqlpb.ReadImportDataSpec_TargetColList,
minor nit here/throughout: as a reader, I had a little trouble with this type and knowing what it was/how to read it at first glance. I think I managed to figure it out but the parts I struggled with were what are the keys (string could be anything) and then, is it the same as the tables
map right above it? if so, should we just be changing that to contain structs? if not, when/why is it different? also, if *distsqlpb.ReadImportDataSpec_TargetColList
is just a long-winded []string
maybe we could use that for internal calls and just wrap it in the proto message when building the spec proto.
Anyway, this is all just personal preference / style opinion, and I realize you've spent more time in here figuring what you need to plumb though than I have, so I'm fine with it as-is if you are.
pkg/sql/row/row_converter.go, line 231 at r3 (raw file):
var txCtx transform.ExprTransformContext // Although we don't yet support DEFAULT expressions on visible columns,
can you update this comment to clarify how targeting fewer columns works w.r.t. defaults?
I think I gather that we don't support DEFAULT yet (we should for non-targeted cols) but what does that mean?
adabb7b
to
00a8a2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt)
pkg/ccl/importccl/import_stmt.go, line 440 at r2 (raw file):
Previously, dt (David Taylor) wrote…
should we be checking that the named cols are actually valid here?
I think we also want to validate that every column is not targetted has a default if we support filling defaults or has an implicit null default and is nullable, otherwise there is no point in starting the import, right?
Done. Added a validity check for named cols, also added a check to disallow DEFAULT expressions and non-nullable non-target columns.
pkg/ccl/importccl/import_stmt.go, line 631 at r2 (raw file):
Previously, dt (David Taylor) wrote…
minor nit here/throughout: as a reader, I had a little trouble with this type and knowing what it was/how to read it at first glance. I think I managed to figure it out but the parts I struggled with were what are the keys (string could be anything) and then, is it the same as the
tables
map right above it? if so, should we just be changing that to contain structs? if not, when/why is it different? also, if*distsqlpb.ReadImportDataSpec_TargetColList
is just a long-winded[]string
maybe we could use that for internal calls and just wrap it in the proto message when building the spec proto.Anyway, this is all just personal preference / style opinion, and I realize you've spent more time in here figuring what you need to plumb though than I have, so I'm fine with it as-is if you are.
Thanks for this suggestion, the map tables and targetCols did have a 1:1 relation so I just made tables have structs instead. This allowed us to avoid plumbing the TargetColList separately and made it cleaner.
pkg/sql/row/row_converter.go, line 231 at r3 (raw file):
Previously, dt (David Taylor) wrote…
can you update this comment to clarify how targeting fewer columns works w.r.t. defaults?
I think I gather that we don't support DEFAULT yet (we should for non-targeted cols) but what does that mean?
Done.
@dt RFAL. Addressed comments and added some more tests to ensure we do not support DEFAULT expression, or non-nullable non-target columns. |
This change adds the proto definitions required to plumb information about the target columns to be imported into, through the import job and ReadImport processor. Release note: None
This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source. TODO: Add the grammar which supports IMPORT INTO without any target columns specified, in which case we import data for all columns. Release note: None
This change adds unit tests for importing into an existing table while specifying a subset of all visible columns as targets. Release note: None
00a8a2e
to
91aa7c4
Compare
TFTR! |
39023: pkg: Add support for user specified target cols in IMPORT INTO r=adityamaru27 a=adityamaru27 This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source. (Part of the larger roadmap #26834) 39043: distsqlrun: lookup join efficiency improvements r=jordanlewis a=jordanlewis - distsqlrun: don't save lookup rows twice - distsqlrun: stream lookup join output - distsql: remove unused field from lookup join spec Previously, lookup join buffered its rendered output rows before emitting any of them, because this used to be a requirement when lookup join was also responsible for doing a second layer of lookups against an index. This was no longer necessary. Now, after the result of a lookup batch is accumulated into memory, rows are rendered and emitted in a row-at-a-time streaming fashion. Also, remove the unused extra index filter expression field from the lookup join spec. 39118: opt: add IndexOrdinal alias r=RaduBerinde a=RaduBerinde Adding a `cat.IndexOrdinal` alias to make it more explicit when a value is an index ordinal. It is only an alias because a separate type would make things like loops more annoying. Release note: None 39146: sql/tree: store From as value in SelectClause AST node r=nvanbenschoten a=nvanbenschoten Drive by cleanup. The value is assumed to be non-nil in a number of places, so we should avoid the unnecessary indirection. Release note: None Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Jordan Lewis <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
This is an add on to cockroachdb#39023. It adds the grammar required to support queries such as: `IMPORT INTO test CSV DATA ('...');` No target columns implies that all columns are imported into from the data sources. Release note: None
This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source.
(Part of the larger roadmap #26834)