• Table with a variable number of elements in a column

    From Bill Gunshannon@21:1/5 to All on Sat Apr 27 14:59:11 2019
    Not sure what the right terminology for this is, but I will
    provide my example and see if anyone can tell me how I might
    do this.

    I want to create a database table for an index of all my record
    albums.

    The basic stuff is easy.

    Title, Artist, Publisher and the Publisher's ID # as a primary
    unique key.
    But then I get to the hard part.
    Number of tracks and then a list of those tracks.
    This would, obviously, be different and variable from album to album.

    So, I need a way to define a table that has a variable number of fields depending on the value in Number-of-Tracks.

    Is there any way in standard SQL to do this without just having a
    large number of fields and only using the number you need?


    bill

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Gunshannon@21:1/5 to John-Paul Stewart on Sat Apr 27 15:48:08 2019
    On 4/27/19 3:39 PM, John-Paul Stewart wrote:
    On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:


    Not sure what the right terminology for this is, but I will
    provide my example and see if anyone can tell me how I might
    do this.

    I want to create a database table for an index of all my record
    albums.

    The basic stuff is easy.

    Title, Artist, Publisher and the Publisher's ID # as a primary
    unique key.
    But then I get to the hard part.
    Number of tracks and then a list of those tracks.
    This would, obviously, be different and variable from album to album.

    So, I need a way to define a table that has a variable number of fields
    depending on the value in Number-of-Tracks.

    The usual way to do this in a relational database is to have a separate
    table of tracks and define the relationship between the two tables. For example, add a unique album id number to your albums table. (You won't
    need the "number of tracks" field you propose.)

    Well, I am going to want that number anyway. :-)

    Then have a tracks
    table that has album id (which refers back to the albums table), track number, track title, etc. You can then use a JOIN clause in your SQL
    SELECT statement to associate the track and album info with each other,
    or other queries to get the number of tracks in an album (e.g., "select count(*) from tracks where album_id = 1"), or whatever else you need to
    know about it.

    But if I understand this correctly I would need a separate table for
    every album resulting in, potentially, thousands of tables. (OK, in
    my case hundreds but others may like this idea, too, when I finish
    the whole project.)


    You probably want to read up on the concept of "foreign keys" and SQL
    JOIN clauses. That's the usual way to do it in a database, and a big
    part of the relational model.
    Not sure what the "foreign keys" have to do with it, but I understand
    the JOIN part. I was just looking for an efficient way to do this and
    somehow the though of hundreds to thousands of TABLES scares me.

    But, thanks for the info. I will likely end out trying it a couple
    of different ways before deciding which is best.

    bill

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John-Paul Stewart@21:1/5 to Bill Gunshannon on Sat Apr 27 15:39:01 2019
    On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:


    Not sure what the right terminology for this is, but I will
    provide my example and see if anyone can tell me how I might
    do this.

    I want to create a database table for an index of all my record
    albums.

    The basic stuff is easy.

    Title, Artist, Publisher and the Publisher's ID # as a primary
    unique key.
    But then I get to the hard part.
    Number of tracks and then a list of those tracks.
    This would, obviously, be different and variable from album to album.

    So, I need a way to define a table that has a variable number of fields depending on the value in Number-of-Tracks.

    The usual way to do this in a relational database is to have a separate
    table of tracks and define the relationship between the two tables. For example, add a unique album id number to your albums table. (You won't
    need the "number of tracks" field you propose.) Then have a tracks
    table that has album id (which refers back to the albums table), track
    number, track title, etc. You can then use a JOIN clause in your SQL
    SELECT statement to associate the track and album info with each other,
    or other queries to get the number of tracks in an album (e.g., "select count(*) from tracks where album_id = 1"), or whatever else you need to
    know about it.

    You probably want to read up on the concept of "foreign keys" and SQL
    JOIN clauses. That's the usual way to do it in a database, and a big
    part of the relational model.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John-Paul Stewart@21:1/5 to Bill Gunshannon on Sat Apr 27 16:02:31 2019
    On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:
    On 4/27/19 3:39 PM, John-Paul Stewart wrote:
    On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:


    Not sure what the right terminology for this is, but I will
    provide my example and see if anyone can tell me how I might
    do this.

    I want to create a database table for an index of all my record
    albums.

    The basic stuff is easy.

    Title, Artist, Publisher and the Publisher's ID # as a primary
    unique key.
    But then I get to the hard part.
    Number of tracks and then a list of those tracks.
    This would, obviously, be different and variable from album to album.

    So, I need a way to define a table that has a variable number of fields
    depending on the value in Number-of-Tracks.

    The usual way to do this in a relational database is to have a separate
    table of tracks and define the relationship between the two tables.  For
    example, add a unique album id number to your albums table.  (You won't
    need the "number of tracks" field you propose.)

    Well, I am going to want that number anyway. :-)

    Yes, but the database can calculate it for you, for each album. Having
    a separate column could lead to problems where the column says something different than the actual number of tracks stored in the table.

                                                      Then have a tracks
    table that has album id (which refers back to the albums table), track
    number, track title, etc.  You can then use a JOIN clause in your SQL
    SELECT statement to associate the track and album info with each other,
    or other queries to get the number of tracks in an album (e.g., "select
    count(*) from tracks where album_id = 1"), or whatever else you need to
    know about it.

    But if I understand this correctly I would need a separate table for
    every album resulting in, potentially, thousands of tables. (OK, in
    my case hundreds but others may like this idea, too, when I finish
    the whole project.)

    No, it's only one "tracks" table. That's why there's an album id column
    in it: so that each track is associated with one album. There'd be
    (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
    on.... Each (album, track) tuple would have to be unique. But you'd
    keep all the tracks for all the albums in one table.


    You probably want to read up on the concept of "foreign keys" and SQL
    JOIN clauses.  That's the usual way to do it in a database, and a big
    part of the relational model.
    Not sure what the "foreign keys" have to do with it,

    The foreign key (in this case, the use of album id in the tracks table)
    is what connects the individual tracks to the albums, and ensures that
    the album id actually exists. By declaring album id (in the tracks
    table) as a foreign key referring to the albums table, Postgres will
    ensure that each track has a corresponding album. Without declaring it
    as a foreign key, you can mistakenly have a track who's album id doesn't
    exist.

    but I understand
    the JOIN part.  I was just looking for an efficient way to do this and somehow the though of hundreds to thousands of TABLES scares me.

    It's just two: albums and tracks.

    But, thanks for the info.  I will likely end out trying it a couple
    of different ways before deciding which is best.

    Another topic to read up on is "database normalization".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John-Paul Stewart@21:1/5 to Bill Gunshannon on Sat Apr 27 17:10:16 2019
    On 2019-04-27 4:29 p.m., Bill Gunshannon wrote:
    On 4/27/19 4:02 PM, John-Paul Stewart wrote:
    On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:

    But if I understand this correctly I would need a separate table for
    every album resulting in, potentially, thousands of tables. (OK, in
    my case hundreds but others may like this idea, too, when I finish
    the whole project.)

    No, it's only one "tracks" table.  That's why there's an album id column
    in it:  so that each track is associated with one album.  There'd be
    (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
    on....  Each (album, track) tuple would have to be unique.  But you'd
    keep all the tracks for all the albums in one table.

    I got that now.
    But I guess I am still stuck with how I define how many tracks columns
    there has to be for each album.  I think I have been away from this for
    too long.  I am missing something that is probably both simple and very apparent.

    By using a tracks table, it's no longer a question of "how many tracks
    columns" but "how many tracks rows" instead. And the answer is "however
    many you populate with the same album id". That's the beauty of it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Gunshannon@21:1/5 to John-Paul Stewart on Sat Apr 27 16:29:47 2019
    On 4/27/19 4:02 PM, John-Paul Stewart wrote:
    On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:
    On 4/27/19 3:39 PM, John-Paul Stewart wrote:
    On 2019-04-27 2:59 p.m., Bill Gunshannon wrote:


    Not sure what the right terminology for this is, but I will
    provide my example and see if anyone can tell me how I might
    do this.

    I want to create a database table for an index of all my record
    albums.

    The basic stuff is easy.

    Title, Artist, Publisher and the Publisher's ID # as a primary
    unique key.
    But then I get to the hard part.
    Number of tracks and then a list of those tracks.
    This would, obviously, be different and variable from album to album.

    So, I need a way to define a table that has a variable number of fields >>>> depending on the value in Number-of-Tracks.

    The usual way to do this in a relational database is to have a separate
    table of tracks and define the relationship between the two tables.  For >>> example, add a unique album id number to your albums table.  (You won't >>> need the "number of tracks" field you propose.)

    Well, I am going to want that number anyway. :-)

    Yes, but the database can calculate it for you, for each album. Having
    a separate column could lead to problems where the column says something different than the actual number of tracks stored in the table.

    Ah yes, I guess I was too tied into the database using the number
    to determine how many fields it needed.


                                                      Then have a tracks
    table that has album id (which refers back to the albums table), track
    number, track title, etc.  You can then use a JOIN clause in your SQL
    SELECT statement to associate the track and album info with each other,
    or other queries to get the number of tracks in an album (e.g., "select
    count(*) from tracks where album_id = 1"), or whatever else you need to
    know about it.

    But if I understand this correctly I would need a separate table for
    every album resulting in, potentially, thousands of tables. (OK, in
    my case hundreds but others may like this idea, too, when I finish
    the whole project.)

    No, it's only one "tracks" table. That's why there's an album id column
    in it: so that each track is associated with one album. There'd be
    (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
    on.... Each (album, track) tuple would have to be unique. But you'd
    keep all the tracks for all the albums in one table.

    I got that now.
    But I guess I am still stuck with how I define how many tracks columns
    there has to be for each album. I think I have been away from this for
    too long. I am missing something that is probably both simple and very apparent.



    You probably want to read up on the concept of "foreign keys" and SQL
    JOIN clauses.  That's the usual way to do it in a database, and a big
    part of the relational model.
    Not sure what the "foreign keys" have to do with it,

    The foreign key (in this case, the use of album id in the tracks table)
    is what connects the individual tracks to the albums, and ensures that
    the album id actually exists. By declaring album id (in the tracks
    table) as a foreign key referring to the albums table, Postgres will
    ensure that each track has a corresponding album. Without declaring it
    as a foreign key, you can mistakenly have a track who's album id doesn't exist.

    Got it, I understand that part now. Like I said, been away from this
    for much too long. (Damn retirement.....)


    but I understand
    the JOIN part.  I was just looking for an efficient way to do this and
    somehow the though of hundreds to thousands of TABLES scares me.

    It's just two: albums and tracks.

    But, thanks for the info.  I will likely end out trying it a couple
    of different ways before deciding which is best.

    Another topic to read up on is "database normalization".


    I will. Amazing how long it takes to learn and how fast you can
    lose the learning when you stop doing it.

    bill

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Gunshannon@21:1/5 to John-Paul Stewart on Sat Apr 27 18:40:10 2019
    On 4/27/19 5:10 PM, John-Paul Stewart wrote:
    On 2019-04-27 4:29 p.m., Bill Gunshannon wrote:
    On 4/27/19 4:02 PM, John-Paul Stewart wrote:
    On 2019-04-27 3:48 p.m., Bill Gunshannon wrote:

    But if I understand this correctly I would need a separate table for
    every album resulting in, potentially, thousands of tables. (OK, in
    my case hundreds but others may like this idea, too, when I finish
    the whole project.)

    No, it's only one "tracks" table.  That's why there's an album id column >>> in it:  so that each track is associated with one album.  There'd be
    (album 1, track 1), (album 1, track 2), (album 2, track 1), and so
    on....  Each (album, track) tuple would have to be unique.  But you'd
    keep all the tracks for all the albums in one table.

    I got that now.
    But I guess I am still stuck with how I define how many tracks columns
    there has to be for each album.  I think I have been away from this for
    too long.  I am missing something that is probably both simple and very
    apparent.

    By using a tracks table, it's no longer a question of "how many tracks columns" but "how many tracks rows" instead. And the answer is "however
    many you populate with the same album id". That's the beauty of it.


    Yeah, I went upstairs to watch the golf and think about it and
    it hit me. I was looking at it all wrong. I was trying to label
    the field for each track but being as they are merely sequentially
    numbered values they don't need individual names. Duh...

    Hopefully other stuff will come back to the surface easier than this.

    Thank you very much for kicking me in the head to knock this stuff
    loose so it could float back up to the surface.

    bill

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John-Paul Stewart@21:1/5 to Bill Gunshannon on Sat Apr 27 19:26:48 2019
    On 2019-04-27 6:40 p.m., Bill Gunshannon wrote:

    Yeah, I went upstairs to watch the golf and think about it and
    it hit me.  I was looking at it all wrong. 

    I know the feeling. It's all too easy to get fixated on one idea and
    then "not see the forest for the trees". I've been there, done that far
    more times than I'd care to admit.

    You really don't want to know how badly I mis-used Postgres' arrays in
    the past to achieve the result you initially described before I learned
    about this stuff.

    Thank you very much for kicking me in the head to knock this stuff
    loose so it could float back up to the surface.

    I'm sure there are plenty of topics where I'll need the kick in the
    head. And more where no amount of kicking will help me. That's what I
    like about these newsgroups: they're a great place to get a second opinion.

    Good luck! Your album archive project sounds like more fun than
    anything I'm working on.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From alexander.d.mills@gmail.com@21:1/5 to All on Fri Feb 21 14:02:35 2020
    I have a similar question/answer on StackExchange: https://softwareengineering.stackexchange.com/questions/405567/how-to-use-strict-schema-with-seemingly-fluid-data-type

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Barthel@21:1/5 to alexander.d.mills@gmail.com on Sat Feb 22 09:15:15 2020
    alexander.d.mills@gmail.com writes:

    I have a similar question/answer on StackExchange: https://softwareengineering.stackexchange.com/questions/405567/ how-to-use-strict-schema-with-seemingly-fluid-data-type

    I have seen that kind of modelling elsewhere. This is basically
    an EAV (Entity Attribute Value) table? The book [1,2] describes
    it as an anti-pattern which should be avoided because it is
    harder to use, control and query later.

    Example: The email should probably satisfy some constraints but
    it is of type "json" and anything can be stored in it. How about
    typos in the "key" field etc. Note that processing may consume
    more CPU time as well according to Wikipedia [3] - depending on
    the planned size of your database.

    So, before using this pattern, I would think about whether it is
    really necessary to have that flexibility (at the cost of the
    disadvantages) or if it is possible to use normalization theory
    and create relations for the entities.

    | is there a better way to do this other than using JSON for the
    | value column?

    With the example shown, why are you using a "JSON" field instead
    of a text field? If you are using a JSON field, I would store
    all emails in one JSON array field and make (user_id, key)
    unique:

    | id | user_id | key | value | |----|---------|-------|--------------------------------------|
    | 1 | 1 | email | ["foo1@bar.com", "foo2@bar.com", ..] |
    ...

    Why are there more than one email addresses for each user? Do
    they serve a specific purpose? With your design, it seems
    impossible to select some email addresses for certain actions
    (which may be OK?).

    An alternative version might be: use a N:M mapping between
    "user" and "email" (the "email" table contains a list of email
    addresses a user may specify, i.e. "email at work", "private
    mail", etc. and link the user with zero or more email
    addresses).

    (Further N:M mappings might be necessary for other "key"
    values which might lead to a larger number of tables but column
    constraints and queries would be simpler)

    | If not - is there a way to enforce a schema on the JSON
    | somehow?

    You can do this with a trigger that runs before the insertion.
    You have to carefully check the layout of the JSON and the
    constraint of its values (i.e. if the key=email, then the value
    field should have a certain pattern etc.). I would also check
    the "key" field and only allow existing keys (so that it becomes
    possible to add "e-mail", "E-Mail", ... ). PostgreSQL comes with
    nice functions and operators to work with JSON [4].

    | Last question - from my brief research the inverse table design
    | is called an "unpivot" table - but if there is a better name
    | for it please let me know.

    "unpivot" sounds a bit "unspecific". Personally, I'd name it
    "user_meta_data" or "user_contacts" (if there are only email
    addresses).

    [1] D. Fontaine: Mastering PostgreSQL In Application Development
    [2] https://tapoueh.org/blog/2018/03/database-modelization-anti-patterns/#entity-attribute-values
    [3] https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model [4] https://www.postgresql.org/docs/11/functions-json.html
    --
    Christian Barthel <bch@online.de>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)