• Keeping a list of records with named fields that can be updated

    From songbird@21:1/5 to All on Wed Dec 14 13:50:14 2022
    I'm relatively new to python but not new to programming in general.

    The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

    Assume that all data is CSV format. There are multiple files.

    Assume there is a coherent starting point and that all data is in order.

    Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that
    they all must contain a description.

    All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some basic
    information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

    A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

    I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description changes.

    I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond
    that yet as I'm still learning.

    Suggestions?

    Thanks! :)


    songbird

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to songbird on Wed Dec 14 19:23:11 2022
    songbird <songbird@anthive.com> writes:
    I started using named tuples (it works for reading in the
    files and accessing the fields) but I cannot update those

    Instead of named tuples, you could use dictionaries, regular
    classes, or data classes. Some like the library "attrs" to
    reduce boilerplate code in classes, some like "Pydantic" or
    "chili".

    Some books that I deem to be ok:

    "Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013),
    How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
    The Coder's Apprentice - Pieter Spronck (2016-09-21), and
    Python Programming - John Zelle (2009).

    For advanced learners:

    Fluent Python - Luciano Ramalho (2015)
    Pro Python - James Browning (2014)
    The Python Journeyman - Robert Smallshire (2018-01-02)
    Python Applications Programming - Wesley Chun (2012-03)
    Mastering Object-Oriented Python - Steven F. Lott (2014)

    , python.org has:

    Python Tutorial
    The Python Library Reference
    The Python Language Reference
    (and more ...)

    which I like to download as PDF files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to songbird on Wed Dec 14 22:54:04 2022
    Dictionaries and sets are your friends here.

    On 12/14/2022 1:50 PM, songbird wrote:

    I'm relatively new to python but not new to programming in general.

    The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

    Assume that all data is CSV format. There are multiple files.

    Assume there is a coherent starting point and that all data is in order.

    Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that
    they all must contain a description.

    All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some
    basic information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

    A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

    I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description
    changes.

    I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond
    that yet as I'm still learning.

    Suggestions?

    Thanks! :)


    songbird

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Otten@21:1/5 to songbird on Thu Dec 15 10:21:14 2022
    On 14/12/2022 19:50, songbird wrote:

    I'm relatively new to python but not new to programming in general.

    The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

    Assume that all data is CSV format. There are multiple files.

    Assume there is a coherent starting point and that all data is in order.

    Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that
    they all must contain a description.

    All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some
    basic information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

    A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

    I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description
    changes.

    I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond
    that yet as I'm still learning.

    Suggestions?

    Thanks! :)

    While I think what you need is a database instead of the collection of
    csv files the way to alter namedtuples is to create a new one:

    from collections import namedtuple
    Row = namedtuple("Row", "foo bar baz")
    row = Row(1, 2, 3)
    row._replace(bar=42)
    Row(foo=1, bar=42, baz=3)

    An alternative would be dataclasses where basic usage is just as easy:

    from dataclasses import make_dataclass
    Row = make_dataclass("Row", "foo bar baz".split())
    row = Row(1, 2, 3)
    row
    Row(foo=1, bar=2, baz=3)
    row.bar = 42
    row
    Row(foo=1, bar=42, baz=3)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Weatherby,Gerard@21:1/5 to All on Thu Dec 15 13:00:15 2022
    I have a lot of NamedTuples in my codebase, and I now add new ones never. They were a good option prior to Python 3.7 but dataclasses are much easier to work with and are almost a drop-in substitute.

    A combination of a default dictionary and a dataclass might meet your needs:



    import collections
    from dataclasses import dataclass


    @dataclass
    class AccountingEntry:
    description: str
    # other fields


    ledger = collections.defaultdict(list)

    for ae in get_accounting_entries():
    ledger[ae.description] = ae



    From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of songbird <songbird@anthive.com>
    Date: Wednesday, December 14, 2022 at 10:38 PM
    To: python-list@python.org <python-list@python.org>
    Subject: Keeping a list of records with named fields that can be updated
    *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

    I'm relatively new to python but not new to programming in general.

    The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

    Assume that all data is CSV format. There are multiple files.

    Assume there is a coherent starting point and that all data is in order.

    Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that
    they all must contain a description.

    All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some basic
    information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

    A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

    I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description changes.

    I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond
    that yet as I'm still learning.

    Suggestions?

    Thanks! :)


    songbird
    -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!ndkoYjlClLoELvhzTpXFZEtJ70fXjdFllo-ce0fJ4f0AdRLQXvryO11ZSJ16tf-Ke-pko3kmBxW1cesvrQAQUQ$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
    python-list__;!!Cn_UX_p3!ndkoYjlClLoELvhzTpXFZEtJ70fXjdFllo-ce0fJ4f0AdRLQXvryO11ZSJ16tf-Ke-pko3kmBxW1cesvrQAQUQ$>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From songbird@21:1/5 to Peter Otten on Thu Dec 15 08:41:57 2022
    Peter Otten wrote:

    While I think what you need is a database instead of the collection of
    csv files the way to alter namedtuples is to create a new one:

    the files are coming from the web site that stores the
    accounts. i already have manually created files from many
    years ago with some of the same information but to go back
    and reformat all of those would be a lot of work. it is
    much easier to just take the files as supplied and process
    them if i can do that instead.

    i do know database stuff well enough but this is fairly
    simple math and i'd like to avoid creating yet another
    copy in yet another format to have to deal with.


    from collections import namedtuple
    Row = namedtuple("Row", "foo bar baz")
    row = Row(1, 2, 3)
    row._replace(bar=42)
    Row(foo=1, bar=42, baz=3)

    An alternative would be dataclasses where basic usage is just as easy:

    from dataclasses import make_dataclass
    Row = make_dataclass("Row", "foo bar baz".split())
    row = Row(1, 2, 3)
    row
    Row(foo=1, bar=2, baz=3)
    row.bar = 42
    row
    Row(foo=1, bar=42, baz=3)

    thanks, i'll give these a try. :)


    songbird

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albert-Jan Roskam@21:1/5 to All on Sat Dec 17 20:45:18 2022
    On Dec 15, 2022 10:21, Peter Otten <__peter__@web.de> wrote:

    >>> from collections import namedtuple
    >>> Row = namedtuple("Row", "foo bar baz")
    >>> row = Row(1, 2, 3)
    >>> row._replace(bar=42)
    Row(foo=1, bar=42, baz=3)

    ====
    Ahh, I always thought these are undocumented methods, but: "In addition to
    the methods inherited from tuples, named tuples support three additional
    methods and two attributes. To prevent conflicts with field names, the
    method and attribute names start with an underscore."
    https://docs.python.org/3/library/collections.html#collections.somenamedtuple._make

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gilmeh Serda@21:1/5 to songbird on Sat Dec 17 19:53:34 2022
    On Wed, 14 Dec 2022 13:50:14 -0500, songbird wrote:

    Suggestions?

    Move it to SQLite. Most likely easier to deal with table integrity, like differences between types. And it's probably faster, too.

    You can always export to csv later.

    --
    Gilmeh

    The most important design issue... is the fact that Linux is supposed to
    be fun... -- Linus Torvalds at the First Dutch International Symposium on
    Linux

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From songbird@21:1/5 to Peter Otten on Sun Dec 18 10:44:18 2022
    Peter Otten wrote:
    ...
    While I think what you need is a database instead of the collection of
    csv files the way to alter namedtuples is to create a new one:

    from collections import namedtuple
    Row = namedtuple("Row", "foo bar baz")
    row = Row(1, 2, 3)
    row._replace(bar=42)
    Row(foo=1, bar=42, baz=3)

    namedtuple is easier to use as that will use the csv and
    csvreader and create the records without me having to do any
    conversion or direct handling myself. it's all automagically
    done. my initial version works, but i'd like it to be a bit
    more elegant and handle descriptions it hasn't seen before
    in a more robust manner.


    An alternative would be dataclasses where basic usage is just as easy:

    from dataclasses import make_dataclass
    Row = make_dataclass("Row", "foo bar baz".split())
    row = Row(1, 2, 3)
    row
    Row(foo=1, bar=2, baz=3)
    row.bar = 42
    row
    Row(foo=1, bar=42, baz=3)

    i do like that i can directly reference each field in a
    dataclass and not have to specify a _replace for each change.

    is there an easy way to convert from namedtuple to dataclass?
    i can see there is a _asdict converter, but don't really like
    how that turns out as then i have to do a bunch of:
    rec['fieldname'] = blah

    rec.fieldname is much easier to understand.


    songbird

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Otten@21:1/5 to Albert-Jan Roskam on Mon Dec 19 18:42:43 2022
    On 17/12/2022 20:45, Albert-Jan Roskam wrote:
    On Dec 15, 2022 10:21, Peter Otten <__peter__@web.de> wrote:

    >>> from collections import namedtuple
    >>> Row = namedtuple("Row", "foo bar baz")
    >>> row = Row(1, 2, 3)
    >>> row._replace(bar=42)
    Row(foo=1, bar=42, baz=3)

    ====
    Ahh, I always thought these are undocumented methods, but: "In addition to
    the methods inherited from tuples, named tuples support three additional
    methods and two attributes. To prevent conflicts with field names, the
    method and attribute names start with an underscore."
    https://docs.python.org/3/library/collections.html#collections.somenamedtuple._make

    I've read somewhere that Raymond Hettinger regrets the naming and now
    would favour a trailing underscore to avoid name conflicts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Otten@21:1/5 to songbird on Mon Dec 19 19:29:20 2022
    On 18/12/2022 16:44, songbird wrote:
    Peter Otten wrote:
    ...
    While I think what you need is a database instead of the collection of
    csv files the way to alter namedtuples is to create a new one:

    from collections import namedtuple
    Row = namedtuple("Row", "foo bar baz")
    row = Row(1, 2, 3)
    row._replace(bar=42)
    Row(foo=1, bar=42, baz=3)

    namedtuple is easier to use as that will use the csv and
    csvreader and create the records without me having to do any
    conversion or direct handling myself. it's all automagically
    done. my initial version works, but i'd like it to be a bit
    more elegant and handle descriptions it hasn't seen before
    in a more robust manner.


    An alternative would be dataclasses where basic usage is just as easy:

    from dataclasses import make_dataclass
    Row = make_dataclass("Row", "foo bar baz".split())
    row = Row(1, 2, 3)
    row
    Row(foo=1, bar=2, baz=3)
    row.bar = 42
    row
    Row(foo=1, bar=42, baz=3)

    i do like that i can directly reference each field in a
    dataclass and not have to specify a _replace for each change.

    is there an easy way to convert from namedtuple to dataclass?
    i can see there is a _asdict converter, but don't really like
    how that turns out as then i have to do a bunch of:
    rec['fieldname'] = blah

    rec.fieldname is much easier to understand.

    I recommend that you use a dataclass /instead/ of a namedtuple, not
    both. However, for a dataclass with the same fields in the same order as
    in your namedtuple the conversion is trivial:

    Create compatible namedtuple and dataclass types:

    NTRow = namedtuple("NTRow", ["alpha", "beta", "gamma"])
    DCRow = make_dataclass("DCRow", NTRow._fields)

    Build the namedtuple:

    ntrow = NTRow(1, "two", 3.0)
    ntrow
    NTRow(alpha=1, beta='two', gamma=3.0)

    Convert to dataclass:

    dcrow = DCRow(*ntrow)
    dcrow
    DCRow(alpha=1, beta='two', gamma=3.0)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From songbird@21:1/5 to Peter Otten on Mon Dec 19 14:22:34 2022
    Peter Otten wrote:
    ...
    I recommend that you use a dataclass /instead/ of a namedtuple, not
    both. However, for a dataclass with the same fields in the same order as
    in your namedtuple the conversion is trivial:

    Create compatible namedtuple and dataclass types:

    NTRow = namedtuple("NTRow", ["alpha", "beta", "gamma"])
    DCRow = make_dataclass("DCRow", NTRow._fields)

    Build the namedtuple:

    ntrow = NTRow(1, "two", 3.0)
    ntrow
    NTRow(alpha=1, beta='two', gamma=3.0)

    Convert to dataclass:

    dcrow = DCRow(*ntrow)
    dcrow
    DCRow(alpha=1, beta='two', gamma=3.0)

    thanks, once i get the data in from the file i only have
    to reference it, but for the rest of the code i can use
    the dataclass instead and that will be easier to read than
    dicts. :)

    your help is appreciated. :)


    songbird

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)