Forum: >>> Magnum BBS <<<

Keeping a list of records with named fields that can be updated

From songbird@21:1/5 to All on Wed Dec 14 13:50:14 2022

I'm relatively new to python but not new to programming in general.

The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

Assume that all data is CSV format. There are multiple files.

Assume there is a coherent starting point and that all data is in order.

Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that
they all must contain a description.

All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some basic
information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description changes.

I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond
that yet as I'm still learning.

Suggestions?

Thanks! :)

songbird

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to songbird on Wed Dec 14 19:23:11 2022

songbird <songbird@anthive.com> writes:

I started using named tuples (it works for reading in the
files and accessing the fields) but I cannot update those

Instead of named tuples, you could use dictionaries, regular
classes, or data classes. Some like the library "attrs" to
reduce boilerplate code in classes, some like "Pydantic" or
"chili".

Some books that I deem to be ok:

"Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013),
How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
The Coder's Apprentice - Pieter Spronck (2016-09-21), and
Python Programming - John Zelle (2009).

For advanced learners:

Fluent Python - Luciano Ramalho (2015)
Pro Python - James Browning (2014)
The Python Journeyman - Robert Smallshire (2018-01-02)
Python Applications Programming - Wesley Chun (2012-03)
Mastering Object-Oriented Python - Steven F. Lott (2014)

, python.org has:

Python Tutorial
The Python Library Reference
The Python Language Reference
(and more ...)

which I like to download as PDF files.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to songbird on Wed Dec 14 22:54:04 2022

Dictionaries and sets are your friends here.

On 12/14/2022 1:50 PM, songbird wrote:

I'm relatively new to python but not new to programming in general.

The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

Assume that all data is CSV format. There are multiple files.

Assume there is a coherent starting point and that all data is in order.

Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that

they all must contain a description.

All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some

basic information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description

changes.

I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond

that yet as I'm still learning.

Suggestions?

Thanks! :)

songbird

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Otten@21:1/5 to songbird on Thu Dec 15 10:21:14 2022

On 14/12/2022 19:50, songbird wrote:

I'm relatively new to python but not new to programming in general.

The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

Assume that all data is CSV format. There are multiple files.

Assume there is a coherent starting point and that all data is in order.

Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that

they all must contain a description.

All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some

basic information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description

changes.

I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond

that yet as I'm still learning.

Suggestions?

Thanks! :)

While I think what you need is a database instead of the collection of
csv files the way to alter namedtuples is to create a new one:

from collections import namedtuple
Row = namedtuple("Row", "foo bar baz")
row = Row(1, 2, 3)
row._replace(bar=42)

Row(foo=1, bar=42, baz=3)

An alternative would be dataclasses where basic usage is just as easy:

from dataclasses import make_dataclass
Row = make_dataclass("Row", "foo bar baz".split())
row = Row(1, 2, 3)
row

Row(foo=1, bar=2, baz=3)

row.bar = 42
row

Row(foo=1, bar=42, baz=3)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Thu Dec 15 13:00:15 2022

I have a lot of NamedTuples in my codebase, and I now add new ones never. They were a good option prior to Python 3.7 but dataclasses are much easier to work with and are almost a drop-in substitute.

A combination of a default dictionary and a dataclass might meet your needs:

import collections
from dataclasses import dataclass

@dataclass
class AccountingEntry:
description: str
# other fields

ledger = collections.defaultdict(list)

for ae in get_accounting_entries():
ledger[ae.description] = ae

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of songbird <songbird@anthive.com>
Date: Wednesday, December 14, 2022 at 10:38 PM
To: python-list@python.org <python-list@python.org>
Subject: Keeping a list of records with named fields that can be updated
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

I'm relatively new to python but not new to programming in general.

The program domain is accounting and keeping track of stock trades and other related information (dates, cash accounts, interest, dividends, transfers of funds, etc.)

Assume that all data is CSV format. There are multiple files.

Assume there is a coherent starting point and that all data is in order.

Assume each line contains a description. The description determines what the line is. The number of fields in the line does not change within the data file but it may happen that later lines in other files may be different other than the fact that
they all must contain a description.

All descriptions are deterministic (none are recursive or referencing things from the future). All things referenced in the description which do not already exist are added to a list (or perhaps more than one in a few cases) and may contain some basic
information (the date, how many and for how much, or a total amount or a fee or ...) If the field of the line isn't a number it is either a symbol or a description.

A default action is simply to keep most parts of the line and to adjust any totals of a previously seen description that matches by whatever amounts are on the line. The key is the description.

I've already written one program based upon the files I already have which works but what happens is that new descriptions are added (new accounts, new stocks, etc.) and I don't want to have to write new code manually every time a description changes.

I started using named tuples (it works for reading in the files and accessing the fields) but I cannot update those so I need to use something else to give me the list of unique descriptions and fields that I need to update. I've not gotten beyond
that yet as I'm still learning.

Suggestions?

Thanks! :)

songbird
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!ndkoYjlClLoELvhzTpXFZEtJ70fXjdFllo-ce0fJ4f0AdRLQXvryO11ZSJ16tf-Ke-pko3kmBxW1cesvrQAQUQ$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
python-list__;!!Cn_UX_p3!ndkoYjlClLoELvhzTpXFZEtJ70fXjdFllo-ce0fJ4f0AdRLQXvryO11ZSJ16tf-Ke-pko3kmBxW1cesvrQAQUQ$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From songbird@21:1/5 to Peter Otten on Thu Dec 15 08:41:57 2022

Peter Otten wrote:

While I think what you need is a database instead of the collection of
csv files the way to alter namedtuples is to create a new one:

the files are coming from the web site that stores the
accounts. i already have manually created files from many
years ago with some of the same information but to go back
and reformat all of those would be a lot of work. it is
much easier to just take the files as supplied and process
them if i can do that instead.

i do know database stuff well enough but this is fairly
simple math and i'd like to avoid creating yet another
copy in yet another format to have to deal with.

from collections import namedtuple
Row = namedtuple("Row", "foo bar baz")
row = Row(1, 2, 3)
row._replace(bar=42)

Row(foo=1, bar=42, baz=3)

An alternative would be dataclasses where basic usage is just as easy:

from dataclasses import make_dataclass
Row = make_dataclass("Row", "foo bar baz".split())
row = Row(1, 2, 3)
row

Row(foo=1, bar=2, baz=3)

row.bar = 42
row

Row(foo=1, bar=42, baz=3)

thanks, i'll give these a try. :)

songbird

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Albert-Jan Roskam@21:1/5 to All on Sat Dec 17 20:45:18 2022

On Dec 15, 2022 10:21, Peter Otten <__peter__@web.de> wrote:

>>> from collections import namedtuple
>>> Row = namedtuple("Row", "foo bar baz")
>>> row = Row(1, 2, 3)
>>> row._replace(bar=42)
Row(foo=1, bar=42, baz=3)

====
Ahh, I always thought these are undocumented methods, but: "In addition to
the methods inherited from tuples, named tuples support three additional
methods and two attributes. To prevent conflicts with field names, the
method and attribute names start with an underscore."
https://docs.python.org/3/library/collections.html#collections.somenamedtuple._make

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gilmeh Serda@21:1/5 to songbird on Sat Dec 17 19:53:34 2022

On Wed, 14 Dec 2022 13:50:14 -0500, songbird wrote:

Suggestions?

Move it to SQLite. Most likely easier to deal with table integrity, like differences between types. And it's probably faster, too.

You can always export to csv later.

--
Gilmeh

The most important design issue... is the fact that Linux is supposed to
be fun... -- Linus Torvalds at the First Dutch International Symposium on
Linux

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From songbird@21:1/5 to Peter Otten on Sun Dec 18 10:44:18 2022

Peter Otten wrote:
...

While I think what you need is a database instead of the collection of
csv files the way to alter namedtuples is to create a new one:

from collections import namedtuple
Row = namedtuple("Row", "foo bar baz")
row = Row(1, 2, 3)
row._replace(bar=42)

Row(foo=1, bar=42, baz=3)

namedtuple is easier to use as that will use the csv and
csvreader and create the records without me having to do any
conversion or direct handling myself. it's all automagically
done. my initial version works, but i'd like it to be a bit
more elegant and handle descriptions it hasn't seen before
in a more robust manner.

An alternative would be dataclasses where basic usage is just as easy:

from dataclasses import make_dataclass
Row = make_dataclass("Row", "foo bar baz".split())
row = Row(1, 2, 3)
row

Row(foo=1, bar=2, baz=3)

row.bar = 42
row

Row(foo=1, bar=42, baz=3)

i do like that i can directly reference each field in a
dataclass and not have to specify a _replace for each change.

is there an easy way to convert from namedtuple to dataclass?
i can see there is a _asdict converter, but don't really like
how that turns out as then i have to do a bunch of:
rec['fieldname'] = blah

rec.fieldname is much easier to understand.

songbird

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Otten@21:1/5 to Albert-Jan Roskam on Mon Dec 19 18:42:43 2022

On 17/12/2022 20:45, Albert-Jan Roskam wrote:

On Dec 15, 2022 10:21, Peter Otten <__peter__@web.de> wrote:

>>> from collections import namedtuple
>>> Row = namedtuple("Row", "foo bar baz")
>>> row = Row(1, 2, 3)
>>> row._replace(bar=42)
Row(foo=1, bar=42, baz=3)

====
Ahh, I always thought these are undocumented methods, but: "In addition to
the methods inherited from tuples, named tuples support three additional
methods and two attributes. To prevent conflicts with field names, the
method and attribute names start with an underscore."
https://docs.python.org/3/library/collections.html#collections.somenamedtuple._make

I've read somewhere that Raymond Hettinger regrets the naming and now
would favour a trailing underscore to avoid name conflicts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Otten@21:1/5 to songbird on Mon Dec 19 19:29:20 2022

On 18/12/2022 16:44, songbird wrote:

Peter Otten wrote:
...

While I think what you need is a database instead of the collection of
csv files the way to alter namedtuples is to create a new one:

from collections import namedtuple
Row = namedtuple("Row", "foo bar baz")
row = Row(1, 2, 3)
row._replace(bar=42)

Row(foo=1, bar=42, baz=3)

namedtuple is easier to use as that will use the csv and
csvreader and create the records without me having to do any
conversion or direct handling myself. it's all automagically
done. my initial version works, but i'd like it to be a bit
more elegant and handle descriptions it hasn't seen before
in a more robust manner.

An alternative would be dataclasses where basic usage is just as easy:

from dataclasses import make_dataclass
Row = make_dataclass("Row", "foo bar baz".split())
row = Row(1, 2, 3)
row

Row(foo=1, bar=2, baz=3)

row.bar = 42
row

Row(foo=1, bar=42, baz=3)

i do like that i can directly reference each field in a
dataclass and not have to specify a _replace for each change.

is there an easy way to convert from namedtuple to dataclass?
i can see there is a _asdict converter, but don't really like
how that turns out as then i have to do a bunch of:
rec['fieldname'] = blah

rec.fieldname is much easier to understand.

I recommend that you use a dataclass /instead/ of a namedtuple, not
both. However, for a dataclass with the same fields in the same order as
in your namedtuple the conversion is trivial:

Create compatible namedtuple and dataclass types:

NTRow = namedtuple("NTRow", ["alpha", "beta", "gamma"])
DCRow = make_dataclass("DCRow", NTRow._fields)

Build the namedtuple:

ntrow = NTRow(1, "two", 3.0)
ntrow

NTRow(alpha=1, beta='two', gamma=3.0)

Convert to dataclass:

dcrow = DCRow(*ntrow)
dcrow

DCRow(alpha=1, beta='two', gamma=3.0)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From songbird@21:1/5 to Peter Otten on Mon Dec 19 14:22:34 2022

Peter Otten wrote:
...

I recommend that you use a dataclass /instead/ of a namedtuple, not
both. However, for a dataclass with the same fields in the same order as
in your namedtuple the conversion is trivial:

Create compatible namedtuple and dataclass types:

NTRow = namedtuple("NTRow", ["alpha", "beta", "gamma"])
DCRow = make_dataclass("DCRow", NTRow._fields)

Build the namedtuple:

ntrow = NTRow(1, "two", 3.0)
ntrow

NTRow(alpha=1, beta='two', gamma=3.0)

Convert to dataclass:

dcrow = DCRow(*ntrow)
dcrow

DCRow(alpha=1, beta='two', gamma=3.0)

thanks, once i get the data in from the file i only have
to reference it, but for the rest of the code i can use
the dataclass instead and that will be easier to read than
dicts. :)

your help is appreciated. :)

songbird

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	78:30:02
Calls:	6,716
Files:	12,247
Messages:	5,357,830

Keeping a list of records with named fields that can be updated

Who's Online

System Info