• Re: LRU cache

    From Chris Angelico@21:1/5 to Dino on Wed Feb 15 09:45:50 2023
    On Wed, 15 Feb 2023 at 09:37, Dino <dino@no.spam.ar> wrote:


    Here's my problem today. I am using a dict() to implement a quick and
    dirty in-memory cache.

    I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant entries.

    I am thinking of the Least Recently Used principle, but how to implement
    that is not immediate. Before I embark on reinventing the wheel, is
    there a tool, library or smart trick that will allow me to remove
    elements with LRU logic?


    Check out functools.lru_cache :)

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dino@21:1/5 to All on Tue Feb 14 17:07:00 2023
    Here's my problem today. I am using a dict() to implement a quick and
    dirty in-memory cache.

    I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant
    entries.

    I am thinking of the Least Recently Used principle, but how to implement
    that is not immediate. Before I embark on reinventing the wheel, is
    there a tool, library or smart trick that will allow me to remove
    elements with LRU logic?

    thanks

    Dino

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From avi.e.gross@gmail.com@21:1/5 to All on Tue Feb 14 18:21:46 2023
    Dino,

    If your question is understood, you want to treat a dictionary as a sort of queue with a maximum number of entries. And, you want to remove some kind of least useful item to make room for any new one.

    Most dictionaries now have entries in the order they were entered. There may already be some variant out there that implements this but it should not be hard to create.

    So you could simply change the method that adds an item to the dictionary.
    If the new next item is not already in the dictionary, simply remove the
    first item using whatever method you wish.

    Getting all the keys may be avoided by using an iterator once as in:
    next(iter( my_dict.keys() )) or something like that.

    You can then remove that item using the key and insert your new item at the end.

    Of course, that is not least recently used but least recently entered.

    To deal with keeping track of what was accessed last or never, is a problem
    not trivially solved with just a dictionary. I mean you could store a tuple
    for each item that included a date and a payload as the value, and you could then search all the values and find the one with the oldest date. That seems
    a bit much so another architecture could be to maintain another data
    structure holding keys and dates that perhaps you keep sorted by date and
    every time the cache accesses a value, it finds the item in the second
    storage and updates the date and moves the item to the end of the
    structure.

    But note that some may want to keep an access count so you always know how
    many times an item has been re-used and thus not remove them as easily.

    The main goal of a dictionary is to speed up access and make it almost
    linear. Your algorithm can add so much overhead, depending on how it is
    done, that it can defeat the purpose.

    What some might do is skip the dictionary and use some kind of queue like a dequeue that handles your thousand entries and new items are added at the
    end, items accessed moved to the front, and a brute force search is used to find an entry. But note some algorithms like that may end up removing the newest item immediately as it is least recently used if placed at the end.

    It may be an Ordered Dict is one solution as shown here:

    https://www.geeksforgeeks.org/lru-cache-in-python-using-ordereddict/



    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Dino
    Sent: Tuesday, February 14, 2023 5:07 PM
    To: python-list@python.org
    Subject: LRU cache


    Here's my problem today. I am using a dict() to implement a quick and dirty in-memory cache.

    I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant
    entries.

    I am thinking of the Least Recently Used principle, but how to implement
    that is not immediate. Before I embark on reinventing the wheel, is there a tool, library or smart trick that will allow me to remove elements with LRU logic?

    thanks

    Dino
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From avi.e.gross@gmail.com@21:1/5 to Dino on Tue Feb 14 20:40:13 2023
    Chris,

    That is a nice decorator solution with some extra features.

    We don't know if the OP needed a cache that was more general purpose and
    could be accessed from multiple points, and shared across multiple
    functions.



    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Chris Angelico
    Sent: Tuesday, February 14, 2023 5:46 PM
    To: python-list@python.org
    Subject: Re: LRU cache

    On Wed, 15 Feb 2023 at 09:37, Dino <dino@no.spam.ar> wrote:


    Here's my problem today. I am using a dict() to implement a quick and
    dirty in-memory cache.

    I am stopping adding elements when I am reaching 1000 elements
    (totally arbitrary number), but I would like to have something
    slightly more sophisticated to free up space for newer and potentially
    more relevant entries.

    I am thinking of the Least Recently Used principle, but how to
    implement that is not immediate. Before I embark on reinventing the
    wheel, is there a tool, library or smart trick that will allow me to
    remove elements with LRU logic?


    Check out functools.lru_cache :)

    ChrisA
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mats Wichmann@21:1/5 to Dino on Tue Feb 14 17:36:49 2023
    On 2/14/23 15:07, Dino wrote:

    Here's my problem today. I am using a dict() to implement a quick and
    dirty in-memory cache.

    I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant entries.

    I am thinking of the Least Recently Used principle, but how to implement
    that is not immediate. Before I embark on reinventing the wheel, is
    there a tool, library or smart trick that will allow me to remove
    elements with LRU logic?

    It depends here how fancy you want to get. If you really need the
    behavior of a size-limited cache and you expect there to be a lot of
    churn, then you'll probably want a combination of data structures...
    e.g. a dict-like thing for quick hashed lookups (means your "keys" need
    to be hashable) and a doubly-linked list for ordering so you can quickly
    move a hit to the most-recent position - and maybe you want to keep
    cache stats as well.

    Do look at functools if that well-tested standard library module fits
    your needs. If you need, or are motivated (as a learning experience?) to
    roll your own, my memory tells me the RealPython crew did a tutorial on implementing an LRU cache, might be worth a look (not sure if this is
    one of the all-free ones, or one of the must-be-a-paid-subscriber-to-get-the-advanced-stuff ones.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dino@21:1/5 to Mats Wichmann on Wed Feb 15 14:58:37 2023
    Thank you Mats, Avi and Chris

    btw, functools.lru_cache seems rather different from what I need, but
    maybe I am missing something. I'll look closer.

    On 2/14/2023 7:36 PM, Mats Wichmann wrote:
    On 2/14/23 15:07, Dino wrote:


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Weatherby,Gerard@21:1/5 to Mats Wichmann on Fri Feb 17 02:40:21 2023
    I think this does the trick:

    https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9


    #!/usr/bin/env python3
    import collections
    import random
    from typing import Hashable, Any, Optional, Dict, Tuple


    class LruCache:
    """Dictionary like storage of most recently inserted values"""

    def __init__(self, size: int = 1000):
    """:param size number of cached entries"""
    assert isinstance(size, int)
    self.size = size
    self.insert_counter = 0
    self.oldest = 0
    self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age index
    self._lru: Dict[int, Hashable] = {} # age counter dictionary

    def insert(self, key: Hashable, value: Any) -> None:
    """Insert into dictionary"""
    existing = self._data.get(key, None)
    self._data[key] = (value, self.insert_counter)
    self._lru[self.insert_counter] = key
    if existing is not None:
    self._lru.pop(existing[1], None) # remove old counter value, if it exists
    self.insert_counter += 1
    if (sz := len(self._data)) > self.size: # is cache full?
    assert sz == self.size + 1
    while (
    key := self._lru.get(self.oldest, None)) is None: # index may not be present, if value was reinserted
    self.oldest += 1
    del self._data[key] # remove oldest key / value from dictionary
    del self._lru[self.oldest]
    self.oldest += 1 # next oldest index
    assert len(self._lru) == len(self._data)

    def get(self, key: Hashable) -> Optional[Any]:
    """Get value or return None if not in cache"""
    if (tpl := self._data.get(key, None)) is not None:
    return tpl[0]
    return None


    if __name__ == "__main__":
    CACHE_SIZE = 1000
    TEST_SIZE = 1_000_000
    cache = LruCache(size=CACHE_SIZE)

    all = []
    for i in range(TEST_SIZE):
    all.append(random.randint(-5000, 5000))

    summary = collections.defaultdict(int)
    for value in all:
    cache.insert(value, value * value)
    summary[value] += 1
    smallest = TEST_SIZE
    largest = -TEST_SIZE
    total = 0
    for value, count in summary.items():
    smallest = min(smallest, count)
    largest = max(largest, count)
    total += count
    avg = total / len(summary)
    print(f"{len(summary)} values occurrences range from {smallest} to {largest}, average {avg:.1f}")

    recent = set() # recent most recent entries
    for i in range(len(all) - 1, -1, -1): # loop backwards to get the most recent entries
    value = all[i]
    if len(recent) < CACHE_SIZE:
    recent.add(value)
    if value in recent:
    if (r := cache.get(value)) != value * value:
    raise ValueError(f"Cache missing recent {value} {r}")
    else:
    if cache.get(value) != None:
    raise ValueError(f"Cache includes old {value}")

    From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Dino <dino@no.spam.ar>
    Date: Wednesday, February 15, 2023 at 3:07 PM
    To: python-list@python.org <python-list@python.org>
    Subject: Re: LRU cache
    *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

    Thank you Mats, Avi and Chris

    btw, functools.lru_cache seems rather different from what I need, but
    maybe I am missing something. I'll look closer.

    On 2/14/2023 7:36 PM, Mats Wichmann wrote:
    On 2/14/23 15:07, Dino wrote:


    -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-
    list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dino@21:1/5 to Gerard on Fri Feb 17 14:09:15 2023
    Thank you, Gerard. I really appreciate your help

    Dino

    On 2/16/2023 9:40 PM, Weatherby,Gerard wrote:
    I think this does the trick:

    https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9


    #!/usr/bin/env python3
    import collections
    import random
    from typing import Hashable, Any, Optional, Dict, Tuple


    class LruCache:
    """Dictionary like storage of most recently inserted values"""

    def __init__(self, size: int = 1000):
    """:param size number of cached entries"""
    assert isinstance(size, int)
    self.size = size
    self.insert_counter = 0
    self.oldest = 0
    self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age index
    self._lru: Dict[int, Hashable] = {} # age counter dictionary

    def insert(self, key: Hashable, value: Any) -> None:
    """Insert into dictionary"""
    existing = self._data.get(key, None)
    self._data[key] = (value, self.insert_counter)
    self._lru[self.insert_counter] = key
    if existing is not None:
    self._lru.pop(existing[1], None) # remove old counter value, if it exists
    self.insert_counter += 1
    if (sz := len(self._data)) > self.size: # is cache full?
    assert sz == self.size + 1
    while (
    key := self._lru.get(self.oldest, None)) is None: # index may not be present, if value was reinserted
    self.oldest += 1
    del self._data[key] # remove oldest key / value from dictionary
    del self._lru[self.oldest]
    self.oldest += 1 # next oldest index
    assert len(self._lru) == len(self._data)

    def get(self, key: Hashable) -> Optional[Any]:
    """Get value or return None if not in cache"""
    if (tpl := self._data.get(key, None)) is not None:
    return tpl[0]
    return None


    if __name__ == "__main__":
    CACHE_SIZE = 1000
    TEST_SIZE = 1_000_000
    cache = LruCache(size=CACHE_SIZE)

    all = []
    for i in range(TEST_SIZE):
    all.append(random.randint(-5000, 5000))

    summary = collections.defaultdict(int)
    for value in all:
    cache.insert(value, value * value)
    summary[value] += 1
    smallest = TEST_SIZE
    largest = -TEST_SIZE
    total = 0
    for value, count in summary.items():
    smallest = min(smallest, count)
    largest = max(largest, count)
    total += count
    avg = total / len(summary)
    print(f"{len(summary)} values occurrences range from {smallest} to {largest}, average {avg:.1f}")

    recent = set() # recent most recent entries
    for i in range(len(all) - 1, -1, -1): # loop backwards to get the most recent entries
    value = all[i]
    if len(recent) < CACHE_SIZE:
    recent.add(value)
    if value in recent:
    if (r := cache.get(value)) != value * value:
    raise ValueError(f"Cache missing recent {value} {r}")
    else:
    if cache.get(value) != None:
    raise ValueError(f"Cache includes old {value}")

    From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Dino <dino@no.spam.ar>
    Date: Wednesday, February 15, 2023 at 3:07 PM
    To: python-list@python.org <python-list@python.org>
    Subject: Re: LRU cache
    *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

    Thank you Mats, Avi and Chris

    btw, functools.lru_cache seems rather different from what I need, but
    maybe I am missing something. I'll look closer.

    On 2/14/2023 7:36 PM, Mats Wichmann wrote:
    On 2/14/23 15:07, Dino wrote:


    -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-
    list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albert-Jan Roskam@21:1/5 to All on Sat Feb 18 11:38:56 2023
    I sometimes use this trick, which I learnt from a book by Martelli.
    Instead of try/except, membership testing with "in" (__contains__) might
    be faster. Probably "depends". Matter of measuring.
    def somefunc(arg, _cache={}):
        if len(_cache) > 10 ** 5:
            _cache.pop()
        try:
            return _cache[arg]
        except KeyError:
            result = expensivefunc(arg)
            _cache[arg] = result
            return result
    Albert-Jan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to Albert-Jan Roskam on Sat Feb 18 10:29:32 2023
    On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:
    I sometimes use this trick, which I learnt from a book by Martelli.
    Instead of try/except, membership testing with "in" (__contains__) might
    be faster. Probably "depends". Matter of measuring.
    def somefunc(arg, _cache={}):
        if len(_cache) > 10 ** 5:
            _cache.pop()
        try:
            return _cache[arg]
        except KeyError:
            result = expensivefunc(arg)
            _cache[arg] = result
            return result
    Albert-Jan

    _cache.get(arg) should be a little faster and use slightly fewer
    resources than the try/except.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rob Cliffe@21:1/5 to Thomas Passin on Sat Feb 18 16:28:28 2023
    On 18/02/2023 15:29, Thomas Passin wrote:
    On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:
        I sometimes use this trick, which I learnt from a book by Martelli. >>     Instead of try/except, membership testing with "in"
    (__contains__) might
        be faster. Probably "depends". Matter of measuring.
        def somefunc(arg, _cache={}):
            if len(_cache) > 10 ** 5:
                _cache.pop()
            try:
                return _cache[arg]
            except KeyError:
                result = expensivefunc(arg)
                _cache[arg] = result
                return result
        Albert-Jan

    _cache.get(arg) should be a little faster and use slightly fewer
    resources than the try/except.

    Provided that you can provide a default value to get() which will never
    be a genuine "result".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albert-Jan Roskam@21:1/5 to All on Sat Feb 18 18:19:22 2023
    On Feb 18, 2023 17:28, Rob Cliffe via Python-list <python-list@python.org>
    wrote:

    On 18/02/2023 15:29, Thomas Passin wrote:
    > On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:
    >>     I sometimes use this trick, which I learnt from a book by
    Martelli.
    >>     Instead of try/except, membership testing with "in"
    >> (__contains__) might
    >>     be faster. Probably "depends". Matter of measuring.
    >>     def somefunc(arg, _cache={}):
    >>         if len(_cache) > 10 ** 5:
    >>             _cache.pop()
    >>         try:
    >>             return _cache[arg]
    >>         except KeyError:
    >>             result = expensivefunc(arg)
    >>             _cache[arg] = result
    >>             return result
    >>     Albert-Jan
    >
    > _cache.get(arg) should be a little faster and use slightly fewer
    > resources than the try/except.
    >
    Provided that you can provide a default value to get() which will never
    be a genuine "result".

    =====
    This might be better than None:
    _cache.get(arg, Ellipsis)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rob Cliffe@21:1/5 to Albert-Jan Roskam on Sun Feb 19 00:17:02 2023
    On 18/02/2023 17:19, Albert-Jan Roskam wrote:


    On Feb 18, 2023 17:28, Rob Cliffe via Python-list
    <python-list@python.org> wrote:

    On 18/02/2023 15:29, Thomas Passin wrote:
    > On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:
    >>     I sometimes use this trick, which I learnt from a book by
    Martelli.
    >>     Instead of try/except, membership testing with "in"
    >> (__contains__) might
    >>     be faster. Probably "depends". Matter of measuring.
    >>     def somefunc(arg, _cache={}):
    >>         if len(_cache) > 10 ** 5:
    >>             _cache.pop()
    >>         try:
    >>             return _cache[arg]
    >>         except KeyError:
    >>             result = expensivefunc(arg)
    >>             _cache[arg] = result
    >>             return result
    >>     Albert-Jan
    >
    > _cache.get(arg) should be a little faster and use slightly fewer
    > resources than the try/except.
    >
    Provided that you can provide a default value to get() which will
    never
    be a genuine "result".


    =====

    This might be better than None:
    _cache.get(arg, Ellipsis)


    A common strategy is to have a dedicated sentinel object.  E.g. (untested): IMPOSSIBLE_RESULT = object()
    ...
            if _cache.get(arg, IMPOSSIBLE_RESULT) == IMPOSSIBLE_RESULT:
                # arg was not in the cache
                ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)