Forum: >>> Magnum BBS <<<

Re: LRU cache

From Chris Angelico@21:1/5 to Dino on Wed Feb 15 09:45:50 2023

On Wed, 15 Feb 2023 at 09:37, Dino <dino@no.spam.ar> wrote:

Here's my problem today. I am using a dict() to implement a quick and
dirty in-memory cache.

I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant entries.

I am thinking of the Least Recently Used principle, but how to implement
that is not immediate. Before I embark on reinventing the wheel, is
there a tool, library or smart trick that will allow me to remove
elements with LRU logic?

Check out functools.lru_cache :)

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to All on Tue Feb 14 17:07:00 2023

Here's my problem today. I am using a dict() to implement a quick and
dirty in-memory cache.

I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant
entries.

I am thinking of the Least Recently Used principle, but how to implement
that is not immediate. Before I embark on reinventing the wheel, is
there a tool, library or smart trick that will allow me to remove
elements with LRU logic?

thanks

Dino

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to All on Tue Feb 14 18:21:46 2023

Dino,

If your question is understood, you want to treat a dictionary as a sort of queue with a maximum number of entries. And, you want to remove some kind of least useful item to make room for any new one.

Most dictionaries now have entries in the order they were entered. There may already be some variant out there that implements this but it should not be hard to create.

So you could simply change the method that adds an item to the dictionary.
If the new next item is not already in the dictionary, simply remove the
first item using whatever method you wish.

Getting all the keys may be avoided by using an iterator once as in:
next(iter( my_dict.keys() )) or something like that.

You can then remove that item using the key and insert your new item at the end.

Of course, that is not least recently used but least recently entered.

To deal with keeping track of what was accessed last or never, is a problem
not trivially solved with just a dictionary. I mean you could store a tuple
for each item that included a date and a payload as the value, and you could then search all the values and find the one with the oldest date. That seems
a bit much so another architecture could be to maintain another data
structure holding keys and dates that perhaps you keep sorted by date and
every time the cache accesses a value, it finds the item in the second
storage and updates the date and moves the item to the end of the
structure.

But note that some may want to keep an access count so you always know how
many times an item has been re-used and thus not remove them as easily.

The main goal of a dictionary is to speed up access and make it almost
linear. Your algorithm can add so much overhead, depending on how it is
done, that it can defeat the purpose.

What some might do is skip the dictionary and use some kind of queue like a dequeue that handles your thousand entries and new items are added at the
end, items accessed moved to the front, and a brute force search is used to find an entry. But note some algorithms like that may end up removing the newest item immediately as it is least recently used if placed at the end.

It may be an Ordered Dict is one solution as shown here:

https://www.geeksforgeeks.org/lru-cache-in-python-using-ordereddict/

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Dino
Sent: Tuesday, February 14, 2023 5:07 PM
To: python-list@python.org
Subject: LRU cache

Here's my problem today. I am using a dict() to implement a quick and dirty in-memory cache.

I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant
entries.

I am thinking of the Least Recently Used principle, but how to implement
that is not immediate. Before I embark on reinventing the wheel, is there a tool, library or smart trick that will allow me to remove elements with LRU logic?

thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to Dino on Tue Feb 14 20:40:13 2023

Chris,

That is a nice decorator solution with some extra features.

We don't know if the OP needed a cache that was more general purpose and
could be accessed from multiple points, and shared across multiple
functions.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Chris Angelico
Sent: Tuesday, February 14, 2023 5:46 PM
To: python-list@python.org
Subject: Re: LRU cache

On Wed, 15 Feb 2023 at 09:37, Dino <dino@no.spam.ar> wrote:

Here's my problem today. I am using a dict() to implement a quick and
dirty in-memory cache.

I am stopping adding elements when I am reaching 1000 elements
(totally arbitrary number), but I would like to have something
slightly more sophisticated to free up space for newer and potentially
more relevant entries.

I am thinking of the Least Recently Used principle, but how to
implement that is not immediate. Before I embark on reinventing the
wheel, is there a tool, library or smart trick that will allow me to
remove elements with LRU logic?

Check out functools.lru_cache :)

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mats Wichmann@21:1/5 to Dino on Tue Feb 14 17:36:49 2023

On 2/14/23 15:07, Dino wrote:

Here's my problem today. I am using a dict() to implement a quick and
dirty in-memory cache.

I am stopping adding elements when I am reaching 1000 elements (totally arbitrary number), but I would like to have something slightly more sophisticated to free up space for newer and potentially more relevant entries.

I am thinking of the Least Recently Used principle, but how to implement
that is not immediate. Before I embark on reinventing the wheel, is
there a tool, library or smart trick that will allow me to remove
elements with LRU logic?

It depends here how fancy you want to get. If you really need the
behavior of a size-limited cache and you expect there to be a lot of
churn, then you'll probably want a combination of data structures...
e.g. a dict-like thing for quick hashed lookups (means your "keys" need
to be hashable) and a doubly-linked list for ordering so you can quickly
move a hit to the most-recent position - and maybe you want to keep
cache stats as well.

Do look at functools if that well-tested standard library module fits
your needs. If you need, or are motivated (as a learning experience?) to
roll your own, my memory tells me the RealPython crew did a tutorial on implementing an LRU cache, might be worth a look (not sure if this is
one of the all-free ones, or one of the must-be-a-paid-subscriber-to-get-the-advanced-stuff ones.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to Mats Wichmann on Wed Feb 15 14:58:37 2023

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to Mats Wichmann on Fri Feb 17 02:40:21 2023

I think this does the trick:

https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9

#!/usr/bin/env python3
import collections
import random
from typing import Hashable, Any, Optional, Dict, Tuple

class LruCache:
"""Dictionary like storage of most recently inserted values"""

def __init__(self, size: int = 1000):
""":param size number of cached entries"""
assert isinstance(size, int)
self.size = size
self.insert_counter = 0
self.oldest = 0
self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age index
self._lru: Dict[int, Hashable] = {} # age counter dictionary

def insert(self, key: Hashable, value: Any) -> None:
"""Insert into dictionary"""
existing = self._data.get(key, None)
self._data[key] = (value, self.insert_counter)
self._lru[self.insert_counter] = key
if existing is not None:
self._lru.pop(existing[1], None) # remove old counter value, if it exists
self.insert_counter += 1
if (sz := len(self._data)) > self.size: # is cache full?
assert sz == self.size + 1
while (
key := self._lru.get(self.oldest, None)) is None: # index may not be present, if value was reinserted
self.oldest += 1
del self._data[key] # remove oldest key / value from dictionary
del self._lru[self.oldest]
self.oldest += 1 # next oldest index
assert len(self._lru) == len(self._data)

def get(self, key: Hashable) -> Optional[Any]:
"""Get value or return None if not in cache"""
if (tpl := self._data.get(key, None)) is not None:
return tpl[0]
return None

if __name__ == "__main__":
CACHE_SIZE = 1000
TEST_SIZE = 1_000_000
cache = LruCache(size=CACHE_SIZE)

all = []
for i in range(TEST_SIZE):
all.append(random.randint(-5000, 5000))

summary = collections.defaultdict(int)
for value in all:
cache.insert(value, value * value)
summary[value] += 1
smallest = TEST_SIZE
largest = -TEST_SIZE
total = 0
for value, count in summary.items():
smallest = min(smallest, count)
largest = max(largest, count)
total += count
avg = total / len(summary)
print(f"{len(summary)} values occurrences range from {smallest} to {largest}, average {avg:.1f}")

recent = set() # recent most recent entries
for i in range(len(all) - 1, -1, -1): # loop backwards to get the most recent entries
value = all[i]
if len(recent) < CACHE_SIZE:
recent.add(value)
if value in recent:
if (r := cache.get(value)) != value * value:
raise ValueError(f"Cache missing recent {value} {r}")
else:
if cache.get(value) != None:
raise ValueError(f"Cache includes old {value}")

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Dino <dino@no.spam.ar>
Date: Wednesday, February 15, 2023 at 3:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: LRU cache
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:

-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-
list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to Gerard on Fri Feb 17 14:09:15 2023

Thank you, Gerard. I really appreciate your help

Dino

On 2/16/2023 9:40 PM, Weatherby,Gerard wrote:

I think this does the trick:

https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9

#!/usr/bin/env python3
import collections
import random
from typing import Hashable, Any, Optional, Dict, Tuple

class LruCache:
"""Dictionary like storage of most recently inserted values"""

def __init__(self, size: int = 1000):
""":param size number of cached entries"""
assert isinstance(size, int)
self.size = size
self.insert_counter = 0
self.oldest = 0
self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age index
self._lru: Dict[int, Hashable] = {} # age counter dictionary

def insert(self, key: Hashable, value: Any) -> None:
"""Insert into dictionary"""
existing = self._data.get(key, None)
self._data[key] = (value, self.insert_counter)
self._lru[self.insert_counter] = key
if existing is not None:
self._lru.pop(existing[1], None) # remove old counter value, if it exists
self.insert_counter += 1
if (sz := len(self._data)) > self.size: # is cache full?
assert sz == self.size + 1
while (
key := self._lru.get(self.oldest, None)) is None: # index may not be present, if value was reinserted
self.oldest += 1
del self._data[key] # remove oldest key / value from dictionary
del self._lru[self.oldest]
self.oldest += 1 # next oldest index
assert len(self._lru) == len(self._data)

def get(self, key: Hashable) -> Optional[Any]:
"""Get value or return None if not in cache"""
if (tpl := self._data.get(key, None)) is not None:
return tpl[0]
return None

if __name__ == "__main__":
CACHE_SIZE = 1000
TEST_SIZE = 1_000_000
cache = LruCache(size=CACHE_SIZE)

all = []
for i in range(TEST_SIZE):
all.append(random.randint(-5000, 5000))

summary = collections.defaultdict(int)
for value in all:
cache.insert(value, value * value)
summary[value] += 1
smallest = TEST_SIZE
largest = -TEST_SIZE
total = 0
for value, count in summary.items():
smallest = min(smallest, count)
largest = max(largest, count)
total += count
avg = total / len(summary)
print(f"{len(summary)} values occurrences range from {smallest} to {largest}, average {avg:.1f}")

recent = set() # recent most recent entries
for i in range(len(all) - 1, -1, -1): # loop backwards to get the most recent entries
value = all[i]
if len(recent) < CACHE_SIZE:
recent.add(value)
if value in recent:
if (r := cache.get(value)) != value * value:
raise ValueError(f"Cache missing recent {value} {r}")
else:
if cache.get(value) != None:
raise ValueError(f"Cache includes old {value}")

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Dino <dino@no.spam.ar>
Date: Wednesday, February 15, 2023 at 3:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: LRU cache
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:

-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-

list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Albert-Jan Roskam@21:1/5 to All on Sat Feb 18 11:38:56 2023

I sometimes use this trick, which I learnt from a book by Martelli.
Instead of try/except, membership testing with "in" (__contains__) might
be faster. Probably "depends". Matter of measuring.
def somefunc(arg, _cache={}):
if len(_cache) > 10 ** 5:
_cache.pop()
try:
return _cache[arg]
except KeyError:
result = expensivefunc(arg)
_cache[arg] = result
return result
Albert-Jan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Albert-Jan Roskam on Sat Feb 18 10:29:32 2023

On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:

I sometimes use this trick, which I learnt from a book by Martelli.
Instead of try/except, membership testing with "in" (__contains__) might
be faster. Probably "depends". Matter of measuring.
def somefunc(arg, _cache={}):
if len(_cache) > 10 ** 5:
_cache.pop()
try:
return _cache[arg]
except KeyError:
result = expensivefunc(arg)
_cache[arg] = result
return result
Albert-Jan

_cache.get(arg) should be a little faster and use slightly fewer
resources than the try/except.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rob Cliffe@21:1/5 to Thomas Passin on Sat Feb 18 16:28:28 2023

On 18/02/2023 15:29, Thomas Passin wrote:

On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:

    I sometimes use this trick, which I learnt from a book by Martelli. >>     Instead of try/except, membership testing with "in"
(__contains__) might
    be faster. Probably "depends". Matter of measuring.
    def somefunc(arg, _cache={}):
    if len(_cache) > 10 ** 5:
    _cache.pop()
    try:
    return _cache[arg]
    except KeyError:
    result = expensivefunc(arg)
    _cache[arg] = result
    return result
    Albert-Jan

_cache.get(arg) should be a little faster and use slightly fewer
resources than the try/except.

Provided that you can provide a default value to get() which will never
be a genuine "result".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Albert-Jan Roskam@21:1/5 to All on Sat Feb 18 18:19:22 2023

On Feb 18, 2023 17:28, Rob Cliffe via Python-list <python-list@python.org>
wrote:

On 18/02/2023 15:29, Thomas Passin wrote:
> On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:
>>     I sometimes use this trick, which I learnt from a book by
Martelli.
>>     Instead of try/except, membership testing with "in"
>> (__contains__) might
>>     be faster. Probably "depends". Matter of measuring.
>>     def somefunc(arg, _cache={}):
>>     if len(_cache) > 10 ** 5:
>>     _cache.pop()
>>     try:
>>     return _cache[arg]
>>     except KeyError:
>>     result = expensivefunc(arg)
>>     _cache[arg] = result
>>     return result
>>     Albert-Jan
>
> _cache.get(arg) should be a little faster and use slightly fewer
> resources than the try/except.
>
Provided that you can provide a default value to get() which will never
be a genuine "result".

=====
This might be better than None:
_cache.get(arg, Ellipsis)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rob Cliffe@21:1/5 to Albert-Jan Roskam on Sun Feb 19 00:17:02 2023

On 18/02/2023 17:19, Albert-Jan Roskam wrote:

On Feb 18, 2023 17:28, Rob Cliffe via Python-list
<python-list@python.org> wrote:

On 18/02/2023 15:29, Thomas Passin wrote:
> On 2/18/2023 5:38 AM, Albert-Jan Roskam wrote:
>>     I sometimes use this trick, which I learnt from a book by
Martelli.
>>     Instead of try/except, membership testing with "in"
>> (__contains__) might
>>     be faster. Probably "depends". Matter of measuring.
>>     def somefunc(arg, _cache={}):
>>     if len(_cache) > 10 ** 5:
>>     _cache.pop()
>>     try:
>>     return _cache[arg]
>>     except KeyError:
>>     result = expensivefunc(arg)
>>     _cache[arg] = result
>>     return result
>>     Albert-Jan
>
> _cache.get(arg) should be a little faster and use slightly fewer
> resources than the try/except.
>
Provided that you can provide a default value to get() which will
never
be a genuine "result".

=====

This might be better than None:
_cache.get(arg, Ellipsis)

A common strategy is to have a dedicated sentinel object. E.g. (untested): IMPOSSIBLE_RESULT = object()
...
        if _cache.get(arg, IMPOSSIBLE_RESULT) == IMPOSSIBLE_RESULT:
            # arg was not in the cache
            ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	78:26:19
Calls:	6,716
Files:	12,247
Messages:	5,357,830

Re: LRU cache

Who's Online

System Info