• Postgresql equivalent of Python's timeit?

    From Albert-Jan Roskam@21:1/5 to All on Fri Sep 15 17:42:06 2023
    Hi,
    This is more related to Postgresql than to Python, I hope this is ok.
    I want to measure Postgres queries N times, much like Python timeit
    (https://docs.python.org/3/library/timeit.html). I know about EXPLAIN
    ANALYZE and psql \timing, but there's quite a bit of variation in the
    times. Is there a timeit-like function in Postgresql?
    Thanks!
    Albert-Jan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter J. Holzer@21:1/5 to Albert-Jan Roskam via Python-list on Fri Sep 15 19:45:16 2023
    On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
    This is more related to Postgresql than to Python, I hope this is ok.
    I want to measure Postgres queries N times, much like Python timeit
    (https://docs.python.org/3/library/timeit.html). I know about EXPLAIN
    ANALYZE and psql \timing, but there's quite a bit of variation in the
    times. Is there a timeit-like function in Postgresql?

    Why not simply call it n times from Python?

    (But be aware that calling the same query n times in a row is likely to be unrealistically fast because most of the data will already be in
    memory.)

    hp

    --
    _ | Peter J. Holzer | Story must make more sense than reality.
    |_|_) | |
    | | | hjp@hjp.at | -- Charles Stross, "Creative writing
    __/ | http://www.hjp.at/ | challenge!"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmUEmCIACgkQ8g5IURL+ KF1YFg//Q6/IBdUY7yyaqi9V8Wc9xdahajWV5JD+0I6a9/LZrmqNUip4nfZLVd6v IlwgQE3+SNgKVm/8CI4qKuwuTen1pRhTV++Oh6BuFjE+7Rhoz3J5pjd+bPEEWPOe FRMY4FfhQZPWZ0ODq75TCcfNJYOmgvrzWmU3+oNTwMac15duewggLCT3nwNwrmHD 8KfYiXaBoZ+4wHRtwy4l1R/unu27ZP3otAL0fH2n+GnL+LfJo81aXTsn8YlsLGzG 288/HEZ37iHOyXhYoH0uOKOneV15mrF4xaoAOmvx8RJOxn1ubJVOKF0aZC3/DOiU I+DIFVxHpf6tHiaEd8LVwNHuvuMcFR4o91FyseLo/PGAyPWzjo+meosvZ5GywJZJ ZtjUrSxlVQnJy2ZGdPXRuEBISQ4It7UbxF0joLSUE+z6t81RlabjyIKzSbQeoUE2 Mos2VI/MxZyjSNFczI9kB5VGJkbRbmIVYYyXL9I94Zqve9xmstJPofbcxgInMnyY WdI3/kQDiWYdVtVW844l738CGAvf2EKT/Y34S2ByoCvc27vCyLsddnh9wAV5alfl TW7FUim1an/tIeQnpSD4YHFfSVXVHzN4OT7pp5Tsa8AOFzkkso6ybRmyw/pfO148 8sK+ApOlT96AiI8F/X0v6gxhezsdWqZhVXtyivt
  • From Albert-Jan Roskam@21:1/5 to All on Sun Sep 17 11:01:43 2023
    On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
    <python-list@python.org> wrote:

    On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
    >    This is more related to Postgresql than to Python, I hope this is
    ok.
    >    I want to measure Postgres queries N times, much like Python timeit
    >    (https://docs.python.org/3/library/timeit.html). I know about
    EXPLAIN
    >    ANALYZE and psql \timing, but there's quite a bit of variation in
    the
    >    times. Is there a timeit-like function in Postgresql?

    Why not simply call it n times from Python?

    (But be aware that calling the same query n times in a row is likely to
    be
    unrealistically fast because most of the data will already be in
    memory.)

    =====
    Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
    don't re-use the same database connection.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From avi.e.gross@gmail.com@21:1/5 to All on Sun Sep 17 11:48:46 2023
    Timing things that are fairly simple is hard enough to do repeatedly, but when it involves access to slower media and especially to network connections to servers, the number of things that can change are enormous. There are all kinds of caching at
    various levels depending on your hardware and resource contention with other programs running here and there as well as on various network-like structures and busses or just hard disks. Asking for anything to be repeated multiple times in a row as a
    general rule can make your results seem slower or faster depending on too many factors including what else is running on your machine.

    I am wondering if an approach to running something N times that may average things out a bit is to simply put in a pause. Have your program wait a few minutes between attempts and perhaps even do other things within your loop that make it likely some of
    the resources you want not to be in a queue have a chance to be flushed as other things take their place. Obviously, a machine or system with lots of resources may take more effort to use enough new data that replaces the old.

    Good luck. Getting reliable numbers is no easy feat as someone else may have trouble duplicating the results with a somewhat different setup.

    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Albert-Jan Roskam via Python-list
    Sent: Sunday, September 17, 2023 5:02 AM
    To: Peter J. Holzer <hjp-python@hjp.at>
    Cc: python-list@python.org
    Subject: Re: Postgresql equivalent of Python's timeit?

    On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
    <python-list@python.org> wrote:

    On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
    > This is more related to Postgresql than to Python, I hope this is
    ok.
    > I want to measure Postgres queries N times, much like Python timeit
    > (https://docs.python.org/3/library/timeit.html). I know about
    EXPLAIN
    > ANALYZE and psql \timing, but there's quite a bit of variation in
    the
    > times. Is there a timeit-like function in Postgresql?

    Why not simply call it n times from Python?

    (But be aware that calling the same query n times in a row is likely to
    be
    unrealistically fast because most of the data will already be in
    memory.)

    =====
    Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
    don't re-use the same database connection.
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to AVI GROSS via Python-list on Sun Sep 17 12:23:07 2023
    On 9/17/2023 11:48 AM, AVI GROSS via Python-list wrote:
    Timing things that are fairly simple is hard enough to do repeatedly, but when it involves access to slower media and especially to network connections to servers, the number of things that can change are enormous. There are all kinds of caching at
    various levels depending on your hardware and resource contention with other programs running here and there as well as on various network-like structures and busses or just hard disks. Asking for anything to be repeated multiple times in a row as a
    general rule can make your results seem slower or faster depending on too many factors including what else is running on your machine.

    I am wondering if an approach to running something N times that may average things out a bit is to simply put in a pause. Have your program wait a few minutes between attempts and perhaps even do other things within your loop that make it likely some
    of the resources you want not to be in a queue have a chance to be flushed as other things take their place.

    One thing I have done for timing queries is to construct a series of
    test queries with the query parameters drawn randomly from a large set
    of values. The hope is that random combinations will defeat caching and provide a reasonably realistic view of the times.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter J. Holzer@21:1/5 to Albert-Jan Roskam via Python-list on Sun Sep 17 18:34:17 2023
    On 2023-09-17 11:01:43 +0200, Albert-Jan Roskam via Python-list wrote:
    On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
    <python-list@python.org> wrote:

    On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
    >    This is more related to Postgresql than to Python, I hope this is
    ok.
    >    I want to measure Postgres queries N times, much like Python timeit
    >    (https://docs.python.org/3/library/timeit.html). I know about
    EXPLAIN
    >    ANALYZE and psql \timing, but there's quite a bit of variation in
    the
    >    times. Is there a timeit-like function in Postgresql?

    Why not simply call it n times from Python?

    (But be aware that calling the same query n times in a row is likely to
    be
    unrealistically fast because most of the data will already be in
    memory.)

    =====
    Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
    don't re-use the same database connection.

    There is some per-session caching, but the bulk of it is shared between sessions or even in the operating system. And you wouldn't want to get
    rid of these caches either (which you could do by rebooting or - a bit
    faster - restarting postgres and dropping the caches
    (/proc/sys/vm/drop_caches on Linux), because that would make the
    benchmark unrealistically slow (unless you want to establish some
    worst-case baseline). During normal operations some data will be cached,
    but probably not all of it and it will change depending on workload and possibly other factors.

    I think Avi's advice to wait for a few minutes between repetitions is
    good. Of course that means that you can't just time the whole thing but
    have to time each query separately and then compute the average. (On the
    bright side that also gives you the opportunity to compute standard
    deviation, min, max, quantiles, etc.)

    hp

    --
    _ | Peter J. Holzer | Story must make more sense than reality.
    |_|_) | |
    | | | hjp@hjp.at | -- Charles Stross, "Creative writing
    __/ | http://www.hjp.at/ | challenge!"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmUHKoQACgkQ8g5IURL+ KF1e/w/8CoHgrJ0pPIk4HvmLWKniQvV7E8PqRXYYudjnwJTx2WocbFiIpGGgMfSF JwSNCPUyDMnBPEAZz/mVXad9gE9K6oujlDckN/sJAHe8MarrsHg1w70wanIWS7A6 1ex6lhWF3B+bINxXL8DpdbnXemAwk5EsNuYGJCjs3kjix1iTRlOQf1nurvIWx2Dr vb0OQq6DI/dNvCapVqfv10Cvvv3aTTsh012M5px6JXzTRgrRNtyJnnR+V4Rt1E/c E62eZ03ZXcV36Hm2+NJ+b5el5H/Z34nllB8vKLsZJy8GjfT1WB50AOcGklb6aa1x 6TeWQVRxQNWn0rfrQUX+VdL2tcG9rvJjjSm8Su1Bou9a0CHe5Abt++32L8YDkaaa o+QZXVcDqN81YuWMPduiAw3kgWh2ZK1Q3JOu866MAca0izT8XW1u7bhuA8NeJh/a nY8rc4bTBCiPoxtx0ZSxbBVnxKgCq+fZNQJngxHHfJoggFK0SShhcZUmm2bCj1lh xD4Rmen99RpiYdeIzm993V5+wuaiMCGkwC94hG2VkdQmm24Lr+AKe5zC7Gn4lNIL BymdZG12NZh19oG2buWLRyNKJrw6lm5mSnpKqg/h52prrAODtgGdxHdpTSUpuoii MpfhYISw4H7vU6+H0L11Ikf2ScC6yNpvIY+d8kX
  • From Thomas Passin@21:1/5 to Albert-Jan Roskam via Python-list on Sun Sep 17 08:52:30 2023
    On 9/17/2023 5:01 AM, Albert-Jan Roskam via Python-list wrote:
    On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
    <python-list@python.org> wrote:

    On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
    >    This is more related to Postgresql than to Python, I hope this is
    ok.
    >    I want to measure Postgres queries N times, much like Python timeit
    >    (https://docs.python.org/3/library/timeit.html). I know about
    EXPLAIN
    >    ANALYZE and psql \timing, but there's quite a bit of variation in
    the
    >    times. Is there a timeit-like function in Postgresql?

    Why not simply call it n times from Python?

    (But be aware that calling the same query n times in a row is likely to
    be
    unrealistically fast because most of the data will already be in
    memory.)

    =====
    Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
    don't re-use the same database connection.

    Here is a stack overflow thread that gives a good rundown of ways to get timings from Postgres:

    https://stackoverflow.com/questions/9063402/get-execution-time-of-postgresql-query

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)