• Updating python3-xlrd for pandas 1.5 compatibility

    From Diane Trout@21:1/5 to All on Thu Feb 23 09:00:01 2023
    XPost: linux.debian.maint.python

    Hi,

    the version of python3-xlrd 1.2.0-3 in unstable/testing is too old to
    be used with pandas 1.5.3. (See Bug #1031701). As it is a really common workflow to use pandas to read excel files, it'd be nice if the version
    of xlrd in bookworm was compatible.

    Because of the freeze I wanted to check if it was appropriate to upload
    the new version, and what kind of warning I should give to the other developers.

    THe xlrd changelog says the biggest change in going from 1.2 to 2.0 was
    they removed the ability to read the newer XML excel files .xslx from
    xlrd in favor of using openpyxl

    I updated the source package python-xlrd to 2.0.1 and sent it through experimental, where there were no issues detected by packages that had
    CI tests.

    Unfortunately there's packages without tests.

    Here's the list of packages I found that have any relationship to
    python-xlrd, if it looked like the autopkgtests actually tested using
    the xlrd library and what the level of declared dependency is. (none
    means the package lacks autopackage tests)

    | nemo | none | Recommends |
    | odoo-14 | none | Depends |
    | ofxstatement-plugins | none | Depends |
    | psychopy | unlikely | Depends |
    | python3-agateexcel | yes | Depends |
    | python3-canmatrix | no | Recommends |
    | python3-drslib | no | Recommends |
    | python3-glue | yes | Depends |
    | python3-pyspectral | probably | Suggests |
    | python3-rows | unlikely | Recommends |
    | python3-tablib | unlikely | Depends |
    | visidata | none | Build-Depends |
    | vistrails | none | Build-Depends |
    | python-xrt | none | Build-Depends |
    | pyutilib | none | Build-Depends |

    Thanks
    Diane

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin@21:1/5 to Diane Trout on Thu Feb 23 10:40:01 2023
    XPost: linux.debian.maint.python

    On 2023-02-22 23:12, Diane Trout wrote:
    | visidata | none | Build-Depends |

    There seems to be no versioned depends or similar in the code.
    Should be safe, but in doubt ask Anja (upstream).

    Cheers

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Shengjing Zhu@21:1/5 to elbrus@debian.org on Fri Feb 24 20:00:01 2023
    XPost: linux.debian.maint.python

    On Sat, Feb 25, 2023 at 2:33 AM Paul Gevers <elbrus@debian.org> wrote:
    pandas has a quite extensive autopkgtest, doesn't it
    cover this use case? Apparently you knew this earlier, why do you bring
    this up now?

    Seems a bit unfortunate when pandas updates the version.

    https://salsa.debian.org/science-team/pandas/-/blob/debian/1.5.3+dfsg-2/pandas/tests/io/excel/test_xlrd.py#L13
    https://salsa.debian.org/science-team/pandas/-/blob/debian/1.5.3+dfsg-2/debian/tests/control#L51

    A RC bug for python3-xlrd was missing when pandas updated to 1.4.3+dfsg-1.

    --
    Shengjing Zhu

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Shengjing Zhu@21:1/5 to zhsj@debian.org on Fri Feb 24 20:00:01 2023
    XPost: linux.debian.maint.python

    On Sat, Feb 25, 2023 at 2:51 AM Shengjing Zhu <zhsj@debian.org> wrote:

    On Sat, Feb 25, 2023 at 2:33 AM Paul Gevers <elbrus@debian.org> wrote:
    pandas has a quite extensive autopkgtest, doesn't it
    cover this use case? Apparently you knew this earlier, why do you bring this up now?

    Seems a bit unfortunate when pandas updates the version.

    https://salsa.debian.org/science-team/pandas/-/blob/debian/1.5.3+dfsg-2/pandas/tests/io/excel/test_xlrd.py#L13
    https://salsa.debian.org/science-team/pandas/-/blob/debian/1.5.3+dfsg-2/debian/tests/control#L51


    Reading the comments in test/control, seems python3-blosc and
    python3-snappy are also not compatible with the version of pandas.

    --
    Shengjing Zhu

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Diane Trout@21:1/5 to Paul Gevers on Sat Feb 25 00:30:01 2023
    XPost: linux.debian.maint.python

    On Fri, 2023-02-24 at 19:33 +0100, Paul Gevers wrote:
    Hi Diane,

    On 23-02-2023 08:12, Diane Trout wrote:
    the version of python3-xlrd 1.2.0-3 in unstable/testing is too old
    to
    be used with pandas 1.5.3. (See Bug #1031701).

    Do I understand correctly that this isn't an issue from the point of python3-xlrd and that only pandas is effected? While investigating
    for
    this reply I noticed src:pandas doesn't even have a dependency in any
    of
    its binaries.

    It looks like the xlrd dependency was commented out because the Debian
    version is too old, though apparently that was done 7 months ago.

    https://salsa.debian.org/science-team/pandas/-/blob/main/debian/control#L45

    Here's the pandas module that conditionally uses xlrd if it's
    available.

    https://salsa.debian.org/science-team/pandas/-/blob/main/pandas/io/excel/_xlrd.py


    As it is a really common
    workflow to use pandas to read excel files, it'd be nice if the
    version
    of xlrd in bookworm was compatible.

    As the maintainer of pandas, do you consider it an RC issue that
    pandas
    can't convert it? I guess not because you say "it'd be nice" and you
    don't even have the required dependency. How severe do you consider
    this
    issue for pandas? pandas has a quite extensive autopkgtest, doesn't
    it
    cover this use case? Apparently you knew this earlier, why do you
    bring
    this up now?

    The issue is somewhere between a minor and a normal bug, it breaks a
    small component of the library.

    I wouldn't claim to be a maintainer of pandas, I feel Rebecca Palmer
    has been doing the vast amount of work keeping pandas updated in
    Debian.

    I started investigating this up after my coworker ran into while trying
    to process an .xls file. And when I looked, saw someone else had also
    recently filed the same bug report.


    Because of the freeze I wanted to check if it was appropriate to
    upload
    the new version,

    I'd hope that the "rules" are clear: https://release.debian.org/testing/freeze_policy.html#soft. You can
    contact the Release Team if you need further clarification.

    and what kind of warning I should give to the other
    developers.

    It depends. I'm worried about what you write below.

    That's fair.

    The counter argument is that xlrd's support for handling the xml based
    .xslx files was unsafe since Python 3.9, and it has been recommended to switching to another package like openpyxl to handle xlsx files for a
    while.

    (Release from xlrd announcement for thread mentioning the removal, and
    then goes into discussing the security issues) https://groups.google.com/g/python-excel/c/IRa8IWq_4zk/m/Af8-hrRnAgAJ

    The reason the issue doesn't show up much is .xls files are deprecated
    by nearly everyone, this only shows up when you're reading old data or generated by old software.

    The reason this is likely a minor issue, is there's a simple work
    around which is to convert your xls file to a xlsx file.

    Here's Pandas's discussion about deprecating xlrd for xlsx files. https://github.com/pandas-dev/pandas/issues/28547


    Here's the list of packages I found that have any relationship to python-xlrd, if it looked like the autopkgtests actually tested
    using
    the xlrd library and what the level of declared dependency is.
    (none
    means the package lacks autopackage tests)

    nemo                 | none     | Recommends    | odoo-14              | none     | Depends       | ofxstatement-plugins | none     | Depends       | psychopy             | unlikely | Depends       | python3-agateexcel   | yes      | Depends       | python3-canmatrix    | no       | Recommends    | python3-drslib       | no       | Recommends    | python3-glue         | yes      | Depends       | python3-pyspectral   | probably | Suggests      | python3-rows         | unlikely | Recommends    | python3-tablib       | unlikely | Depends       | visidata             | none     | Build-Depends | vistrails            | none     | Build-Depends | python-xrt           | none     | Build-Depends | pyutilib             | none     | Build-Depends |

    If I read everything correctly, it seems like you're too late with
    this
    change.


    With a bit more wakefulness, I looked through the packages that have
    any dependency on xlrd.

    I think odoo-14 is the package most likely to have issues. They use
    xlrd and seem to expect to be able to read and write xls & xlsx files
    using xlrd. Needless to say, updating xlrd would then break the ability
    to process xlsx files. Though of course the xlrd upstream thinks that's unreliable, and I have no idea how important this feature is to them.

    (the odoo repository also has tests, and someone could in theory write autopkgtests for it)

    I couldn't figure out what pyspectral is doing.

    These packages ofxstatement-plugins, psychopy, python3-agateexcel, python3-rows, python3-tablib, and visidata appear to also depend
    on/recommend openpyxl so they likely use the xlrd for .xls files and
    openpyxl for .xsx files as xlrd has been recommending.

    python3-canmatrix uses a different package python3-xlsxwriter to deal
    with xlsx files https://salsa.debian.org/python-team/packages/python-canmatrix/-/blob/debian/main/setup.py#L104

    Nemo looks to only be using xlrd for older .xls files, and has a
    different tool for the newer files. They seem to be using mimetypes and
    use this block for .xlsx files. https://salsa.debian.org/search?search=vnd.openxmlformats-officedocument.spreadsheetml.sheet&nav_source=navbar&project_id=17703&group_id=2992&search_code=true&repository_ref=master

    and this block for .xls files https://salsa.debian.org/cinnamon-team/nemo/-/blob/master/search-helpers/mso-xls.nemo_search_helper

    python3-drslib appears to be expecting to be used on .xls files.
    (looking through) https://sources.debian.org/src/drslib/0.3.1.p3-2/drslib/p_cmip5/init.py/

    vistrails only lists xlrd as a build depends, and it's tests seems to
    think it might work with both xls and xlrx files, but the test code in
    the package seems to only test xls files.

    And as an aside, I found that python-xrt probably should remove
    python3-xlrd from it's build dependencies as the package doesn't seem
    to use it.
    https://codesearch.debian.net/search?q=package%3Apython-xrt+xlrd

    Ultimately the argument that this is a relatively minor feature, cuts
    both ways. It both suggests the risk of updating is relatively low, but
    also there's less reason to update.

    Thank you for your time evaluating this request.
    Diane

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETQVcMeSBIEX5AQ11mQ04NnM013AFAmP5R3AACgkQmQ04NnM0 13Cp8w/+Jnph5a5PFVBcZ+/6ce+LRClBc2WH2fjSGPV3wH7a5VojDZLheyFvscIM MFHIfHcgiFSntbUMyrrPCWFVw8uWFmPsIgCei3+ywlcfi5vpYsxxjif6bbRgRMBV IR1945VbmV+dGZ+mANdKnX3hKIEkWlvWflwoKmgkb70gZHI+g0L4NsT205PtNB0w wly26KE/gh0z+X9s0SkVcjwIMWBTjUEpLKp6MIE2c+5zINocITBo4f+ooq7ls64v dGII+lDus8n67bUhm9bVxCm9g+lVYkGEodPAzKe6jgS0Bv7WglqmoxKUM/tfhznI 8fCTZ1ssK6eO6NgVfEnx+IUfcZ72V3/PGDK7FJp2rutYiJcbfZdOi22gyrLHvVLl VngOoaz3+iRg5KrOoLm4iHKSjDSH2H49si0NEPP9O2TyuD0pcp6cM55XMX49jUkN rHyUXgJH2o89ViteMDSzjSFdTh8XyqS0uD2VXhpELk1NJMWfGRiHvPDieoQx1ZAe qXE+1LOI5wdBjp29SiG5VVLeSBcyiTO/Kn3SbINf/AJyYop3r0aWYBrsp35ySdpM WAolVNJBBbKTvhU+pRdu7vBayIqIj0mLgBWWTMWc8zX2DteayQTmZeb+1pG2M8kS 6j8+zxvO/b9qkfuCXErOE7QUF5RwYAVRt657Fp6qbFwEH36ZQD8=
    =5nGk
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)