• Re: dxtbx pytest issue

    From Neil Williams@21:1/5 to All on Thu Sep 1 14:10:01 2022
    On Thu, 1 Sep 2022 13:32:36 +0200 (CEST)
    PICCA Frederic-Emmanuel <frederic-emmanuel.picca@synchrotron-soleil.fr>
    wrote:

    Hello,

    I am still trying to package[2] dxtbx[1], and now I end up with
    something strange. When I run the test suite during the build I have
    a failure like this

    tests/test_dataset_as_flex.py ..F..F..F
    [ 2%]

    Looks like you need a -v option to see more detail.

    I put the error message bellow, it is quite long

    now If I execute by hand only the failing test like this, it works


    $ pytest-3 tests/test_dataset_as_flex.py

    The 2% indicates that other tests have executed before the test series
    gets to test_dataset_as_flex.py. When you run it directly, no other
    tests are run. This could indicate that there is interference between
    tests (an upstream problem) in that the tests are not sufficiently
    discrete and are affected by which other tests are executed before the
    failing test. pytest has support to solve these problems, if the test
    suite is written correctly.

    It could also be a version problem with any of the build-dependencies
    (like h5py), so check for various requirements*.txt and see if upstream
    have a .gitlabci.yml or .github config which clarifies exactly what
    versions are used.

    You may need to exclude certain tests in case, for example, h5py in
    Debian is built with options that dxtbx cannot handle.

    before investigating further, I would like your advice in order to
    debug this sort of issue.

    first what is the difference between

    pytest and pytest <file>

    pytest scans the directory tree to collect all the tests it can find.

    pytest <file> only collects the tests from that file.

    Check for a pytest.ini in the upstream source tree and also setup.cfg

    Use the -v option to see more of what is going on and the
    --collect-only option is useful too.

    See pytest --help as other filtering options are available as well.

    You'll need to have a local build environment (fakeroot debian/rules
    build etc.) & dig into the Python unit test methods and see what is
    going on.

    thanks for your help

    Frederic

    [1] https://github.com/cctbx/dxtbx
    [2] https://salsa.debian.org/science-team/dxtbx



    full error message

    ___________ test_dataset_as_flex[int-dataset_as_flex_int-bshuf_lz4] ____________

    type_name = 'int', creator = <function bshuf_lz4 at 0x7fccddd35120>
    converter = <Boost.Python.function object at 0x556b03602bb0>

    @pytest.mark.parametrize(
    "creator",
    [
    uncompressed,
    gzip,
    bshuf_lz4,
    ],
    )
    @pytest.mark.parametrize(
    "type_name,converter",
    [
    ("int", dataset_as_flex_int),
    ("float", dataset_as_flex_float),
    ("double", dataset_as_flex_double),
    ],
    )
    def test_dataset_as_flex(type_name, creator, converter):
    # Create an in-memory HDF5 dataset with unique name
    f = h5py.File(type_name + ".h5", "w", driver="core", backing_store=False)
    shape = (20, 20, 20)
    dataset = creator(f, shape, type_name)

    tests/test_dataset_as_flex.py:64:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ tests/test_dataset_as_flex.py:34: in bshuf_lz4
    return file.create_dataset( /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/group.py:161:
    in create_dataset dsid = dataset.make_new_dset(group, shape, dtype,
    data, name, **kwds) /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/dataset.py:106: in make_new_dset dcpl = filters.fill_dcpl( _ _ _ _ _ _ _ _ _ _ _ _ _
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    plist = <h5py._debian_h5py_serial.h5p.PropDCID object at
    0x7fccdbbd9760> shape = (20, 20, 20), dtype = dtype('int64'), chunks 0x7fccdbbd9760> = (10, 10, 10)
    compression = 32008, compression_opts = (0, 2), shuffle = None
    fletcher32 = None, maxshape = None, scaleoffset = None, external = [] allow_unknown_filter = False

    def fill_dcpl(plist, shape, dtype, chunks, compression,
    compression_opts, shuffle, fletcher32, maxshape, scaleoffset,
    external, allow_unknown_filter=False):
    """ Generate a dataset creation property list.

    Undocumented and subject to change without warning.
    """

    if shape is None or shape == ():
    shapetype = 'Empty' if shape is None else 'Scalar'
    if any((chunks, compression, compression_opts, shuffle, fletcher32, scaleoffset is not None)):
    raise TypeError(
    f"{shapetype} datasets don't support chunk/filter options" )
    if maxshape and maxshape != ():
    raise TypeError(f"{shapetype} datasets cannot be
    extended") return h5p.create(h5p.DATASET_CREATE)

    def rq_tuple(tpl, name):
    """ Check if chunks/maxshape match dataset rank """
    if tpl in (None, True):
    return
    try:
    tpl = tuple(tpl)
    except TypeError:
    raise TypeError('"%s" argument must be None or a
    sequence object' % name) if len(tpl) != len(shape):
    raise ValueError('"%s" must have same rank as dataset
    shape' % name)
    rq_tuple(chunks, 'chunks')
    rq_tuple(maxshape, 'maxshape')

    if compression is not None:
    if isinstance(compression, FilterRefBase):
    compression_opts = compression.filter_options
    compression = compression.filter_id

    if compression not in encode and not
    isinstance(compression, int): raise ValueError('Compression filter
    "%s" is unavailable' % compression)
    if compression == 'gzip':
    if compression_opts is None:
    gzip_level = DEFAULT_GZIP
    elif compression_opts in range(10):
    gzip_level = compression_opts
    else:
    raise ValueError("GZIP setting must be an integer
    from 0-9, not %r" % compression_opts)
    elif compression == 'lzf':
    if compression_opts is not None:
    raise ValueError("LZF compression filter accepts
    no options")
    elif compression == 'szip':
    if compression_opts is None:
    compression_opts = DEFAULT_SZIP

    err = "SZIP options must be a 2-tuple ('ec'|'nn',
    even integer 0-32)" try:
    szmethod, szpix = compression_opts
    except TypeError:
    raise TypeError(err)
    if szmethod not in ('ec', 'nn'):
    raise ValueError(err)
    if not (0<szpix<=32 and szpix%2 == 0):
    raise ValueError(err)

    elif compression_opts is not None:
    # Can't specify just compression_opts by itself.
    raise TypeError("Compression method must be specified")

    if scaleoffset is not None:
    # scaleoffset must be an integer when it is not None or
    False, # except for integral data, for which scaleoffset == True is
    # permissible (will use SO_INT_MINBITS_DEFAULT)

    if scaleoffset < 0:
    raise ValueError('scale factor must be >= 0')

    if dtype.kind == 'f':
    if scaleoffset is True:
    raise ValueError('integer scaleoffset must be
    provided for ' 'floating point types')
    elif dtype.kind in ('u', 'i'):
    if scaleoffset is True:
    scaleoffset = h5z.SO_INT_MINBITS_DEFAULT
    else:
    raise TypeError('scale/offset filter only supported
    for integer ' 'and floating-point types')

    # Scale/offset following fletcher32 in the filter chain
    will (almost?) # always triggers a read error, as most scale/offset
    settings are # lossy. Since fletcher32 must come first (see comment
    below) we # simply prohibit the combination of fletcher32 and
    scale/offset. if fletcher32:
    raise ValueError('fletcher32 cannot be used with
    potentially lossy' ' scale/offset filter')

    external = _normalize_external(external)
    # End argument validation

    if (chunks is True) or \
    (chunks is None and any((shuffle, fletcher32, compression,
    maxshape, scaleoffset is not None))):
    chunks = guess_chunk(shape, maxshape, dtype.itemsize)

    if maxshape is True:
    maxshape = (None,)*len(shape)

    if chunks is not None:
    plist.set_chunk(chunks)
    plist.set_fill_time(h5d.FILL_TIME_ALLOC) # prevent
    resize glitch
    # scale-offset must come before shuffle and compression
    if scaleoffset is not None:
    if dtype.kind in ('u', 'i'):
    plist.set_scaleoffset(h5z.SO_INT, scaleoffset)
    else: # dtype.kind == 'f'
    plist.set_scaleoffset(h5z.SO_FLOAT_DSCALE,
    scaleoffset)
    for item in external:
    plist.set_external(*item)

    if shuffle:
    plist.set_shuffle()

    if compression == 'gzip':
    plist.set_deflate(gzip_level)
    elif compression == 'lzf':
    plist.set_filter(h5z.FILTER_LZF, h5z.FLAG_OPTIONAL)
    elif compression == 'szip':
    opts = {'ec': h5z.SZIP_EC_OPTION_MASK, 'nn': h5z.SZIP_NN_OPTION_MASK} plist.set_szip(opts[szmethod], szpix)
    elif isinstance(compression, int):
    if not allow_unknown_filter and not h5z.filter_avail(compression):
    raise ValueError("Unknown compression filter number:
    %s" % compression)
    E ValueError: Unknown compression filter number: 32008

    /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/filters.py:281: ValueError



    --
    Neil Williams
    =============
    https://linux.codehelp.co.uk/

    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEf3HB6ceOc10DYMbM8WfkPIFDtoIFAmMQn4AACgkQ8WfkPIFD toKCpxAAv3QWRu0Qcpj/7TsVQ8h7+dUacyAjR1jf/y/FT35O0lrmbGpjewgFOqRF FSSJTOcKEjFO9YuYVmIuSJxxK6ax7Z0DIl+Y6IM7P9dw35Ys9cVuBFSi9VhWpwzS GO5VKj/3A27/bD0H4yyzmc0oh86bD2Wcu1Z49rARf5v2lvJpUMGki7ck3o8wto+Q 3eHhGM0Pj+CePN3BZdI6ersCMQosgnHL22tAtHvAkC45uW77iN2T6Se3ZC1O0OOK M4bpFDJnv51M1sfflR511SpsrDFkwB0ozxLIgP59YBPvmMo76aOxI8/u78om2MDu JEcdX1wVjX0rVmSavDe165MSS/KVUZJMhpAcd7Jwk6Rx5Hv/C5fFtjYmZpNL6/A2 ER8J6VhqFruK90/2DFr8niqTFH9eJGP7j8zXwzRZ7gN/KqaW5jYmBlv+KkkWnmcr WscuWv47ASS+FS6dNAiBZVM0RL9HOmZG0bxH2eN4pnMlFzZE4/cbwfqwI/EWa5Lk +PCcONF/aL7n8VGl63oEW7WdRkhfcvDu9wxg+C0W3dq9ZJhhdcO0V2ncIJnE6fs5 YViid6wRVuTZSuEsJlnNsGHY5lB9z2NXvuMMp4sJ7Sx8QCOyWKQhOGMDvmcRGyLa JjyKpaMS+dVwvrR9jb9HP2gmGoR03+XMuiJiDm8hTVufPPZBmF8=
    =gvkT
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From PICCA Frederic-Emmanuel@21:1/5 to All on Thu Sep 1 13:50:01 2022
    Hello,

    I am still trying to package[2] dxtbx[1], and now I end up with something strange. When I run the test suite during the build I have a failure like this

    tests/test_dataset_as_flex.py ..F..F..F [ 2%]

    I put the error message bellow, it is quite long

    now If I execute by hand only the failing test like this, it works


    $ pytest-3 tests/test_dataset_as_flex.py ================================================================================================================================= test session starts =========================================================================================================
    ========================
    platform linux -- Python 3.10.6, pytest-7.1.2, pluggy-1.0.0+repack
    rootdir: /home/picca/debian/science-team/dxtbx, configfile: pytest.ini
    plugins: requests-mock-1.9.3, forked-1.4.0, xdist-2.5.0, mock-3.8.2, dials-data-2.4.0
    collected 9 items

    tests/test_dataset_as_flex.py .........
    [100%]

    ================================================================================================================================== 9 passed in 0.61s ==========================================================================================================
    ========================

    before investigating further, I would like your advice in order to debug this sort of issue.

    first what is the difference between

    pytest and pytest <file>

    thanks for your help

    Frederic

    [1] https://github.com/cctbx/dxtbx
    [2] https://salsa.debian.org/science-team/dxtbx



    full error message

    ___________ test_dataset_as_flex[int-dataset_as_flex_int-bshuf_lz4] ____________

    type_name = 'int', creator = <function bshuf_lz4 at 0x7fccddd35120>
    converter = <Boost.Python.function object at 0x556b03602bb0>

    @pytest.mark.parametrize(
    "creator",
    [
    uncompressed,
    gzip,
    bshuf_lz4,
    ],
    )
    @pytest.mark.parametrize(
    "type_name,converter",
    [
    ("int", dataset_as_flex_int),
    ("float", dataset_as_flex_float),
    ("double", dataset_as_flex_double),
    ],
    )
    def test_dataset_as_flex(type_name, creator, converter):
    # Create an in-memory HDF5 dataset with unique name
    f = h5py.File(type_name + ".h5", "w", driver="core", backing_store=False)

    shape = (20, 20, 20)
    dataset = creator(f, shape, type_name)

    tests/test_dataset_as_flex.py:64:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/test_dataset_as_flex.py:34: in bshuf_lz4
    return file.create_dataset( /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/group.py:161: in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/dataset.py:106: in make_new_dset
    dcpl = filters.fill_dcpl(
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    plist = <h5py._debian_h5py_serial.h5p.PropDCID object at 0x7fccdbbd9760>
    shape = (20, 20, 20), dtype = dtype('int64'), chunks = (10, 10, 10)
    compression = 32008, compression_opts = (0, 2), shuffle = None
    fletcher32 = None, maxshape = None, scaleoffset = None, external = [] allow_unknown_filter = False

    def fill_dcpl(plist, shape, dtype, chunks, compression, compression_opts,
    shuffle, fletcher32, maxshape, scaleoffset, external,
    allow_unknown_filter=False):
    """ Generate a dataset creation property list.

    Undocumented and subject to change without warning.
    """

    if shape is None or shape == ():
    shapetype = 'Empty' if shape is None else 'Scalar'
    if any((chunks, compression, compression_opts, shuffle, fletcher32,
    scaleoffset is not None)):
    raise TypeError(
    f"{shapetype} datasets don't support chunk/filter options"
    )
    if maxshape and maxshape != ():
    raise TypeError(f"{shapetype} datasets cannot be extended")
    return h5p.create(h5p.DATASET_CREATE)

    def rq_tuple(tpl, name):
    """ Check if chunks/maxshape match dataset rank """
    if tpl in (None, True):
    return
    try:
    tpl = tuple(tpl)
    except TypeError:
    raise TypeError('"%s" argument must be None or a sequence object' % name)
    if len(tpl) != len(shape):
    raise ValueError('"%s" must have same rank as dataset shape' % name)

    rq_tuple(chunks, 'chunks')
    rq_tuple(maxshape, 'maxshape')

    if compression is not None:
    if isinstance(compression, FilterRefBase):
    compression_opts = compression.filter_options
    compression = compression.filter_id

    if compression not in encode and not isinstance(compression, int):
    raise ValueError('Compression filter "%s" is unavailable' % compression)

    if compression == 'gzip':
    if compression_opts is None:
    gzip_level = DEFAULT_GZIP
    elif compression_opts in range(10):
    gzip_level = compression_opts
    else:
    raise ValueError("GZIP setting must be an integer from 0-9, not %r" % compression_opts)

    elif compression == 'lzf':
    if compression_opts is not None:
    raise ValueError("LZF compression filter accepts no options")

    elif compression == 'szip':
    if compression_opts is None:
    compression_opts = DEFAULT_SZIP

    err = "SZIP options must be a 2-tuple ('ec'|'nn', even integer 0-32)"
    try:
    szmethod, szpix = compression_opts
    except TypeError:
    raise TypeError(err)
    if szmethod not in ('ec', 'nn'):
    raise ValueError(err)
    if not (0<szpix<=32 and szpix%2 == 0):
    raise ValueError(err)

    elif compression_opts is not None:
    # Can't specify just compression_opts by itself.
    raise TypeError("Compression method must be specified")

    if scaleoffset is not None:
    # scaleoffset must be an integer when it is not None or False,
    # except for integral data, for which scaleoffset == True is
    # permissible (will use SO_INT_MINBITS_DEFAULT)

    if scaleoffset < 0:
    raise ValueError('scale factor must be >= 0')

    if dtype.kind == 'f':
    if scaleoffset is True:
    raise ValueError('integer scaleoffset must be provided for '
    'floating point types')
    elif dtype.kind in ('u', 'i'):
    if scaleoffset is True:
    scaleoffset = h5z.SO_INT_MINBITS_DEFAULT
    else:
    raise TypeError('scale/offset filter only supported for integer '
    'and floating-point types')

    # Scale/offset following fletcher32 in the filter chain will (almost?)
    # always triggers a read error, as most scale/offset settings are
    # lossy. Since fletcher32 must come first (see comment below) we
    # simply prohibit the combination of fletcher32 and scale/offset.
    if fletcher32:
    raise ValueError('fletcher32 cannot be used with potentially lossy'
    ' scale/offset filter')

    external = _normalize_external(external)
    # End argument validation

    if (chunks is True) or \
    (chunks is None and any((shuffle, fletcher32, compression, maxshape,
    scaleoffset is not None))):
    chunks = guess_chunk(shape, maxshape, dtype.itemsize)

    if maxshape is True:
    maxshape = (None,)*len(shape)

    if chunks is not None:
    plist.set_chunk(chunks)
    plist.set_fill_time(h5d.FILL_TIME_ALLOC) # prevent resize glitch

    # scale-offset must come before shuffle and compression
    if scaleoffset is not None:
    if dtype.kind in ('u', 'i'):
    plist.set_scaleoffset(h5z.SO_INT, scaleoffset)
    else: # dtype.kind == 'f'
    plist.set_scaleoffset(h5z.SO_FLOAT_DSCALE, scaleoffset)

    for item in external:
    plist.set_external(*item)

    if shuffle:
    plist.set_shuffle()

    if compression == 'gzip':
    plist.set_deflate(gzip_level)
    elif compression == 'lzf':
    plist.set_filter(h5z.FILTER_LZF, h5z.FLAG_OPTIONAL)
    elif compression == 'szip':
    opts = {'ec': h5z.SZIP_EC_OPTION_MASK, 'nn': h5z.SZIP_NN_OPTION_MASK}
    plist.set_szip(opts[szmethod], szpix)
    elif isinstance(compression, int):
    if not allow_unknown_filter and not h5z.filter_avail(compression):
    raise ValueError("Unknown compression filter number: %s" % compression)
    E ValueError: Unknown compression filter number: 32008

    /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/filters.py:281: ValueError

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From PICCA Frederic-Emmanuel@21:1/5 to All on Thu Sep 1 15:10:01 2022
    For the record, I found it..., the upstream modify the HDF5_PLUGIN_PATH when loading the dxtbx module.

    they guess that they are using conda and override the path. All this is useless on Debian since the plugin are system installed properly.

    Cheers

    Fred


    # Ensures that HDF5 has the conda_base plugin path configured.
    #
    # Ideally this will be properly configured by the conda environment.
    # However, currently the dials-installer will not install a path-correct
    # conda_base folder, so it needs to be updated manually.

    _hdf5_plugin_path = libtbx.env.under_base(os.path.join("lib", "hdf5", "plugin"))

    # Inject via the environment if h5py not used yet, or else use h5py
    if "h5py" not in sys.modules:
    os.environ["HDF5_PLUGIN_PATH"] = (
    _hdf5_plugin_path + os.pathsep + os.getenv("HDF5_PLUGIN_PATH", "")
    )
    else:
    # We've already loaded h5py, so setting the environment variable won't work
    # We need to use the h5py API to add a plugin path
    import h5py

    h5_plugin_paths = [h5py.h5pl.get(i).decode() for i in range(h5py.h5pl.size())]
    if _hdf5_plugin_path not in h5_plugin_paths:
    h5py.h5pl.prepend(_hdf5_plugin_path.encode())

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From PICCA Frederic-Emmanuel@21:1/5 to All on Thu Sep 1 15:00:01 2022
    Hello Neil

    Looks like you need a -v option to see more detail.

    thanks for the advices, I found by removing files one by one that the failling behavious is due to the import of the library itself.
    the failing test PASS by himself, but if I add a useless import dxtbx inside, it failes.

    so there is some interaction between the dxtbx package and the h5py library, I will dig it.

    Cheers

    Fred

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)