Forum: >>> Magnum BBS <<<

Re: dxtbx pytest issue

From Neil Williams@21:1/5 to All on Thu Sep 1 14:10:01 2022

On Thu, 1 Sep 2022 13:32:36 +0200 (CEST)
PICCA Frederic-Emmanuel <frederic-emmanuel.picca@synchrotron-soleil.fr>
wrote:

Hello,

I am still trying to package[2] dxtbx[1], and now I end up with
something strange. When I run the test suite during the build I have
a failure like this

tests/test_dataset_as_flex.py ..F..F..F
[ 2%]

Looks like you need a -v option to see more detail.

I put the error message bellow, it is quite long

now If I execute by hand only the failing test like this, it works

$ pytest-3 tests/test_dataset_as_flex.py

The 2% indicates that other tests have executed before the test series
gets to test_dataset_as_flex.py. When you run it directly, no other
tests are run. This could indicate that there is interference between
tests (an upstream problem) in that the tests are not sufficiently
discrete and are affected by which other tests are executed before the
failing test. pytest has support to solve these problems, if the test
suite is written correctly.

It could also be a version problem with any of the build-dependencies
(like h5py), so check for various requirements*.txt and see if upstream
have a .gitlabci.yml or .github config which clarifies exactly what
versions are used.

You may need to exclude certain tests in case, for example, h5py in
Debian is built with options that dxtbx cannot handle.

before investigating further, I would like your advice in order to
debug this sort of issue.

first what is the difference between

pytest and pytest <file>

pytest scans the directory tree to collect all the tests it can find.

pytest <file> only collects the tests from that file.

Check for a pytest.ini in the upstream source tree and also setup.cfg

Use the -v option to see more of what is going on and the
--collect-only option is useful too.

See pytest --help as other filtering options are available as well.

You'll need to have a local build environment (fakeroot debian/rules
build etc.) & dig into the Python unit test methods and see what is
going on.

thanks for your help

Frederic

[1] https://github.com/cctbx/dxtbx
[2] https://salsa.debian.org/science-team/dxtbx

full error message

___________ test_dataset_as_flex[int-dataset_as_flex_int-bshuf_lz4] ____________

type_name = 'int', creator = <function bshuf_lz4 at 0x7fccddd35120>
converter = <Boost.Python.function object at 0x556b03602bb0>

@pytest.mark.parametrize(
"creator",
[
uncompressed,
gzip,
bshuf_lz4,
],
)
@pytest.mark.parametrize(
"type_name,converter",
[
("int", dataset_as_flex_int),
("float", dataset_as_flex_float),
("double", dataset_as_flex_double),
],
)
def test_dataset_as_flex(type_name, creator, converter):
# Create an in-memory HDF5 dataset with unique name
f = h5py.File(type_name + ".h5", "w", driver="core", backing_store=False)
shape = (20, 20, 20)

dataset = creator(f, shape, type_name)

tests/test_dataset_as_flex.py:64:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ tests/test_dataset_as_flex.py:34: in bshuf_lz4
return file.create_dataset( /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/group.py:161:
in create_dataset dsid = dataset.make_new_dset(group, shape, dtype,
data, name, **kwds) /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/dataset.py:106: in make_new_dset dcpl = filters.fill_dcpl( _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

plist = <h5py._debian_h5py_serial.h5p.PropDCID object at
0x7fccdbbd9760> shape = (20, 20, 20), dtype = dtype('int64'), chunks 0x7fccdbbd9760> = (10, 10, 10)
compression = 32008, compression_opts = (0, 2), shuffle = None
fletcher32 = None, maxshape = None, scaleoffset = None, external = [] allow_unknown_filter = False

def fill_dcpl(plist, shape, dtype, chunks, compression,
compression_opts, shuffle, fletcher32, maxshape, scaleoffset,
external, allow_unknown_filter=False):
""" Generate a dataset creation property list.

Undocumented and subject to change without warning.
"""

if shape is None or shape == ():
shapetype = 'Empty' if shape is None else 'Scalar'
if any((chunks, compression, compression_opts, shuffle, fletcher32, scaleoffset is not None)):
raise TypeError(
f"{shapetype} datasets don't support chunk/filter options" )
if maxshape and maxshape != ():
raise TypeError(f"{shapetype} datasets cannot be
extended") return h5p.create(h5p.DATASET_CREATE)

def rq_tuple(tpl, name):
""" Check if chunks/maxshape match dataset rank """
if tpl in (None, True):
return
try:
tpl = tuple(tpl)
except TypeError:
raise TypeError('"%s" argument must be None or a
sequence object' % name) if len(tpl) != len(shape):
raise ValueError('"%s" must have same rank as dataset
shape' % name)
rq_tuple(chunks, 'chunks')
rq_tuple(maxshape, 'maxshape')

if compression is not None:
if isinstance(compression, FilterRefBase):
compression_opts = compression.filter_options
compression = compression.filter_id

if compression not in encode and not
isinstance(compression, int): raise ValueError('Compression filter
"%s" is unavailable' % compression)
if compression == 'gzip':
if compression_opts is None:
gzip_level = DEFAULT_GZIP
elif compression_opts in range(10):
gzip_level = compression_opts
else:
raise ValueError("GZIP setting must be an integer
from 0-9, not %r" % compression_opts)
elif compression == 'lzf':
if compression_opts is not None:
raise ValueError("LZF compression filter accepts
no options")
elif compression == 'szip':
if compression_opts is None:
compression_opts = DEFAULT_SZIP

err = "SZIP options must be a 2-tuple ('ec'|'nn',
even integer 0-32)" try:
szmethod, szpix = compression_opts
except TypeError:
raise TypeError(err)
if szmethod not in ('ec', 'nn'):
raise ValueError(err)
if not (0<szpix<=32 and szpix%2 == 0):
raise ValueError(err)

elif compression_opts is not None:
# Can't specify just compression_opts by itself.
raise TypeError("Compression method must be specified")

if scaleoffset is not None:
# scaleoffset must be an integer when it is not None or
False, # except for integral data, for which scaleoffset == True is
# permissible (will use SO_INT_MINBITS_DEFAULT)

if scaleoffset < 0:
raise ValueError('scale factor must be >= 0')

if dtype.kind == 'f':
if scaleoffset is True:
raise ValueError('integer scaleoffset must be
provided for ' 'floating point types')
elif dtype.kind in ('u', 'i'):
if scaleoffset is True:
scaleoffset = h5z.SO_INT_MINBITS_DEFAULT
else:
raise TypeError('scale/offset filter only supported
for integer ' 'and floating-point types')

# Scale/offset following fletcher32 in the filter chain
will (almost?) # always triggers a read error, as most scale/offset
settings are # lossy. Since fletcher32 must come first (see comment
below) we # simply prohibit the combination of fletcher32 and
scale/offset. if fletcher32:
raise ValueError('fletcher32 cannot be used with
potentially lossy' ' scale/offset filter')

external = _normalize_external(external)
# End argument validation

if (chunks is True) or \
(chunks is None and any((shuffle, fletcher32, compression,
maxshape, scaleoffset is not None))):
chunks = guess_chunk(shape, maxshape, dtype.itemsize)

if maxshape is True:
maxshape = (None,)*len(shape)

if chunks is not None:
plist.set_chunk(chunks)
plist.set_fill_time(h5d.FILL_TIME_ALLOC) # prevent
resize glitch
# scale-offset must come before shuffle and compression
if scaleoffset is not None:
if dtype.kind in ('u', 'i'):
plist.set_scaleoffset(h5z.SO_INT, scaleoffset)
else: # dtype.kind == 'f'
plist.set_scaleoffset(h5z.SO_FLOAT_DSCALE,
scaleoffset)
for item in external:
plist.set_external(*item)

if shuffle:
plist.set_shuffle()

if compression == 'gzip':
plist.set_deflate(gzip_level)
elif compression == 'lzf':
plist.set_filter(h5z.FILTER_LZF, h5z.FLAG_OPTIONAL)
elif compression == 'szip':
opts = {'ec': h5z.SZIP_EC_OPTION_MASK, 'nn': h5z.SZIP_NN_OPTION_MASK} plist.set_szip(opts[szmethod], szpix)
elif isinstance(compression, int):
if not allow_unknown_filter and not h5z.filter_avail(compression):

raise ValueError("Unknown compression filter number:
%s" % compression)

E ValueError: Unknown compression filter number: 32008

/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/filters.py:281: ValueError

--
Neil Williams
=============
https://linux.codehelp.co.uk/

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEf3HB6ceOc10DYMbM8WfkPIFDtoIFAmMQn4AACgkQ8WfkPIFD toKCpxAAv3QWRu0Qcpj/7TsVQ8h7+dUacyAjR1jf/y/FT35O0lrmbGpjewgFOqRF FSSJTOcKEjFO9YuYVmIuSJxxK6ax7Z0DIl+Y6IM7P9dw35Ys9cVuBFSi9VhWpwzS GO5VKj/3A27/bD0H4yyzmc0oh86bD2Wcu1Z49rARf5v2lvJpUMGki7ck3o8wto+Q 3eHhGM0Pj+CePN3BZdI6ersCMQosgnHL22tAtHvAkC45uW77iN2T6Se3ZC1O0OOK M4bpFDJnv51M1sfflR511SpsrDFkwB0ozxLIgP59YBPvmMo76aOxI8/u78om2MDu JEcdX1wVjX0rVmSavDe165MSS/KVUZJMhpAcd7Jwk6Rx5Hv/C5fFtjYmZpNL6/A2 ER8J6VhqFruK90/2DFr8niqTFH9eJGP7j8zXwzRZ7gN/KqaW5jYmBlv+KkkWnmcr WscuWv47ASS+FS6dNAiBZVM0RL9HOmZG0bxH2eN4pnMlFzZE4/cbwfqwI/EWa5Lk +PCcONF/aL7n8VGl63oEW7WdRkhfcvDu9wxg+C0W3dq9ZJhhdcO0V2ncIJnE6fs5 YViid6wRVuTZSuEsJlnNsGHY5lB9z2NXvuMMp4sJ7Sx8QCOyWKQhOGMDvmcRGyLa JjyKpaMS+dVwvrR9jb9HP2gmGoR03+XMuiJiDm8hTVufPPZBmF8=
=gvkT
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From PICCA Frederic-Emmanuel@21:1/5 to All on Thu Sep 1 13:50:01 2022

Hello,

I am still trying to package[2] dxtbx[1], and now I end up with something strange. When I run the test suite during the build I have a failure like this

tests/test_dataset_as_flex.py ..F..F..F [ 2%]

I put the error message bellow, it is quite long

now If I execute by hand only the failing test like this, it works

$ pytest-3 tests/test_dataset_as_flex.py ================================================================================================================================= test session starts =========================================================================================================
========================
platform linux -- Python 3.10.6, pytest-7.1.2, pluggy-1.0.0+repack
rootdir: /home/picca/debian/science-team/dxtbx, configfile: pytest.ini
plugins: requests-mock-1.9.3, forked-1.4.0, xdist-2.5.0, mock-3.8.2, dials-data-2.4.0
collected 9 items

tests/test_dataset_as_flex.py .........
[100%]

================================================================================================================================== 9 passed in 0.61s ==========================================================================================================
========================

before investigating further, I would like your advice in order to debug this sort of issue.

first what is the difference between

pytest and pytest <file>

thanks for your help

Frederic

[1] https://github.com/cctbx/dxtbx
[2] https://salsa.debian.org/science-team/dxtbx

full error message

___________ test_dataset_as_flex[int-dataset_as_flex_int-bshuf_lz4] ____________

type_name = 'int', creator = <function bshuf_lz4 at 0x7fccddd35120>
converter = <Boost.Python.function object at 0x556b03602bb0>

@pytest.mark.parametrize(
"creator",
[
uncompressed,
gzip,
bshuf_lz4,
],
)
@pytest.mark.parametrize(
"type_name,converter",
[
("int", dataset_as_flex_int),
("float", dataset_as_flex_float),
("double", dataset_as_flex_double),
],
)
def test_dataset_as_flex(type_name, creator, converter):
# Create an in-memory HDF5 dataset with unique name
f = h5py.File(type_name + ".h5", "w", driver="core", backing_store=False)

shape = (20, 20, 20)

dataset = creator(f, shape, type_name)

tests/test_dataset_as_flex.py:64:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tests/test_dataset_as_flex.py:34: in bshuf_lz4
return file.create_dataset( /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/group.py:161: in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) /usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/dataset.py:106: in make_new_dset
dcpl = filters.fill_dcpl(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

plist = <h5py._debian_h5py_serial.h5p.PropDCID object at 0x7fccdbbd9760>
shape = (20, 20, 20), dtype = dtype('int64'), chunks = (10, 10, 10)
compression = 32008, compression_opts = (0, 2), shuffle = None
fletcher32 = None, maxshape = None, scaleoffset = None, external = [] allow_unknown_filter = False

def fill_dcpl(plist, shape, dtype, chunks, compression, compression_opts,
shuffle, fletcher32, maxshape, scaleoffset, external,
allow_unknown_filter=False):
""" Generate a dataset creation property list.

Undocumented and subject to change without warning.
"""

if shape is None or shape == ():
shapetype = 'Empty' if shape is None else 'Scalar'
if any((chunks, compression, compression_opts, shuffle, fletcher32,
scaleoffset is not None)):
raise TypeError(
f"{shapetype} datasets don't support chunk/filter options"
)
if maxshape and maxshape != ():
raise TypeError(f"{shapetype} datasets cannot be extended")
return h5p.create(h5p.DATASET_CREATE)

def rq_tuple(tpl, name):
""" Check if chunks/maxshape match dataset rank """
if tpl in (None, True):
return
try:
tpl = tuple(tpl)
except TypeError:
raise TypeError('"%s" argument must be None or a sequence object' % name)
if len(tpl) != len(shape):
raise ValueError('"%s" must have same rank as dataset shape' % name)

rq_tuple(chunks, 'chunks')
rq_tuple(maxshape, 'maxshape')

if compression is not None:
if isinstance(compression, FilterRefBase):
compression_opts = compression.filter_options
compression = compression.filter_id

if compression not in encode and not isinstance(compression, int):
raise ValueError('Compression filter "%s" is unavailable' % compression)

if compression == 'gzip':
if compression_opts is None:
gzip_level = DEFAULT_GZIP
elif compression_opts in range(10):
gzip_level = compression_opts
else:
raise ValueError("GZIP setting must be an integer from 0-9, not %r" % compression_opts)

elif compression == 'lzf':
if compression_opts is not None:
raise ValueError("LZF compression filter accepts no options")

elif compression == 'szip':
if compression_opts is None:
compression_opts = DEFAULT_SZIP

err = "SZIP options must be a 2-tuple ('ec'|'nn', even integer 0-32)"
try:
szmethod, szpix = compression_opts
except TypeError:
raise TypeError(err)
if szmethod not in ('ec', 'nn'):
raise ValueError(err)
if not (0<szpix<=32 and szpix%2 == 0):
raise ValueError(err)

elif compression_opts is not None:
# Can't specify just compression_opts by itself.
raise TypeError("Compression method must be specified")

if scaleoffset is not None:
# scaleoffset must be an integer when it is not None or False,
# except for integral data, for which scaleoffset == True is
# permissible (will use SO_INT_MINBITS_DEFAULT)

if scaleoffset < 0:
raise ValueError('scale factor must be >= 0')

if dtype.kind == 'f':
if scaleoffset is True:
raise ValueError('integer scaleoffset must be provided for '
'floating point types')
elif dtype.kind in ('u', 'i'):
if scaleoffset is True:
scaleoffset = h5z.SO_INT_MINBITS_DEFAULT
else:
raise TypeError('scale/offset filter only supported for integer '
'and floating-point types')

# Scale/offset following fletcher32 in the filter chain will (almost?)
# always triggers a read error, as most scale/offset settings are
# lossy. Since fletcher32 must come first (see comment below) we
# simply prohibit the combination of fletcher32 and scale/offset.
if fletcher32:
raise ValueError('fletcher32 cannot be used with potentially lossy'
' scale/offset filter')

external = _normalize_external(external)
# End argument validation

if (chunks is True) or \
(chunks is None and any((shuffle, fletcher32, compression, maxshape,
scaleoffset is not None))):
chunks = guess_chunk(shape, maxshape, dtype.itemsize)

if maxshape is True:
maxshape = (None,)*len(shape)

if chunks is not None:
plist.set_chunk(chunks)
plist.set_fill_time(h5d.FILL_TIME_ALLOC) # prevent resize glitch

# scale-offset must come before shuffle and compression
if scaleoffset is not None:
if dtype.kind in ('u', 'i'):
plist.set_scaleoffset(h5z.SO_INT, scaleoffset)
else: # dtype.kind == 'f'
plist.set_scaleoffset(h5z.SO_FLOAT_DSCALE, scaleoffset)

for item in external:
plist.set_external(*item)

if shuffle:
plist.set_shuffle()

if compression == 'gzip':
plist.set_deflate(gzip_level)
elif compression == 'lzf':
plist.set_filter(h5z.FILTER_LZF, h5z.FLAG_OPTIONAL)
elif compression == 'szip':
opts = {'ec': h5z.SZIP_EC_OPTION_MASK, 'nn': h5z.SZIP_NN_OPTION_MASK}
plist.set_szip(opts[szmethod], szpix)
elif isinstance(compression, int):
if not allow_unknown_filter and not h5z.filter_avail(compression):

raise ValueError("Unknown compression filter number: %s" % compression)

E ValueError: Unknown compression filter number: 32008

/usr/lib/python3/dist-packages/h5py/_debian_h5py_serial/_hl/filters.py:281: ValueError

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From PICCA Frederic-Emmanuel@21:1/5 to All on Thu Sep 1 15:10:01 2022

For the record, I found it..., the upstream modify the HDF5_PLUGIN_PATH when loading the dxtbx module.

they guess that they are using conda and override the path. All this is useless on Debian since the plugin are system installed properly.

Cheers

Fred

# Ensures that HDF5 has the conda_base plugin path configured.
#
# Ideally this will be properly configured by the conda environment.
# However, currently the dials-installer will not install a path-correct
# conda_base folder, so it needs to be updated manually.

_hdf5_plugin_path = libtbx.env.under_base(os.path.join("lib", "hdf5", "plugin"))

# Inject via the environment if h5py not used yet, or else use h5py
if "h5py" not in sys.modules:
os.environ["HDF5_PLUGIN_PATH"] = (
_hdf5_plugin_path + os.pathsep + os.getenv("HDF5_PLUGIN_PATH", "")
)
else:
# We've already loaded h5py, so setting the environment variable won't work
# We need to use the h5py API to add a plugin path
import h5py

h5_plugin_paths = [h5py.h5pl.get(i).decode() for i in range(h5py.h5pl.size())]
if _hdf5_plugin_path not in h5_plugin_paths:
h5py.h5pl.prepend(_hdf5_plugin_path.encode())

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From PICCA Frederic-Emmanuel@21:1/5 to All on Thu Sep 1 15:00:01 2022

Hello Neil

Looks like you need a -v option to see more detail.

thanks for the advices, I found by removing files one by one that the failling behavious is due to the import of the library itself.
the failing test PASS by himself, but if I add a useless import dxtbx inside, it failes.

so there is some interaction between the dxtbx package and the h5py library, I will dig it.

Cheers

Fred

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	103:08:25
Calls:	6,700
Files:	12,232
Messages:	5,350,168

Re: dxtbx pytest issue

Who's Online

System Info