• Exclude 'None' from list comprehension of dicts

    From Loris Bennett@21:1/5 to All on Thu Aug 4 13:51:24 2022
    Hi,

    I am constructing a list of dictionaries via the following list
    comprehension:

    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

    get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

    filtered_data = list(filter(None, data))

    but is there a more elegant way?

    Cheers,

    Loris

    --
    This signature is currently under construction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Loris Bennett on Thu Aug 4 12:37:39 2022
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]
    ...
    filtered_data = list(filter(None, data))

    You could have "get_job_efficiency_dict" return an iterable
    that yields either zero dictionaries or one dictionary.
    For example, a list with either zero entries or one entry.

    Then, use "itertools.chain.from_iterable" to merge all those
    lists with empty lists effectively removed. E.g.,

    print( list( itertools.chain.from_iterable( [[ 1 ], [], [ 2 ], [ 3 ]])))

    will print

    [1, 2, 3]

    . Or, consider a boring old "for" loop:

    data = []
    for job_id in job_ids:
    dictionary = get_job_efficiency_dict( job_id )
    if dictionary:
    data.append( dictionary )

    . It might not be "elegant", but it's quite readable to me.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Stefan Ram on Thu Aug 4 14:58:28 2022
    ram@zedat.fu-berlin.de (Stefan Ram) writes:

    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]
    ...
    filtered_data = list(filter(None, data))

    You could have "get_job_efficiency_dict" return an iterable
    that yields either zero dictionaries or one dictionary.
    For example, a list with either zero entries or one entry.

    Then, use "itertools.chain.from_iterable" to merge all those
    lists with empty lists effectively removed. E.g.,

    print( list( itertools.chain.from_iterable( [[ 1 ], [], [ 2 ], [ 3 ]])))

    will print

    [1, 2, 3]

    'itertool' is a bit of a blind-spot of mine, so thanks for pointing that
    out.

    . Or, consider a boring old "for" loop:

    data = []
    for job_id in job_ids:
    dictionary = get_job_efficiency_dict( job_id )
    if dictionary:
    data.append( dictionary )

    . It might not be "elegant", but it's quite readable to me.

    To me to. However, 'data' can occasionally consist of many 10,000s of elements. Would there be a potential performance problem here? Even if
    there is, it wouldn't be so bad, as the aggregation of the data is not time-critical and only occurs once a month. Still, I wouldn't want the
    program to be unnecessarily inefficient.

    Cheers,

    Loris

    --
    This signature is currently under construction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Loris Bennett on Thu Aug 4 13:41:12 2022
    "Loris Bennett" <loris.bennett@fu-berlin.de> writes:
    To me to. However, 'data' can occasionally consist of many 10,000s of >elements. Would there be a potential performance problem here?

    Yes, the "for" loop seems to be slower here by a factor of
    about 2. But in absolute terms, the extra time is negligible
    if the process is not repeated in a loop.

    import timeit

    for _ in range( 5 ):

    start_time = timeit.default_timer()
    data = []
    for i in range( 100000 ):
    if i:
    data.append( i )
    print( timeit.default_timer() - start_time )

    start_time = timeit.default_timer()
    data =[ i for i in range( 100000 )]
    filtered_data = list( filter( None, data ))
    print( timeit.default_timer() - start_time )

    print()

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Antoon Pardon@21:1/5 to All on Thu Aug 4 20:35:55 2022
    Op 4/08/2022 om 13:51 schreef Loris Bennett:
    Hi,

    I am constructing a list of dictionaries via the following list comprehension:

    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

    get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

    filtered_data = list(filter(None, data))

    but is there a more elegant way?

    Just wondering, why don't you return an empty dictionary in case of a failure? In that case your list will be all dictionaries and empty ones will be processed
    fast enough.

    --
    Antoon Pardon.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MRAB@21:1/5 to Loris Bennett on Thu Aug 4 19:50:16 2022
    On 2022-08-04 12:51, Loris Bennett wrote:
    Hi,

    I am constructing a list of dictionaries via the following list comprehension:

    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

    get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

    filtered_data = list(filter(None, data))

    but is there a more elegant way?

    I'm not sure how elegant it is, but:

    data = [result for job_id in job_ids if (result := get_job_efficiency_dict(job_id)) is not None]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Weatherby,Gerard@21:1/5 to All on Thu Aug 4 19:01:03 2022
    T3I6DQoNCmRhdGEgPSBbZCBmb3IgZCBpbiBbZ2V0X2pvYl9lZmZpY2llbmN5X2RpY3Qoam9iX2lk KSBmb3Igam9iX2lkIGluIGpvYl9pZHNdIGlmIGQgaXMgbm90IE5vbmVdDQoNCm9yDQoNCmZvciBq b2JfaWQgaW4gam9iX2lkczoNCiAgICBpZiAoZCA6PSBnZXRfam9iX2VmZmljaWVuY3lfZGljdChq b2JfaWQpKSBpcyBub3QgTm9uZToNCiAgICAgIGRhdGEuYXBwZW5kKGQpDQoNCg0KUGVyc29uYWxs eSwgSeKAmWQgZ290IHdpdGggdGhlIGxhdHRlciBpbiBteSBvd24gY29kZS4NCg0K4oCUDQpHZXJh cmQgV2VhdGhlcmJ5IHwgQXBwbGljYXRpb24gQXJjaGl0ZWN0IE5NUmJveCB8IE5BTiB8IERlcGFy dG1lbnQgb2YgTW9sZWN1bGFyIEJpb2xvZ3kgYW5kIEJpb3BoeXNpY3MNCiBVQ29ubiBIZWFsdGgg MjYzIEZhcm1pbmd0b24gQXZlbnVlLCBGYXJtaW5ndG9uLCBDVCAwNjAzMC02NDA2IHVjaGMuZWR1 DQpPbiBBdWcgNCwgMjAyMiwgMjo1MiBQTSAtMDQwMCwgTVJBQiA8cHl0aG9uQG1yYWJhcm5ldHQu cGx1cy5jb20+LCB3cm90ZToNCioqKiBBdHRlbnRpb246IFRoaXMgaXMgYW4gZXh0ZXJuYWwgZW1h aWwuIFVzZSBjYXV0aW9uIHJlc3BvbmRpbmcsIG9wZW5pbmcgYXR0YWNobWVudHMgb3IgY2xpY2tp bmcgb24gbGlua3MuICoqKg0KDQpPbiAyMDIyLTA4LTA0IDEyOjUxLCBMb3JpcyBCZW5uZXR0IHdy b3RlOg0KSGksDQoNCkkgYW0gY29uc3RydWN0aW5nIGEgbGlzdCBvZiBkaWN0aW9uYXJpZXMgdmlh IHRoZSBmb2xsb3dpbmcgbGlzdA0KY29tcHJlaGVuc2lvbjoNCg0KZGF0YSA9IFtnZXRfam9iX2Vm ZmljaWVuY3lfZGljdChqb2JfaWQpIGZvciBqb2JfaWQgaW4gam9iX2lkc10NCg0KSG93ZXZlciwN Cg0KZ2V0X2pvYl9lZmZpY2llbmN5X2RpY3Qoam9iX2lkKQ0KDQp1c2VzICdzdWJwcm9jZXNzLlBv cGVuJyB0byBydW4gYW4gZXh0ZXJuYWwgcHJvZ3JhbSBhbmQgdGhpcyBjYW4gZmFpbC4NCkluIHRo aXMgY2FzZSwgdGhlIGRpY3Qgc2hvdWxkIGp1c3QgYmUgb21pdHRlZCBmcm9tICdkYXRhJy4NCg0K SSBjYW4gaGF2ZSAnZ2V0X2pvYl9lZmZpY2llbmN5X2RpY3QnIHJldHVybiAnTm9uZScgYW5kIHRo ZW4gcnVuDQoNCmZpbHRlcmVkX2RhdGEgPSBsaXN0KGZpbHRlcihOb25lLCBkYXRhKSkNCg0KYnV0 IGlzIHRoZXJlIGEgbW9yZSBlbGVnYW50IHdheT8NCg0KSSdtIG5vdCBzdXJlIGhvdyBlbGVnYW50 IGl0IGlzLCBidXQ6DQoNCmRhdGEgPSBbcmVzdWx0IGZvciBqb2JfaWQgaW4gam9iX2lkcyBpZiAo cmVzdWx0IDo9DQpnZXRfam9iX2VmZmljaWVuY3lfZGljdChqb2JfaWQpKSBpcyBub3QgTm9uZV0N Ci0tDQpodHRwczovL3VybGRlZmVuc2UuY29tL3YzL19faHR0cHM6Ly9tYWlsLnB5dGhvbi5vcmcv bWFpbG1hbi9saXN0aW5mby9weXRob24tbGlzdF9fOyEhQ25fVVhfcDMhaXF4aFlNb0hjWVFZMXhv aEdDcGFmcEJLWklVY0dFVjZaajEtUkx6T0NGNjFUVVhHci04b2g5SEx1TC1IOHc0Z3hnREN5cGNP WU9Za3FOWExKeFVJcWhXZCQNCg==

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Loris Bennett@21:1/5 to Antoon Pardon on Fri Aug 5 07:50:25 2022
    Antoon Pardon <antoon.pardon@vub.be> writes:

    Op 4/08/2022 om 13:51 schreef Loris Bennett:
    Hi,

    I am constructing a list of dictionaries via the following list
    comprehension:

    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

    get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

    filtered_data = list(filter(None, data))

    but is there a more elegant way?

    Just wondering, why don't you return an empty dictionary in case of a failure?
    In that case your list will be all dictionaries and empty ones will be processed
    fast enough.

    When the list of dictionaries is processed, I would have to check each
    element to see if it is empty. That strikes me as being less efficient
    than filtering out the empty dictionaries in one go, although obviously
    one would need to benchmark that.

    Cheers,

    Loris


    --
    This signature is currently under construction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From avi.e.gross@gmail.com@21:1/5 to Antoon Pardon on Fri Aug 5 16:44:37 2022
    Benchmarking aside, Lori, there are some ideas about such things.

    You are describing a case, in abstract terms, where an algorithm grinds away and produces results that may include an occasional or a common unwanted result. The question is when to eliminate the unwanted. Do you eliminate
    them immediately at the expense of some extra code at that point, or do you want till much later or even at the end?

    The answer is it DEPENDS and let me point out that many problems can start multi-dimensional (as in processing a 5-D matrix) and produce a linear
    output (as in a 1-D list) or it can be the other way around. Sometimes what
    you want eliminated is something like duplicates. Is it easier to remove duplicates as they happen, or later when you have some huge data structure containing oodles of copies of each duplicate?

    You can imagine many scenarios and sometimes you need to also look at costs. What does it cost to check if a token is valid, as in can the word be found
    in a dictionary? Is it cheaper to wait till you have lots of words including duplicates and do one lookup to find a bad word then mark it so future occurrences are removed without that kind of lookup? Or is it better to read
    I the dictionary once and hash it so later access is easy?

    In your case, you have a single simple criterion for recognizing an item to leave out. So the above may not apply. But I note we often use pre-created software that simply returns a result and then the only reasonable way to remove things is after calling it. Empty or unwanted items may take up some room, though, so a long-running process may be better off pruning as it
    goes.

    -----Original Message-----
    From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Loris Bennett
    Sent: Friday, August 5, 2022 1:50 AM
    To: python-list@python.org
    Subject: Re: Exclude 'None' from list comprehension of dicts

    Antoon Pardon <antoon.pardon@vub.be> writes:

    Op 4/08/2022 om 13:51 schreef Loris Bennett:
    Hi,

    I am constructing a list of dictionaries via the following list
    comprehension:

    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

    get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

    filtered_data = list(filter(None, data))

    but is there a more elegant way?

    Just wondering, why don't you return an empty dictionary in case of a
    failure?
    In that case your list will be all dictionaries and empty ones will be processed fast enough.

    When the list of dictionaries is processed, I would have to check each
    element to see if it is empty. That strikes me as being less efficient than filtering out the empty dictionaries in one go, although obviously one would need to benchmark that.

    Cheers,

    Loris


    --
    This signature is currently under construction.
    --
    https://mail.python.org/mailman/listinfo/python-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Antoon Pardon@21:1/5 to All on Mon Aug 15 14:56:07 2022
    Op 5/08/2022 om 07:50 schreef Loris Bennett:
    Antoon Pardon<antoon.pardon@vub.be> writes:

    Op 4/08/2022 om 13:51 schreef Loris Bennett:
    Hi,

    I am constructing a list of dictionaries via the following list
    comprehension:

    data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

    get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

    filtered_data = list(filter(None, data))

    but is there a more elegant way?
    Just wondering, why don't you return an empty dictionary in case of a failure?
    In that case your list will be all dictionaries and empty ones will be processed
    fast enough.
    When the list of dictionaries is processed, I would have to check each element to see if it is empty. That strikes me as being less efficient
    than filtering out the empty dictionaries in one go, although obviously
    one would need to benchmark that.

    I may be missing something but why would you have to check each element
    to see if it is empty? What would go wrong if you just treated empty dictionaries the same as non-empty directories?

    --
    Antoon Pardon.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dn@21:1/5 to Antoon Pardon on Tue Aug 16 10:20:29 2022
    On 16/08/2022 00.56, Antoon Pardon wrote:
    Op 5/08/2022 om 07:50 schreef Loris Bennett:
    Antoon Pardon<antoon.pardon@vub.be>  writes:

    Op 4/08/2022 om 13:51 schreef Loris Bennett:
    Hi,

    I am constructing a list of dictionaries via the following list
    comprehension:

        data = [get_job_efficiency_dict(job_id) for job_id in job_ids]

    However,

        get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail.
    In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

        filtered_data = list(filter(None, data))

    but is there a more elegant way?
    Just wondering, why don't you return an empty dictionary in case of a
    failure?
    In that case your list will be all dictionaries and empty ones will
    be processed
    fast enough.
    When the list of dictionaries is processed, I would have to check each
    element to see if it is empty.  That strikes me as being less efficient
    than filtering out the empty dictionaries in one go, although obviously
    one would need to benchmark that.

    I may be missing something but why would you have to check each element
    to see if it is empty? What would go wrong if you just treated empty dictionaries the same as non-empty directories?

    'Truthiness':-

    bool( {} )
    False
    bool( { "a":1 } )
    True

    --
    Regards,
    =dn

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Antoon Pardon@21:1/5 to All on Tue Aug 16 21:31:56 2022
    Op 16/08/2022 om 00:20 schreef dn:
    On 16/08/2022 00.56, Antoon Pardon wrote:
    Op 5/08/2022 om 07:50 schreef Loris Bennett:
    Antoon Pardon<antoon.pardon@vub.be>   writes:

    Op 4/08/2022 om 13:51 schreef Loris Bennett:
    Hi,

    I am constructing a list of dictionaries via the following list
    comprehension:

        data = [get_job_efficiency_dict(job_id) for job_id in job_ids] >>>>>
    However,

        get_job_efficiency_dict(job_id)

    uses 'subprocess.Popen' to run an external program and this can fail. >>>>> In this case, the dict should just be omitted from 'data'.

    I can have 'get_job_efficiency_dict' return 'None' and then run

        filtered_data = list(filter(None, data))

    but is there a more elegant way?
    Just wondering, why don't you return an empty dictionary in case of a
    failure?
    In that case your list will be all dictionaries and empty ones will
    be processed
    fast enough.
    When the list of dictionaries is processed, I would have to check each
    element to see if it is empty.  That strikes me as being less efficient >>> than filtering out the empty dictionaries in one go, although obviously
    one would need to benchmark that.
    I may be missing something but why would you have to check each element
    to see if it is empty? What would go wrong if you just treated empty
    dictionaries the same as non-empty directories?
    'Truthiness':-

    In what way is that relevant in this case?

    --
    Antoon Pardon

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)