• Re: Camera interfaces

    From Dimiter_Popoff@21:1/5 to Don Y on Thu Dec 29 15:33:46 2022
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    Not use about USB, perhaps USB cameras are covered in the standard
    (yet to deal with that one).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Thu Dec 29 06:16:53 2022
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays. Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Thu Dec 29 12:06:33 2022
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Using a DRAM in that manner would only give you a single bit value for
    each pixel (maybe some more modern memories store multiple bits in a
    cell so you get a few grey levels).

    There are some CMOS sensors that let you address pixels individually and
    in a random order (like you got with the DRAM) but by its nature, such a readout method tends to be slow, and space inefficient, so these
    interfaces tend to be only available on smaller camera arrays.

    That is why most sensors read out via row/column shift registers to a
    pixel serial (maybe multiple pixels per clock) output, and if the camera includes its own A/D conversion, might serialize the results to minimize interconnect.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Richard Damon on Thu Dec 29 19:45:40 2022
    On 12/29/2022 19:21, Richard Damon wrote:
    On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    If you are looking for the very latest standards, yes. Enough data is
    out there to handle a lot of basic MIPI operations. Since the small
    player isn't going to be trying to implement the low level interface themselves (or at least shouldn't be trying to),

    So how does one use a MIPI camera without using the low level interface?

    unless you are trying
    to work with a bleeding edge camera (which you probably can't actually
    buy if you are a small player) you can tend to find enough information
    to use the camera.

    That is fair enough, as long as we are talking about some internal
    sensor specifics of the "bleeding edge" cameras.


    My experiance is if you can actually buy the camera normally, there will
    be the data available to use it.

    That's really reassuring. I am more interested in talking to MIPI
    display modules than to cameras (at least the sequence is this) but
    still.

    The big problem is "Grey Market"
    cameras, via unauthorized distributors you are at the mercy of the distributor to get you the needed data.

    Don't they conform to the MIPI standard? (which I have no access to).



    Not use about USB, perhaps USB cameras are covered in the standard
    (yet to deal with that one).

    There is a USB video standard, and many USB cameras can just be plugged
    in and used.

    OK, I thought I had seen that some years ago. Might be an escape (though cameras found in phones and tablets etc. are probably all MIPI).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to All on Thu Dec 29 12:21:44 2022
    On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    If you are looking for the very latest standards, yes. Enough data is
    out there to handle a lot of basic MIPI operations. Since the small
    player isn't going to be trying to implement the low level interface
    themselves (or at least shouldn't be trying to), unless you are trying
    to work with a bleeding edge camera (which you probably can't actually
    buy if you are a small player) you can tend to find enough information
    to use the camera.

    My experiance is if you can actually buy the camera normally, there will
    be the data available to use it. The big problem is "Grey Market"
    cameras, via unauthorized distributors you are at the mercy of the
    distributor to get you the needed data.


    Not use about USB, perhaps USB cameras are covered in the standard
    (yet to deal with that one).

    There is a USB video standard, and many USB cameras can just be plugged
    in and used.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to All on Thu Dec 29 13:48:10 2022
    On 12/29/22 12:45 PM, Dimiter_Popoff wrote:
    On 12/29/2022 19:21, Richard Damon wrote:
    On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    If you are looking for the very latest standards, yes. Enough data is
    out there to handle a lot of basic MIPI operations. Since the small
    player isn't going to be trying to implement the low level interface
    themselves (or at least shouldn't be trying to),

    So how does one use a MIPI camera without using the low level interface?

    You use a chip that has a mipi interface, either a CPU or FPGA with a
    built in MIPI interface or a MIPI converter chip that converts the MIPI interface into something you can deal with.


    unless you are trying to work with a bleeding edge camera (which you
    probably can't actually buy if you are a small player) you can tend to
    find enough information to use the camera.

    That is fair enough, as long as we are talking about some internal
    sensor specifics of the "bleeding edge" cameras.

    Bleeding Edge cameras/displays may need newer versions of MIPI than may
    be easy to find in the consumer market. They may need bleeding edge
    processors.

    As I mention below, more important are the configuration registers,
    which might be harder to get for bleeding edge parts. This is often proprietary, as knowing what is adjustable is often part of the secret
    sauce for those cameras.



    My experiance is if you can actually buy the camera normally, there
    will be the data available to use it.

    That's really reassuring. I am more interested in talking to MIPI
    display modules than to cameras (at least the sequence is this) but
    still.

    So you want a chip with MIPI DSI capability built in, or a convert chip.


    The big problem is "Grey Market" cameras, via unauthorized
    distributors you are at the mercy of the distributor to get you the
    needed data.

    Don't they conform to the MIPI standard? (which I have no access to).

    Yes, but MIPI doesn't define the more important configuration registers
    you need to setup for the device.

    MIPI is a video data protocol, it has limited configuration capability. Generally there will be something like an I2C bus to the camera that is
    used to configure it.

    (There might be a way to tunnel the configuration over the MIPI lines,
    but I doubt MIPI defines a configuration protocol)




    Not use about USB, perhaps USB cameras are covered in the standard
    (yet to deal with that one).

    There is a USB video standard, and many USB cameras can just be
    plugged in and used.

    OK, I thought I had seen that some years ago. Might be an escape (though cameras found in phones and tablets etc. are probably all MIPI).


    Yes, the cameras in phones will almost always be MIPI. There is no need
    for them to use a USB Video connection, that is just way too much overhead.

    In fact, a lot of stand alone USB Cameras might have a MIPI based camera
    core internally, with a MIPI to USB inteface to send the data to the Host.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Rick C on Thu Dec 29 13:58:40 2022
    On 12/29/22 1:20 PM, Rick C wrote:
    On Thursday, December 29, 2022 at 12:06:40 PM UTC-5, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays. Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for
    each pixel (maybe some more modern memories store multiple bits in a
    cell so you get a few grey levels).

    You could probably modulate the timing of the scans to get a range of grey scale, even if small. Let the chip integrate for 1 unit, 2 units, 4 units, etc. of time. I'm assuming light responsiveness of the human eye is logarithmic, rather than linear.
    If not, then 1, 2, 3, 4 units of time. Even 16 levels of grey is much better than black and white.

    It would be a bit of processing to translate the thermometer codes into pixel values, but just time consuming, not hard.


    Yes, at a drastic reduction in frame rate.

    Power of two spacing will show a lot of banding in the image, but 16
    levels spaces at about 1.4x exposure steps might be acceptable. (2**16
    dynamic range is extream and well beyond even what "HDR" video can deal
    with)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Richard Damon on Thu Dec 29 10:20:53 2022
    On Thursday, December 29, 2022 at 12:06:40 PM UTC-5, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays. Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for
    each pixel (maybe some more modern memories store multiple bits in a
    cell so you get a few grey levels).

    You could probably modulate the timing of the scans to get a range of grey scale, even if small. Let the chip integrate for 1 unit, 2 units, 4 units, etc. of time. I'm assuming light responsiveness of the human eye is logarithmic, rather than linear.
    If not, then 1, 2, 3, 4 units of time. Even 16 levels of grey is much better than black and white.

    It would be a bit of processing to translate the thermometer codes into pixel values, but just time consuming, not hard.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Thu Dec 29 12:29:46 2022
    On 12/29/2022 6:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    Not use about USB, perhaps USB cameras are covered in the standard
    (yet to deal with that one).

    I built my prototypes (proof-of-principle) using COTS USB cameras.
    But, getting the data out of the serial data stream and into RAM so
    it can be analyzed consumes memory bandwidth.

    I'm currently trying to sort out an approximate cost factor "per
    camera" (per video stream) and looking for ways that I can cut costs
    (memory bandwidth requirements) to allow greater numbers of
    cameras or higher frame rates.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Thu Dec 29 12:26:56 2022
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for each pixel (maybe some more modern memories store multiple bits in a cell so you get
    a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    There are some CMOS sensors that let you address pixels individually and in a random order (like you got with the DRAM) but by its nature, such a readout method tends to be slow, and space inefficient, so these interfaces tend to be
    only available on smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    That is why most sensors read out via row/column shift registers to a pixel serial (maybe multiple pixels per clock) output, and if the camera includes its
    own A/D conversion, might serialize the results to minimize interconnect.

    Yes, but then you have to store it in memory in order to examine it.
    I.e., if your goal isn't just to pass the image out to a display,
    then having to unpack the serial stream into RAM is an added cost.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Richard Damon on Thu Dec 29 22:11:37 2022
    On 12/29/2022 20:48, Richard Damon wrote:
    On 12/29/22 12:45 PM, Dimiter_Popoff wrote:
    On 12/29/2022 19:21, Richard Damon wrote:
    On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    If you are looking for the very latest standards, yes. Enough data is
    out there to handle a lot of basic MIPI operations. Since the small
    player isn't going to be trying to implement the low level interface
    themselves (or at least shouldn't be trying to),

    So how does one use a MIPI camera without using the low level interface?

    You use a chip that has a mipi interface, either a CPU or FPGA with a
    built in MIPI interface or a MIPI converter chip that converts the MIPI interface into something you can deal with.

    An FPGA with MIPI would do, I have not looked for one yet.



    unless you are trying to work with a bleeding edge camera (which you
    probably can't actually buy if you are a small player) you can tend
    to find enough information to use the camera.

    That is fair enough, as long as we are talking about some internal
    sensor specifics of the "bleeding edge" cameras.

    Bleeding Edge cameras/displays may need newer versions of MIPI than may
    be easy to find in the consumer market. They may need bleeding edge processors.

    Well a 64 bit GHz range 4 or 8 core power architecture part should be
    plenty. But I am not after bleeding edge cameras, a decent one I
    can control will do.


    As I mention below, more important are the configuration registers,
    which might be harder to get for bleeding edge parts. This is often proprietary, as knowing what is adjustable is often part of the secret
    sauce for those cameras.

    Do you get that sort of data for decent cameras? Sort of like how
    to focus it etc.? Or do you have to rely on black box "converters",
    like with wifi modules which won't let you get around their tcp/ip
    stack?


    My experiance is if you can actually buy the camera normally, there
    will be the data available to use it.

    That's really reassuring. I am more interested in talking to MIPI
    display modules than to cameras (at least the sequence is this) but
    still.

    So you want a chip with MIPI DSI capability built in, or a convert chip.

    Not really, no. I want to be able to put the framebuffer data into
    the display like I have been doing with RGB, hsync, vsync etc., via
    a parallel or lvds interface. Is there enough info out there how to
    do this with an fpga? I think I have enough info to do hdmi this way,
    but no MIPI. Well, my guess is that pixel data will still be pixel
    data etc., can't be that hard.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Thu Dec 29 16:09:45 2022
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for
    each pixel (maybe some more modern memories store multiple bits in a
    cell so you get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but tend to
    be on the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels individually
    and in a random order (like you got with the DRAM) but by its nature,
    such a readout method tends to be slow, and space inefficient, so
    these interfaces tend to be only available on smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address to
    teh camera and then get back the pixel value is going to use up more
    total bandwidth than getting the image in a stream. The one exception
    would be if you need just a very small percentage of the array data, and
    it is scattered over the array so a Region of Interest operation can't
    be used.


    That is why most sensors read out via row/column shift registers to a
    pixel serial (maybe multiple pixels per clock) output, and if the
    camera includes its own A/D conversion, might serialize the results to
    minimize interconnect.

    Yes, but then you have to store it in memory in order to examine it.
    I.e., if your goal isn't just to pass the image out to a display,
    then having to unpack the serial stream into RAM is an added cost.


    Unless you make sure you get a camera with the same image format and
    timing as your display.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Thu Dec 29 15:57:00 2022
    On 12/29/2022 2:09 PM, Richard Damon wrote:
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for each >>> pixel (maybe some more modern memories store multiple bits in a cell so you >>> get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but tend to be on
    the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels individually and in >>> a random order (like you got with the DRAM) but by its nature, such a
    readout method tends to be slow, and space inefficient, so these interfaces >>> tend to be only available on smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address to teh camera and then get back the pixel value is going to use up more total bandwidth than getting the image in a stream. The one exception would be if you
    need just a very small percentage of the array data, and it is scattered over the array so a Region of Interest operation can't be used.

    No, you're missing the nature of the DRAM example.

    You don't "send" the address of the memory cell desired *to* the DRAM.
    You simply *address* the memory cell, directly. I.e., if there are
    N locations in the DRAM, then N addresses in your address space are
    consumed by it; one for each location in the array.

    I'm looking for *that* sort of "direct access" in a camera.

    I could *emulate* it by building a module that implements <whatever>
    interface to <whichever> camera and deserializes the data into a
    RAM. Then, mapping that *entire* RAM into the address space of the
    host processor.

    (Keeping the RAM updated would require a pseudo dual-ported architecture; possibly toggling between an "active" RAM and an "updated" RAM so that
    the full bandwidth of the RAM was available to the host)

    Having the host processor (DMA, etc.) perform this task means it loses bandwidth to the "deserialization" activity.

    That is why most sensors read out via row/column shift registers to a pixel >>> serial (maybe multiple pixels per clock) output, and if the camera includes >>> its own A/D conversion, might serialize the results to minimize interconnect.

    Yes, but then you have to store it in memory in order to examine it.
    I.e., if your goal isn't just to pass the image out to a display,
    then having to unpack the serial stream into RAM is an added cost.

    Unless you make sure you get a camera with the same image format and timing as
    your display.

    I typically don't "display" the images captured. Rather, I use the
    cameras as sensors: is there anything in the path of the closing
    (or opening) garage door that should cause me to inhibit/abort
    those actions? has the mail truck appeared at the mailbox, yet,
    today? *who* is standing at the front door?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Fri Dec 30 01:14:43 2022
    On 12/30/2022 0:57, Don Y wrote:
    On 12/29/2022 2:09 PM, Richard Damon wrote:
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value
    for each pixel (maybe some more modern memories store multiple bits
    in a cell so you get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but tend
    to be on the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels individually
    and in a random order (like you got with the DRAM) but by its
    nature, such a readout method tends to be slow, and space
    inefficient, so these interfaces tend to be only available on
    smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address to
    teh camera and then get back the pixel value is going to use up more
    total bandwidth than getting the image in a stream. The one exception
    would be if you need just a very small percentage of the array data,
    and it is scattered over the array so a Region of Interest operation
    can't be used.

    No, you're missing the nature of the DRAM example.

    You don't "send" the address of the memory cell desired *to* the DRAM.
    You simply *address* the memory cell, directly.  I.e., if there are
    N locations in the DRAM, then N addresses in your address space are
    consumed by it; one for each location in the array.

    I'm looking for *that* sort of "direct access" in a camera.

    I could *emulate* it by building a module that implements <whatever> interface to <whichever> camera and deserializes the data into a
    RAM.  Then, mapping that *entire* RAM into the address space of the
    host processor.

    (Keeping the RAM updated would require a pseudo dual-ported architecture; possibly toggling between an "active" RAM and an "updated" RAM so that
    the full bandwidth of the RAM was available to the host)

    Having the host processor (DMA, etc.) perform this task means it loses bandwidth to the "deserialization" activity.

    Well of course but are you sure you can really win much? At first
    glance you'd be able to halve the memory bandwidth. But then you may
    run into problems with "doppler" kind of effects (clearly not Doppler
    but you get the idea) if you access the frame being acquired; so you'll
    want that double buffering you are talking about elsewhere (one frame
    being acquired and one having been acquired prior to that). Which would
    mean that somewhere something will have to do the copying you want to
    avoid...
    Since you have already done it with USB cameras I think the practical
    way is to just keep doing it this way, may be not USB if you can
    find some more economic way to do it, MIPI or whatever.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to All on Thu Dec 29 19:32:04 2022
    On 12/29/22 3:11 PM, Dimiter_Popoff wrote:
    On 12/29/2022 20:48, Richard Damon wrote:
    On 12/29/22 12:45 PM, Dimiter_Popoff wrote:
    On 12/29/2022 19:21, Richard Damon wrote:
    On 12/29/22 8:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked >>>>> you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    If you are looking for the very latest standards, yes. Enough data
    is out there to handle a lot of basic MIPI operations. Since the
    small player isn't going to be trying to implement the low level
    interface themselves (or at least shouldn't be trying to),

    So how does one use a MIPI camera without using the low level interface?

    You use a chip that has a mipi interface, either a CPU or FPGA with a
    built in MIPI interface or a MIPI converter chip that converts the
    MIPI interface into something you can deal with.

    An FPGA with MIPI would do, I have not looked for one yet.



    unless you are trying to work with a bleeding edge camera (which you
    probably can't actually buy if you are a small player) you can tend
    to find enough information to use the camera.

    That is fair enough, as long as we are talking about some internal
    sensor specifics of the "bleeding edge" cameras.

    Bleeding Edge cameras/displays may need newer versions of MIPI than
    may be easy to find in the consumer market. They may need bleeding
    edge processors.

    Well a 64 bit GHz range 4 or 8 core power architecture part should be
    plenty. But I am not after bleeding edge cameras, a decent one I
    can control will do.

    Not Bleeding Edge in processor power, but in MIPI interfaces. I don't
    know it the latest cameras are using faster version of the MIPI
    interface to move the pixels faster. If so, you need a chip with that
    faster grade MIPI interface.



    As I mention below, more important are the configuration registers,
    which might be harder to get for bleeding edge parts. This is often
    proprietary, as knowing what is adjustable is often part of the secret
    sauce for those cameras.

    Do you get that sort of data for decent cameras? Sort of like how
    to focus it etc.? Or do you have to rely on black box "converters",
    like with wifi modules which won't let you get around their tcp/ip
    stack?

    I haven't heard of team members having trouble getting specs for
    actually available product.

    No, we are a bit bigger than teh "hobbiest" market, but no where near
    the big boys. Our volumes would be in 1000s in some cases.



    My experiance is if you can actually buy the camera normally, there
    will be the data available to use it.

    That's really reassuring. I am more interested in talking to MIPI
    display modules than to cameras (at least the sequence is this) but
    still.

    So you want a chip with MIPI DSI capability built in, or a convert chip.

    Not really, no. I want to be able to put the framebuffer data into
    the display like I have been doing with RGB, hsync, vsync etc., via
    a parallel or lvds interface. Is there enough info out there how to
    do this with an fpga? I think I have enough info to do hdmi this way,
    but no MIPI. Well, my guess is that pixel data will still be pixel
    data etc., can't be that hard.



    (DSI is Display Serial Interface, that is the version of MIPI that a
    MIPI display would use)

    I have used Lattice Crosslink FPGAs to do that sort of work. They are
    small gate arrays designed for protocol conversion.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Thu Dec 29 19:40:59 2022
    On 12/29/22 5:57 PM, Don Y wrote:
    On 12/29/2022 2:09 PM, Richard Damon wrote:
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value
    for each pixel (maybe some more modern memories store multiple bits
    in a cell so you get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but tend
    to be on the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels individually
    and in a random order (like you got with the DRAM) but by its
    nature, such a readout method tends to be slow, and space
    inefficient, so these interfaces tend to be only available on
    smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address to
    teh camera and then get back the pixel value is going to use up more
    total bandwidth than getting the image in a stream. The one exception
    would be if you need just a very small percentage of the array data,
    and it is scattered over the array so a Region of Interest operation
    can't be used.

    No, you're missing the nature of the DRAM example.

    You don't "send" the address of the memory cell desired *to* the DRAM.
    You simply *address* the memory cell, directly.  I.e., if there are
    N locations in the DRAM, then N addresses in your address space are
    consumed by it; one for each location in the array.


    No, look at you DRAM timing again, the trasaction begins with the
    sending of the address over typically two clock edges with RAS and CAS,
    and then a couple of clock cycles and then you get back on the data bus
    the answer.

    Yes, the addresses come from an address bus, using address space out of
    the processor, but it is a multi-cycle operation. Typically, you read
    back a "burst" with some minimal caching on the processor side, but that
    is more a minor detail.


    I'm looking for *that* sort of "direct access" in a camera.

    Its been awhile, but I thought some CMOS cameras could work on a similar
    basis, strobe a Row/Column address from pins on the camera, and a few
    clock cycles later you got a burst out of the camera starting at the
    address cell.


    I could *emulate* it by building a module that implements <whatever> interface to <whichever> camera and deserializes the data into a
    RAM.  Then, mapping that *entire* RAM into the address space of the
    host processor.

    (Keeping the RAM updated would require a pseudo dual-ported architecture; possibly toggling between an "active" RAM and an "updated" RAM so that
    the full bandwidth of the RAM was available to the host)

    Having the host processor (DMA, etc.) perform this task means it loses bandwidth to the "deserialization" activity.

    That is why most sensors read out via row/column shift registers to
    a pixel serial (maybe multiple pixels per clock) output, and if the
    camera includes its own A/D conversion, might serialize the results
    to minimize interconnect.

    Yes, but then you have to store it in memory in order to examine it.
    I.e., if your goal isn't just to pass the image out to a display,
    then having to unpack the serial stream into RAM is an added cost.

    Unless you make sure you get a camera with the same image format and
    timing as your display.

    I typically don't "display" the images captured.  Rather, I use the
    cameras as sensors:  is there anything in the path of the closing
    (or opening) garage door that should cause me to inhibit/abort
    those actions?  has the mail truck appeared at the mailbox, yet,
    today?  *who* is standing at the front door?


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Thu Dec 29 20:37:53 2022
    On 12/29/2022 4:14 PM, Dimiter_Popoff wrote:
    On 12/30/2022 0:57, Don Y wrote:
    On 12/29/2022 2:09 PM, Richard Damon wrote:
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for >>>>> each pixel (maybe some more modern memories store multiple bits in a cell >>>>> so you get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but tend to be >>> on the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels individually and >>>>> in a random order (like you got with the DRAM) but by its nature, such a >>>>> readout method tends to be slow, and space inefficient, so these
    interfaces tend to be only available on smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address to teh >>> camera and then get back the pixel value is going to use up more total
    bandwidth than getting the image in a stream. The one exception would be if >>> you need just a very small percentage of the array data, and it is scattered
    over the array so a Region of Interest operation can't be used.

    No, you're missing the nature of the DRAM example.

    You don't "send" the address of the memory cell desired *to* the DRAM.
    You simply *address* the memory cell, directly.  I.e., if there are
    N locations in the DRAM, then N addresses in your address space are
    consumed by it; one for each location in the array.

    I'm looking for *that* sort of "direct access" in a camera.

    I could *emulate* it by building a module that implements <whatever>
    interface to <whichever> camera and deserializes the data into a
    RAM.  Then, mapping that *entire* RAM into the address space of the
    host processor.

    (Keeping the RAM updated would require a pseudo dual-ported architecture;
    possibly toggling between an "active" RAM and an "updated" RAM so that
    the full bandwidth of the RAM was available to the host)

    Having the host processor (DMA, etc.) perform this task means it loses
    bandwidth to the "deserialization" activity.

    Well of course but are you sure you can really win much? At first
    glance you'd be able to halve the memory bandwidth.

    I'd save one memory reference, per pixel, per frame; the data is "just
    there" instead of having to be streamed in from a USB device and DMA'ed
    into memory.

    But then you may
    run into problems with "doppler" kind of effects (clearly not Doppler
    but you get the idea) if you access the frame being acquired; so you'll
    want that double buffering you are talking about elsewhere (one frame
    being acquired and one having been acquired prior to that). Which would
    mean that somewhere something will have to do the copying you want to avoid...

    No. The "deserializer" could (conceivably) just toggle two (or N) pointers
    to "captured frame" and "frame being captured". You do this when synthesizing video for similar reasons (if you update the area of the frame buffer that
    is being painted to the visual display AS it is being painted, objects
    that are "in motion" appear to "tear" (visual artifacts).

    Since you have already done it with USB cameras I think the practical
    way is to just keep doing it this way, may be not USB if you can
    find some more economic way to do it, MIPI or whatever.

    That was the issue I was exploring. I want to see the sort of performance
    and cost associated with different approaches.

    USB (and some of the camera protocols) are supported on much silicon.
    But, when you start wanting to run multiple cameras from the same
    host.... <frown>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Thu Dec 29 20:32:33 2022
    On 12/29/2022 5:40 PM, Richard Damon wrote:
    On 12/29/22 5:57 PM, Don Y wrote:
    On 12/29/2022 2:09 PM, Richard Damon wrote:
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value for >>>>> each pixel (maybe some more modern memories store multiple bits in a cell >>>>> so you get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but tend to be >>> on the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels individually and >>>>> in a random order (like you got with the DRAM) but by its nature, such a >>>>> readout method tends to be slow, and space inefficient, so these
    interfaces tend to be only available on smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address to teh >>> camera and then get back the pixel value is going to use up more total
    bandwidth than getting the image in a stream. The one exception would be if >>> you need just a very small percentage of the array data, and it is scattered
    over the array so a Region of Interest operation can't be used.

    No, you're missing the nature of the DRAM example.

    You don't "send" the address of the memory cell desired *to* the DRAM.
    You simply *address* the memory cell, directly.  I.e., if there are
    N locations in the DRAM, then N addresses in your address space are
    consumed by it; one for each location in the array.

    No, look at you DRAM timing again, the trasaction begins with the sending of the address over typically two clock edges with RAS and CAS, and then a couple
    of clock cycles and then you get back on the data bus the answer.

    But it's a single memory reference. Look at what happens when you
    deserialize a USB video stream into that same DRAM. The DMAC has
    tied up the bus for the same amount of time that the processor
    would have if it read those same N locations.

    Yes, the addresses come from an address bus, using address space out of the processor, but it is a multi-cycle operation. Typically, you read back a "burst" with some minimal caching on the processor side, but that is more a minor detail.

    I'm looking for *that* sort of "direct access" in a camera.

    Its been awhile, but I thought some CMOS cameras could work on a similar basis,
    strobe a Row/Column address from pins on the camera, and a few clock cycles later you got a burst out of the camera starting at the address cell.

    I don't want the camera to decide which pixels *it* thinks I want to see.
    It sends me a burst of a row -- but the next part of the image I may have wanted to access may have been down the same *column*. Or, in another
    part of the image entirely.

    Serial protocols inherently deliver data in a predefined pattern
    (often intended for display). Scene analysis doesn't necessarily
    conform to that same pattern.

    E.g., if I've imposed a mask on the field to indicate portions that
    are not important, then any bandwidth the camera spends delivering that
    data to memory is wasted. If the memory was "just there", then there
    would be no associated bandwidth impact.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to blockedofcourse@foo.invalid on Fri Dec 30 00:29:53 2022
    Hi Don,

    On Thu, 29 Dec 2022 12:29:46 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    On 12/29/2022 6:33 AM, Dimiter_Popoff wrote:
    On 12/29/2022 15:16, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?


    Hah, Don, consider yourself lucky if you find a camera you have
    enough documentation to use at all, serial or whatever.

    The MIPI standards are only for politburo members (last time I looked
    you need to make several millions annually to be able to *apply*
    for membership, which of course costs thousands, annually again).

    Not use about USB, perhaps USB cameras are covered in the standard
    (yet to deal with that one).

    I built my prototypes (proof-of-principle) using COTS USB cameras.
    But, getting the data out of the serial data stream and into RAM so
    it can be analyzed consumes memory bandwidth.

    I'm currently trying to sort out an approximate cost factor "per
    camera" (per video stream) and looking for ways that I can cut costs
    (memory bandwidth requirements) to allow greater numbers of
    cameras or higher frame rates.

    You aren't going to find anything low cost ... if you want bandwidth
    for multiple cameras, you need to look into bus based frame grabbers.
    They still exist, but are (relatively) expensive and getting harder to
    find.

    George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to George Neuner on Fri Dec 30 00:27:48 2022
    Hi George!

    [Hope you are faring well... enjoying the COLD! ;) ]

    On 12/29/2022 10:29 PM, George Neuner wrote:
    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    I built my prototypes (proof-of-principle) using COTS USB cameras.
    But, getting the data out of the serial data stream and into RAM so
    it can be analyzed consumes memory bandwidth.

    I'm currently trying to sort out an approximate cost factor "per
    camera" (per video stream) and looking for ways that I can cut costs
    (memory bandwidth requirements) to allow greater numbers of
    cameras or higher frame rates.

    You aren't going to find anything low cost ... if you want bandwidth
    for multiple cameras, you need to look into bus based frame grabbers.
    They still exist, but are (relatively) expensive and getting harder to
    find.

    So, my options are:
    - reduce the overall frame rate such that N cameras can
    be serviced by the USB (or whatever) interface *and*
    the processing load
    - reduce the resolution of the cameras (a special case of the above)
    - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
    multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
    effectively be applied

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Fri Dec 30 10:04:12 2022
    On 12/30/2022 9:24 AM, Richard Damon wrote:
    On 12/30/22 2:27 AM, Don Y wrote:
    Hi George!

    [Hope you are faring well... enjoying the COLD!  ;) ]

    On 12/29/2022 10:29 PM, George Neuner wrote:
    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    I built my prototypes (proof-of-principle) using COTS USB cameras.
    But, getting the data out of the serial data stream and into RAM so
    it can be analyzed consumes memory bandwidth.

    I'm currently trying to sort out an approximate cost factor "per
    camera" (per video stream) and looking for ways that I can cut costs
    (memory bandwidth requirements) to allow greater numbers of
    cameras or higher frame rates.

    You aren't going to find anything low cost ... if you want bandwidth
    for multiple cameras, you need to look into bus based frame grabbers.
    They still exist, but are (relatively) expensive and getting harder to
    find.

    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above)
    - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB Cameras" sort of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap" processor right on each camera using a processor with a direct camera interface to pull in the image and do your processing and send the results over some comm-link to
    the center core.

    If I went the frame-grabber approach, that would be how I would address the hardware. But, it doesn't scale well. I.e., at what point do you throw in
    the towel and say there are too many concurrent images in the scene to
    pile them all onto a single "host" processor?

    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate
    processing power to handle a "single camera scene". But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view. E.g., point a camera across the path of a garage door and have
    it "notice" any deviation from the "unobstructed" image.

    When the scene gets too large to represent in enough detail in a single camera's field of view, then there needs to be a way to coordinate
    multiple cameras to a single (virtual?) host. If those cameras were just "chunks of memory", then the *imagery* would be easy to examine in a single host -- though the processing power *might* need to increase geometrically (depending on your current goal)

    Moving the processing to "host per camera" implementation gives you more
    MIPS. But, makes coordinating partial results tedious.

    It is unclear what you actual image requirements per camera are, so it is hard
    to say what level camera and processor you will need.

    My first feeling is you seem to be assuming a fairly cheep camera and then doing some fairly simple processing over the partial image, in which case you might even be able to live with a camera that uses a crude SPI interface to bring the frame in, and a very simple processor.

    I use A LOT of cameras. But, I should be able to swap the camera (upgrade/downgrade) and still rely on the same *local* compute engine.
    E.g., some of my cameras have Ir illuminators; it's not important
    in others; some are PTZ; others fixed.

    Watching for an obstruction in the path of a garage door (open/close)
    has different requirements than trying to recognize a visitor at the front door. Or, identify the locations of the occupants of a facility.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Fri Dec 30 11:24:13 2022
    On 12/30/22 2:27 AM, Don Y wrote:
    Hi George!

    [Hope you are faring well... enjoying the COLD!  ;) ]

    On 12/29/2022 10:29 PM, George Neuner wrote:
    But, most cameras seem to have (bit- or word-) serial interfaces
    nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    I built my prototypes (proof-of-principle) using COTS USB cameras.
    But, getting the data out of the serial data stream and into RAM so
    it can be analyzed consumes memory bandwidth.

    I'm currently trying to sort out an approximate cost factor "per
    camera" (per video stream) and looking for ways that I can cut costs
    (memory bandwidth requirements) to allow greater numbers of
    cameras or higher frame rates.

    You aren't going to find anything low cost ... if you want bandwidth
    for multiple cameras, you need to look into bus based frame grabbers.
    They still exist, but are (relatively) expensive and getting harder to
    find.

    So, my options are:
    - reduce the overall frame rate such that N cameras can
      be serviced by the USB (or whatever) interface *and*
      the processing load
    - reduce the resolution of the cameras (a special case of the above)
    - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
      multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
      effectively be applied


    The fact that you are starting for the concept of using "USB Cameras"
    sort of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap"
    processor right on each camera using a processor with a direct camera
    interface to pull in the image and do your processing and send the
    results over some comm-link to the center core.

    It is unclear what you actual image requirements per camera are, so it
    is hard to say what level camera and processor you will need.

    My first feeling is you seem to be assuming a fairly cheep camera and
    then doing some fairly simple processing over the partial image, in
    which case you might even be able to live with a camera that uses a
    crude SPI interface to bring the frame in, and a very simple processor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Fri Dec 30 13:02:37 2022
    On 12/30/22 12:04 PM, Don Y wrote:
    On 12/30/2022 9:24 AM, Richard Damon wrote:
    On 12/30/22 2:27 AM, Don Y wrote:
    Hi George!

    [Hope you are faring well... enjoying the COLD!  ;) ]

    On 12/29/2022 10:29 PM, George Neuner wrote:
    But, most cameras seem to have (bit- or word-) serial interfaces >>>>>>> nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    I built my prototypes (proof-of-principle) using COTS USB cameras.
    But, getting the data out of the serial data stream and into RAM so
    it can be analyzed consumes memory bandwidth.

    I'm currently trying to sort out an approximate cost factor "per
    camera" (per video stream) and looking for ways that I can cut costs >>>>> (memory bandwidth requirements) to allow greater numbers of
    cameras or higher frame rates.

    You aren't going to find anything low cost ... if you want bandwidth
    for multiple cameras, you need to look into bus based frame grabbers.
    They still exist, but are (relatively) expensive and getting harder to >>>> find.

    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above)
    - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB Cameras"
    sort of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap"
    processor right on each camera using a processor with a direct camera
    interface to pull in the image and do your processing and send the
    results over some comm-link to the center core.

    If I went the frame-grabber approach, that would be how I would address the hardware.  But, it doesn't scale well.  I.e., at what point do you throw in the towel and say there are too many concurrent images in the scene to
    pile them all onto a single "host" processor?

    Thats why I didn't suggest that method. I was suggesting each camera has
    its own tightly coupled processor that handles the need of THAT


    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate processing power to handle a "single camera scene".  But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view.  E.g., point a camera across the path of a garage door and have
    it "notice" any deviation from the "unobstructed" image.

    And if one camera can't fit the full scene, you use two cameras, each
    with there own processor, and they each process their own image.

    The only problem is if your image processing algoritm need to compare
    parts of the images between the two cameras, which seems unlikely.

    It does say that if trying to track something across the cameras, you
    need enough overlap to allow them to hand off the object when it is in
    the overlap.

    When the scene gets too large to represent in enough detail in a single camera's field of view, then there needs to be a way to coordinate
    multiple cameras to a single (virtual?) host.  If those cameras were just "chunks of memory", then the *imagery* would be easy to examine in a single host -- though the processing power *might* need to increase geometrically (depending on your current goal)

    Yes, but your "chunks of memory" model just doesn't exist as a viable
    camera model.

    The CMOS cameras with addressable pixels have "access times"
    significantly lower than your typical memory (and is read once) so
    doesn't really meet that model. Some of them do allow for sending
    multiple small regions of intererst and down loading just those regions,
    but this then starts to require moderate processor overhead to be
    loading all these regions and updating the grabber to put them where you
    want.

    And yes, it does mean that there might be some cases where you need a
    core module that has TWO cameras connected to a single processor, either
    to get a wider field of view, or to combine two different types of
    camera (maybe a high res black and white to a low res color if you need
    just minor color information, or combine a visible camera to a thermal
    camera). These just become another tool in your tool box.


    Moving the processing to "host per camera" implementation gives you more MIPS.  But, makes coordinating partial results tedious.

    Depends on what sort of partial results you are looking at.


    It is unclear what you actual image requirements per camera are, so it
    is hard to say what level camera and processor you will need.

    My first feeling is you seem to be assuming a fairly cheep camera and
    then doing some fairly simple processing over the partial image, in
    which case you might even be able to live with a camera that uses a
    crude SPI interface to bring the frame in, and a very simple processor.

    I use A LOT of cameras.  But, I should be able to swap the camera (upgrade/downgrade) and still rely on the same *local* compute engine.
    E.g., some of my cameras have Ir illuminators; it's not important
    in others; some are PTZ; others fixed.

    Doesn't sound reasonable. If you downgrade a camera, you can't count on
    it being able to meet the same requirements, or you over speced the
    initial camera.

    You put on a camera a processor capable of handling the tasks you expect
    out of that set of hardware. One type of processor likely can handle a
    variaty of different camera setup with


    Watching for an obstruction in the path of a garage door (open/close)
    has different requirements than trying to recognize a visitor at the front door.  Or, identify the locations of the occupants of a facility.


    Yes, so you don't want to "Pay" for the capability to recognize a
    visitor in your garage door sensor, so you use different levels of sensor/processor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Fri Dec 30 14:59:39 2022
    On 12/30/2022 11:02 AM, Richard Damon wrote:
    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above)
    - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB Cameras" sort >>> of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap" processor >>> right on each camera using a processor with a direct camera interface to >>> pull in the image and do your processing and send the results over some
    comm-link to the center core.

    If I went the frame-grabber approach, that would be how I would address the >> hardware.  But, it doesn't scale well.  I.e., at what point do you throw in
    the towel and say there are too many concurrent images in the scene to
    pile them all onto a single "host" processor?

    Thats why I didn't suggest that method. I was suggesting each camera has its own tightly coupled processor that handles the need of THAT

    My existing "module" handles a single USB camera (with a fairly heavy-weight processor).

    But, being USB-based, there is no way to look at *part* of an image.
    And, I have to pay a relatively high cost (capturing the entire
    image from the serial stream) to look at *any* part of it.

    *If* a "camera memory" was available, I would site N of these
    in the (64b) address space of the host and let the host pick
    and choose which parts of which images it wanted to examine...
    without worrying about all of the bandwidth that would have been
    consumed deserializing those N images into that memory (which is
    a continuous process)

    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate
    processing power to handle a "single camera scene".  But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view.  E.g., point a camera across the path of a garage door and have
    it "notice" any deviation from the "unobstructed" image.

    And if one camera can't fit the full scene, you use two cameras, each with there own processor, and they each process their own image.

    That's the above approach, but...

    The only problem is if your image processing algoritm need to compare parts of
    the images between the two cameras, which seems unlikely.

    Consider watching a single room (e.g., a lobby at a business) and
    tracking the movements of "visitors". It's unlikely that an individual's movements would always be constrained to a single camera field. There will
    be times when he/she is "half-in" a field (and possibly NOT in the other,
    HALF in the other or ENTIRELY in the other). You can't ignore cases where
    the entire object (or, your notion of what that object's characteristics
    might be) is not entirely in the field as that leaves a vulnerability.

    For example, I watch our garage door with *four* cameras. A camera is positioned on each side ("door jam"?) of the door "looking at" the other camera. This because a camera can't likely see the full height of the door opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
    and I'll watch *its* side!).

    [The other two cameras are similarly positioned on the overhead *track*
    onto which the door rolls, when open]

    An object in (or near) the doorway can be visible in one (either) or
    both cameras, depending on where it is located. Additionally, one of
    those manifestations may be only "partial" as regards to where it is
    located and intersects the cameras' fields of view.

    The "cost" of watching the door is only the cost of the actual *cameras*.
    The cost of the compute resources is amortized over the rest of the system
    as those can be used for other, non-camera, non-garage related activities.

    It does say that if trying to track something across the cameras, you need enough overlap to allow them to hand off the object when it is in the overlap.

    And, objects that consume large portions of a camera's field of view
    require similar handling (unless you can always guarantee that cameras
    and targets are "far apart")

    When the scene gets too large to represent in enough detail in a single
    camera's field of view, then there needs to be a way to coordinate
    multiple cameras to a single (virtual?) host.  If those cameras were just >> "chunks of memory", then the *imagery* would be easy to examine in a single >> host -- though the processing power *might* need to increase geometrically >> (depending on your current goal)

    Yes, but your "chunks of memory" model just doesn't exist as a viable camera model.

    Apparently not -- in the COTS sense. But, that doesn't mean I can't
    build a "camera memory emulator".

    The downside is that this increases the cost of the "actual camera"
    (see my above comment wrt ammortization).

    And, it just moves the point at which a single host (of fixed capabilities)
    can no longer handle the scene's complexity. (when you have 10 cameras?)

    The CMOS cameras with addressable pixels have "access times" significantly lower than your typical memory (and is read once) so doesn't really meet that model. Some of them do allow for sending multiple small regions of intererst and down loading just those regions, but this then starts to require moderate processor overhead to be loading all these regions and updating the grabber to
    put them where you want.

    You would, instead, let the "camera memory emulator" capture the entire
    image from the camera and place the entire image in a contiguous
    region of memory (from the perspective of the host). The cost of capturing
    the portions that are not used is hidden *in* the cost of the "emulator".

    And yes, it does mean that there might be some cases where you need a core module that has TWO cameras connected to a single processor, either to get a wider field of view, or to combine two different types of camera (maybe a high
    res black and white to a low res color if you need just minor color information, or combine a visible camera to a thermal camera). These just become another tool in your tool box.

    I *think* (uncharted territory) that the better investment is to develop algorithms that let me distribute the processing among multiple
    (single) "camera modules/nodes". How would your "two camera" exemplar
    address an application requiring *three* cameras? etc.

    I can, currently, distribute this processing by treating the
    region of memory into which a (local) camera's imagery is
    deserialized as a "memory object" and then exporting *access*
    to that object to other similar "camera modules/nodes".

    But, the access times of non-local memory are horrendous, given
    that the contents are ephemeral (if accesses could be *cached*
    on each host needing them, then these costs diminish).

    So, I need to come up with algorithms that let me export abstractions
    instead of raw data.

    Moving the processing to "host per camera" implementation gives you more
    MIPS.  But, makes coordinating partial results tedious.

    Depends on what sort of partial results you are looking at.

    "Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"

    "Ah! I was wondering whose legs those were in *my* image!"

    It is unclear what you actual image requirements per camera are, so it is >>> hard to say what level camera and processor you will need.

    My first feeling is you seem to be assuming a fairly cheep camera and then >>> doing some fairly simple processing over the partial image, in which case >>> you might even be able to live with a camera that uses a crude SPI interface
    to bring the frame in, and a very simple processor.

    I use A LOT of cameras.  But, I should be able to swap the camera
    (upgrade/downgrade) and still rely on the same *local* compute engine.
    E.g., some of my cameras have Ir illuminators; it's not important
    in others; some are PTZ; others fixed.

    Doesn't sound reasonable. If you downgrade a camera, you can't count on it being able to meet the same requirements, or you over speced the initial camera.

    Sorry, I was using up/down relative to "nominal camera", not "specific camera previously selected for application". I'd 8really* like to just have a
    single "camera module" (module = CPU+I/O) instead of one for camera type A
    and another for camera type B, etc.

    You put on a camera a processor capable of handling the tasks you expect out of
    that set of hardware.  One type of processor likely can handle a variaty of different camera setup with

    Exactly. If a particular instance has an Ir illuminator, then you include controls for that in *the* "camera module". If another instance doesn't have this ability, then those controls go unused.

    Watching for an obstruction in the path of a garage door (open/close)
    has different requirements than trying to recognize a visitor at the front >> door.  Or, identify the locations of the occupants of a facility.

    Yes, so you don't want to "Pay" for the capability to recognize a visitor in your garage door sensor, so you use different levels of sensor/processor.

    Exactly. But, the algorithms that do the scene analysis can be the same;
    you just parameterize the image and the objects within it that you seek.

    There will likely be some combinations that exceed the capabilities of
    the hardware to process in real-time. So, you fall back to lower
    frame rates or let the algorithms drop targets ("You watch Bob, I'll
    watch Tom!")

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Sat Dec 31 00:39:16 2022
    On 12/30/22 4:59 PM, Don Y wrote:
    On 12/30/2022 11:02 AM, Richard Damon wrote:
    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above) >>>>> - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB
    Cameras" sort of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap"
    processor right on each camera using a processor with a direct
    camera interface to pull in the image and do your processing and
    send the results over some comm-link to the center core.

    If I went the frame-grabber approach, that would be how I would
    address the
    hardware.  But, it doesn't scale well.  I.e., at what point do you
    throw in
    the towel and say there are too many concurrent images in the scene to
    pile them all onto a single "host" processor?

    Thats why I didn't suggest that method. I was suggesting each camera
    has its own tightly coupled processor that handles the need of THAT

    My existing "module" handles a single USB camera (with a fairly
    heavy-weight
    processor).

    But, being USB-based, there is no way to look at *part* of an image.
    And, I have to pay a relatively high cost (capturing the entire
    image from the serial stream) to look at *any* part of it.

    Yep, having chosen USB as your interface, you have limited yourself.

    Since you say you have a fairly heavy-weight processor, that frame grab
    likely isn't you limiting factor.


    *If* a "camera memory" was available, I would site N of these
    in the (64b) address space of the host and let the host pick
    and choose which parts of which images it wanted to examine...
    without worrying about all of the bandwidth that would have been
    consumed deserializing those N images into that memory (which is
    a continuous process)

    But such a camera would almost certainly be designed for the processor
    to be on the same board as the camera, (or be VERY slow in access), so
    much less apt allow you to add multiple cameras to one processor.


    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate
    processing power to handle a "single camera scene".  But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view.  E.g., point a camera across the path of a garage door and have >>> it "notice" any deviation from the "unobstructed" image.

    And if one camera can't fit the full scene, you use two cameras, each
    with there own processor, and they each process their own image.

    That's the above approach, but...

    The only problem is if your image processing algoritm need to compare
    parts of the images between the two cameras, which seems unlikely.

    Consider watching a single room (e.g., a lobby at a business) and
    tracking the movements of "visitors".  It's unlikely that an individual's movements would always be constrained to a single camera field.  There will be times when he/she is "half-in" a field (and possibly NOT in the other, HALF in the other or ENTIRELY in the other).  You can't ignore cases where the entire object (or, your notion of what that object's characteristics might be) is not entirely in the field as that leaves a vulnerability.

    Sounds like you aren't overlapping your cameras enough or have
    insufficent coverage. Maybe your problem is wrong field of view for
    your lens. Maybe you need fewer but better cameras with wider fields of
    view.

    This might be due to try to use "stock" inexpensive USB cameras.

    For example, I watch our garage door with *four* cameras.  A camera is positioned on each side ("door jam"?) of the door "looking at" the other camera.  This because a camera can't likely see the full height of the door opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side" and I'll watch *its* side!).

    Right, and if ANY see a problem, you stop. So no need for inter-camera coordination.

    [The other two cameras are similarly positioned on the overhead *track*
    onto which the door rolls, when open]

    An object in (or near) the doorway can be visible in one (either) or
    both cameras, depending on where it is located.  Additionally, one of
    those manifestations may be only "partial" as regards to where it is
    located and intersects the cameras' fields of view.

    But since you aren't trying to ID, only Detect, there still isn't a need
    for camera-camera processing, just camera-door controller


    The "cost" of watching the door is only the cost of the actual *cameras*.
    The cost of the compute resources is amortized over the rest of the system
    as those can be used for other, non-camera, non-garage related activities.

    It does say that if trying to track something across the cameras, you
    need enough overlap to allow them to hand off the object when it is in
    the overlap.

    And, objects that consume large portions of a camera's field of view
    require similar handling (unless you can always guarantee that cameras
    and targets are "far apart")

    When the scene gets too large to represent in enough detail in a single
    camera's field of view, then there needs to be a way to coordinate
    multiple cameras to a single (virtual?) host.  If those cameras were
    just
    "chunks of memory", then the *imagery* would be easy to examine in a
    single
    host -- though the processing power *might* need to increase
    geometrically
    (depending on your current goal)

    Yes, but your "chunks of memory" model just doesn't exist as a viable
    camera model.

    Apparently not -- in the COTS sense.  But, that doesn't mean I can't
    build a "camera memory emulator".

    The downside is that this increases the cost of the "actual camera"
    (see my above comment wrt ammortization).

    Yep, implementing this likely costs more than giving the camera a
    dedicated moderate processor to do the major work. Might not handle the
    actual ID problem of your Door bell, but could likely process the live
    video, take a snapshot of a region with a good view of the vistor
    coming, and send just that to your master system for ID.


    And, it just moves the point at which a single host (of fixed capabilities) can no longer handle the scene's complexity.  (when you have 10 cameras?)

    The CMOS cameras with addressable pixels have "access times"
    significantly lower than your typical memory (and is read once) so
    doesn't really meet that model. Some of them do allow for sending
    multiple small regions of intererst and down loading just those
    regions, but this then starts to require moderate processor overhead
    to be loading all these regions and updating the grabber to put them
    where you want.

    You would, instead, let the "camera memory emulator" capture the entire
    image from the camera and place the entire image in a contiguous
    region of memory (from the perspective of the host).  The cost of capturing the portions that are not used is hidden *in* the cost of the "emulator".

    Yep, you could build you system with a two-port memory buffer between
    the frane grabber loading with one port, and the decoding processor on
    the other.

    The most cost effective way to do this is likely a commercial
    frame-grabber with built "two-port" memory, that sits in a slot of a PC
    type computer. These would likely not work with a "USB Camera" (why
    would you need a frame grabber with a camera that has it built in) so
    would be totally changing your cost models.

    IF your current design method is based on using USB cameras, trying to
    do a full custom interface may be out of your field of operation.


    And yes, it does mean that there might be some cases where you need a
    core module that has TWO cameras connected to a single processor,
    either to get a wider field of view, or to combine two different types
    of camera (maybe a high res black and white to a low res color if you
    need just minor color information, or combine a visible camera to a
    thermal camera). These just become another tool in your tool box.

    I *think* (uncharted territory) that the better investment is to develop algorithms that let me distribute the processing among multiple
    (single) "camera modules/nodes".  How would your "two camera" exemplar address an application requiring *three* cameras?  etc.

    The first question comes, what processing are you thinking of that needs
    images from 3 cameras.

    Note, my two camera example was a case where the processing needed to be
    done did need data from two cameras.

    If you have another task that needs a different camera, you just build a
    system with one two camera model and one 1 camera module, relaying back
    to a central control, or you nominate one of the modules to be central
    control if the load there is light enough.

    Your garage doer example would be built from 4 seperate and independent
    1 camera modules, either going to one as the master, or to a 5th module
    acting as the master.

    The cases I can think of for needing to process three cameras together
    would be:

    1) a system stiching images from 3 cameras and generating a single image
    out of it, but that totally breaks your concept of needing only bits of
    the images, that inherently is using most of each camera, and doing some stiching processing on the overlaps.

    2) A Multi-spectrum system, where again, you are taking the ENTIRE scene
    from the three cameras and producing a merged "false-color" image from
    them. Again, this also breaks you partial image model.


    I can, currently, distribute this processing by treating the
    region of memory into which a (local) camera's imagery is
    deserialized as a "memory object" and then exporting *access*
    to that object to other similar "camera modules/nodes".

    But, the access times of non-local memory are horrendous, given
    that the contents are ephemeral (if accesses could be *cached*
    on each host needing them, then these costs diminish).

    So, I need to come up with algorithms that let me export abstractions
    instead of raw data.

    Sounds like you current design is very centralized. This limits its scalability,


    Moving the processing to "host per camera" implementation gives you more >>> MIPS.  But, makes coordinating partial results tedious.

    Depends on what sort of partial results you are looking at.

    "Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"

    "Ah!  I was wondering whose legs those were in *my* image!"

    It is unclear what you actual image requirements per camera are, so
    it is hard to say what level camera and processor you will need.

    My first feeling is you seem to be assuming a fairly cheep camera
    and then doing some fairly simple processing over the partial image,
    in which case you might even be able to live with a camera that uses
    a crude SPI interface to bring the frame in, and a very simple
    processor.

    I use A LOT of cameras.  But, I should be able to swap the camera
    (upgrade/downgrade) and still rely on the same *local* compute engine.
    E.g., some of my cameras have Ir illuminators; it's not important
    in others; some are PTZ; others fixed.

    Doesn't sound reasonable. If you downgrade a camera, you can't count
    on it being able to meet the same requirements, or you over speced the
    initial camera.

    Sorry, I was using up/down relative to "nominal camera", not "specific
    camera
    previously selected for application".  I'd 8really* like to just have a single "camera module" (module = CPU+I/O) instead of one for camera type A and another for camera type B, etc.


    That only works if you are willing to spend for the sports car, even if
    you just need it to go around the block.

    It depends a bit on how much span you need of capability. A $10 camera
    is likely having a very different interface to a $30,000 camera, so will
    need a different board. Some boards might handle multiple camera
    interface types if it doesn't add a lot to the board, but you are apt to
    find that you need to make some choice.

    Then some tasks will just need a lot more computer power than others.
    Yes, you can just put too much computer power on the simple tasks, (and
    that might make sense to early design the higher end processor), but
    ultimately you are going to want the less expensive lower end processors.

    You put on a camera a processor capable of handling the tasks you
    expect out of that set of hardware.  One type of processor likely can
    handle a variaty of different camera setup with

    Exactly.  If a particular instance has an Ir illuminator, then you include controls for that in *the* "camera module".  If another instance doesn't have
    this ability, then those controls go unused.

    Yes, Auxilary functionality is often cheap to include the hooks for.


    Watching for an obstruction in the path of a garage door (open/close)
    has different requirements than trying to recognize a visitor at the
    front
    door.  Or, identify the locations of the occupants of a facility.

    Yes, so you don't want to "Pay" for the capability to recognize a
    visitor in your garage door sensor, so you use different levels of
    sensor/processor.

    Exactly.  But, the algorithms that do the scene analysis can be the same; you just parameterize the image and the objects within it that you seek.

    Actually, "Tracking" can be a very different type of algorithm then "Detecting". You might be able to use a Tracking base algorithm to
    Detect, but likely a much simpler algorithm can be used (needing less resources) to just detect.


    There will likely be some combinations that exceed the capabilities of
    the hardware to process in real-time.  So, you fall back to lower
    frame rates or let the algorithms drop targets ("You watch Bob, I'll
    watch Tom!")



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Sat Dec 31 01:27:10 2022
    On 12/30/2022 10:39 PM, Richard Damon wrote:
    On 12/30/22 4:59 PM, Don Y wrote:
    On 12/30/2022 11:02 AM, Richard Damon wrote:
    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above) >>>>>> - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB Cameras" sort
    of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap" processor
    right on each camera using a processor with a direct camera interface to >>>>> pull in the image and do your processing and send the results over some >>>>> comm-link to the center core.

    If I went the frame-grabber approach, that would be how I would address the
    hardware.  But, it doesn't scale well.  I.e., at what point do you throw in
    the towel and say there are too many concurrent images in the scene to >>>> pile them all onto a single "host" processor?

    Thats why I didn't suggest that method. I was suggesting each camera has its
    own tightly coupled processor that handles the need of THAT

    My existing "module" handles a single USB camera (with a fairly heavy-weight >> processor).

    But, being USB-based, there is no way to look at *part* of an image.
    And, I have to pay a relatively high cost (capturing the entire
    image from the serial stream) to look at *any* part of it.

    Yep, having chosen USB as your interface, you have limited yourself.

    Doesn't matter. Any serial interface poses the same problem;
    I can't examine the image until I can *look* at it.

    Since you say you have a fairly heavy-weight processor, that frame grab likely
    isn't you limiting factor.

    It becomes an issue when the number of cameras increases
    significantly on a single host. I have one scene that requires
    11 cameras to capture, completely.

    *If* a "camera memory" was available, I would site N of these
    in the (64b) address space of the host and let the host pick
    and choose which parts of which images it wanted to examine...
    without worrying about all of the bandwidth that would have been
    consumed deserializing those N images into that memory (which is
    a continuous process)

    But such a camera would almost certainly be designed for the processor to be on
    the same board as the camera, (or be VERY slow in access), so much less apt allow you to add multiple cameras to one processor.

    Yes. But, if the module is small, then siting the assembly "someplace convenient" isn't a big issue. I.e., my modules are smaller than most webcams/dashcams.

    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate
    processing power to handle a "single camera scene".  But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view.  E.g., point a camera across the path of a garage door and have >>>> it "notice" any deviation from the "unobstructed" image.

    And if one camera can't fit the full scene, you use two cameras, each with >>> there own processor, and they each process their own image.

    That's the above approach, but...

    The only problem is if your image processing algoritm need to compare parts >>> of the images between the two cameras, which seems unlikely.

    Consider watching a single room (e.g., a lobby at a business) and
    tracking the movements of "visitors".  It's unlikely that an individual's >> movements would always be constrained to a single camera field.  There will >> be times when he/she is "half-in" a field (and possibly NOT in the other,
    HALF in the other or ENTIRELY in the other).  You can't ignore cases where >> the entire object (or, your notion of what that object's characteristics
    might be) is not entirely in the field as that leaves a vulnerability.

    Sounds like you aren't overlapping your cameras enough or have insufficent coverage.  Maybe your problem is wrong field of view for your lens. Maybe you
    need fewer but better cameras with wider fields of view.

    Distance from camera to target means you have to play games with optics
    that can distort images.

    I also can't rely on "professional installers" *or* for the cameras to remain aimed in their original configurations.

    This might be due to try to use "stock" inexpensive USB cameras.

    For example, I watch our garage door with *four* cameras.  A camera is
    positioned on each side ("door jam"?) of the door "looking at" the other
    camera.  This because a camera can't likely see the full height of the door >> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side" >> and I'll watch *its* side!).

    Right, and if ANY see a problem, you stop. So no need for inter-camera coordination.

    But you don't know there is a problem until you can identify *where*
    the obstruction exists and if that poses a problem for the vehicle
    or the "obstructing item". Doing so requires knowing what the
    object likely is.

    E.g., SWMBO frequently stands in the doorway as I pull the car in or
    out (not enough room between vehicles *in* the garage to allow for
    ease of entry/egress). I'd not want this to be flagged as a
    problem (signalling an alert in the vehicle).

    Likewise, an obstruction on one vehicle-side of the garage shouldn't
    interfere with access to the other side.

    [The other two cameras are similarly positioned on the overhead *track*
    onto which the door rolls, when open]

    An object in (or near) the doorway can be visible in one (either) or
    both cameras, depending on where it is located.  Additionally, one of
    those manifestations may be only "partial" as regards to where it is
    located and intersects the cameras' fields of view.

    But since you aren't trying to ID, only Detect, there still isn't a need for camera-camera processing, just camera-door controller

    The cameras need to coordinate to resolve the location of the object.
    A "toy wagon" would present differently, visually, than a tall person.

    When the scene gets too large to represent in enough detail in a single >>>> camera's field of view, then there needs to be a way to coordinate
    multiple cameras to a single (virtual?) host.  If those cameras were just >>>> "chunks of memory", then the *imagery* would be easy to examine in a single
    host -- though the processing power *might* need to increase geometrically >>>> (depending on your current goal)

    Yes, but your "chunks of memory" model just doesn't exist as a viable camera
    model.

    Apparently not -- in the COTS sense.  But, that doesn't mean I can't
    build a "camera memory emulator".

    The downside is that this increases the cost of the "actual camera"
    (see my above comment wrt ammortization).

    Yep, implementing this likely costs more than giving the camera a dedicated moderate processor to do the major work. Might not handle the actual ID problem
    of your Door bell, but could likely process the live video, take a snapshot of
    a region with a good view of the vistor coming, and send just that to your master system for ID.

    But, then I could just use one of my existing "modules". If the
    target fits entirely within its field of view, then it has everything
    that it needs for the assigned functionality. If not, then it
    needs to consult with other cameras.

    The CMOS cameras with addressable pixels have "access times" significantly >>> lower than your typical memory (and is read once) so doesn't really meet >>> that model. Some of them do allow for sending multiple small regions of
    intererst and down loading just those regions, but this then starts to
    require moderate processor overhead to be loading all these regions and
    updating the grabber to put them where you want.

    You would, instead, let the "camera memory emulator" capture the entire
    image from the camera and place the entire image in a contiguous
    region of memory (from the perspective of the host).  The cost of capturing >> the portions that are not used is hidden *in* the cost of the "emulator".

    Yep, you could build you system with a two-port memory buffer between the frane
    grabber loading with one port, and the decoding processor on the other.

    Yes. But large *true* dual-port memories are costly. Instead, you would emulate such a device either by time-division multiplexing a single
    physical memory *or* sharing alternate memories (fill one, view the other).

    The most cost effective way to do this is likely a commercial frame-grabber with built "two-port" memory, that sits in a slot of a PC type computer. These
    would likely not work with a "USB Camera" (why would you need a frame grabber with a camera that has it built in) so would be totally changing your cost models.

    Yes, I have a few of these intended for medical imaging apps.
    Way too big; way too expensive. Designed for the wrong type of "host"

    IF your current design method is based on using USB cameras, trying to do a full custom interface may be out of your field of operation.

    And yes, it does mean that there might be some cases where you need a core >>> module that has TWO cameras connected to a single processor, either to get a
    wider field of view, or to combine two different types of camera (maybe a >>> high res black and white to a low res color if you need just minor color >>> information, or combine a visible camera to a thermal camera). These just >>> become another tool in your tool box.

    I *think* (uncharted territory) that the better investment is to develop
    algorithms that let me distribute the processing among multiple
    (single) "camera modules/nodes".  How would your "two camera" exemplar
    address an application requiring *three* cameras?  etc.

    The first question comes, what processing are you thinking of that needs images
    from 3 cameras.

    Note, my two camera example was a case where the processing needed to be done did need data from two cameras.

    If you have another task that needs a different camera, you just build a system
    with one two camera model and one 1 camera module, relaying back to a central control, or you nominate one of the modules to be central control if the load there is light enough.

    Your garage doer example would be built from 4 seperate and independent 1 camera modules, either going to one as the master, or to a 5th module acting as
    the master.

    Yes, but they have to share image data (either raw or abstracted)
    to make deductions about the targets present.

    The cases I can think of for needing to process three cameras together would be:

    1) a system stiching images from 3 cameras and generating a single image out of
    it, but that totally breaks your concept of needing only bits of the images, that inherently is using most of each camera, and doing some stiching processing on the overlaps.

    2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from the three cameras and producing a merged "false-color" image from them. Again,
    this also breaks you partial image model.

    Or, tracking multiple actors in an "arena" -- visitors in a business,
    occupants in a home, etc. In much the same way that the two garage
    door cameras conspire to locate the obstruction's position along the
    line from left doorjam to right, pairs of cameras can resolve
    a target in an arena and *sets* of cameras (freely paired, as needed)
    can track all locations (and targets) in the arena.

    I can, currently, distribute this processing by treating the
    region of memory into which a (local) camera's imagery is
    deserialized as a "memory object" and then exporting *access*
    to that object to other similar "camera modules/nodes".

    But, the access times of non-local memory are horrendous, given
    that the contents are ephemeral (if accesses could be *cached*
    on each host needing them, then these costs diminish).

    So, I need to come up with algorithms that let me export abstractions
    instead of raw data.

    Sounds like you current design is very centralized. This limits its scalability,

    The current design is completely distributed. The only "shared component"
    is the network switch through which they converse and the RDBMS that acts
    as the persistent store.

    If a site realizes that it needs additional coverage to track <whatever>
    it just adds another camera module and lets the RDBMS know about it's general location/functionality (i.e., how it can relate to any other cameras
    covering the same arena)

    My first feeling is you seem to be assuming a fairly cheep camera and then
    doing some fairly simple processing over the partial image, in which case >>>>> you might even be able to live with a camera that uses a crude SPI
    interface to bring the frame in, and a very simple processor.

    I use A LOT of cameras.  But, I should be able to swap the camera
    (upgrade/downgrade) and still rely on the same *local* compute engine. >>>> E.g., some of my cameras have Ir illuminators; it's not important
    in others; some are PTZ; others fixed.

    Doesn't sound reasonable. If you downgrade a camera, you can't count on it >>> being able to meet the same requirements, or you over speced the initial >>> camera.

    Sorry, I was using up/down relative to "nominal camera", not "specific camera
    previously selected for application".  I'd 8really* like to just have a
    single "camera module" (module = CPU+I/O) instead of one for camera type A >> and another for camera type B, etc.

    That only works if you are willing to spend for the sports car, even if you just need it to go around the block.

    If the "extra" bits of the sports car can be used by other elements,
    then those costs aren't directly borne by the camera module, itself.
    E.g., when the garage door is closed, there's no reason the modules
    in the garage can't be busy training speech models or removing
    commercials from recorded broadcast content.

    If, OTOH, you detect objects with a photo-interrupter across the door's
    path, there's scant little it can do when not needed.

    It depends a bit on how much span you need of capability. A $10 camera is likely having a very different interface to a $30,000 camera, so will need a different board. Some boards might handle multiple camera interface types if it
    doesn't add a lot to the board, but you are apt to find that you need to make some choice.

    I don't ever see a need for a $30,000 camera. There may be a need for a
    PTZ model. Or, a low lux model. Or, one with longer focal length. Or,
    shorter (I'd considered putting one *in* the mailbox to examine its
    contents instead of just detecting that it had been "visited").

    Instead of a 4K device, I'd opt for multiple simpler devices better
    positioned.

    But, not radically different in terms of cost, size, etc.

    If you walk into a bank lobby, you don't see *one* super-high resolution,
    wide field camera surveilling the lobby but, rather half a dozen or more watching specific portions of the lobby. Similarly, if you use the
    self-check at the store, there is a camera per checkout station instead
    of one "really good" camera located centrally trying to take it all in.

    This gives installers more leeway in terms of how they cover an arena.

    Then some tasks will just need a lot more computer power than others. Yes, you
    can just put too much computer power on the simple tasks, (and that might make
    sense to early design the higher end processor), but ultimately you are going to want the less expensive lower end processors.

    I can call on surplus processing power from other nodes in the system
    in much the same way that they can call on surplus capabilities from
    a camera module that isn't "seeing" anything interesting, at the moment.

    There will always be limits on what can be done; I'm not going
    to be able to VISUALLY verify that you have the right wrench in
    your hand as you set about working on the car. Or, that you
    are holding an eating utensil instead of a random piece of
    plastic as you traverse the kitchen.

    But, I'll know YOU are in the kitchen and likely the person whose
    voice I hear (to further reinforce the speaker identification
    algorithms).

    You put on a camera a processor capable of handling the tasks you expect out
    of that set of hardware.  One type of processor likely can handle a variaty
    of different camera setup with

    Exactly.  If a particular instance has an Ir illuminator, then you include >> controls for that in *the* "camera module".  If another instance doesn't have
    this ability, then those controls go unused.

    Yes, Auxilary functionality is often cheap to include the hooks for.

    But, it often requires looking at your TOTAL needs instead of designing
    for specific (initial) needs. E.g., my camera modules now include
    audio capabilities as there are instances where I want an audio
    pickup in the same arena that I am monitoring. Silly to have to add
    an "audio module" just because I didn't have the foresight to
    include it with the camera!

    Watching for an obstruction in the path of a garage door (open/close)
    has different requirements than trying to recognize a visitor at the front >>>> door.  Or, identify the locations of the occupants of a facility.

    Yes, so you don't want to "Pay" for the capability to recognize a visitor in
    your garage door sensor, so you use different levels of sensor/processor. >>
    Exactly.  But, the algorithms that do the scene analysis can be the same; >> you just parameterize the image and the objects within it that you seek.

    Actually, "Tracking" can be a very different type of algorithm then "Detecting". You might be able to use a Tracking base algorithm to Detect, but
    likely a much simpler algorithm can be used (needing less resources) to just detect.

    My current detection algorithm (e.g., garage) just looks for deltas between "clear" and "obstructed" imagery, conditioned by masks. There is some
    image processing required as things look different at night vs. day, etc.

    I don't have to "get it right". All I have to do is demonstrate "proof of concept". And, be able to indicate why a particular approach is superior
    to others/existing ones.

    E.g., if you drive a "pickup-on-steroids", you'd need to locate a photointerrupter "obstruction detector" pretty high up off the ground
    to catch the case where the truck bed was in the way of the door.
    Or, some lumber overhanging the end of the bed that you forgot you'd
    brought home! And, you'd likely need *another* detector down low
    to catch toddlers or toy wagons in the path of the door.

    OTOH, doing the detection with a camera catches these use conditions
    in addition to the "nominal" one for which the photointerrupter was
    designed.

    Tracking two/four occupants of a home *suggests* that you can track
    6 or 8. Or, dozens of employees in a business conference room, etc.

    I have no desire to spend my time perfecting any of these
    technologies (I have other goals); just lay the groundwork and the
    framework to make them possible.

    There will likely be some combinations that exceed the capabilities of
    the hardware to process in real-time.  So, you fall back to lower
    frame rates or let the algorithms drop targets ("You watch Bob, I'll
    watch Tom!")

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Sat Dec 31 13:15:47 2022
    On 12/30/2022 5:32, Don Y wrote:
    On 12/29/2022 5:40 PM, Richard Damon wrote:
    On 12/29/22 5:57 PM, Don Y wrote:
    On 12/29/2022 2:09 PM, Richard Damon wrote:
    On 12/29/22 2:26 PM, Don Y wrote:
    On 12/29/2022 10:06 AM, Richard Damon wrote:
    On 12/29/22 8:16 AM, Don Y wrote:
    ISTR playing with de-encapsulated DRAMs as image sensors
    back in school (DRAM being relatively new technology, then).

    But, most cameras seem to have (bit- or word-) serial interfaces >>>>>>> nowadays.  Are there any (mainstream/high volume) devices that
    "look" like a chunk of memory, in their native form?

    Using a DRAM in that manner would only give you a single bit value >>>>>> for each pixel (maybe some more modern memories store multiple
    bits in a cell so you get a few grey levels).

    I mentioned the DRAM reference only as an exemplar of how a "true"
    parallel, random access interface could exist.

    Right, and cameras based on parallel random access do exist, but
    tend to be on the smaller and slower end of the spectrum.


    There are some CMOS sensors that let you address pixels
    individually and in a random order (like you got with the DRAM)
    but by its nature, such a readout method tends to be slow, and
    space inefficient, so these interfaces tend to be only available
    on smaller camera arrays.

    But, if you are processing the image, such an approach can lead to
    higher throughput than having to transfer a serial data stream into
    memory (thus consuming memory bandwidth).

    My guess is that in almost all cases, the need to send the address
    to teh camera and then get back the pixel value is going to use up
    more total bandwidth than getting the image in a stream. The one
    exception would be if you need just a very small percentage of the
    array data, and it is scattered over the array so a Region of
    Interest operation can't be used.

    No, you're missing the nature of the DRAM example.

    You don't "send" the address of the memory cell desired *to* the DRAM.
    You simply *address* the memory cell, directly.  I.e., if there are
    N locations in the DRAM, then N addresses in your address space are
    consumed by it; one for each location in the array.

    No, look at you DRAM timing again, the trasaction begins with the
    sending of the address over typically two clock edges with RAS and
    CAS, and then a couple of clock cycles and then you get back on the
    data bus the answer.

    But it's a single memory reference.  Look at what happens when you deserialize a USB video stream into that same DRAM.  The DMAC has
    tied up the bus for the same amount of time that the processor
    would have if it read those same N locations.

    Yes, the addresses come from an address bus, using address space out
    of the processor, but it is a multi-cycle operation. Typically, you
    read back a "burst" with some minimal caching on the processor side,
    but that is more a minor detail.

    I'm looking for *that* sort of "direct access" in a camera.

    Its been awhile, but I thought some CMOS cameras could work on a
    similar basis, strobe a Row/Column address from pins on the camera,
    and a few clock cycles later you got a burst out of the camera
    starting at the address cell.

    I don't want the camera to decide which pixels *it* thinks I want to see.
    It sends me a burst of a row -- but the next part of the image I may have wanted to access may have been down the same *column*.  Or, in another
    part of the image entirely.

    Serial protocols inherently deliver data in a predefined pattern
    (often intended for display).  Scene analysis doesn't necessarily
    conform to that same pattern.

    Isn't there a camera doing a protocol which allows you to request
    a specific area only to be transferred? RFB like, VNC does that
    all the time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sat Dec 31 11:16:44 2022
    On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
    Serial protocols inherently deliver data in a predefined pattern
    (often intended for display).  Scene analysis doesn't necessarily
    conform to that same pattern.

    Isn't there a camera doing a protocol which allows you to request
    a specific area only to be transferred? RFB like, VNC does that
    all the time.

    That only makes sense if you know, a priori, which part(s) of the
    image you might want to examine. E.g., it would work for
    "exposing" just the portion of the field that "overlaps" some
    other image. I can get fixed parts of partial frames from
    *other* cameras just by ensuring the other camera puts that
    portion of the image in a particular memory object and then
    export that memory object to the node that wants it.

    But, if a target can move into or out of the exposed area, then
    you have to make a return trip to the camera to request MORE of
    the field.

    When your targets are "far away" (like a surveillance camera
    monitoring a parking lot), targets don't move from their
    previous noted positions considerably from one frame to the
    next.

    But, when the camera and targets are in close proximity,
    there's greater (apparent) relative motion in the same
    frame-interval. So, knowing where (x,y+WxH)) the portion of
    the image of interest lay, previously, is less predictive
    of where it may lie currently.

    Having the entire image available means the software
    can look <wherever> and <whenever>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Sat Dec 31 22:13:16 2022
    On 12/31/2022 20:16, Don Y wrote:
    On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
    Serial protocols inherently deliver data in a predefined pattern
    (often intended for display).  Scene analysis doesn't necessarily
    conform to that same pattern.

    Isn't there a camera doing a protocol which allows you to request
    a specific area only to be transferred? RFB like, VNC does that
    all the time.

    That only makes sense if you know, a priori, which part(s) of the
    image you might want to examine.  E.g., it would work for
    "exposing" just the portion of the field that "overlaps" some
    other image.  I can get fixed parts of partial frames from
    *other* cameras just by ensuring the other camera puts that
    portion of the image in a particular memory object and then
    export that memory object to the node that wants it.

    But, if a target can move into or out of the exposed area, then
    you have to make a return trip to the camera to request MORE of
    the field.

    When your targets are "far away" (like a surveillance camera
    monitoring a parking lot), targets don't move from their
    previous noted positions considerably from one frame to the
    next.

    But, when the camera and targets are in close proximity,
    there's greater (apparent) relative motion in the same
    frame-interval.  So, knowing where (x,y+WxH)) the portion of
    the image of interest lay, previously, is less predictive
    of where it may lie currently.

    Having the entire image available means the software
    can look <wherever> and <whenever>.


    Well yes, obviously so, but this is valid whatever the interface.
    Direct access to the sensor cells can't be double buffered so
    you will have to transfer anyway to get the frame you are analyzing
    static.
    Perhaps you could find a way to make yourself some camera module
    using an existing one, MIPI or even USB, since you are looking for low
    overall cost; and add some MCU board to it to do the buffering and
    transfer areas on request. Or may be put enough CPU power together with
    each camera to do most if not all of the analysis... Depending on
    which achieves the lowest cost. But I can't say much on cost, that's
    pretty far from me (as you know).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to as I on Sat Dec 31 14:29:28 2022
    On 12/31/2022 1:13 PM, Dimiter_Popoff wrote:
    On 12/31/2022 20:16, Don Y wrote:
    On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
    Serial protocols inherently deliver data in a predefined pattern
    (often intended for display).  Scene analysis doesn't necessarily
    conform to that same pattern.

    Isn't there a camera doing a protocol which allows you to request
    a specific area only to be transferred? RFB like, VNC does that
    all the time.

    That only makes sense if you know, a priori, which part(s) of the
    image you might want to examine.  E.g., it would work for
    "exposing" just the portion of the field that "overlaps" some
    other image.  I can get fixed parts of partial frames from
    *other* cameras just by ensuring the other camera puts that
    portion of the image in a particular memory object and then
    export that memory object to the node that wants it.

    But, if a target can move into or out of the exposed area, then
    you have to make a return trip to the camera to request MORE of
    the field.

    When your targets are "far away" (like a surveillance camera
    monitoring a parking lot), targets don't move from their
    previous noted positions considerably from one frame to the
    next.

    But, when the camera and targets are in close proximity,
    there's greater (apparent) relative motion in the same
    frame-interval.  So, knowing where (x,y+WxH)) the portion of
    the image of interest lay, previously, is less predictive
    of where it may lie currently.

    Having the entire image available means the software
    can look <wherever> and <whenever>.

    Well yes, obviously so, but this is valid whatever the interface.
    Direct access to the sensor cells can't be double buffered so
    you will have to transfer anyway to get the frame you are analyzing
    static.

    I would assume the devices would have evolved an "internal buffer"
    (as I said, my experience with *DRAM* in this manner was 40+ years
    ago)

    Perhaps you could find a way to make yourself some camera module
    using an existing one, MIPI or even USB, since you are looking for low overall cost; and add some MCU board to it to do the buffering and
    transfer areas on request. Or may be put enough CPU power together with
    each camera to do most if not all of the analysis... Depending on
    which achieves the lowest cost. But I can't say much on cost, that's
    pretty far from me (as you know).

    My current approach gives me that -- MIPS, size, etc. But, the cost
    of transferring parts of the image (without adding a specific mechanism)
    is a "shared page" (DSM). So, host (on node A) references part of
    node *B*s frame buffer and the page (on B) containing that memory
    address gets shipped back to node A and mapped into A's memory.

    An agency on A could "touch" a "pixel-per-block" and cause the entire
    frame to be transferred to A, from B (or, I can treat the entire frame
    as a coherent object and arrange for ALL of it to be transferred when
    ANY of it is referenced). Some process on B could alternate between
    multiple such "memory objects" ("this one is complete, but I'm busy
    filling this OTHER one with data from the camera interface")
    to give me a *virtual* "camera memory device".

    But, transport delays make this unsuitable for real-time work;
    a megabyte of imagery would require 100ms to transfer, in "raw"
    form. (I could encode it on the originating host; transfer it
    and then decode it on the receiving host -- at the expense of MIPS.
    This is how I "record" video without saturating the network)

    So, you (B) want to "abstract" the salient features of the image
    while it is on B and then transfer just those to A. *Use*
    them, on A, and then move on to the next set of features
    (that B has computed while A was busy chewing on the last set)

    Or, give A direct access to the native data (without A having
    to capture video streams from each of the cameras that it wants
    to potentially examine)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to blockedofcourse@foo.invalid on Sat Dec 31 17:40:08 2022
    [Hope you are faring well... enjoying the COLD! ;) ]

    Not very. Don't think I have your latest email.


    On Fri, 30 Dec 2022 14:59:39 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    On 12/30/2022 11:02 AM, Richard Damon wrote:
    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above) >>>>> - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB Cameras" sort >>>> of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap" processor >>>> right on each camera using a processor with a direct camera interface to >>>> pull in the image and do your processing and send the results over some >>>> comm-link to the center core.

    If I went the frame-grabber approach, that would be how I would address the >>> hardware.  But, it doesn't scale well.  I.e., at what point do you throw in
    the towel and say there are too many concurrent images in the scene to
    pile them all onto a single "host" processor?

    Thats why I didn't suggest that method. I was suggesting each camera has its >> own tightly coupled processor that handles the need of THAT

    My existing "module" handles a single USB camera (with a fairly heavy-weight >processor).

    But, being USB-based, there is no way to look at *part* of an image.
    And, I have to pay a relatively high cost (capturing the entire
    image from the serial stream) to look at *any* part of it.

    *If* a "camera memory" was available, I would site N of these
    in the (64b) address space of the host and let the host pick
    and choose which parts of which images it wanted to examine...
    without worrying about all of the bandwidth that would have been
    consumed deserializing those N images into that memory (which is
    a continuous process)

    That's the way all cameras work - at least low level. The camera
    captures a field (or a frame, depending) on its CCD, and then the CCD
    pixel data is read out serially by a controller.

    What you are looking for is some kind of local frame buffering at the
    camera. There are some "smart" cameras that provide that ... and also generally a bunch of image analysis functions that you may or may not
    find useful. I haven't played with any of them in a long time, and
    when I did the image functions were too primitive for my purpose, so I
    really can't recommend anything.


    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate
    processing power to handle a "single camera scene".  But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view.  E.g., point a camera across the path of a garage door and have >>> it "notice" any deviation from the "unobstructed" image.

    And if one camera can't fit the full scene, you use two cameras, each with >> there own processor, and they each process their own image.

    That's the above approach, but...

    The only problem is if your image processing algoritm need to compare parts of
    the images between the two cameras, which seems unlikely.

    Consider watching a single room (e.g., a lobby at a business) and
    tracking the movements of "visitors". It's unlikely that an individual's >movements would always be constrained to a single camera field. There will >be times when he/she is "half-in" a field (and possibly NOT in the other, >HALF in the other or ENTIRELY in the other). You can't ignore cases where >the entire object (or, your notion of what that object's characteristics >might be) is not entirely in the field as that leaves a vulnerability.

    I've done simple cases following objects from one camera to another,
    but not dealing with different angles/points of view - the cameras had contiguous views with a bit of overlap. That made it relatively easy.

    Following a person, e.g., seen quarter-behind in one camera, and
    tracking them to another camera that sees a side view - from the
    /other/ side -

    Just following a person is easy, but tracking a specific person,
    particularly when multiple people are present, gets very complicated
    very quickly.


    For example, I watch our garage door with *four* cameras. A camera is >positioned on each side ("door jam"?) of the door "looking at" the other >camera. This because a camera can't likely see the full height of the door >opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side" >and I'll watch *its* side!).

    [The other two cameras are similarly positioned on the overhead *track*
    onto which the door rolls, when open]

    An object in (or near) the doorway can be visible in one (either) or
    both cameras, depending on where it is located. Additionally, one of
    those manifestations may be only "partial" as regards to where it is
    located and intersects the cameras' fields of view.

    The "cost" of watching the door is only the cost of the actual *cameras*.
    The cost of the compute resources is amortized over the rest of the system
    as those can be used for other, non-camera, non-garage related activities.

    It does say that if trying to track something across the cameras, you need >> enough overlap to allow them to hand off the object when it is in the overlap.

    And, objects that consume large portions of a camera's field of view
    require similar handling (unless you can always guarantee that cameras
    and targets are "far apart")

    When the scene gets too large to represent in enough detail in a single
    camera's field of view, then there needs to be a way to coordinate
    multiple cameras to a single (virtual?) host.  If those cameras were just >>> "chunks of memory", then the *imagery* would be easy to examine in a single >>> host -- though the processing power *might* need to increase geometrically >>> (depending on your current goal)

    Yes, but your "chunks of memory" model just doesn't exist as a viable camera >> model.

    Apparently not -- in the COTS sense. But, that doesn't mean I can't
    build a "camera memory emulator".

    The downside is that this increases the cost of the "actual camera"
    (see my above comment wrt ammortization).

    And, it just moves the point at which a single host (of fixed capabilities) >can no longer handle the scene's complexity. (when you have 10 cameras?)

    The CMOS cameras with addressable pixels have "access times" significantly >> lower than your typical memory (and is read once) so doesn't really meet that
    model. Some of them do allow for sending multiple small regions of intererst >> and down loading just those regions, but this then starts to require moderate
    processor overhead to be loading all these regions and updating the grabber to
    put them where you want.

    You would, instead, let the "camera memory emulator" capture the entire
    image from the camera and place the entire image in a contiguous
    region of memory (from the perspective of the host). The cost of capturing >the portions that are not used is hidden *in* the cost of the "emulator".

    And yes, it does mean that there might be some cases where you need a core >> module that has TWO cameras connected to a single processor, either to get a >> wider field of view, or to combine two different types of camera (maybe a high
    res black and white to a low res color if you need just minor color
    information, or combine a visible camera to a thermal camera). These just
    become another tool in your tool box.

    I *think* (uncharted territory) that the better investment is to develop >algorithms that let me distribute the processing among multiple
    (single) "camera modules/nodes". How would your "two camera" exemplar >address an application requiring *three* cameras? etc.

    I can, currently, distribute this processing by treating the
    region of memory into which a (local) camera's imagery is
    deserialized as a "memory object" and then exporting *access*
    to that object to other similar "camera modules/nodes".

    But, the access times of non-local memory are horrendous, given
    that the contents are ephemeral (if accesses could be *cached*
    on each host needing them, then these costs diminish).

    So, I need to come up with algorithms that let me export abstractions
    instead of raw data.

    Moving the processing to "host per camera" implementation gives you more >>> MIPS.  But, makes coordinating partial results tedious.

    Depends on what sort of partial results you are looking at.

    "Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"

    "Ah! I was wondering whose legs those were in *my* image!"

    It is unclear what you actual image requirements per camera are, so it is >>>> hard to say what level camera and processor you will need.

    My first feeling is you seem to be assuming a fairly cheep camera and then >>>> doing some fairly simple processing over the partial image, in which case >>>> you might even be able to live with a camera that uses a crude SPI interface
    to bring the frame in, and a very simple processor.

    I use A LOT of cameras.  But, I should be able to swap the camera
    (upgrade/downgrade) and still rely on the same *local* compute engine.
    E.g., some of my cameras have Ir illuminators; it's not important
    in others; some are PTZ; others fixed.

    Doesn't sound reasonable. If you downgrade a camera, you can't count on it >> being able to meet the same requirements, or you over speced the initial camera.

    Sorry, I was using up/down relative to "nominal camera", not "specific camera >previously selected for application". I'd 8really* like to just have a >single "camera module" (module = CPU+I/O) instead of one for camera type A >and another for camera type B, etc.

    You put on a camera a processor capable of handling the tasks you expect out of
    that set of hardware.  One type of processor likely can handle a variaty of >> different camera setup with

    Exactly. If a particular instance has an Ir illuminator, then you include >controls for that in *the* "camera module". If another instance doesn't have >this ability, then those controls go unused.

    Watching for an obstruction in the path of a garage door (open/close)
    has different requirements than trying to recognize a visitor at the front >>> door.  Or, identify the locations of the occupants of a facility.

    Yes, so you don't want to "Pay" for the capability to recognize a visitor in >> your garage door sensor, so you use different levels of sensor/processor.

    Exactly. But, the algorithms that do the scene analysis can be the same;
    you just parameterize the image and the objects within it that you seek.

    There will likely be some combinations that exceed the capabilities of
    the hardware to process in real-time. So, you fall back to lower
    frame rates or let the algorithms drop targets ("You watch Bob, I'll
    watch Tom!")


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to George Neuner on Sat Dec 31 18:35:07 2022
    On 12/31/2022 3:40 PM, George Neuner wrote:

    [Hope you are faring well... enjoying the COLD! ;) ]

    Not very. Don't think I have your latest email.

    Hmmm... I wondered why I hadn't heard from you! (I trashed a bunch
    of email aliases trying to shake off spammers -- you know, the
    businesses that "need" your email address in order for you to
    place an order... and then feel like you will be DELIGHTED to
    receive an ongoing stream of solicitations! The problem with
    aliases is that you can't "undelete" them -- they get permanently
    excised from the mail domain's name space, for obvious reasons!)

    On Fri, 30 Dec 2022 14:59:39 -0700, Don Y
    <blockedofcourse@foo.invalid> wrote:

    On 12/30/2022 11:02 AM, Richard Damon wrote:
    So, my options are:
    - reduce the overall frame rate such that N cameras can
       be serviced by the USB (or whatever) interface *and*
       the processing load
    - reduce the resolution of the cameras (a special case of the above) >>>>>> - reduce the number of cameras "per processor" (again, above)
    - design a "camera memory" (frame grabber) that I can install
       multiply on a single host
    - develop distributed algorithms to allow more bandwidth to
       effectively be applied

    The fact that you are starting for the concept of using "USB Cameras" sort
    of starts you with that sort of limit.

    My personal thought on your problem is you want to put a "cheap" processor
    right on each camera using a processor with a direct camera interface to >>>>> pull in the image and do your processing and send the results over some >>>>> comm-link to the center core.

    If I went the frame-grabber approach, that would be how I would address the
    hardware.  But, it doesn't scale well.  I.e., at what point do you throw in
    the towel and say there are too many concurrent images in the scene to >>>> pile them all onto a single "host" processor?

    Thats why I didn't suggest that method. I was suggesting each camera has its
    own tightly coupled processor that handles the need of THAT

    My existing "module" handles a single USB camera (with a fairly heavy-weight >> processor).

    But, being USB-based, there is no way to look at *part* of an image.
    And, I have to pay a relatively high cost (capturing the entire
    image from the serial stream) to look at *any* part of it.

    *If* a "camera memory" was available, I would site N of these
    in the (64b) address space of the host and let the host pick
    and choose which parts of which images it wanted to examine...
    without worrying about all of the bandwidth that would have been
    consumed deserializing those N images into that memory (which is
    a continuous process)

    That's the way all cameras work - at least low level. The camera
    captures a field (or a frame, depending) on its CCD, and then the CCD
    pixel data is read out serially by a controller.

    What you are looking for is some kind of local frame buffering at the

    Exactly. And, bring that buffer out to a set of pins for random
    access -- like a DRAM (memory). In that way, I could explore whatever
    parts of the image I deemed necessary -- without paying a price
    (bandwidth) to pull the image data into "my" memory.

    camera. There are some "smart" cameras that provide that ... and also generally a bunch of image analysis functions that you may or may not
    find useful. I haven't played with any of them in a long time, and
    when I did the image functions were too primitive for my purpose, so I
    really can't recommend anything.

    ISTM that the better solution is to develop algorithms that can
    process portions of the scene, concurrently, on different "hosts".
    Then, coordinate these "partial results" to form the desired result.

    I already have a "camera module" (host+USB camera) that has adequate
    processing power to handle a "single camera scene".  But, these all
    assume the scene can be easily defined to fit in that camera's field
    of view.  E.g., point a camera across the path of a garage door and have >>>> it "notice" any deviation from the "unobstructed" image.

    And if one camera can't fit the full scene, you use two cameras, each with >>> there own processor, and they each process their own image.

    That's the above approach, but...

    The only problem is if your image processing algoritm need to compare parts of
    the images between the two cameras, which seems unlikely.

    Consider watching a single room (e.g., a lobby at a business) and
    tracking the movements of "visitors". It's unlikely that an individual's
    movements would always be constrained to a single camera field. There will >> be times when he/she is "half-in" a field (and possibly NOT in the other,
    HALF in the other or ENTIRELY in the other). You can't ignore cases where >> the entire object (or, your notion of what that object's characteristics
    might be) is not entirely in the field as that leaves a vulnerability.

    I've done simple cases following objects from one camera to another,
    but not dealing with different angles/points of view - the cameras had contiguous views with a bit of overlap. That made it relatively easy.

    Yes. Each camera needs to grok the physical space in order to
    understand "references" provided by another observer into that
    space.

    For the garage door cameras, it's relatively simple: you're
    looking at a very narrow strip of 2-space (the plane of the
    door) from opposing ends. You *know* that the door opening
    has the same physical dimensions as seen on each door jam
    by the opposing cameras, even if it "appears differently" to
    the two observers. And, you know that anything seen by a
    camera is located between that camera and its counterpart
    (though it may not be visible to the counterpart).

    What you don't know is how "thick" (along the vision axis)
    the object might be (e.g., a person vs. a vehicle). But, I
    don't see that knowledge adding much value to warrant further
    complicating the design.

    Following a person, e.g., seen quarter-behind in one camera, and
    tracking them to another camera that sees a side view - from the
    /other/ side -

    Just following a person is easy, but tracking a specific person,
    particularly when multiple people are present, gets very complicated
    very quickly.

    Yes. You need enough detail to be able to distinguish *easily*
    between candidates. You're not just "counting bodies".

    In the *home* environment, the actors are likely not malevolent;
    it's in their best interest for the system to know who/where they
    are. But, I don't think that's necessarily true in commercial and
    industrial environments. Even though it is similarly in THEIR best
    interests, I think the actors, there, are more likely to express
    hostility towards their overlords in that way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Sun Jan 1 16:04:49 2023
    On 12/31/2022 23:29, Don Y wrote:
    On 12/31/2022 1:13 PM, Dimiter_Popoff wrote:
    On 12/31/2022 20:16, Don Y wrote:
    On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
    Serial protocols inherently deliver data in a predefined pattern
    (often intended for display).  Scene analysis doesn't necessarily
    conform to that same pattern.

    Isn't there a camera doing a protocol which allows you to request
    a specific area only to be transferred? RFB like, VNC does that
    all the time.

    That only makes sense if you know, a priori, which part(s) of the
    image you might want to examine.  E.g., it would work for
    "exposing" just the portion of the field that "overlaps" some
    other image.  I can get fixed parts of partial frames from
    *other* cameras just by ensuring the other camera puts that
    portion of the image in a particular memory object and then
    export that memory object to the node that wants it.

    But, if a target can move into or out of the exposed area, then
    you have to make a return trip to the camera to request MORE of
    the field.

    When your targets are "far away" (like a surveillance camera
    monitoring a parking lot), targets don't move from their
    previous noted positions considerably from one frame to the
    next.

    But, when the camera and targets are in close proximity,
    there's greater (apparent) relative motion in the same
    frame-interval.  So, knowing where (x,y+WxH)) the portion of
    the image of interest lay, previously, is less predictive
    of where it may lie currently.

    Having the entire image available means the software
    can look <wherever> and <whenever>.

    Well yes, obviously so, but this is valid whatever the interface.
    Direct access to the sensor cells can't be double buffered so
    you will have to transfer anyway to get the frame you are analyzing
    static.

    I would assume the devices would have evolved an "internal buffer"
    (as I said, my experience with *DRAM* in this manner was 40+ years
    ago)

    Perhaps you could find a way to make yourself some camera module
    using an existing one, MIPI or even USB, since you are looking for low
    overall cost; and add some MCU board to it to do the buffering and
    transfer areas on request. Or may be put enough CPU power together with
    each camera to do most if not all of the analysis... Depending on
    which achieves the lowest cost. But I can't say much on cost, that's
    pretty far from me (as you know).

    My current approach gives me that -- MIPS, size, etc.  But, the cost
    of transferring parts of the image (without adding a specific mechanism)
    is a "shared page" (DSM).  So, host (on node A) references part of
    node *B*s frame buffer and the page (on B) containing that memory
    address gets shipped back to node A and mapped into A's memory.

    I assume A and B are connected over Ethernet via tcp/ip? Or are they
    just two cores on the same chip or something?


    ....

    But, transport delays make this unsuitable for real-time work;
    a megabyte of imagery would require 100ms to transfer, in "raw"
    form.  (I could encode it on the originating host; transfer it
    and then decode it on the receiving host -- at the expense of MIPS.
    This is how I "record" video without saturating the network)

    100 ms latency would be an issue if you face say A-Train
    (for you and the rest who have not watched "The Boys" - he is a
    super (a "sup" as they have it) who can run fast enough to not
    be seen by normal humans...) :-).


    So, you (B) want to "abstract" the salient features of the image
    while it is on B and then transfer just those to A.  *Use*
    them, on A, and then move on to the next set of features
    (that B has computed while A was busy chewing on the last set)

    Or, give A direct access to the native data (without A having
    to capture video streams from each of the cameras that it wants
    to potentially examine)


    In RFB, the server can - and should - decide which parts of the
    framebuffer have changed and send across only them. Which works
    fine for computer generated images - plenty of single colour areas,
    no noise etc. In your case you might have to resort to jpeg
    the image downgrading its quality so "small" changes would
    disappear, I think those who write video encoders do something
    like that (for my vnc server lossless RLE was plenty, but it it
    is not very efficient when the screen is some real life photo,
    obviously).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Sun Jan 1 23:53:05 2023
    On 1/1/2023 23:28, Don Y wrote:
    On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
    ....
    ....
    In RFB, the server can - and should - decide which parts of the
    framebuffer have changed and send across only them. Which works
    fine for computer generated images - plenty of single colour areas,

    Yes, but if the receiving end has no interest in those areas
    of the image, then you're just wasting effort (bandwidth)
    transfering them -- esp if the areas of interest will need
    that bandwidth!

    But nothing is stopping the receiving end to request a particular area
    and the sending side sending just the changed parts of it.
    I am not suggesting you use RFB, I use it just as an example.


    no noise etc.  In your case you might have to resort to jpeg
    the image downgrading its quality so "small" changes would
    disappear, I think those who write video encoders do something
    like that (for my vnc server lossless  RLE was plenty, but it it
    is not very efficient when the screen is some real life photo,
    obviously).

    I think the solution is to share abstractions.  Design the
    algorithms so they can address partial "objects of interest"
    and report on those.  Then, coordinate those partial results
    to come up with a unified concept of what's happening in
    the observed scene.

    Well I think this is the way to go, too. This implies enough
    CPU horsepowers per camera which nowadays might be practical.

    But, this is a fair bit harder than just trying to look at
    a unified frame buffer and detect objects/motion!

    Well yes but you lose the framebuffer transfer problem, no
    need to do your "remote virtual machine" for that etc.


    OTOH, if it was easy, it would be boring ("What's to be learned
    from doing something that's already been done?")


    Not only that; if it were easy everyone else would be doing it :-).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Jan 1 14:28:20 2023
    On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
    Perhaps you could find a way to make yourself some camera module
    using an existing one, MIPI or even USB, since you are looking for low
    overall cost; and add some MCU board to it to do the buffering and
    transfer areas on request. Or may be put enough CPU power together with
    each camera to do most if not all of the analysis... Depending on
    which achieves the lowest cost. But I can't say much on cost, that's
    pretty far from me (as you know).

    My current approach gives me that -- MIPS, size, etc.  But, the cost
    of transferring parts of the image (without adding a specific mechanism)
    is a "shared page" (DSM).  So, host (on node A) references part of
    node *B*s frame buffer and the page (on B) containing that memory
    address gets shipped back to node A and mapped into A's memory.

    I assume A and B are connected over Ethernet via tcp/ip? Or are they
    just two cores on the same chip or something?

    If A had direct access to the camera on B, then they'd be the same node,
    right? :>

    A maps a memory object that has been defined on B into A's
    memory space at a particular address (range). So, A *pretends*
    it has a local copy of the frame buffer.

    When A references ANY address in that range, a fault causes a
    request to be sent to B for a copy of the datum referenced.

    Of course, it would be silly to just return that one value;
    any subsequent references would also have to cause a page fault
    (because you couldn't allow A to reference an adjacent address
    for which data is not yet available, locally). So, B ships a copy
    of the entire page over to A and A instantiates that copy as a local
    page marked as "present" (so, subsequent references incur no faults).

    The application has fine-grained control over the *policy* that is
    used, here. So, if he knows that an access to address N will then
    be followed by N+3000r16, he can arrange for the page containing
    N *and* N+3000r16 to both be shipped over (so there isn't a
    fault triggered when the N+3000r16 reference occurs).

    [There are also provisions that allow multiple *writers*
    to shared regions so the memory behaves, functionally (but
    not temporally!) like LOCAL "shared memory"]

    But, that's a shitload of overhead if you want to treat
    the remote frame buffer AS IF it was local.

    But, transport delays make this unsuitable for real-time work;
    a megabyte of imagery would require 100ms to transfer, in "raw"
    form.  (I could encode it on the originating host; transfer it
    and then decode it on the receiving host -- at the expense of MIPS.
    This is how I "record" video without saturating the network)

    100 ms latency would be an issue if you face say A-Train
    (for you and the rest who have not watched "The Boys" - he is a
    super (a "sup" as they have it) who can run fast enough to not
    be seen by normal humans...) :-).

    The bigger problem is throughput. You don't care if all of your
    references are skewed 100ms in time; add enough buffering to
    ensure every frame remains available for that full 100ms and
    just expect the results to be "late".

    The problem happens when there's another frame coming before
    you've finished processing the current frame. And so on.

    So, while it is "slick" and eliminates a lot of explicit remote
    access code being exposed to the algorithm (e.g., "get me location
    X,Y of the remote frame buffer"), it's just not practical for the
    application.

    So, you (B) want to "abstract" the salient features of the image
    while it is on B and then transfer just those to A.  *Use*
    them, on A, and then move on to the next set of features
    (that B has computed while A was busy chewing on the last set)

    Or, give A direct access to the native data (without A having
    to capture video streams from each of the cameras that it wants
    to potentially examine)


    In RFB, the server can - and should - decide which parts of the
    framebuffer have changed and send across only them. Which works
    fine for computer generated images - plenty of single colour areas,

    Yes, but if the receiving end has no interest in those areas
    of the image, then you're just wasting effort (bandwidth)
    transfering them -- esp if the areas of interest will need
    that bandwidth!

    no noise etc.  In your case you might have to resort to jpeg
    the image downgrading its quality so "small" changes would
    disappear, I think those who write video encoders do something
    like that (for my vnc server lossless  RLE was plenty, but it it
    is not very efficient when the screen is some real life photo,
    obviously).

    I think the solution is to share abstractions. Design the
    algorithms so they can address partial "objects of interest"
    and report on those. Then, coordinate those partial results
    to come up with a unified concept of what's happening in
    the observed scene.

    But, this is a fair bit harder than just trying to look at
    a unified frame buffer and detect objects/motion!

    OTOH, if it was easy, it would be boring ("What's to be learned
    from doing something that's already been done?")

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to All on Sun Jan 1 20:59:12 2023
    On Sun, 1 Jan 2023 14:28:20 -0700, Don Y <blockedofcourse@foo.invalid>
    wrote:

    On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:

    The bigger problem is throughput. You don't care if all of your
    references are skewed 100ms in time; add enough buffering to
    ensure every frame remains available for that full 100ms and
    just expect the results to be "late".

    The problem happens when there's another frame coming before
    you've finished processing the current frame. And so on.

    So, while it is "slick" and eliminates a lot of explicit remote
    access code being exposed to the algorithm (e.g., "get me location
    X,Y of the remote frame buffer"), it's just not practical for the >application.

    All cameras have a free-run "demand" mode in which (between resets)
    the CCD is always accumulating - waiting to be read out. But many
    also have a mode in they do nothing until commanded.

    In any event, without command the controller will just service the CCD
    - it won't transfer the image anywhere unless asked.

    Many "smart" cameras can do ersatz stream compression by double
    buffering internally and performing image subtraction to remove
    unchanging (to some threshold) images. In a motion activated
    environment this can greatly cut down on the number of images YOU have
    to process.

    Better ones also offer a suite of onboard image processing functions:
    motion detection, contrast expansion, thresholding, line finding ...
    now even some offer pattern object recognition. If the functions they
    provide are useful, it can pay to take advantage of them.

    I know you are (thinking of) designing your own ... you should maybe
    think hard about what smarts you want onboard.



    In RFB, the server can - and should - decide which parts of the
    framebuffer have changed and send across only them. Which works
    fine for computer generated images - plenty of single colour areas,

    Yes, but if the receiving end has no interest in those areas
    of the image, then you're just wasting effort (bandwidth)
    transfering them -- esp if the areas of interest will need
    that bandwidth!

    That's true, but protocols like VNC's "copyrect" encoding essentially
    divide the image into a large checkerboard, and transfers only those
    "squares" where the underlying image has changed. What is considered a
    "change" could be further limited on the sending side by
    pre-processing: erosion and/or thresholding.

    The biggest problem always is how much extra buffering you need for as-yet-unprocessed images in the stream - while you're working on one
    thing, you easily can lose something else.


    no noise etc.  In your case you might have to resort to jpeg
    the image downgrading its quality so "small" changes would
    disappear, I think those who write video encoders do something
    like that (for my vnc server lossless  RLE was plenty, but it it
    is not very efficient when the screen is some real life photo,
    obviously).

    And RLE or copyrect can be combined further with lossless LZ
    compression.


    For really good results, wavelet compression is the best - it
    basically reduces the whole image to a set of equation coefficients,
    and you can preserve (or degrade) detail in the reconstructed image by
    altering how many coefficients are calculated from the original.

    But it is compute intensive: you really need a DSP or SIMD CPU to do
    it efficiently.


    I think the solution is to share abstractions. Design the
    algorithms so they can address partial "objects of interest"
    and report on those. Then, coordinate those partial results
    to come up with a unified concept of what's happening in
    the observed scene.

    But, this is a fair bit harder than just trying to look at
    a unified frame buffer and detect objects/motion!

    OTOH, if it was easy, it would be boring ("What's to be learned
    from doing something that's already been done?")

    As I said previously, smart cameras can do things like motion
    detection onboard, and report the AOI along with the image.


    George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Jan 1 23:50:49 2023
    On 1/1/2023 2:53 PM, Dimiter_Popoff wrote:
    On 1/1/2023 23:28, Don Y wrote:
    On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:
    ....
    ....
    In RFB, the server can - and should - decide which parts of the
    framebuffer have changed and send across only them. Which works
    fine for computer generated images - plenty of single colour areas,

    Yes, but if the receiving end has no interest in those areas
    of the image, then you're just wasting effort (bandwidth)
    transfering them -- esp if the areas of interest will need
    that bandwidth!

    But nothing is stopping the receiving end to request a particular area
    and the sending side sending just the changed parts of it.
    I am not suggesting you use RFB, I use it just as an example.

    I'm trying to hide the fact that there are bits of code (and I/O's)
    operating on different processors. I.e., a single processor would
    <somehow> have all of these images accessible to it. I'd like to
    maintain that illusion by hiding any transfers/mapping "under the
    surface" so the main algorithm can concentrate on the problem at
    hand, and not the implementation platform.

    It's like having virtual memory instead of forcing the application
    to drag in "overlays" due to hardware constraints on the address
    space. Or, having to push data out to disk when the amount of
    local memory is exceeded.

    These are just nuisances that interfere with the design of the algorithm.

    But, I may be able to use the shared memory mechanism as a way to "hint"
    to the OS as to which parts of the image are of interest to the remote
    node. Then, arrange for the pager to only send the differences over
    the wire -- counting on the local pager to instantiate a duplicate
    copy of the previous image (which is likely still available on that
    host).

    I.e., bastardize CoW for the purpose.

    no noise etc.  In your case you might have to resort to jpeg
    the image downgrading its quality so "small" changes would
    disappear, I think those who write video encoders do something
    like that (for my vnc server lossless  RLE was plenty, but it it
    is not very efficient when the screen is some real life photo,
    obviously).

    I think the solution is to share abstractions.  Design the
    algorithms so they can address partial "objects of interest"
    and report on those.  Then, coordinate those partial results
    to come up with a unified concept of what's happening in
    the observed scene.

    Well I think this is the way to go, too. This implies enough
    CPU horsepowers per camera which nowadays might be practical.

    I've got enough for a single camera. But, if I had to handle a
    multicamera *scene* (completely) with that processor, I'd be
    running out of MIPS.

    But, this is a fair bit harder than just trying to look at
    a unified frame buffer and detect objects/motion!

    Well yes but you lose the framebuffer transfer problem, no
    need to do your "remote virtual machine" for that etc.

    The question will be where the effort pays off quickest. E.g.,
    dropping the effective frame rate may make simpler solutions
    more practical.

    OTOH, if it was easy, it would be boring ("What's to be learned
    from doing something that's already been done?")

    Not only that; if it were easy everyone else would be doing it :-).

    I have no problem letting other people invent wheels that I
    can freely use. Much of my current architecture is pieced
    together from ideas gleaned over the past several decades
    (admittedly, on bigger iron than "MCUs"). It's only now
    that it is economically feasible for me to exploit some of these
    technologies.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to George Neuner on Mon Jan 2 00:27:56 2023
    On 1/1/2023 6:59 PM, George Neuner wrote:
    On Sun, 1 Jan 2023 14:28:20 -0700, Don Y <blockedofcourse@foo.invalid>
    wrote:

    On 1/1/2023 7:04 AM, Dimiter_Popoff wrote:

    The bigger problem is throughput. You don't care if all of your
    references are skewed 100ms in time; add enough buffering to
    ensure every frame remains available for that full 100ms and
    just expect the results to be "late".

    The problem happens when there's another frame coming before
    you've finished processing the current frame. And so on.

    So, while it is "slick" and eliminates a lot of explicit remote
    access code being exposed to the algorithm (e.g., "get me location
    X,Y of the remote frame buffer"), it's just not practical for the
    application.

    All cameras have a free-run "demand" mode in which (between resets)
    the CCD is always accumulating - waiting to be read out. But many
    also have a mode in they do nothing until commanded.

    The implication in my comments was that you would want to target a
    certain frame rate as a performance metric. Whether that has to
    be the cameras nominal rate or something slower than that would
    likely depend on the scene being analyzed.

    In any event, without command the controller will just service the CCD
    - it won't transfer the image anywhere unless asked.

    Many "smart" cameras can do ersatz stream compression by double
    buffering internally and performing image subtraction to remove
    unchanging (to some threshold) images. In a motion activated
    environment this can greatly cut down on the number of images YOU have
    to process.

    Better ones also offer a suite of onboard image processing functions:
    motion detection, contrast expansion, thresholding, line finding ...
    now even some offer pattern object recognition. If the functions they provide are useful, it can pay to take advantage of them.

    I don't yet know what will be useful. So far, my algorithms have been two-dimensional versions of photo-interrupters. I don't care what I'm
    seeing, just that I'm seeing it in a certain place under certain
    conditions.

    Visually tracking targets will be considerably harder.

    Previously, I required the targets to wear a beacon that I could
    locate wirelessly. This works because it gives the user a means
    of interacting with the system audibly without having to clutter
    the space with utterances (and sort out what's intentional and
    what is extraneous).

    But, that only makes sense for folks using "personal audio".
    Anyone without such a device would be invisible to the system.

    Switching to vision will (?) let me allow anyone in the arena
    to interact/participate. And, can potentially let nonverbal
    users interact without having to wear a "transducer".

    I know you are (thinking of) designing your own ... you should maybe
    think hard about what smarts you want onboard.

    Thinking hard is easy. Knowing WHAT to think about is hard!

    In RFB, the server can - and should - decide which parts of the
    framebuffer have changed and send across only them. Which works
    fine for computer generated images - plenty of single colour areas,

    Yes, but if the receiving end has no interest in those areas
    of the image, then you're just wasting effort (bandwidth)
    transfering them -- esp if the areas of interest will need
    that bandwidth!

    That's true, but protocols like VNC's "copyrect" encoding essentially
    divide the image into a large checkerboard, and transfers only those "squares" where the underlying image has changed. What is considered a "change" could be further limited on the sending side by
    pre-processing: erosion and/or thresholding.

    I would assume you could adaptively size regions so you looked at the
    cost of sending the contents AND size information for multiple smaller
    regions vs. just the contents (size implied) for larger ones -- which
    may only contain a small amount of deltas.

    The biggest problem always is how much extra buffering you need for as-yet-unprocessed images in the stream - while you're working on one
    thing, you easily can lose something else.

    Yes. The "easy" approach is to treat it as HRT and plan on processing
    every frame in a single frame time. Latency can be large-ish -- as long
    as throughput is guaranteed.

    no noise etc.  In your case you might have to resort to jpeg
    the image downgrading its quality so "small" changes would
    disappear, I think those who write video encoders do something
    like that (for my vnc server lossless  RLE was plenty, but it it
    is not very efficient when the screen is some real life photo,
    obviously).

    And RLE or copyrect can be combined further with lossless LZ
    compression.

    For really good results, wavelet compression is the best - it
    basically reduces the whole image to a set of equation coefficients,
    and you can preserve (or degrade) detail in the reconstructed image by altering how many coefficients are calculated from the original.

    But it is compute intensive: you really need a DSP or SIMD CPU to do
    it efficiently.

    Time spent compressing and decompressing equates with time on the
    wire, transferring UNcompressed data. There's a point at which it's
    probably smarter to just use more network bandwidth than waste
    MIPS trying to conserve it.

    My immediate concern is making a "wise" (not necessarily "optimal")
    HARDWARE implementation decision so I can have some boards cut.
    And, start planning the sort of capabilities/feature that I can develop
    with those facilities.

    I think the solution is to share abstractions. Design the
    algorithms so they can address partial "objects of interest"
    and report on those. Then, coordinate those partial results
    to come up with a unified concept of what's happening in
    the observed scene.

    But, this is a fair bit harder than just trying to look at
    a unified frame buffer and detect objects/motion!

    OTOH, if it was easy, it would be boring ("What's to be learned
    from doing something that's already been done?")

    As I said previously, smart cameras can do things like motion
    detection onboard, and report the AOI along with the image.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)