• Re: Extract the values belong to a specific key.

    From Ed Morton@21:1/5 to hongy...@gmail.com on Fri Dec 10 09:03:49 2021
    On 12/10/2021 8:51 AM, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],
    "key2":[
    "val21",
    "val22",
    ...
    ],
    "key3":[
    "val31",
    "val32",
    ...
    ],
    ...

    I want to extract the values belong to a specific key. Any hints?

    Regards,
    HZ


    Determine what language that is then google for a tool that understands whatever language that is and then use that tool to operate on the file.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to hongy...@gmail.com on Fri Dec 10 16:13:56 2021
    On 10.12.2021 15:51, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],
    "key2":[
    "val21",
    "val22",
    ...
    ],
    "key3":[
    "val31",
    "val32",
    ...
    ],
    ...

    I want to extract the values belong to a specific key. Any hints?

    To get the whole key-block use (for example for "key2")

    awk '/"key2":/,/],/'

    would produce

    "key2":[
    "val21",
    "val22",
    ...
    ],

    Adjustments may be necessary if you want something different.
    May fail depending on actual data.

    Awk is flexible enough to, e.g., omit the first and last line of the
    key block, or extract the valXY substrings from the data lines.

    Left as homework.

    Janis


    Regards,
    HZ


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to All on Fri Dec 10 06:51:03 2021
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],
    "key2":[
    "val21",
    "val22",
    ...
    ],
    "key3":[
    "val31",
    "val32",
    ...
    ],
    ...

    I want to extract the values belong to a specific key. Any hints?

    Regards,
    HZ

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to hongy...@gmail.com on Fri Dec 10 10:26:33 2021
    On 12/10/21 7:51 AM, hongy...@gmail.com wrote:
    I've a file which has the following content:

    Been there.
    Done that.
    Will do it again.

    I want to extract the values belong to a specific key. Any hints?

    If I don't need to do anything other than extract the section, I'd
    probably use (GNU) sed.

    sed -n '/"key#":/,/\],/p'

    (From memory, untested.)

    You may need to tweak the pattern matches, especially if the closing
    "]," matches multiple parts of the structure. In that case, bound it,
    "^ ],$" type thing.

    This is probably similar to Janis's awk recommendation. I find sed to
    be somewhat simpler (at least in my head) for this than awk.

    There may also be tools for working with this format directly. It's reminiscent of JSON and YAML, both of which have dedicated tools to make
    this a lot ... something ... let's go with precise.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Janis Papanagnou on Fri Dec 10 11:10:05 2021
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
    On 10.12.2021 15:51, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],
    "key2":[
    "val21",
    "val22",
    ...
    ],
    "key3":[
    "val31",
    "val32",
    ...
    ],
    ...

    I want to extract the values belong to a specific key. Any hints?

    To get the whole key-block use (for example for "key2")

    awk '/"key2":/,/],/'

    would produce

    "key2":[
    "val21",
    "val22",
    ...
    ],

    Adjustments may be necessary if you want something different.
    May fail depending on actual data.

    Awk is flexible enough to, e.g., omit the first and last line of the
    key block, or extract the valXY substrings from the data lines.

    Left as homework.

    Both sed and awk are line-oriented. The file looks very much like JSON.
    If that's what it is, then another file might be valid JSON but with a completely different line structure.

    We don't have enough information to suggest a robust solution. I
    suspect the real answer is "find a tool that handles JSON", but we don't
    know for sure that it's JSON.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tavis Ormandy@21:1/5 to hongy...@gmail.com on Fri Dec 10 22:43:24 2021
    On 2021-12-10, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],

    I want to extract the values belong to a specific key. Any hints?


    This looks like JSON, there's a tool called jq that makes querying
    it easy. Try something like `jq -r '.key2[]' < yourfile`.

    Tavis.


    --
    _o) $ lynx lock.cmpxchg8b.com
    /\\ _o) _o) $ finger taviso@sdf.org
    _\_V _( ) _( ) @taviso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Tavis Ormandy on Fri Dec 10 15:52:18 2021
    On Saturday, December 11, 2021 at 6:43:30 AM UTC+8, Tavis Ormandy wrote:
    On 2021-12-10, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],

    I want to extract the values belong to a specific key. Any hints?

    This looks like JSON, there's a tool called jq that makes querying
    it easy. Try something like `jq -r '.key2[]' < yourfile`.

    In fact, I am processing the "live/packages_choice.json" file coming from the following image:
    http://cdimage.deepin.com/releases/20.3/deepin-desktop-community-20.3-amd64.iso

    The original file, although named as a json file, lacks matching `{` at the very beginning, which makes the jq tool fails. If I add the missing `{` to the first line of the file, then jq will work, say, for `dde` key:

    $ jq -r '.dde[]' < packages_choice.json
    deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    But without adding the missing `{`, the following error will be triggered:

    $ jq -r '.dde[]' < /mnt/live/packages_choice.json
    jq: error (at <stdin>:2): Cannot index string with string "dde"
    parse error: Expected string key before ':' at line 2, column 10


    However, both sed and awk can use this poorly formatted json file to complete the following task:

    $ awk '/"dde":/,/\]/' /mnt/live/packages_choice.json | sed '1d;$d' | awk -F\" '{print $2}'
    deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    $ sed -n '/"dde":/,/\]/p' /mnt/live/packages_choice.json | sed '1d;$d' | awk -F\" '{print $2}'
    deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    Regards,
    HZ

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to hongy...@gmail.com on Fri Dec 10 18:43:42 2021
    On 12/10/2021 5:52 PM, hongy...@gmail.com wrote:
    On Saturday, December 11, 2021 at 6:43:30 AM UTC+8, Tavis Ormandy wrote:
    On 2021-12-10, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],

    I want to extract the values belong to a specific key. Any hints?

    This looks like JSON, there's a tool called jq that makes querying
    it easy. Try something like `jq -r '.key2[]' < yourfile`.

    In fact, I am processing the "live/packages_choice.json" file coming from the following image:
    http://cdimage.deepin.com/releases/20.3/deepin-desktop-community-20.3-amd64.iso

    The original file, although named as a json file, lacks matching `{` at the very beginning, which makes the jq tool fails. If I add the missing `{` to the first line of the file, then jq will work, say, for `dde` key:

    $ jq -r '.dde[]' < packages_choice.json

    Then why would you look for anything beyond:

    { echo '{'; cat packages_choice.json; } | jq -r '.dde[]'

    or similar? Anything you come up with using sed or awk to parse the JSON
    will be far less robust than using `jq` and it doesn't seem necessary if
    all you need to do is add a leading `{` to the input and then `jq` works.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to hongy...@gmail.com on Fri Dec 10 16:37:13 2021
    "hongy...@gmail.com" <hongyi.zhao@gmail.com> writes:
    On Saturday, December 11, 2021 at 6:43:30 AM UTC+8, Tavis Ormandy wrote:
    On 2021-12-10, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],

    I want to extract the values belong to a specific key. Any hints?

    This looks like JSON, there's a tool called jq that makes querying
    it easy. Try something like `jq -r '.key2[]' < yourfile`.

    In fact, I am processing the "live/packages_choice.json" file coming from the following image:
    http://cdimage.deepin.com/releases/20.3/deepin-desktop-community-20.3-amd64.iso

    (Deepin is apparently a Chinese Linux distribution.)

    If you had mentioned from the start that file's name ends in ".json", it
    would have saved a lot of time. Don't hide information like that.

    The original file, although named as a json file, lacks matching `{`
    at the very beginning, which makes the jq tool fails. If I add the
    missing `{` to the first line of the file, then jq will work, say, for
    `dde` key:

    So it's not valid JSON, while the file name implies that it is.
    Perhaps consider reporting this problem to the site that provided it.
    (Maybe there's some valid reason for it, though.)

    $ jq -r '.dde[]' < packages_choice.json
    [...]

    But without adding the missing `{`, the following error will be triggered:

    $ jq -r '.dde[]' < /mnt/live/packages_choice.json
    jq: error (at <stdin>:2): Cannot index string with string "dde"
    parse error: Expected string key before ':' at line 2, column 10

    I'd say the best way to read information from the file is to create a
    copy with the required '{' and '}' added and then use jq. Or just
    something like:

    ( echo '{' ; cat packages_choice.json ; echo '}' ) | jq ...

    You've already made jq work.

    If your only requirement is to parse that one specific version of that
    one specific file, do whatever works. If you might need to extract
    information from a different or future version, any sed/awk solution is
    likely to break (since line breaks can mostly be inserted or removed arbitrarily in JSON).

    However, both sed and awk can use this poorly formatted json file to
    complete the following task:

    $ awk '/"dde":/,/\]/' /mnt/live/packages_choice.json | sed '1d;$d' | awk -F\" '{print $2}'
    [snip]
    $ sed -n '/"dde":/,/\]/p' /mnt/live/packages_choice.json | sed '1d;$d' | awk -F\" '{print $2}'
    [snip]

    You could almost certainly do that with a single awk command. But
    again, awk and sed are probably the wrong tools.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to Ed Morton on Fri Dec 10 16:56:25 2021
    On Saturday, December 11, 2021 at 8:43:48 AM UTC+8, Ed Morton wrote:
    On 12/10/2021 5:52 PM, hongy...@gmail.com wrote:
    On Saturday, December 11, 2021 at 6:43:30 AM UTC+8, Tavis Ormandy wrote:
    On 2021-12-10, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],

    I want to extract the values belong to a specific key. Any hints?

    This looks like JSON, there's a tool called jq that makes querying
    it easy. Try something like `jq -r '.key2[]' < yourfile`.

    In fact, I am processing the "live/packages_choice.json" file coming from the following image:
    http://cdimage.deepin.com/releases/20.3/deepin-desktop-community-20.3-amd64.iso

    The original file, although named as a json file, lacks matching `{` at the very beginning, which makes the jq tool fails. If I add the missing `{` to the first line of the file, then jq will work, say, for `dde` key:

    $ jq -r '.dde[]' < packages_choice.json
    Then why would you look for anything beyond:

    { echo '{'; cat packages_choice.json; } | jq -r '.dde[]'

    Yes. It works:

    $ { echo '{'; cat /mnt/live/packages_choice.json; } | jq -r '.dde[]' deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    or similar? Anything you come up with using sed or awk to parse the JSON
    will be far less robust than using `jq` and it doesn't seem necessary if
    all you need to do is add a leading `{` to the input and then `jq` works.

    It was only after I checked the file that I discovered the problem. This is a bug/error introduced by the image creator and should be fixed in a future version. So, I agree that in this case, `jq` is the right tool.

    HZ

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hongyi.zhao@gmail.com@21:1/5 to hongy...@gmail.com on Fri Dec 10 16:23:22 2021
    On Saturday, December 11, 2021 at 7:52:21 AM UTC+8, hongy...@gmail.com wrote:
    On Saturday, December 11, 2021 at 6:43:30 AM UTC+8, Tavis Ormandy wrote:
    On 2021-12-10, hongy...@gmail.com wrote:
    I've a file which has the following content:

    "key1":[
    "val11",
    "val12",
    ...
    ],

    I want to extract the values belong to a specific key. Any hints?

    This looks like JSON, there's a tool called jq that makes querying
    it easy. Try something like `jq -r '.key2[]' < yourfile`.
    In fact, I am processing the "live/packages_choice.json" file coming from the following image:
    http://cdimage.deepin.com/releases/20.3/deepin-desktop-community-20.3-amd64.iso

    The original file, although named as a json file, lacks matching `{` at the very beginning, which makes the jq tool fails. If I add the missing `{` to the first line of the file, then jq will work, say, for `dde` key:

    $ jq -r '.dde[]' < packages_choice.json
    deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    But without adding the missing `{`, the following error will be triggered:

    $ jq -r '.dde[]' < /mnt/live/packages_choice.json
    jq: error (at <stdin>:2): Cannot index string with string "dde"
    parse error: Expected string key before ':' at line 2, column 10


    However, both sed and awk can use this poorly formatted json file to complete the following task:

    $ awk '/"dde":/,/\]/' /mnt/live/packages_choice.json | sed '1d;$d' | awk -F\" '{print $2}'
    deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    $ sed -n '/"dde":/,/\]/p' /mnt/live/packages_choice.json | sed '1d;$d' | awk -F\" '{print $2}'
    deepin-desktop-server
    deepin-default-settings
    dde-desktop
    dde-dock
    dde-launcher
    dde-control-center
    startdde
    dde-session-ui
    deepin-artwork
    dde-file-manager
    dde-qt5integration
    plymouth-theme-deepin-logo
    deepin-wallpapers
    fonts-noto
    dde-introduction
    dde-kwin
    deepin-screensaver
    dde
    dde-calendar
    network-manager-integration-plugins
    deepin-terminal

    In fact, the final format I want should look like this:

    $ sudo mount deepin-desktop-community-20.3-amd64.iso /mnt
    $ sed -n '/"dde":/,/\]/p' /mnt/live/packages_choice.json |
    sed '1d;$d' |
    awk -F\" '{print $2}' |
    egrep -v '^(dde|plymouth-.*|dde-session-.*|network-manager-.*)$' |
    awk '{line[NR]=$0} END {for (i=1;i<=length(line);i++) {if (i == 1 ) {print "env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends "line[i]" \\"} else { if (i == length(line)) { print line[i]} else { print line[i]" \\"}}}}'
    env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends deepin-desktop-server \
    deepin-default-settings \
    dde-desktop \
    dde-dock \
    dde-launcher \
    dde-control-center \
    startdde \
    deepin-artwork \
    dde-file-manager \
    dde-qt5integration \
    deepin-wallpapers \
    fonts-noto \
    dde-introduction \
    dde-kwin \
    deepin-screensaver \
    dde-calendar \
    deepin-terminal


    Although the script code written by me above look very ugly and crappy, they do represent the style of the data I ultimately need to extract and obtain. It's currently used by the Dockerfile here [1].

    [1] https://github.com/hongyi-zhao/dockerfile

    HZ

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)