• double bracket integer index in pandas; Is this a legal syntax

    From Artie Ziff@21:1/5 to All on Wed May 3 03:41:26 2023
    Hello,

    I am hope that pandas questions are OK here.

    In a panda lecture, I did not get the expected result.

    I tried this on two different platforms
    (old macOS distro and up-to-date Ubuntu Linux distro, 22.04)

    The Linux distro has:
    python 3.10.11
    pandas 1.5.2
    conda 23.3.1

    Is this double bracket form, df[[1]], deprecated... maybe?

    There is data in a dataframe, df.

    subset = df[[1]]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File
    "/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/frame.py",
    line 3811, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
    File "/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6113, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
    File "/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6173, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
    KeyError: "None of [Int64Index([1], dtype='int64')] are in the [columns]"

    What could be making this fail?

    Thank you!
    az

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Artie Ziff on Wed May 3 21:37:24 2023
    On 03May2023 03:41, Artie Ziff <artie.ziff@gmail.com> wrote:
    I am hope that pandas questions are OK here.

    There are some pandas users here.

    In a panda lecture, I did not get the expected result.
    I tried this on two different platforms
    (old macOS distro and up-to-date Ubuntu Linux distro, 22.04)

    The Linux distro has:
    python 3.10.11
    pandas 1.5.2
    conda 23.3.1

    Is this double bracket form, df[[1]], deprecated... maybe?

    There is data in a dataframe, df.

    subset = df[[1]]

    Whether this works depends on the contents of the dataframe. You're
    supplying this index:

    [1]

    which is a list of ints (with just one int).

    Have a look at this page: https://pandas.pydata.org/docs/user_guide/indexing.html

    If you suppply a list, it expects a list of labels. Is 1 a valid label
    for your particular dataframe?

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Artie Ziff@21:1/5 to All on Wed May 3 17:52:00 2023
    I agree with your analysis, Cameron.

    The code came from a video course, "Pandas Data Analysis with Python Fundamentals" by Daniel Chen.

    I am curious why the author may have said this. To avoid attaching
    screenshots, I'll describe this section of the content. Perhaps someone can say, "oh that's how it used to work"... haha

    D.CHEN:
    "You can also subset the columns by number. If we wanted to get the first column from our data set, we would use zero":

    df = pandas.read_csv('./data/gapminder.tsv', sep='\t')
    subset = df[[0]]
    print(subset.head())
    country
    0 Afghanistan
    1 Afghanistan
    2 Afghanistan
    3 Afghanistan
    4 Afghanistan

    Data for the course:
    https://github.com/chendaniely/pandas_for_everyone.git

    "df[[0]]" is being described to the course student as selecting the first column of data. :-)

    I'll study that link.
    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Artie Ziff on Thu May 4 12:29:45 2023
    On 03May2023 17:52, Artie Ziff <artie.ziff@gmail.com> wrote:
    The code came from a video course, "Pandas Data Analysis with Python >Fundamentals" by Daniel Chen.

    I am curious why the author may have said this. To avoid attaching >screenshots, I'll describe this section of the content. Perhaps someone can >say, "oh that's how it used to work"... haha

    Unlikely; Python indices (and by implication Pandas indices) have
    counted from 0 since forever. I suspect just a typo/braino.

    D.CHEN:
    "You can also subset the columns by number. If we wanted to get the first >column from our data set, we would use zero":

    df = pandas.read_csv('./data/gapminder.tsv', sep='\t')
    subset = df[[0]]
    print(subset.head())
    country
    0 Afghanistan
    1 Afghanistan
    2 Afghanistan
    3 Afghanistan
    4 Afghanistan

    Data for the course:
    https://github.com/chendaniely/pandas_for_everyone.git

    "df[[0]]" is being described to the course student as selecting the first >column of data. :-)

    Well, I would say it makes a new dataframe with just the first column.

    So:

    df[ 0 ] # spaces for clarity

    would (probably, need to check) return the Series for the first column.
    versus:

    df[ [0] ] # spaces for clarity

    makes a new dataframe with only the first column.

    A dataframe can be thought of as an array of Series (one per column).

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)