Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

double bracket integer index in pandas; Is this a legal syntax

From Artie Ziff@21:1/5 to All on Wed May 3 03:41:26 2023

Hello,

I am hope that pandas questions are OK here.

In a panda lecture, I did not get the expected result.

I tried this on two different platforms
(old macOS distro and up-to-date Ubuntu Linux distro, 22.04)

The Linux distro has:
python 3.10.11
pandas 1.5.2
conda 23.3.1

Is this double bracket form, df[[1]], deprecated... maybe?

There is data in a dataframe, df.

subset = df[[1]]

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/frame.py",
line 3811, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6113, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/dks/anaconda3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6173, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([1], dtype='int64')] are in the [columns]"

What could be making this fail?

Thank you!
az

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cameron Simpson@21:1/5 to Artie Ziff on Wed May 3 21:37:24 2023

On 03May2023 03:41, Artie Ziff <artie.ziff@gmail.com> wrote:

I am hope that pandas questions are OK here.

There are some pandas users here.

In a panda lecture, I did not get the expected result.
I tried this on two different platforms
(old macOS distro and up-to-date Ubuntu Linux distro, 22.04)

The Linux distro has:
python 3.10.11
pandas 1.5.2
conda 23.3.1

Is this double bracket form, df[[1]], deprecated... maybe?

There is data in a dataframe, df.

subset = df[[1]]

Whether this works depends on the contents of the dataframe. You're
supplying this index:

[1]

which is a list of ints (with just one int).

Have a look at this page: https://pandas.pydata.org/docs/user_guide/indexing.html

If you suppply a list, it expects a list of labels. Is 1 a valid label
for your particular dataframe?

Cheers,
Cameron Simpson <cs@cskk.id.au>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Artie Ziff@21:1/5 to All on Wed May 3 17:52:00 2023

I agree with your analysis, Cameron.

The code came from a video course, "Pandas Data Analysis with Python Fundamentals" by Daniel Chen.

I am curious why the author may have said this. To avoid attaching
screenshots, I'll describe this section of the content. Perhaps someone can say, "oh that's how it used to work"... haha

D.CHEN:
"You can also subset the columns by number. If we wanted to get the first column from our data set, we would use zero":

df = pandas.read_csv('./data/gapminder.tsv', sep='\t')

subset = df[[0]]
print(subset.head())

country
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan

Data for the course:
https://github.com/chendaniely/pandas_for_everyone.git

"df[[0]]" is being described to the course student as selecting the first column of data. :-)

I'll study that link.
Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cameron Simpson@21:1/5 to Artie Ziff on Thu May 4 12:29:45 2023

On 03May2023 17:52, Artie Ziff <artie.ziff@gmail.com> wrote:

The code came from a video course, "Pandas Data Analysis with Python >Fundamentals" by Daniel Chen.

I am curious why the author may have said this. To avoid attaching >screenshots, I'll describe this section of the content. Perhaps someone can >say, "oh that's how it used to work"... haha

Unlikely; Python indices (and by implication Pandas indices) have
counted from 0 since forever. I suspect just a typo/braino.

D.CHEN:
"You can also subset the columns by number. If we wanted to get the first >column from our data set, we would use zero":

df = pandas.read_csv('./data/gapminder.tsv', sep='\t')

subset = df[[0]]
print(subset.head())

country
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan

Data for the course:
https://github.com/chendaniely/pandas_for_everyone.git

"df[[0]]" is being described to the course student as selecting the first >column of data. :-)

Well, I would say it makes a new dataframe with just the first column.

So:

df[ 0 ] # spaces for clarity

would (probably, need to check) return the Series for the first column.
versus:

df[ [0] ] # spaces for clarity

makes a new dataframe with only the first column.

A dataframe can be thought of as an array of Series (one per column).

Cheers,
Cameron Simpson <cs@cskk.id.au>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	58:43:28
Calls:	6,712
Files:	12,243
Messages:	5,355,631