Get a list from Pandas DataFrame column headers – Dev

The best answers to the question “Get a list from Pandas DataFrame column headers” in the category Dev.

QUESTION:

I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won’t know how many columns there will be or what they will be called.

For example, if I’m given a DataFrame like this:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would get a list like this:

>>> header_list
['y', 'gdp', 'cap']

ANSWER:

There is a built-in method which is the most performant:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function .tolist to return a list.

If performance is not as important to you, Index objects define a .tolist() method that you can call directly:

my_dataframe.columns.tolist()

The difference in performance is obvious:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For those who hate typing, you can just call list on df, as so:

list(df)

ANSWER:

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use (as shown in Ed Chum’s answer):

list(my_dataframe)

ANSWER:

It gets even simpler (by Pandas 0.16.0):

df.columns.tolist()

will give you the column names in a nice list.

ANSWER:

I did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)