4. Data Manipulation#
4.1. Learning objectives#
Understand DataFrames
Manipulate, transform and analyse DataFrames
4.2. Introduction#
Dataframes, as we saw the previous section, is a tabular data format, with columns and rows. Columns are variable names and rows are individual observations. Just like other data structures dicussed in previous chapters, a DataFrame handles different data types:
int64: whole numbers (e.g age = 30)
float64; numbers with decimals (e.g. height = 1.9, temperature = 33.0)
datetime64: date and time
category: catogories or factors (e.g. gender = male, female; race = black, white, …)
bool: True or False
object: string
According to Hadley Wickman, a DataFrame should be in a tidy format:
Each variable is a column; each column is a variable.
Each observation is row; each row is an observation.
Each value is a cell; each cell is a single value.
The pandas package provide a toolbox of working analysing, querying and transforming DataFrames. We will continue working with the `gapminder dataset. First load pandas and read the data, saving it as “gapminder”:
import pandas as pd
gapminder = pd.read_csv("https://raw.githubusercontent.com/aubreympungose/data-science-course/main/weeks/data/gapminder.csv")
Let’s examine the data by looking at few observations:
gapminder.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 |
You can check how many variables are in the data and what type of data:
gapminder.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 country 1704 non-null object
1 continent 1704 non-null object
2 year 1704 non-null int64
3 lifeExp 1704 non-null float64
4 pop 1704 non-null int64
5 gdpPercap 1704 non-null float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB
You can see that gapminder has 6 columns and 1704 rows. The columns in the dataset are:
country: Simply the country
continent: Continent
year: The year data was collected
lifeExp: Life expectancy of a country in year
pop: Population of the country in a year
gdpPercap: Gross Domestic Product of a country in a year
You can also print the columns in the dataset by using .columns attribute.
gapminder.columns
Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
You can get the number of rows by using .index attribute:
gapminder.index
RangeIndex(start=0, stop=1704, step=1)
4.3. MANIPULATING COLUMNS#
4.3.1. Access/Select columns#
We may be interested in some of the columns in the gampminder dataframe, let say, for example, “country” only. You can select a specific column by taking the name of the DataFrame, followed by square brackets ([]) and passing the name of the “column” inside square brackets:
gapminder["country"]
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
...
1699 Zimbabwe
1700 Zimbabwe
1701 Zimbabwe
1702 Zimbabwe
1703 Zimbabwe
Name: country, Length: 1704, dtype: object
To Access multiple, columns you follow the same method as above, except that you wrap column names list ([]), so you end up with double sqaure brackets. Let’s say we want to select “country”, “pop”, and “year” columns:
gapminder[["country", "pop", "year"]]
| country | pop | year | |
|---|---|---|---|
| 0 | Afghanistan | 8425333 | 1952 |
| 1 | Afghanistan | 9240934 | 1957 |
| 2 | Afghanistan | 10267083 | 1962 |
| 3 | Afghanistan | 11537966 | 1967 |
| 4 | Afghanistan | 13079460 | 1972 |
| ... | ... | ... | ... |
| 1699 | Zimbabwe | 9216418 | 1987 |
| 1700 | Zimbabwe | 10704340 | 1992 |
| 1701 | Zimbabwe | 11404948 | 1997 |
| 1702 | Zimbabwe | 11926563 | 2002 |
| 1703 | Zimbabwe | 12311143 | 2007 |
1704 rows × 3 columns
You can alse access columns by their index using .iloc() function. Remember that an index in Python starts at 0, for example, to access the first column
gapminder.iloc[:, 0]
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
...
1699 Zimbabwe
1700 Zimbabwe
1701 Zimbabwe
1702 Zimbabwe
1703 Zimbabwe
Name: country, Length: 1704, dtype: object
In the above code, the colon (:) specify that we want to retun all rows; you will more abouth this when selecting by both columns and rows.
To access multiple columns, you pass a list of indexes of columns. For example, select the 1st, 3rd and 5th columns:
gapminder.iloc[:, [0, 2, 4]]
| country | year | pop | |
|---|---|---|---|
| 0 | Afghanistan | 1952 | 8425333 |
| 1 | Afghanistan | 1957 | 9240934 |
| 2 | Afghanistan | 1962 | 10267083 |
| 3 | Afghanistan | 1967 | 11537966 |
| 4 | Afghanistan | 1972 | 13079460 |
| ... | ... | ... | ... |
| 1699 | Zimbabwe | 1987 | 9216418 |
| 1700 | Zimbabwe | 1992 | 10704340 |
| 1701 | Zimbabwe | 1997 | 11404948 |
| 1702 | Zimbabwe | 2002 | 11926563 |
| 1703 | Zimbabwe | 2007 | 12311143 |
1704 rows × 3 columns
Suppose you want access a range of of columns, for example, from the 1st to the 3rd column:
gapminder.iloc[:, 0:3]
| country | continent | year | |
|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 |
| 1 | Afghanistan | Asia | 1957 |
| 2 | Afghanistan | Asia | 1962 |
| 3 | Afghanistan | Asia | 1967 |
| 4 | Afghanistan | Asia | 1972 |
| ... | ... | ... | ... |
| 1699 | Zimbabwe | Africa | 1987 |
| 1700 | Zimbabwe | Africa | 1992 |
| 1701 | Zimbabwe | Africa | 1997 |
| 1702 | Zimbabwe | Africa | 2002 |
| 1703 | Zimbabwe | Africa | 2007 |
1704 rows × 3 columns
4.3.2. Rename columns#
You may need to change column names. For example, in the gapminder dataset, we may change the “pop” column to population. You will use the .rename() function, pass the column argument, and specify column names within the dictionary ({"old_name":"new_name"}). We will assign the results to the new DataFrame named gapminder_new:
gapminder_new = gapminder.rename(columns = {"pop":"population"})
gapminder_new.head()
| country | continent | year | lifeExp | population | gdpPercap | |
|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 |
The “pop” column has been changed to “population”.
You can also rename multiple columns; in this case, rename “lifeExp” and “gdpPercap”:
gapminder_new = gapminder.rename(columns = {"lifeExp":"life_expectancy", "gdpPercap":"gdp_pc"})
gapminder_new.head()
| country | continent | year | life_expectancy | pop | gdp_pc | |
|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 |
4.3.3. Making new columns, Removing columns#
To create a new column you simply take the name of the dataframe, pass the string of the new columnn inside sqaure bracket, and assign values:
gapminder["new_column"] = 20
gapminder.head()
| country | continent | year | lifeExp | pop | gdpPercap | new_column | |
|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | 20 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | 20 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | 20 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | 20 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | 20 |
There is a new column added that has values of 20.
In many cases, we may need to create a new column resulting from manipulating other columns. For example, we know that the GDP per capita is calculated from dividing the total GDP by a population. If we want to create a new column of the total GDP, we need to multiply GDP per capita by the population size of the country:
gapminder["total_gdp"] = gapminder["gdpPercap"] * gapminder["pop"]
gapminder.head()
| country | continent | year | lifeExp | pop | gdpPercap | new_column | total_gdp | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | 20 | 6.567086e+09 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | 20 | 7.585449e+09 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | 20 | 8.758856e+09 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | 20 | 9.648014e+09 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | 20 | 9.678553e+09 |
We have a new column (“total_gdp”) added to the gapminder DataFrame.
You can remove columns by using `.drop() function:
gapminder = gapminder.drop(["total_gdp", "new_column"], axis = 1)
gapminder.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
| 1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 |
| 2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 |
| 3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 |
4.4. Manipulate Rows#
You can rows by any column by using .sort_values() function:
gapminder.sort_values(["lifeExp"])
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 1292 | Rwanda | Africa | 1992 | 23.599 | 7290203 | 737.068595 |
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
| 552 | Gambia | Africa | 1952 | 30.000 | 284320 | 485.230659 |
| 36 | Angola | Africa | 1952 | 30.015 | 4232095 | 3520.610273 |
| 1344 | Sierra Leone | Africa | 1952 | 30.331 | 2143249 | 879.787736 |
| ... | ... | ... | ... | ... | ... | ... |
| 1487 | Switzerland | Europe | 2007 | 81.701 | 7554661 | 37506.419070 |
| 695 | Iceland | Europe | 2007 | 81.757 | 301931 | 36180.789190 |
| 802 | Japan | Asia | 2002 | 82.000 | 127065841 | 28604.591900 |
| 671 | Hong Kong, China | Asia | 2007 | 82.208 | 6980412 | 39724.978670 |
| 803 | Japan | Asia | 2007 | 82.603 | 127467972 | 31656.068060 |
1704 rows × 6 columns
You can see that Rwanda had the lowest life expectancy.
To sort values into descending order (highest to lowest), you pass ascending = False:
gapminder.sort_values(["pop"], ascending = False)
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 299 | China | Asia | 2007 | 72.961 | 1318683096 | 4959.114854 |
| 298 | China | Asia | 2002 | 72.028 | 1280400000 | 3119.280896 |
| 297 | China | Asia | 1997 | 70.426 | 1230075000 | 2289.234136 |
| 296 | China | Asia | 1992 | 68.690 | 1164970000 | 1655.784158 |
| 707 | India | Asia | 2007 | 64.698 | 1110396331 | 2452.210407 |
| ... | ... | ... | ... | ... | ... | ... |
| 1299 | Sao Tome and Principe | Africa | 1967 | 54.425 | 70787 | 1384.840593 |
| 1298 | Sao Tome and Principe | Africa | 1962 | 51.893 | 65345 | 1071.551119 |
| 420 | Djibouti | Africa | 1952 | 34.812 | 63149 | 2669.529475 |
| 1297 | Sao Tome and Principe | Africa | 1957 | 48.945 | 61325 | 860.736903 |
| 1296 | Sao Tome and Principe | Africa | 1952 | 46.471 | 60011 | 879.583586 |
1704 rows × 6 columns
You can see that China had the highest population.
You may be interested in observations/rows that meet certain conditions. For example, in the gapminder data, you may need to keel all observations from the African continent, and save as a new DataFrame called gapminder_africa:
gapminder_africa = gapminder[gapminder["continent"] == "Africa"]
gapminder_africa.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 24 | Algeria | Africa | 1952 | 43.077 | 9279525 | 2449.008185 |
| 25 | Algeria | Africa | 1957 | 45.685 | 10270856 | 3013.976023 |
| 26 | Algeria | Africa | 1962 | 48.303 | 11000948 | 2550.816880 |
| 27 | Algeria | Africa | 1967 | 51.407 | 12760499 | 3246.991771 |
| 28 | Algeria | Africa | 1972 | 54.518 | 14760787 | 4182.663766 |
Filter all the observations that are in the year 2007 and ave it as `gapminder_2007:
gapminder_2007 = gapminder[gapminder["year"] == 2007]
gapminder_2007.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 11 | Afghanistan | Asia | 2007 | 43.828 | 31889923 | 974.580338 |
| 23 | Albania | Europe | 2007 | 76.423 | 3600523 | 5937.029526 |
| 35 | Algeria | Africa | 2007 | 72.301 | 33333216 | 6223.367465 |
| 47 | Angola | Africa | 2007 | 42.731 | 12420476 | 4797.231267 |
| 59 | Argentina | Americas | 2007 | 75.320 | 40301927 | 12779.379640 |
when filtering rows, you are using boolean operators:
==: equal to!=: not equal to>: greater than<: less than>=: greater than or equal to<=: less than or equal to
You can also filter rows by combining 2 conditions. Let us say we want:
countries in Africa
the year 2007
and save as africa_2007:
africa_2007 = gapminder[(gapminder["continent"] == "Africa") & (gapminder["year"] == 2007)]
africa_2007.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 35 | Algeria | Africa | 2007 | 72.301 | 33333216 | 6223.367465 |
| 47 | Angola | Africa | 2007 | 42.731 | 12420476 | 4797.231267 |
| 131 | Benin | Africa | 2007 | 56.728 | 8078314 | 1441.284873 |
| 167 | Botswana | Africa | 2007 | 50.728 | 1639131 | 12569.851770 |
| 203 | Burkina Faso | Africa | 2007 | 52.295 | 14326203 | 1217.032994 |
Sometimes you may need to filter multiple rows, let say, more than 5 countries. It would be inconvinient to type country == "..." five times. This is where the .isin() function is useful; you pass the list of observations you want. We will work with with observations from the year 2007.
gapminder_2007 = gapminder[gapminder["year"] == 2007]
countries = gapminder_2007[gapminder_2007["country"].isin(["China", "Brazil", "India", "South Africa"])]
countries.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 179 | Brazil | Americas | 2007 | 72.390 | 190010647 | 9065.800825 |
| 299 | China | Asia | 2007 | 72.961 | 1318683096 | 4959.114854 |
| 707 | India | Asia | 2007 | 64.698 | 1110396331 | 2452.210407 |
| 1415 | South Africa | Africa | 2007 | 49.339 | 43997828 | 9269.657808 |
You can also filter rows by using their index positions, let say, the 1st and 3rd rows:
gapminder.iloc[[0, 4]]
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 |
| 4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 |
You can also filter a range of rows For example, we may select top 10 countries with the highes population in 2007. So we will use the gapminder_2007 DataFrame. You will need to arrange by population first
gapminder_2007 = gapminder_2007.sort_values(["pop"], ascending = False)
gapminder_2007.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 299 | China | Asia | 2007 | 72.961 | 1318683096 | 4959.114854 |
| 707 | India | Asia | 2007 | 64.698 | 1110396331 | 2452.210407 |
| 1619 | United States | Americas | 2007 | 78.242 | 301139947 | 42951.653090 |
| 719 | Indonesia | Asia | 2007 | 70.650 | 223547000 | 3540.651564 |
| 179 | Brazil | Americas | 2007 | 72.390 | 190010647 | 9065.800825 |
You can see that observations are sorted in descending order from highest popultion to the lowest; now filter:
gapminder_2007.iloc[0:10]
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 299 | China | Asia | 2007 | 72.961 | 1318683096 | 4959.114854 |
| 707 | India | Asia | 2007 | 64.698 | 1110396331 | 2452.210407 |
| 1619 | United States | Americas | 2007 | 78.242 | 301139947 | 42951.653090 |
| 719 | Indonesia | Asia | 2007 | 70.650 | 223547000 | 3540.651564 |
| 179 | Brazil | Americas | 2007 | 72.390 | 190010647 | 9065.800825 |
| 1175 | Pakistan | Asia | 2007 | 65.483 | 169270617 | 2605.947580 |
| 107 | Bangladesh | Asia | 2007 | 64.062 | 150448339 | 1391.253792 |
| 1139 | Nigeria | Africa | 2007 | 46.859 | 135031164 | 2013.977305 |
| 803 | Japan | Asia | 2007 | 82.603 | 127467972 | 31656.068060 |
| 995 | Mexico | Americas | 2007 | 76.195 | 108700891 | 11977.574960 |
Another way of filtering rows is to use the .query() method. This is how you would filter observations from the African continent:
gapminder_africa = gapminder_2007.query("continent == 'Africa'")
gapminder_africa.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 1139 | Nigeria | Africa | 2007 | 46.859 | 135031164 | 2013.977305 |
| 467 | Egypt | Africa | 2007 | 71.338 | 80264543 | 5581.180998 |
| 515 | Ethiopia | Africa | 2007 | 52.947 | 76511887 | 690.805576 |
| 335 | Congo, Dem. Rep. | Africa | 2007 | 46.462 | 64606759 | 277.551859 |
| 1415 | South Africa | Africa | 2007 | 49.339 | 43997828 | 9269.657808 |
Filter observations where life expectancy is equal to or greater 80 and the year is 2007; assign the results to high_life_expect:
high_life_expect = gapminder.query("lifeExp >= 80 and year == 2007")
high_life_expect.head()
| country | continent | year | lifeExp | pop | gdpPercap | |
|---|---|---|---|---|---|---|
| 71 | Australia | Oceania | 2007 | 81.235 | 20434176 | 34435.36744 |
| 251 | Canada | Americas | 2007 | 80.653 | 33390141 | 36319.23501 |
| 539 | France | Europe | 2007 | 80.657 | 61083916 | 30470.01670 |
| 671 | Hong Kong, China | Asia | 2007 | 82.208 | 6980412 | 39724.97867 |
| 695 | Iceland | Europe | 2007 | 81.757 | 301931 | 36180.78919 |
4.5. Summary statistics and aggregating#
One of the important task in data science is to explore the data using summary statistics. There are various methods used to summarise data, including:
Summary stat |
pandas function |
|---|---|
Mean/average |
|
Median/mid-point |
|
Sum of values |
|
Minimum |
|
Maximum |
|
Standard deviation: spread/dispersion of data |
|
Count or total number of observations |
|
Let explore these functions. What is the average/mean life expectancy of all countries?
gapminder["lifeExp"].mean()
59.47443936619714
What is the median GDP per capita?
gapminder["gdpPercap"].median()
3531.8469885
Find the minimum population:
gapminder["pop"].min()
60011
You can experiment with other functions listed in the above table.
4.5.1. Grouping#
In many cases, you may need to get summary statistics by a category or group. In above examples, although it is important to get the mean population of the world, but this tells us little about variaion in different regions. pandas has .groupby() function that group observations into categories. For example, you may be interested on finding the average the GDP per capita in each continent. Here you would use the .groupby() function, passing the “continent” column inside and follow by the summary statistics function:
gapminder.groupby("continent")["gdpPercap"].mean()
continent
Africa 2193.754578
Americas 7136.110356
Asia 7902.150428
Europe 14469.475533
Oceania 18621.609223
Name: gdpPercap, dtype: float64
The result show mean wealth (GDP per capita) by a continent. Wouldn’t be nice to sort these statistics from highest to lowest?
gapminder.groupby("continent")["gdpPercap"].mean().sort_values(ascending = False)
continent
Oceania 18621.609223
Europe 14469.475533
Asia 7902.150428
Americas 7136.110356
Africa 2193.754578
Name: gdpPercap, dtype: float64
Now We can see that Oceania has the highest average wealth and Africa has the lowest.
Sometimes you may need to find multiple summary statistics: mean, median, standard deviation, minimum,and maximum at the same time. .agg() function is able to do this:
gapminder.groupby("continent")["gdpPercap"].agg(["mean", "median", "std", "min", "max"])
| mean | median | std | min | max | |
|---|---|---|---|---|---|
| continent | |||||
| Africa | 2193.754578 | 1192.138217 | 2827.929863 | 241.165876 | 21951.21176 |
| Americas | 7136.110356 | 5465.509853 | 6396.764112 | 1201.637154 | 42951.65309 |
| Asia | 7902.150428 | 2646.786844 | 14045.373112 | 331.000000 | 113523.13290 |
| Europe | 14469.475533 | 12081.749115 | 9355.213498 | 973.533195 | 49357.19017 |
| Oceania | 18621.609223 | 17983.303955 | 6358.983321 | 10039.595640 | 34435.36744 |
What if you want to find a summary stat of multiple columns? For example, find the mean pf these columns per continent: life expectancy, population, GDP per capita:
gapminder.groupby("continent").agg(
mean_life_expect = ("lifeExp", "mean"),
mean_pop = ("pop", "mean"),
mean_gdp_pc = ("gdpPercap", "mean")
)
| mean_life_expect | mean_pop | mean_gdp_pc | |
|---|---|---|---|
| continent | |||
| Africa | 48.865330 | 9.916003e+06 | 2193.754578 |
| Americas | 64.658737 | 2.450479e+07 | 7136.110356 |
| Asia | 60.064903 | 7.703872e+07 | 7902.150428 |
| Europe | 71.903686 | 1.716976e+07 | 14469.475533 |
| Oceania | 74.326208 | 8.874672e+06 | 18621.609223 |
4.6. Conclusion#
In this chapter you have learned about different ways of manipulating DataFrames. Next, we go to other pandas methods.