# Bar plot and proportion

edited August 2020

(10 points)

Using your own random data from last Friday's forum question, let's use Colab to examine the activity_level variable. Specifically:

• Generate a bar chart for activity_level and
• Compute the proportion of folks whose activity level is high.

Note that creative burden is higher in this lab than in the last in that the Colab link above leads to a blank notebook. Nonetheless, you can find sample code that should help in our class presentation on Categorical Data.

«1

• edited August 2020

Data Set

import pandas as pd
df.tail()

Value Counts

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()
activity_level
high    41
none    31
moderate    28

Proportion of folks whose activity level is high is: 0.41

value_counts['high']/len(df)

Bar Chart

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020
import pandas as pd
df.tail()

Value count

value_counts = df['activity_level'].value_counts()
value_counts

moderate    37
none        37
high        26

The proportion whose activity level is high is .26

Bar chart

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

I generated my own random data with this code:

import pandas as pd

My data:

first_name  last_name   age sex height  weight  income  activity_level
0   Retha   Reese   41  female  63.91   166.41  24811   moderate
1   Felicia Hamm    41  female  61.88   152.91  11829   high
2   Lauren  Poindexter  22  female  69.36   219.97  4259    none
3   Erin    Davis   29  female  60.52   224.19  19351   moderate
4   Minnie  Bouie   20  female  69.10   140.45  12624   high

I got my value counts with the code:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

...and got:

activity_level
none    37
high    35
moderate    28

My proportion of people with a high activity level is .35

value_counts['high']/len(df)

I got my bar chart with this code:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Here is my data set:

import pandas as pd

Here is the value counts, which are the data points for my bar chart

value_counts = df['activity_level'].value_counts()
value_counts

Here is my bar chart for activity level, obtained by

value_counts.plot.bar(figsize=(12,7), rot = 0);

I then computed the proportion of folks whose activity level is high, which is .33

value_counts['high']/len(df)

# Output: 0.33

• edited August 2020
import pandas as pd
df.tail()

value_counts = df['activity_level'].value_counts()
value_counts

high 39
none 31
moderate 30
Name: activity_level, dtype: int64

value_counts['high']/len(df)

0.39

value_counts.plot.bar(figsize=(12,7), rot = 0);

• Data:

import pandas as pd
df.tail()

first_name last_name age sex height weight income activity_level
95 Robert Session 20 male 71.36 175.13 2783 high
96 Christine Powers 20 female 65.73 172.88 6692 high
97 Brandon Griffin 23 male 69.00 168.12 31407 moderate
98 Mollie Donohue 39 female 68.95 174.77 15755 high
99 Manuel Hammond 32 male 68.79 146.79 238400 moderate

I also created a table for activity level:

value_counts.to_markdown = df['activity_level'].value_counts()
value_counts

none 35
high 34
moderate 31
Name: activity_level, dtype: int64

Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

The proportion of people who were at a high activity level is 0.34

value_counts['high']/len(df)

• edited August 2020

Data Set:

import pandas as pd
df.tail()

Value Counts Code:

value_counts = df['activity_level'].value_counts()
value_counts
none        37
high        33
moderate    30

The proportion of high activity level is 0.33

value_counts['high']/len(df)

Bar Chart:

• edited August 2020

My Data Set

import pandas as pd
df.tail()

first_name  last_name    age    sex height  weight  income  activity_level
95  James   Theden   24 male    64.49   135.56  20767   moderate
96  Michael Williams     33 male    68.77   199.40  65051   high
97  Phillip Yamashiro   33  male    69.81   215.21  32861   none
98  Douglas Mims    47  male    72.60   172.08  102766  high
99  Beatrice    Cox 23  female  61.08   153.27  73659   none

The Value Counts of My Data

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

activity_level
high            35
none    33
moderate    32

The Proportion of My Data Set

value_counts['high']/len(df)

0.35

The Bar Plot of My Data Set

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data set:

import pandas as pd
df.tail()

first_name last_name   age sex height  weight  income  activity_level
95 Cara    Rogers  22  female  59.93   153.81  9555    moderate
96 Craig   Brandt  29  male    65.45   169.31  28619   moderate
97 Herbert Carolina    25  male    74.51   141.63  6421    moderate
98 Patricia    Guarnieri   22  female  69.52   150.90  228591  none
99 Stacy   Davis   24  male    67.87   217.32  12139   moderate

Value counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

activity_level
none   47
moderate   31
high   22

Proportion whose activity is high:

value_counts['high']/len(df)
0.22

Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data Set:

import pandas as pd

first_name  last_name   age sex height  weight  income  activity_level
0   Barbara Dolan   57  female  57.01   177.34  6631    high
1   Caren   Walters 22  female  63.84   157.23  8015    high
2   Wesley  Avery   39  male    66.90   204.50  2201    moderate
3   Michael Numbers 41  male    67.09   164.74  5184    none
4   Bruce   Williams    37  male    67.07   180.17  9517    none

Table for the activity level:

value_counts = df['activity_level'].value_counts()
value_counts

none        37
high        33
moderate    30
Name: activity_level, dtype: int64

My Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0)

The proportion of people who were at a high activity level is 0.33

value_counts['high']/len(df)

• edited August 2020

Data:

import pandas as pd
df.tail()

first_name  last_name   age sex height  weight  income  activity_level
95  Richard Schmidt 40  male    69.04   184.16  425 moderate
96  Paul    Fleury  29  male    64.96   206.06  11212   moderate
97  Agnes   Pollard 39  female  61.78   233.80  12416   moderate
98  Diane   Morrison    42  female  61.50   179.02  17823   none
99  Frances Horn    38  female  60.43   132.64  336 moderate

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts

moderate    44
high            34
none           22
Name: activity_level, dtype: int64

Proportion of people with high activity level: .34

value_counts['high']/len(df)

Bar chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data:

import pandas as pd

first_name  last_name   age sex height  weight  income  activity_level
0   John    Griffin 53  male    70.83   101.54  7675    moderate
1   Dorthy  Brown   43  female  60.14   193.71  3096    moderate
2   Mary    Geiger  36  female  59.62   160.14  34931   high
3   Ann Diaz    40  female  62.30   200.72  1526    none
4   Steve   Washington  39  male    70.73   145.43  1626    high

to find my specific data for 'activity level'

value_counts = df['activity_level'].value_counts()
value_counts.to_markdown()

activity_level
moderate    39
none    34
high    27

then i put the data into a bar graph

value_counts.plot.bar(figsize=(12,7), rot = 0);

and i found who was high with this and got 0.27

value_counts['high']/len(df)

• First I gather the data:

import pandas as pd

Then I take the value counts from the activity_level variable:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

To get the bar chart, I just enter the following code to produce one:

value_counts.plot.bar()

Finally for the proportion data, I just enter this line which returned the value 0.34

value_counts['high']/len(df)

And that's all I need to put down.

• edited August 2020
import pandas as pd

first_name  last_name   age sex height  weight  income  activity_level
0   Benjamin    Davis   58  male    75.86   151.92  2977    high
1   Cody    Hicks   38  male    69.02   204.74  16815   high
2   Elizabeth   Hall    31  female  61.41   167.96  17624   none
3   Dean    Hunt    36  male    69.04   185.04  1823    none
4   Christine   Valle   40  female  62.50   112.04  4153    none

value counts

value_counts = df['activity_level'].value_counts()
value_counts

high        36
moderate    33
none        31
Name: activity_level, dtype: int64

proportion

value_counts['high']/len(df)

0.36

• Data
import pandas as pd
first_name last_name age sex height weight income activity_level
0 Petrina Jose 37 female 63.10 182.08 22898 high
1 Amanda Abrams 28 female 65.39 126.48 13397 moderate
2 Sandra Howard 55 female 60.88 156.78 31135 moderate
3 Elmer Lim 49 male 66.48 163.43 693 moderate
4 Albert Smith 24 male 72.84 179.31 4438
Variable chart
value_counts = df['activity_level'].value_counts()
value_counts.to_frame()
activity_level
moderate 34
none 34
high 32
Bar Chart
value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data:

import pandas as pd
df.tail()

first_name  last_name   age sex height  weight  income  activity_level
95  Shirley Garcia  25  female  62.52   119.57  5010    none
96  Danny   Hymes   26  male    73.81   153.20  57164   high
97  April   Lebrecque   34  female  69.20   141.89  45640   moderate
98  Karen   Babcock 21  female  61.40   172.48  44646   moderate
99  Tony    Benedict    26  male    69.80   174.12  22386   high

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

activity_level
moderate    35
high    34
none    31

Proportion of folks whose activity level is high is: 0.34

value_counts['high']/len(df)
0.34

Bar:

• import dataset:

import pandas as pd

value counts for desired variable:

value_counts = df['activity_level'].value_counts()
value_counts

high        46
none        29
moderate    25
Name: activity_level, dtype: int64

finding proportion whose activity level is high:

value_counts = df['activity_level'].value_counts()
value_counts

0.46

generate bar graph:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data Set:

import pandas as pd

first_name  last_name   age sex height  weight  income  activity_level
0   David   Krause  21  male    74.85   152.97  7391    moderate
1   Karen   Liebsch 27  female  70.06   116.25  4694    moderate
2   Dorothy Hill    35  female  59.56   153.08  11234   high
3   Brenda  Bott    26  female  66.71   181.44  105242  none
4   Maria   Flournoy    50  female  67.07   189.09  7789    none

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_markdown()

"High" Proportion:

value_counts['high']/len(df)

Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• Data:

import pandas as pd

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

Proportion of those with a high activity level: 0.35

value_counts['high']/len(df)

Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0)

• edited August 2020

Data Set

import pandas as pd
df.tail()

My Data
first_name  last_name   age sex height  weight  income  activity_level
95  Harold  Ogle    26  male    64.84   173.37  37549   high
96  Vivian  Diaz    35  female  66.05   119.89  34276   high
97  Florinda    Alston  35  female  68.45   186.53  8963    none
98  Bobbie  Hillis  32  female  62.81   153.58  25578   moderate
99  Errol   Ricketts    42  male    70.43   183.30  30  none

Value Counts

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()
activity_level
moderate    35
none    33
high    32

Proportion of people's high activity levels: .32

value_counts['high']/len(df)

• edited August 2020

Data Set:

import pandas as pd
df.tail()

first_name  last_name   age sex height  weight  income  activity_level
95  Stewart Horton  29  male    68.29   180.10  1616    none
96  Catherine   Maddox  34  female  65.84   187.60  7534    moderate
97  Margaret    Martin  28  female  66.34   205.22  229127  high
98  Robert  Armstrong   21  male    65.95   157.64  55894   none
99  Sheila  Wallis  41  female  63.27   124.36  4767    moderate

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_markdown()

| activity_level |\n|:---------|-----------------:|\n| high | 40 |\n| none | 31 |\n| moderate | 29 |

High Proportion:

value_counts['high']/len(df)
.4

Bar Chart:
value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data Set:

import pandas as pd
df.tail()

# Output:
first_name  last_name   age     sex     height  weight  income  activity_level
95  William     Pearson     33  male    67.46   147.26  6501    none
96  Heather     Stephens    39  female  67.72   138.53  70003   none
97  Sheri   Lepard  58  female  62.66   172.13  27956   moderate
98  Stacie  Leray   43  female  62.64   156.19  4555    moderate
99  Elizabeth   Peele   41  female  63.79   175.92  19106   moderate

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

activity_level
high    37
moderate    33
none    30

High Proportion:

value_counts['high']/len(df)

.37

Bar Plot:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• Data:

import pandas as pd

first_name  last_name   age sex height  weight  income  activity_level
0   Crystal Menard  21  female  66.10   184.02  6515    none
1   Amy Wise    38  female  57.31   203.48  1683    none
2   Judith  Monk    30  female  63.31   186.16  1156    moderate
3   Humberto    Ray 50  male    70.73   211.53  1183    high
4   Claude  Baker   22  male    70.68   151.08  12819   none

Value counts:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

activity_level
none    40
high    33
moderate    27

Activity level "High":

value_counts['high']/len(df)
0.33

Bar graph:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

My data set:

import pandas as pd

first_name last_name   age sex height  weight  income  activity_level
0     Carl  Gerard  36  male    69.45   215.64  2663    moderate
1     Lila  Johnson 49  female  61.57   175.93  25959   none
2     Brenda    Mossien 40  female  61.94   188.08  14445   high
3     Gary  Lafave  49  male    67.95   169.63  76683   high
4     Jefferey  Johnson 23  male    71.37   190.35  33960   none

Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts

high 37
moderate 33
none 30
Name: activity_level, dtype: int64

High Proportion:

value_counts['high']/len(df)

0.37

My Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

Data set:

import pandas as pd

Value Counts:

value_counts = df['activity_level'].value_counts()

value_counts.to_frame()

moderate 38
high 35
none 27

value_counts['high']/len(df

High proportion:![]

.35

value_counts.plot.bar(figsize=(12,7), rot = 0);

• Data:

first_name last_name   age sex height  weight  income  activity_level
95  Marilyn Parris  29  female  64.68   150.69  115310  high
96  Robert  Flores  51  male            71.18   136.76  24487   moderate
97  Daniel  Bohne   36  male            72.07   177.61  833         none
98  Richard Schubbe 32  male            68.57   145.97  47000   high
99  David   Smith   41  male           69.95    176.52  3249    high

import pandas as pd
df.tail()

Value Count

high 36
none 33
moderate 31

value_counts = df['activity_level'].value_counts()

High proportion:

0.36

value_counts['high']/len(df)

Bar Chart

value_counts.plot.bar(figsize=(12,7), rot = 0);

• My Data Set Is:

import pandas as pd

first_name  last_name   age sex height  weight  income  activity_level
0   John    Sanchez 22  male    68.97   174.75  1675    none
1   Dennis  Palmer  22  female  67.04   168.24  176113  high
2   Jennifer    Walker  32  female  66.34   192.27  2180    moderate
3   Shannon Clay            38    female    62.52   169.43  3165    high
4   Leona   Benson  31  female  62.58   211.73  1969    moderate

My Value Counts:

value_counts = df['activity_level'].value_counts()
value_counts

none        39
high        32
moderate    29
Name: activity_level, dtype: int64

Proportion of people with high activity: .32

value_counts['high']/len(df)

Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

• edited August 2020

My data set

import pandas as pd
df.tail()

My output

first_name  last_name   age sex height  weight  income  activity_level
95  Michael Vogel   31  male    65.23   145.48  894 moderate
96  Karl    Cornely 27  male    70.21   215.22  3908    moderate
97  Phyllis Hoffman 52  female  63.34   162.52  35325   none
98  Patricia    Winslow 28  female  67.92   185.47  3091    high
99  Lamont  Morales 40  male    72.57   173.11  4561    high

Value Counts

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

Output

activity_level
high    36
none    34
moderate    30

Proportion of those who is high-0.36

value_counts['high']/len(df)

Bar Chart

value_counts.plot.bar(figsize=(12,7), rot = 0);

• import pandas as pd

value_counts = df['activity_level'].value_counts()
value_counts

value_counts.plot.bar(figsize=(12,7), rot = 0);

• Data set:

import pandas as pd
df.tail()

first_name  last_name   age sex height  weight  income  activity_level
95  Kathleen    Allen   30  female  60.70   139.41  520 none
96  Sarah   Roche   35  female  68.96   183.76  32649   high
97  Christine   Griffin 29  female  63.36   159.37  7969    moderate
98  Pamela  Small   30  female  67.62   174.56  5076    high
99  Betty   Overton 44  female  61.65   168.26  159 high

Value Count:

value_counts = df['activity_level'].value_counts()
value_counts.to_frame()

activity_level:
moderate: 40
none: 36
high: 24

High Proportion Level:

value_counts['high']/len(df)

0.24

My Bar Chart:

value_counts.plot.bar(figsize=(12,7), rot = 0);

This discussion has been closed.