Bar plot and proportion

edited August 17 in Assignments

(10 points)

Using your own random data from last Friday's forum question, let's use Colab to examine the activity_level variable. Specifically:

  • Generate a bar chart for activity_level and
  • Compute the proportion of folks whose activity level is high.

Note that creative burden is higher in this lab than in the last in that the Colab link above leads to a blank notebook. Nonetheless, you can find sample code that should help in our class presentation on Categorical Data.

«1

Comments

  • edited August 17

    Data Set

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=taylordurall')
    df.tail()
    

    Value Counts

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
        activity_level
    high    41
    none    31
    moderate    28
    

    Proportion of folks whose activity level is high is: 0.41

    value_counts['high']/len(df)
    

    Bar Chart

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17
      import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=wheadri1')
    df.tail()
    

    Value count

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    moderate    37
    none        37
    high        26
    

    The proportion whose activity level is high is .26

    Bar chart

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    I generated my own random data with this code:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?     username=pkdimond')
    df.head()
    

    My data:

    first_name  last_name   age sex height  weight  income  activity_level
    0   Retha   Reese   41  female  63.91   166.41  24811   moderate
    1   Felicia Hamm    41  female  61.88   152.91  11829   high
    2   Lauren  Poindexter  22  female  69.36   219.97  4259    none
    3   Erin    Davis   29  female  60.52   224.19  19351   moderate
    4   Minnie  Bouie   20  female  69.10   140.45  12624   high
    

    I got my value counts with the code:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    

    ...and got:

    activity_level
    none    37
    high    35
    moderate    28
    

    My proportion of people with a high activity level is .35

    value_counts['high']/len(df)
    

    I got my bar chart with this code:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    Here is my data set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=jaallmer27')
    df.head()
    

    Here is the value counts, which are the data points for my bar chart

    value_counts = df['activity_level'].value_counts()
    value_counts
    

    Here is my bar chart for activity level, obtained by

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    I then computed the proportion of folks whose activity level is high, which is .33

    value_counts['high']/len(df)
    
    # Output: 0.33
    
    mark
  • edited August 17
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=JakeDodd')
    df.tail()
    

    value_counts = df['activity_level'].value_counts()
    value_counts
    

    high 39
    none 31
    moderate 30
    Name: activity_level, dtype: int64

    value_counts['high']/len(df)
    

    0.39

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • Data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=Jackson_L')
    df.tail()
    

    first_name last_name age sex height weight income activity_level
    95 Robert Session 20 male 71.36 175.13 2783 high
    96 Christine Powers 20 female 65.73 172.88 6692 high
    97 Brandon Griffin 23 male 69.00 168.12 31407 moderate
    98 Mollie Donohue 39 female 68.95 174.77 15755 high
    99 Manuel Hammond 32 male 68.79 146.79 238400 moderate

    I also created a table for activity level:

    value_counts.to_markdown = df['activity_level'].value_counts()
    value_counts
    

    none 35
    high 34
    moderate 31
    Name: activity_level, dtype: int64

    Bar Chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    The proportion of people who were at a high activity level is 0.34

    value_counts['high']/len(df)
    
    mark
  • edited August 17

    Data Set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=AudreyAlt')
    df.tail()
    

    Value Counts Code:

    value_counts = df['activity_level'].value_counts()
    value_counts
    none        37
    high        33
    moderate    30
    

    The proportion of high activity level is 0.33

    value_counts['high']/len(df)
    

    Bar Chart:

    mark
  • edited August 17

    My Data Set

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=hyoung1')
    df.tail()
    
    first_name  last_name    age    sex height  weight  income  activity_level
    95  James   Theden   24 male    64.49   135.56  20767   moderate
    96  Michael Williams     33 male    68.77   199.40  65051   high
    97  Phillip Yamashiro   33  male    69.81   215.21  32861   none
    98  Douglas Mims    47  male    72.60   172.08  102766  high
    99  Beatrice    Cox 23  female  61.08   153.27  73659   none
    

    The Value Counts of My Data

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    
    activity_level
    high            35
    none    33
    moderate    32
    

    The Proportion of My Data Set

    value_counts['high']/len(df)
    

    0.35

    The Bar Plot of My Data Set

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    Data set:

     import pandas as pd
     df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=amanda')
     df.tail()
    
     first_name last_name   age sex height  weight  income  activity_level
     95 Cara    Rogers  22  female  59.93   153.81  9555    moderate
     96 Craig   Brandt  29  male    65.45   169.31  28619   moderate
     97 Herbert Carolina    25  male    74.51   141.63  6421    moderate
     98 Patricia    Guarnieri   22  female  69.52   150.90  228591  none
     99 Stacy   Davis   24  male    67.87   217.32  12139   moderate
    

    Value counts:

     value_counts = df['activity_level'].value_counts()
     value_counts.to_frame()
    
     activity_level
     none   47
     moderate   31
     high   22
    

    Proportion whose activity is high:

     value_counts['high']/len(df) 
     0.22
    

    Bar Chart:

     value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    Data Set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=lmusial')
    df.head()
    
    first_name  last_name   age sex height  weight  income  activity_level
    0   Barbara Dolan   57  female  57.01   177.34  6631    high
    1   Caren   Walters 22  female  63.84   157.23  8015    high
    2   Wesley  Avery   39  male    66.90   204.50  2201    moderate
    3   Michael Numbers 41  male    67.09   164.74  5184    none
    4   Bruce   Williams    37  male    67.07   180.17  9517    none
    

    Table for the activity level:

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    none        37
    high        33
    moderate    30
    Name: activity_level, dtype: int64
    

    My Bar Chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0)
    

    The proportion of people who were at a high activity level is 0.33

    value_counts['high']/len(df) 
    
    mark
  • edited August 17

    Data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=amberc')
    df.tail()
    
    first_name  last_name   age sex height  weight  income  activity_level
    95  Richard Schmidt 40  male    69.04   184.16  425 moderate
    96  Paul    Fleury  29  male    64.96   206.06  11212   moderate
    97  Agnes   Pollard 39  female  61.78   233.80  12416   moderate
    98  Diane   Morrison    42  female  61.50   179.02  17823   none
    99  Frances Horn    38  female  60.43   132.64  336 moderate
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    moderate    44
    high            34
    none           22
    Name: activity_level, dtype: int64
    

    Proportion of people with high activity level: .34

    value_counts['high']/len(df)
    

    Bar chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    Data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=janeturlington')
    df.head()
    
    
    first_name  last_name   age sex height  weight  income  activity_level
    0   John    Griffin 53  male    70.83   101.54  7675    moderate
    1   Dorthy  Brown   43  female  60.14   193.71  3096    moderate
    2   Mary    Geiger  36  female  59.62   160.14  34931   high
    3   Ann Diaz    40  female  62.30   200.72  1526    none
    4   Steve   Washington  39  male    70.73   145.43  1626    high
    

    to find my specific data for 'activity level'

    value_counts = df['activity_level'].value_counts()
    value_counts.to_markdown()
    
        activity_level
    moderate    39
    none    34
    high    27
    

    then i put the data into a bar graph

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    and i found who was high with this and got 0.27

    value_counts['high']/len(df)
    
    mark
  • First I gather the data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=jsouther')
    df.head()
    

    Then I take the value counts from the activity_level variable:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    

    To get the bar chart, I just enter the following code to produce one:

    value_counts.plot.bar()
    

    Finally for the proportion data, I just enter this line which returned the value 0.34

    value_counts['high']/len(df)
    

    And that's all I need to put down.

    mark
  • edited August 19
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=bdubose')
    df.head()
    
    first_name  last_name   age sex height  weight  income  activity_level
    0   Benjamin    Davis   58  male    75.86   151.92  2977    high
    1   Cody    Hicks   38  male    69.02   204.74  16815   high
    2   Elizabeth   Hall    31  female  61.41   167.96  17624   none
    3   Dean    Hunt    36  male    69.04   185.04  1823    none
    4   Christine   Valle   40  female  62.50   112.04  4153    none
    

    value counts

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    high        36
    moderate    33
    none        31
    Name: activity_level, dtype: int64
    

    proportion

    value_counts['high']/len(df)
    
    0.36
    

  • Data
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=alexa')
    df.head()
    first_name last_name age sex height weight income activity_level
    0 Petrina Jose 37 female 63.10 182.08 22898 high
    1 Amanda Abrams 28 female 65.39 126.48 13397 moderate
    2 Sandra Howard 55 female 60.88 156.78 31135 moderate
    3 Elmer Lim 49 male 66.48 163.43 693 moderate
    4 Albert Smith 24 male 72.84 179.31 4438
    Variable chart
    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    activity_level
    moderate 34
    none 34
    high 32
    Bar Chart
    value_counts.plot.bar(figsize=(12,7), rot = 0);

  • edited August 17

    Data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=ebrady2')
    df.tail()
    
    first_name  last_name   age sex height  weight  income  activity_level
    95  Shirley Garcia  25  female  62.52   119.57  5010    none
    96  Danny   Hymes   26  male    73.81   153.20  57164   high
    97  April   Lebrecque   34  female  69.20   141.89  45640   moderate
    98  Karen   Babcock 21  female  61.40   172.48  44646   moderate
    99  Tony    Benedict    26  male    69.80   174.12  22386   high
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    
        activity_level
    moderate    35
    high    34
    none    31
    

    Proportion of folks whose activity level is high is: 0.34

    value_counts['high']/len(df)
    0.34
    

    Bar:

    mark
  • value_counts.plot.bar(figsize=(12,7), rot = 0);

  • import dataset:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=sterlings')
    df.head()
    

    value counts for desired variable:

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    high        46
    none        29
    moderate    25
    Name: activity_level, dtype: int64
    

    finding proportion whose activity level is high:

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    0.46
    

    generate bar graph:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    Data Set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=ljohns13')
    df.head()
    
    first_name  last_name   age sex height  weight  income  activity_level
    0   David   Krause  21  male    74.85   152.97  7391    moderate
    1   Karen   Liebsch 27  female  70.06   116.25  4694    moderate
    2   Dorothy Hill    35  female  59.56   153.08  11234   high
    3   Brenda  Bott    26  female  66.71   181.44  105242  none
    4   Maria   Flournoy    50  female  67.07   189.09  7789    none
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_markdown()
    

    "High" Proportion:

    value_counts['high']/len(df)
    

    Bar Chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • Data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=hburnett777')
    df.head()
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    

    Proportion of those with a high activity level: 0.35

    value_counts['high']/len(df)
    

    Bar Chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0)
    

    mark
  • edited August 17

    Data Set

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=eallen4')
    df.tail()
    
    
    My Data
    first_name  last_name   age sex height  weight  income  activity_level
    95  Harold  Ogle    26  male    64.84   173.37  37549   high
    96  Vivian  Diaz    35  female  66.05   119.89  34276   high
    97  Florinda    Alston  35  female  68.45   186.53  8963    none
    98  Bobbie  Hillis  32  female  62.81   153.58  25578   moderate
    99  Errol   Ricketts    42  male    70.43   183.30  30  none
    

    Value Counts

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    activity_level
    moderate    35
    none    33
    high    32
    

    Proportion of people's high activity levels: .32

    value_counts['high']/len(df)
    

    mark
  • edited August 17

    Data Set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=bgillen')
    df.tail()
    
    first_name  last_name   age sex height  weight  income  activity_level
    95  Stewart Horton  29  male    68.29   180.10  1616    none
    96  Catherine   Maddox  34  female  65.84   187.60  7534    moderate
    97  Margaret    Martin  28  female  66.34   205.22  229127  high
    98  Robert  Armstrong   21  male    65.95   157.64  55894   none
    99  Sheila  Wallis  41  female  63.27   124.36  4767    moderate
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_markdown()
    

    | activity_level |\n|:---------|-----------------:|\n| high | 40 |\n| none | 31 |\n| moderate | 29 |

    High Proportion:

    value_counts['high']/len(df)
    .4
    

    Bar Chart:
    value_counts.plot.bar(figsize=(12,7), rot = 0);

    mark
  • edited August 17

    Data Set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=cmcmahan')
    df.tail()    
    
    # Output:
        first_name  last_name   age     sex     height  weight  income  activity_level   
    95  William     Pearson     33  male    67.46   147.26  6501    none
    96  Heather     Stephens    39  female  67.72   138.53  70003   none
    97  Sheri   Lepard  58  female  62.66   172.13  27956   moderate
    98  Stacie  Leray   43  female  62.64   156.19  4555    moderate
    99  Elizabeth   Peele   41  female  63.79   175.92  19106   moderate
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    
    activity_level
    high    37
    moderate    33
    none    30
    

    High Proportion:

    value_counts['high']/len(df)
    

    .37

    Bar Plot:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • Data:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=Salem')
    df.head()
    
    first_name  last_name   age sex height  weight  income  activity_level
    0   Crystal Menard  21  female  66.10   184.02  6515    none
    1   Amy Wise    38  female  57.31   203.48  1683    none
    2   Judith  Monk    30  female  63.31   186.16  1156    moderate
    3   Humberto    Ray 50  male    70.73   211.53  1183    high
    4   Claude  Baker   22  male    70.68   151.08  12819   none
    

    Value counts:

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    
    activity_level
    none    40
    high    33
    moderate    27
    

    Activity level "High":

    value_counts['high']/len(df)
    0.33
    

    Bar graph:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 17

    My data set:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=ksims')
    df.head()
    
     first_name last_name   age sex height  weight  income  activity_level
    0     Carl  Gerard  36  male    69.45   215.64  2663    moderate
    1     Lila  Johnson 49  female  61.57   175.93  25959   none
    2     Brenda    Mossien 40  female  61.94   188.08  14445   high
    3     Gary  Lafave  49  male    67.95   169.63  76683   high
    4     Jefferey  Johnson 23  male    71.37   190.35  33960   none
    

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts
    

    high 37
    moderate 33
    none 30
    Name: activity_level, dtype: int64

    High Proportion:

    value_counts['high']/len(df)
    

    0.37

    My Bar Chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • edited August 24

    Data set:

    import pandas as pd
    

    df = pd.read_csv ( 'https://www.marksmath.org/cgi-bin/random_data.csv?username=amcbride')
    df.head()

    Value Counts:

    value_counts = df['activity_level'].value_counts()
    

    value_counts.to_frame()

    moderate 38
    high 35
    none 27

    value_counts['high']/len(df
    

    High proportion:![]

    .35

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

  • Data:

     first_name last_name   age sex height  weight  income  activity_level
    95  Marilyn Parris  29  female  64.68   150.69  115310  high
    96  Robert  Flores  51  male            71.18   136.76  24487   moderate
    97  Daniel  Bohne   36  male            72.07   177.61  833         none
    98  Richard Schubbe 32  male            68.57   145.97  47000   high
    99  David   Smith   41  male           69.95    176.52  3249    high
    
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=ltipton')
    df.tail()
    

    Value Count

    high 36
    none 33
    moderate 31

    value_counts = df['activity_level'].value_counts()
    

    High proportion:

    0.36

    value_counts['high']/len(df)
    

    Bar Chart

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    
    mark
  • My Data Set Is:

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=ccross2') 
    df.head()
    
    first_name  last_name   age sex height  weight  income  activity_level
    0   John    Sanchez 22  male    68.97   174.75  1675    none
    1   Dennis  Palmer  22  female  67.04   168.24  176113  high
    2   Jennifer    Walker  32  female  66.34   192.27  2180    moderate
    3   Shannon Clay            38    female    62.52   169.43  3165    high
    4   Leona   Benson  31  female  62.58   211.73  1969    moderate
    

    My Value Counts:

    value_counts = df['activity_level'].value_counts()
    value_counts
    
    none        39
    high        32
    moderate    29
    Name: activity_level, dtype: int64
    

    Proportion of people with high activity: .32

    value_counts['high']/len(df)
    

    Bar Chart:

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
  • import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=BrionMcLaurin')
    df.tail()  
    

    none 38
    high 34
    moderate 28

  • edited August 17

    My data set

    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=cbehan')
    df.tail()
    

    My output

    first_name  last_name   age sex height  weight  income  activity_level
    95  Michael Vogel   31  male    65.23   145.48  894 moderate
    96  Karl    Cornely 27  male    70.21   215.22  3908    moderate
    97  Phyllis Hoffman 52  female  63.34   162.52  35325   none
    98  Patricia    Winslow 28  female  67.92   185.47  3091    high
    99  Lamont  Morales 40  male    72.57   173.11  4561    high
    

    Value Counts

    value_counts = df['activity_level'].value_counts()
    value_counts.to_frame()
    

    Output

    activity_level
    high    36
    none    34
    moderate    30
    

    Proportion of those who is high-0.36

    value_counts['high']/len(df)
    

    Bar Chart

    value_counts.plot.bar(figsize=(12,7), rot = 0);
    

    mark
Sign In or Register to comment.