Here's a data set that contains 21 bits of information on nearly 4000 emails:
email_data = read.delim('https://www.marksmath.org/classes/Summer2017Stat185/data/email.txt')
dim(email_data)
If I enter email_data
, my computer will attempt to display all 3921 rows and 21 columns. I can get a sense of the data by just looking at the first few rows, though.
head(email_data)
Can you guess what's going on here? Here's a list of all fields (or names) associated with the data.
names(email_data)
Here's a histogram of the lenght of the emails:
hist(email_data$num_char, 100)
And a plot showing the (not so surprising) relationship between number of characters and line breaks.
plot(email_data$num_char, email_data$line_breaks)
us_population_data = read.delim('https://www.marksmath.org/classes/Summer2017Stat185/data/us_population.txt')
us_population_data
It might make sense to visualize this with a time series plot.
plot(us_population_data$year, us_population_data$pop, type='b')