# Engaging students: Approximating data by a straight line

In my capstone class for future secondary math teachers, I ask my students to come up with ideas for engaging their students with different topics in the secondary mathematics curriculum. In other words, the point of the assignment was not to devise a full-blown lesson plan on this topic. Instead, I asked my students to think about three different ways of getting their students interested in the topic in the first place.

I plan to share some of the best of these ideas on this blog (after asking my students’ permission, of course).

This student submission again comes from my former student Caroline Wick. Her topic, from Algebra: approximating data to a straight line.

B1. Curriculum

How can this topic be used in your students’ future courses in mathematics or science?

Though approximating data by a straight line is a subject that is brought up in Algebra 2, it is something that students will need to use in a number of subjects down the line. Probably the most obvious subject would be statistics. Finding an approximate trend line is extremely important for a statistician so that they can predict future, unobserved data. Another example that might not be as readily noticeable would be anthropology. Anthropology is the study of humans in various parts of life. In this case, according to Brian Hopkins, anthropology can be used by stores to figure out what types of products they should stock on their shelves during different types of the year. They do this by collecting the data, then approximating the trend lines to predict how the product will sell during the same season of the next year. For example, Orange Juice and tissues are known to be sold more often during the winter seasons, so stores know that they want to stock up on orange juice and tissue during the colder season each year.

A1: Applications

What interesting (i.e., uncontrived) word problems using this topic can your students do now?
Using the data given below:
(a) plot the points on a graph
(b) Then, using a ruler, do your best to approximate a trend line that fits the points
(c) Write an equation (y=mx+b) that best fits the trend line
(d) Approximate the next four numbers on the line using the equation you created.

Population growth in squirrels in TX from 1950-1980 (in millions)*
Year (x) 1950 1955 1960 1965 1970 1975 1980
Pop. (y) 12 12.7 13.1 13 13.6 13.7 14

From here the student would create his/her graph with the plotted points, find a line that best fits the points with equal numbers over and under the line. They would then use the data and the line to find an equation that best fits the scatter plot data that they graphed. They would then find the approximate squirrel population for 1985, 1990, 1995, and 2000.

This could be either an assignment or it could turn into a project for students with different sets of data. Students could even collect their own data to formulate the graph and equation.

*not real data, fabricated for this problem specifically.

Culture
How has this topic appeared in pop culture (movies, TV, current music, video games, etc.)?

The approximation of data through trend lines has been used in pop culture since the birth of popular culture in the mid twentieth century. More relevantly, it is used to map certain cultural trends. When a new movie is coming out, statisticians use previous data from people who watched/reviewed the movie before its release to map out how they believe it will be appreciated by the public. A movie that did will before its release will likely have a positive trend line that continues upward at a somewhat steady rate. It will get more tickets at the box office than a movie that was not as well liked that might have a less-steep slope. Statisticians use this same trend approximation with TV shows and whether they should run another season, or in music when it hits the top of the charts. The more people listen to a song, the more likelihood it has to be listened to other people, thus the trend continues upward until is slowly dies off.

Take for instance, Taylor Swift’s “Look What You Made Me Do” that was released August 25th of this year. From its release and popularity, statisticians were able to track the data and predict that the song would be number 1 on the top 100 just a few weeks after its release.

References:

# Finding the Regression Line without Calculus

Last month, my latest professional article, Deriving the Regression Line with Algebra, was published in the April 2017 issue of Mathematics Teacher (Vol. 110, Issue 8, pages 594-598). Although linear regression is commonly taught in high school algebra, the usual derivation of the regression line requires multidimensional calculus. Accordingly, algebra students are typically taught the keystrokes for finding the line of best fit on a graphing calculator with little conceptual understanding of how the line can be found.

In my article, I present an alternative way that talented Algebra II students (or, in principle, Algebra I students) can derive the line of best fit for themselves using only techniques that they already know (in particular, without calculus).

For copyright reasons, I’m not allowed to provide the full text of my article here, though subscribers to Mathematics Teacher should be able to read the article by clicking the above link. (I imagine that my article can also be obtained via inter-library loan from a local library.) That said, I am allowed to share a macro-enabled Microsoft Excel spreadsheet that I wrote that allows students to experimentally discover the line of best fit:

http://www.math.unt.edu/~johnq/ExploringTheLineofBestFit.xlsm

I created this spreadsheet so that students can explore (which is, after all, the first E of the 5-E model) the properties of the line of best fit. In this spreadsheet, students can enter a data set with up to 10 points and then experiment with different slopes and $y$-intercepts. As they experiment, the spreadsheet keeps track of the current sum of the squares of the residuals as well as the best guess attempted so far. After some experimentation, the spreadsheet can also provide the correct answer so that students can see how close they got to the right answer.

# My Favorite One-Liners: Part 52

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them. Today’s story is a continuation of yesterday’s post.

When I teach regression, I typically use this example to illustrate the regression effect:

Suppose that the heights of fathers and their adult sons both have mean 69 inches and standard deviation 3 inches. Suppose also that the correlation between the heights of the fathers and sons is 0.5. Predict the height of a son whose father is 63 inches tall. Repeat if the father is 78 inches tall.

Using the formula for the regression line

$y = \overline{y} + r \displaystyle \frac{s_y}{s_x} (x - \overline{x})$,

we obtain the equation

$y = 69 + 0.5(x-69) = 0.5x + 34.5$,

so that the predicted height of the son is 66 inches if the father is 63 inches tall. However, the prediction would be 73.5 inches if the father is 76 inches tall. As expected, tall fathers tend to have tall sons, and short fathers tend to have short sons. Then, I’ll tell my class:

However, to the psychological comfort of us short people, tall fathers tend to have sons who are not quite as tall, and short fathers tend to have sons who are not quite as short.

This was first observed by Francis Galton (see the Wikipedia article for more details), a particularly brilliant but aristocratic (read: snobbish) mathematician who had high hopes for breeding a race of super-tall people with the proper use of genetics, only to discover that the laws of statistics naturally prevented this from occurring. Defeated, he called this phenomenon “regression toward the mean,” and so we’re stuck with called fitting data to a straight line “regression” to this day.

# My Favorite One-Liners: Part 51

In this series, I’m compiling some of the quips and one-liners that I’ll use with my students to hopefully make my lessons more memorable for them.

When I teach regression, I typically use this example to illustrate the regression effect:

Suppose that the heights of fathers and their adult sons both have mean 69 inches and standard deviation 3 inches. Suppose also that the correlation between the heights of the fathers and sons is 0.5. Predict the height of a son whose father is 63 inches tall. Repeat if the father is 78 inches tall.

Using the formula for the regression line

$y = \overline{y} + r \displaystyle \frac{s_y}{s_x} (x - \overline{x})$,

we obtain the equation

$y = 69 + 0.5(x-69) = 0.5x + 34.5$,

so that the predicted height of the son is 66 inches if the father is 63 inches tall. However, the prediction would be 73.5 inches if the father is 76 inches tall.

To make this more memorable for students, I’ll observe:

As expected, tall fathers tend to have tall sons, and short fathers tend to have short sons. For example, my uncle was 6’6″. His two sons, my cousins, were 6’4″ and 6’5″ and were high school basketball stars.

My father was 5’3″. I became a math nerd.

# Deceiving with Statistics

I really enjoyed a recent Math With Bad Drawings post on how descriptive statistics can be used to deceive. For example:

See the rest of the post for similar picture for mean, median, mode, and variance (equivalent to standard deviation); I’ll be using these in my future statistics classes.

# Regression

Source: http://www.xkcd.com/1725/

# Engaging students: Fitting data to a quadratic function

In my capstone class for future secondary math teachers, I ask my students to come up with ideas for engaging their students with different topics in the secondary mathematics curriculum. In other words, the point of the assignment was not to devise a full-blown lesson plan on this topic. Instead, I asked my students to think about three different ways of getting their students interested in the topic in the first place.

I plan to share some of the best of these ideas on this blog (after asking my students’ permission, of course).

This student submission again comes from my former student Loc Nguyen. His topic, from Algebra: fitting data to a quadratic function.

A1. What interesting (i.e., uncontrived) word problems using this topic can your students do now?

To engage students on this topic, I will provide them the word problems in the real life so they can see the usefulness of quadratic regression in predictive purposes. The question to the problem is about the estimated numbers of AIDS cases that can be diagnosed in 2006. The data only show from 1999 to 2003. This will be students’ job to figure out the prediction. I will provide the instructions for this task and I will also walk them through the process of finding the best curve that fit the given data. The best fit to the curve will give us the estimation. Here is how the instruction looks like:

In the end, students will be able to acquire the parabola curve which fit the given data. By letting students work through the real life problems, they will be able to understand why mathematics is important and see how this concept is useful in their lives.

B2. How does this topic extend what your students should have learned in previous courses?

C1. How has this topic appeared in pop culture (movies, TV, current music, video games, etc.)?

At the beginning of the class, I would like to show students the short video of football incident.

This incident was really interesting. The Titans punt went so high so that it hit the scoreboard in Cowboys stadium. Surprisingly, this was Cowboy’s new stadium. There were many questions about what was going on when the architecture built this stadium. It was supposed to be great. This incident revealed the errors in predicting the height of the scoreboard. The data they collected in past year may have been incorrect. I want to incorporate this incident into the concept of quadratic regression. I will pose several questions such as:

Was Titan football punter really that powerful? What was really wrong in this situation?

When the architectures built this stadium, did they ever think that the ball would reach the ceiling?

How come did the architectures fail to measure the height of the ceiling? Did they just assume the height of the stadium tall enough?

What was the path of the ball?

Students will eagerly respond to these questions, and I will slowly bring in the important of quadratic regression. I will then explain how quadratic regression helps us to predict the height based on collected data from past years.

References:

# Engaging students: Approximating data by a straight line

In my capstone class for future secondary math teachers, I ask my students to come up with ideas for engaging their students with different topics in the secondary mathematics curriculum. In other words, the point of the assignment was not to devise a full-blown lesson plan on this topic. Instead, I asked my students to think about three different ways of getting their students interested in the topic in the first place.

I plan to share some of the best of these ideas on this blog (after asking my students’ permission, of course).

This student submission again comes from my former student Esmerelda Sheran. Her topic, from Algebra: approximating data by a straight line.

A.2) How could you as a teacher create an activity or project that involves your topic?

If I created an activity for my class over approximating data by using a straight line I would make sure the type of data, they use is something that is relevant or interesting in the student’s lives. I would have the students work in pairs and choose the data they would work with out of three sets of data I have chosen. Examples of the choices of data would be the relationships between interceptions and wins for NFL teams, car accidents and age, or attendance and GPA (in college/universities). Using the data they chose the students would first take an educated guess of how the graph would look like, draw the scatter plot associated with the data, and compare their guess to the actual graph. At that point the students would try to identify the parent function (xb+c, mx+b, ab, ln(x) etc.) that the data is most similar to or if the data even has correlation. They would then draw what they believed the best fit line would look like on the scatterplot which they would compare to the linear regression once they calculated it on a graphing calculator. I would hope that this activity would be interesting due to the data being real and relatable as well as it being a way to connect parent functions and statistical data.

D.1) What interesting things can you say about the people who contributed to the discovery and/or the development of this topic?

Two of the main collaborators of linear regression are Sir Francis Galton and Karl Pearson. Galton was the discoverer of the linear regression and Pearson further elaborated on Galton’s ideas. Linear regression actually came to be because of sweet peas, Galton was studying heredity in sweet peas and formulated linear regression to aid him in studying the relations he found in his studies. Galton was much more than a hereditist, he was a geologist, meteorologist, tropical explorer, founder of differential psychology, inventor of fingerprint identifications, and an author. A few more interesting things about Galton is that he was knighted, he was accused of promoting eugenics, he was British and he was a half cousin of Charles Darwin. If you were wondering what “eugenics” is, it is the idea of planned breeding of humans through selectively breeding and sterilization. Galton once said, “… I object to pretensions of natural equality.” Being that Galton studied heredity it is no wonder that he felt that some physical/mental/emotional attributes where superior and that humans would benefit from having the “best” genes. Unfortunately for Galton eugenics was frowned upon and he was attacked for promoting it. I think that students would find Galton extremely interesting because of his wide variety of interests.

Karl Pearson, although not as complex as Galton had a few attributes that I feel would interest students. Pearson did not have a childhood that would be considered normal in modern day. Pearson was homeschooled up until he turned nine, and then he went to London alone to study at the University of College School. After he received his degrees and studied physics, metaphysics and Darwinism, Pearson developed his own view in social Darwinism. The social beliefs, he developed led him to changing his name from Carl to Karl.

E.1) How can technology be used to effectively engage students with this topic?

Technology in the classroom has and always will be an effective way to engage students if used correctly. To engage my students to learn how to approximated data with a straight line I would use excel, a smartboard, or the khan academy website. Excel is a useful piece of technology that is underappreciated by the average Joe. With a set of data you can record the relationships and then use the tools to create a scatterplot and then find the linear regression line on the graph.

Using a smartboard in the classroom is effective because it is new technology that is very special and kind of rare. Using smartboard to graph the points of data and then drawing an approximated regression line is highly kinesthetic and gives hands-on experiences instead of just typing in number and getting a calculated result that required almost no brain power. Kinesthetically moving their arms up, down, or side to side helps the students get a feel for the variation and relations between the data and drawing a best fit line themselves help the student understand the data on a different level. The Khan Academy website is a great resource for being introduced and even mastering the concept of linear regression because of the different activities available. For visual and auditory learners, there are a series of videos that explain approximating data by linear regression as well as how to be the most accurate when approximating. Similarly, there is an activity for kinesthetic learners in which they can move a line around to see which line seems most like the best fit line. It is beneficial from an instructor to use this website to help students of all learning types.

References

https://www.dartmouth.edu/~matc/math5.geometry/unit2/unit2.html

http://geomhistory.com/home.html

https://explorable.com/greek-geometry

# What Happens if the Explanatory and Response Variables Are Sorted Independently?

From the category “I Can’t Believe What I Just Read,” the following question was posed to a question-and-answer statistics board last month:

Suppose we have data set $(X_i,Y_i)$ with $n$ points. We want to perform a linear regression, but first we sort the $X_i$ values and the $Y_i$ values independently of each other, forming data set $(X_i,Y_j)$. Is there any meaningful interpretation of the regression on the new data set? Does this have a name?

I imagine this is a silly question so I apologize, I’m not formally trained in stats. In my mind this completely destroys our data and the regression is meaningless. But my manager says he gets “better regressions most of the time” when he does this (here “better” means more predictive). I have a feeling he is deceiving himself.

Your intuition is correct: the independently sorted data have no reliable meaning because the inputs and outputs are being randomly mapped to one another rather than what the observed relationship was.

There is a (good) chance that the regression on the sorted data will look nice, but it is meaningless in context.

And:

If you want to convince your boss, you can show what is happening with simulated, random, independent x,y data. With R:

And:

This technique is actually amazing. I’m finding all sorts of relationships that I never suspected. For instance, I would have not have suspected that the numbers that show up in Powerball lottery, which it is CLAIMED are random, actually are highly correlated with the opening price of Apple stock on the same day! Folks, I think we’re about to cash in big time. 🙂

The sad end of the story, from the original poster:

Thank you for all of your nice and patient examples. I showed him the examples by @RUser4512 and @gung and he remains staunch. He’s becoming irritated and I’m becoming exhausted. I feel crestfallen. I want my work to mean something. I will probably begin looking for other jobs soon.

# Engaging students: Approximating data by a straight line

In my capstone class for future secondary math teachers, I ask my students to come up with ideas for engaging their students with different topics in the secondary mathematics curriculum. In other words, the point of the assignment was not to devise a full-blown lesson plan on this topic. Instead, I asked my students to think about three different ways of getting their students interested in the topic in the first place.

I plan to share some of the best of these ideas on this blog (after asking my students’ permission, of course).

This student submission again comes from my former student Delaina Bazaldua. Her topic, from Algebra: approximating data to a straight line.

How has this topic appeared in pop culture (movies, TV, current music, video games, etc.)?

One of my favorite shows to watch is How I Met Your Mother. I specifically chose this topic for this class because of how it relates to an episode of the show. A piece of the episode that I’m referring to is shown in the YouTube video:

Barney, one of the main characters, describes the graph as the Crazy/Hot Scale. According to him, a girl cannot be crazier than hot which means she has to be above the diagonal straight line. This relates to the topic because one can approximate data by the straight line that Barney gives the viewer. I think the students will be able to relate to this and also find it humorous. Because this video has both of these characteristics, they will be able to remember the concept for upcoming homework and tests which is ultimately the most important part of math: understanding it and being able to recall it.

How has this topic appeared in the news?

Most lines are drawn for the purpose of seeing if there is a relationship between the x and y axis and trying to figure out if you can approximate data from the straight line that is drawn. Graphs like this are found all over the news, and they often relate to natural disasters. For example, this linear regression, http://d32ogoqmya1dw8.cloudfront.net/images/quantskills/methods/quantlit/bestfit_line.v2.jpg, describes floods. In http://serc.carleton.edu/mathyouneed/graphing/bestfit.html, where the picture is found, describes more activities that can be used to create a linear regression which can be converted into a straight line. These examples of straight lines can be used to find more data that isn’t necessarily shown from the points that are plotted. The examples the website gave are: flood frequency curves, earthquake forecasting, meteorite impact prediction, earthquake frequency vs. magnitude, and climate change. This is beneficial for math because it allows students to realize that math isn’t abstract like it is often perceived to be, but rather, it is used for something very important and something that occurs several times a year such as natural disasters and weather.

How can this topic be used in your students’ future courses in mathematics or science?

One of the purposes for teachers to teach is for students to learn what they should for the following year so they can be successful in the particular topic. When it comes to approximating data based on a straight line, the knowledge a student learns in algebra will carry them through statistics, physics, and other higher math and science classes. Linear regression is shown in statistics as one can see in this statistics website: http://onlinestatbook.com/2/regression/intro.html while physics is represented in the physics website: http://dev.physicslab.org/Document.aspx?doctype=3&filename=IntroductoryMathematics_DataAnalysisMethods.xml. A lot can be predicted from these straight lines which is why these graphs aren’t foreign to upper level math and science classes. As I stated before, a lot can be predicted from the graph where data points aren’t necessarily on the trend the data is setting which allows students to expect what would occur at a particular x or y value. A background in this area can help students through the rest of school and perhaps even the rest of their life in some cases.

References: