Education is not Moneyball

I initially embraced value-added methods of teacher evaluation, figuring that they could revolutionize education in the same way that sabermetricians revolutionized professional baseball. Over time, however, I realized that this analogy was somewhat flawed. There are lots of ways to analyze data, and the owners of baseball teams have a real motivation — they want to win ball games and sell tickets — to use data appropriately to ensure their best chance of success. I’m not so sure that the “owners” of public education — the politicians and ultimately the voters — share this motivation.

An excellent editorial the contrasting use of statistics in baseball and in education appeared in Education Week: http://www.edweek.org/tm/articles/2014/08/27/fp_eger_valueadded.html?cmp=ENL-TU-NEWS1 I appreciate the tack that this editorial takes: the author is not philosophically opposed to sabermetric-like analysis of education but argues forcefully that, pragmatically, we’re not there yet.

Both the Gates Foundation and the Education Department have been advocates of using value-added models to gauge teacher performance, but my sense is that they are increasingly nervous about accuracy and fairness of the new methodology, especially as schools transition to the Common Core State Standards.

There are definitely grounds for apprehensiveness. Oddly enough, many of the reasons that the similarly structured WAR [Wins Above Replacement] works in baseball point to reasons why teachers should be skeptical of value-added models.

WAR works because baseball is standardized. All major league baseball players play on the same field, against the same competition with the same rules, and with a sizable sample (162 games). Meanwhile, public schools aren’t playing a codified game. They’re playing Calvinball—the only permanent rule seems to be that you can’t play it the same way twice. Within the same school some teachers have SmartBoards while others use blackboards; some have spacious classrooms, while others are in overcrowded closets; some buy their own supplies while others are given all they need. The differences across schools and districts are even larger.

The American Statistical Association released a brief report on value-added assessment that was devastating to its advocates. ASA set out some caveats on the usage on value-added measurement (VAM) which should give education reformers pause. Some quotes:

VAMs are complicated statistical models, and they require high levels of statistical expertise. Sound statistical practices need to be used when developing and interpreting them, especially when they are part of a high-stakes accountability system. These practices include evaluating model assumptions, checking how well the model fits
the data, investigating sensitivity of estimates to aspects of the model, reporting measures of estimated precision such as confidence intervals or standard errors, and assessing the usefulness of the models for answering the desired questions about teacher effectiveness and how to improve the educational system.

VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.

Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.

VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.

 

 

 

Mathematics and College Football

For years, various algorithms (derisively called “the computers” by sports commentators) have been used to rank college football teams. The source of derision is usually quite simple to explain: most of these algorithms are too hard to explain in layman’s terms, and therefore they are mocked.

For both its simplicity and its ability to provide reasonable rankings, my favorite algorithm is “Random Walker Rankings,” published at http://rwrankings.blogspot.com. Here is a concise description of this ranking system (quoted from http://rwrankings.blogspot.com/2003_12_01_archive.html):

We’ve all experienced befuddlement upon perusing the NCAA Division I-A college football
Bowl Championship Series (BCS) standings, because of the seemingly divine inspiration that must have been incorporated into their determination. The relatively small numbers of games between a large number of teams makes any ranking immediately suspect because of the dearth of head-to-head information. Perhaps you’ve even wondered if a bunch of monkeys could have ranked the football teams as well as the expert coaches and sportswriters polls and the complicated statistical ranking algorithms.

We had these thoughts, so we set out to test this hypothesis, although with simulated monkeys (random walkers) rather than real ones.

Each of our simulated “monkeys” gets a single vote to cast for the “best” team in the nation, making their decisions based on only one simple guideline: They periodically look up the win-loss outcome of a single game played by their favorite team, and flip a weighted coin to determine whether to change their allegiance to the other team. In order to make this process even modestly reasonable, this random decision is made so that there is higher probability that the monkey’s allegiance and vote will go with the team that won the head-to-head contest. For instance, the weighting of the coin might be chosen so that 75% (say) of the time the monkey changes his vote to go with the winner of the game, meaning only a 25% chance of voting for the loser.

The monkey starts by voting for a randomly chosen team. Each monkey then meanders around a network which describes the collection of teams, randomly changing allegiance from one team to another along connections representing games played between the two teams that year. This network is graphically depicted in the figure here, with the monkeys—okay, technically one is a gorilla—not so happily lent to us by Ben Mucha (inset). It’s a simple process: if the outcome of the weighted coin flip indicates that he should be casting his vote for the opposing team, the monkey stops cheerleading for the old team and moves to the site in the network representing his new favorite team. While we let the monkeys change their minds over and over again—indeed, a single monkey voter will forever be changing his vote in this scheme—the percentage of votes cast for each football team quickly stabilizes. We thereby obtain rankings each week of the season and at the end of the season, based on the games played to that point of the season, by looking at the fraction of monkeys that vote for each team…

The virtue of this ranking system lies in its relative ease of explanation. Its performance is arguably on par with the expert polls and (typically more complicated) computer algorithms employed by the BCS. Can a bunch of monkeys rank football teams as well as the systems in use now? Perhaps they can.

Using this algorithm, here’s the current ranking of college football teams as of today. (With great pride, I note that Stanford is ranked #4.) These rankings certainly don’t exactly match the latest AP poll or BCS rankings, but they’re also still reasonable and defensible.

RWFL2011

Engaging students: Solving one-step and two-step inequalities

In my capstone class for future secondary math teachers, I ask my students to come up with ideas for engaging their students with different topics in the secondary mathematics curriculum. In other words, the point of the assignment was not to devise a full-blown lesson plan on this topic. Instead, I asked my students to think about three different ways of getting their students interested in the topic in the first place.

I plan to share some of the best of these ideas on this blog (after asking my students’ permission, of course).

This first student submission comes from my former student Jesse Faltys (who, by the way, was the instigator for me starting this blog in the first place). Her topic: how to engage students when teaching one-step and two-step inequalities.

green line

A. Applications – How could you as a teacher create an activity or project that involves your topic?

  1. Index Card Game: Make two sets of cards. The first should consist of different inequalities. The second should consist of the matching graph. Put your students in pairs and distribute both sets of cards.  The students will then practice solving their inequalities and determine which graph illustrates which inequality.
  2. Inequality Friends: Distribute index cards with simple inequalities to a handful of your students (four or five different inequalities) and to the rest of the students pass of cards that only contain numbers. Have your students rotate around the room and determine if their numbers and inequalities are compatible or not. If they know that their number belongs with that inequality then the students should become “members” and form a group. Once all the students have formed their groups, they should present to the class how they solved their inequality and why all their numbers are “members” of that group.

Both applications allow for a quick assessment by the teacher.  Having the students initially work in pairs to explore the inequality and its matching graph allows for discover on their own.  While ending the class with a group activity allows the teacher to make individual assessments on each student.

green line

B. Curriculum: How does this topic extend what your students should have learned in previous courses?

In a previous course, students learned to solve one- and two-step linear equations.  The process for solving one-step equality is similar to the process of solving a one-step inequality.  Properties of Inequalities are used to isolate the variable on one side of the inequality.  These properties are listed below.  The students should have knowledge of these from the previous course; therefore not overwhelmed with new rules.

Properties of Inequality

1. When you add or subtract the same number from each side of an inequality, the inequality remains true. (Same as previous knowledge with solving one-step equations)

2. When you multiply or divide each side of an inequality by a positive number, the inequality remains true. (Same as previous knowledge with solving one-step equations)

3. When you multiply or divide each side of an inequality by a negative number, the direction of the inequality symbol must be reversed for the inequality to remain true. (THIS IS DIFFERENT)

There is one obvious difference when working with inequalities and multiply/dividing by a negative number there is a change in the inequality symbol.  By pointing out to the student, that they are using what they already know with just one adjustment to the rules could help ease their mind on a new subject matter.

green line

C. CultureHow has this topic appeared in pop culture?

Amusement Parks – If you have ever been to an amusement park, you are familiar with the height requirements on many of the rides.  The provide chart below shows the rides at Disney that require 35 inches or taller to be able to ride. What rides will you ride?

(Height of Student \ge  Height restriction)

Blizzard Beach Summit Plummet 48″
Magic Kingdom Barnstormer at Goofy’s Wiseacres Farm 35″
Animal Kingdom Primeval Whirl 48″
Blizzard Beach Downhill Double Dipper 48″
DisneyQuest Mighty Ducks Pinball Slam 48″
Typhoon Lagoon Bay Slide 52″
Animal Kingdom Kali River Rapids 38″
DisneyQuest Buzz Lightyear’s AstroBlaster 51″
DisneyQuest Cyberspace Mountain 51″
Epcot Test Track 40″
Epcot Soarin’ 40″
Hollywood Studios Star Tours: The Adventures Continue 40″
Magic Kingdom Space Mountain 44″
Magic Kingdom Stitch’s Great Escape 40″
Typhoon Lagoon Humunga Kowabunga 48″
Animal Kingdom Expedition Everest 44″
Blizzard Beach Cross Country Creek 48″
Epcot Mission Space 44″
Hollywood Studios The Twilight Zone Tower of Terror 40″
Hollywood Studios Rock ‘n’ Roller Coaster Starring Aerosmith 48″
Magic Kingdom Splash Mountain 40″
Magic Kingdom Big Thunder Mountain Railroad 40″
Animal Kingdom Dinosaur 40″
Epcot Wonders of Life / Body Wars 40″
Blizzard Beach Summit Plummet 48″
Magic Kingdom Barnstormer at Goofy’s Wiseacres Farm 35″
Animal Kingdom Primeval Whirl 48″
Blizzard Beach Downhill Double Dipper 48″
DisneyQuest Mighty Ducks Pinball Slam 48″
Typhoon Lagoon Bay Slide 52″

Sports – Zdeno Chara is the tallest person who has ever played in the NHL. He is 206 cm tall and is allowed to use a stick that is longer than the NHL’s maximum allowable length. The official rulebook of the NHL state limits for the equipment players can use.  One of these rules states that no hockey stick can exceed160 cm.  (Hockey stick \le 160 cm) The world’s largest hockey stick and puck are in Duncan, British Columbia. The stick is over 62 m in length and weighs almost 28,000 kg.  Is your equipment legal?

hockey

Weather – Every time the news is on our culture references inequalities by the range in the temperature throughout the day.  For example, the most extreme change in temperature in Canada took place in January 1962 in Pincher Creek, Alberta. A warm, dry wind, known as a chinook, raised the temperature from -19 °C to 22 °C in one hour. Represent the temperature during this hour using a double inequality. (-19 < the temperature < 22) What Inequality is today from the weather in 1962?