Thursday, April 12, 2012 at 1:01 PM
Chess enthusiasts watch World Chess champion Garry Kasparov on a television monitor as he holds his head in his hands at the start of the sixth and final match in May1997 against IBM's Deep Blue computer in New York.
Sure, Deep Blue beat Kasparov, but can a computer score an essay as well as a human?
The results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre.
That matters because in 2014, most states will move from their currently mostly paper and pencil-based standardized tests to new online tests. Those tests will largely replace states' current K-12 standardized tests used for state and federal accountability purposes. The idea is that switching from human to computer graders could make administering these tests cheaper for states nationwide.
Ohio state schools chief Stan Heffner has said Ohio could save as much as 40 percent on state testing costs each year by using software instead of humans to score tests.
The study was authored University of Akron researcher Mark Shermis and Ben Hammer, from a company that provides a platform for statistical and analytics competitions. The study compared the results from feeding student essays from six states into nine different essay scoring software programs and compared the programs' scores with those produced by human graders.
Eight of the nine programs were commercial; the other program was a free, open-source software package developed at Carnegie Mellon University. Together they represent nearly all of the available automated essay-scoring options, according to the study.
The study is scheduled to be presented Monday at the annual conference of the National Council on Measurement in Education.
But the study notes that just because the computer programs agree with the humans, doesn't mean they're right:
Agreement with human ratings is not necessarily the best or only measure of students’
writing proficiency (or the evidence of proficiency in an essay)... The limitation of human scoring as a yardstick for automated scoring is underscored by the human ratings used for some of the tasks in this study, which displayed strange statistical properties and in some cases were in conflict with documented adjudication procedures.
And one of the study's authors note that computer software hasn't yet caught up with humans when it comes to identifying creativity:
But while fostering original, nuanced expression is a good goal for a creative writing instructor, many instructors might settle for an easier way to make sure their students know how to write direct, effective sentences and paragraphs. “If you go to a business school or an engineering school, they’re not looking for creative writers,” Shermis says. “They’re looking for people who can communicate ideas. And that’s what the technology is best at” evaluating.
[documentcloud url=https://www.documentcloud.org/documents/335765-contrasting-state-of-the-... format=normal sidebar=false ]