Electronic Scoring of Essays

by Alexander Russo

While new to elementary and secondary education, electronic scoring of written exams is already used for a variety of postsecondary admissions and placement tests.

The “E-rater,” developed by a for-profit subsidiary of the Educational Testing Service, is used to score business school GMAT exams, though human graders are used to check the computer score and resolve any differences.

There are four major essay scoring programs, each with a different approach. In essence, these software programs are able to read and score examples of student writing, including open-ended essay questions.

To learn how to rate each essay question appropriately, the E-rater is fed examples of various quality work that already has been graded by human scores. In this way, the computer learns to mirror the scoring rubrics used by human readers. Use of specific vocabulary, as well as rhetorical complexity, syntax and organization, are also part of what is measured.

Lowered cost is one obvious benefit of having computers score essays instead of human readers. Pennsylvania estimates that it will cut the $1.6 million cost of scoring writing exams in half by going to a computer-scored version. An estimated 35,000 written student essays in grades 3, 5, 8 and 11 from more than 100 districts were scored electronically this December as the third in a series of pilot tests being conducted by the state. Officials are still considering whether to use electronic scoring for all of the Pennsylvania state writing tests, which are required for a diploma as of 2003.

More states could follow soon. Though not yet used for formal assessment purposes, a diagnostic version of ETS’s electronic essay scoring technology has been used by more than 100,000 students in the past year, the first that it’s been available. In most cases, teachers use the program to give students more practice writing and to familiarize them with how standardized writing tests are graded, says Richard Swartz, president of ETS Technologies. Available to both schools and districts, the program has been widely adopted in Montgomery County, Md., and Knox County, Tenn.

Proponents argue that computers arrive at the same scores as human graders but at much lower costs. Electronic scoring also potentially shortens the turnaround time for scoring essays from months to minutes. However, at least some systems use a combination of human and computer scores. Before it commits, Pennsylvania is having its essays scored by humans.

Critics question whether any computer program at this point is capable of measuring substance and quality in complex essays. Given the reality that, until recently, computers were unable to match humans at complex activities such as playing chess, these questions remain legitimate.

Familiarization and testing are important processes, says Swartz. “Online familiarity programs (Websites that explain computer-based testing and prove practice simulations) are going to be an important part of the transition.”

One other factor that has to be taken into consideration is that human readers tend to be slightly tougher when looking at typed essays, according to Swartz. The errors are more apparent and the essays appear to be shorter. Another factor is that some students can type much faster than others, raising questions about how much time and length should be a consideration.