22 September 2007

Wikipedia notches 2 million English language articles... and the debate goes on.

The 2 millionth English language Wikipedia article has been posted: it's about a popular Spanish TV show (the one millionth was about a railway station in Glasgow) .

And this is only part of the story: the ABC informs us that the online encyclopedia now has over 8 million articles in a scarcely believable 250 languages.

This is a magnificent achievement and confirms that many people are willing to share their knowledge (and opinions) voluntarily for a common good, even if their goodwill and industry is diminished by the actions of some contributors who grind axes relentlessly or, for their own reasons, butcher the bona fide efforts of others.

To deal with the alleged vandals new methods of monitoring and regulating content are about to be implemented. The New Scientist reports

News of the plans came to light last August when Wikipedia co-founder Jimmy Wales announced changes to the editing restrictions on the German-language version. However, implementing those changes turned out to be more difficult than anticipated and has still not happened. Now New Scientist has learned that Wikimedia plans to start the first trial of the changes this month.

The shift is a dramatic one for the encyclopedia. For now, edits to an entry can be made by any user and appear immediately to all readers. In the new version, only edits made by a separate class of "trusted" users will be instantly implemented.

To earn this trusted status, users will have to show some commitment to Wikipedia, by making 30 edits in 30 days, say. Other users will have to wait until a trusted editor has given the article a brief look, enough to confirm that the edit is not vandalism, before their changes can be viewed by readers.

This is sure to ease some readers' doubts. Most malicious edits involve crude acts of vandalism, such as the deletion of large chunks of text. Now such changes will rarely make it into articles.

These benefits will come at a price, though. New users could be deterred from participating, since they will lose the gratification that comes from seeing their edit instantly implemented. That could reduce the number of editors as well as creating a class system that divides frequent users from readers. The trusted editors, likely to number around 2000, may also find that articles are being changed too fast for them to monitor.

Not all versions of the encyclopedia will follow this route, says Erik Moller of the Wikimedia Foundation. While editors on the German version are happy with a hierarchy of contributors, the English editors favour a more egalitarian approach. So English readers are likely to continue to see the latest version of an entry, with a page that has been certified as vandalism-free by trusted editors available via a link.

For edits that are more subtly inaccurate, perhaps because they have been designed to promote an agenda, another tool is in store. It allows select groups of editors, probably associated with specific subject areas, to vote on whether an article should be flagged as high quality. Readers would still see the latest version of an article by default, but a link to a high-quality version, if it exists, would also be available.

As well as relying on trusted editors, Wikipedia's upgrade will involve automatically awarding trust ratings to chunks of text within a certain article. Moller says the new system is due to be incorporated into Wikipedia within the next two months, as an option for the different language communities.

The software that will do this, created by Luca de Alfaro and colleagues at the University of California, Santa Cruz, starts by assigning each Wikipedia contributor a trust rating using the encyclopedia's vast log of edits, which records every change to every article and the editor involved. Contributors whose edits tend to remain in place are awarded high trust ratings; those whose changes are quickly altered get a low score. The rationale is that if a change is useful and accurate, it is likely to remain intact during subsequent edits, but if it is inaccurate or malicious, it is likely to be changed. Therefore, users who make long-lasting edits are likely to be trustworthy. New users automatically start with a low rating.

The Australian via The Times also reports this. The Times has also printed an update:

Jimmy Wales said that changes to the online encyclopedia which meant it would now be overseen by a group of 'trusted editors' did not mean that ordinary users weren't free to edit the site, only that they had to have been registered for 4 days before making a change.

This would hopefully lead to a reduction in the number of high profile pages - such as George W. Bush's - that suffered from spontaneous vandalism, he said, as well as improve the reliability of the site, which has been shown to be untrustworthy on several occasions of late.

"There are no plans to restrict anybody's status," Mr Wales said. "Anyone can make an edit, but if the user hasn't been registered for at least 4 days, then it would have to be approved by someone who has been registered before going live."

Mr Wales acknowledged that Wikipedia's reliability had come into question following the discovery that some organisations, including political parties, had been tweaking their entries to improve their image.

But he dismissed the idea that the changes - which will initially only affect the German site - were a response to a planned competitor to Wikipedia, ' Citizendium', which will solicit entries from the public but be edited by a group of experts to root out inaccuracies.

Mr Wales said that it was Wikipedia's aim to "protect the public from goofballs doing bad things" whilst at the same time allowing the "spontaneous acts of goodwell", which were a valuable feature of the site.

I assume that "goodwell" should be "goodwill", but what would a Wikipedia article (or an article about Wikipedia) be without at least one typo?

I should also mention John Quiggin's recent post. Quiggin is a longstanding albeit not uncritical supporter of Wikipedia. What he says about the encyclopedia ( is it time to come up another word which describes the scope and now the scale of the enterprise?) is worth reading. Extract:

The most obvious change in the past eighteen months is the way attention has shifted from the extensive margin (more articles) to the intensive margin (work on existing articles, metacontent such as categorization and classification schemes, and internal process such as the development and enforcement of policies on biographies of living persons, prompted by embarrassments like the Siegenthaler hoax and by the increasing propensity of politicans and others to edit their own entries).

There’s a natural economic logic here. With two million entries already, the typical new entry (ignoring the many short-lived attempts such as this one) is going to be something like List of state leaders in 1390s BC or Kitaƍji Station. The marginal benefit of adding an entry is declining, though certainly not zero. On the other hand, the demand for internal improvements builds on itself. A stroll through Wikipedia using the Random entry function shows that the great majority of entries are tagged as needing improvement of some kind.

This process of cumulative improvement is resource-intensive, but not nearly as much as the dialectical processes that operate for controversial entries (and on Wikipedia, anything and everything can be controversial). Edits are made, reverted, reverted, tagged as needing support or violating some Wikipolicy or other, until a single sentence can consume dozens of hours of work. Still, the result is often a drastic improvement in quality compared to a starting point in which one point of view or another is taken for granted. One obvious manifestation of this is the vast increase in referencing of claims, and the increasing pickiness of policy regarding sources for such claims. Blogs have been a particular victim, with only a handful of expert-written blogs being accepted as reliable sources on particular topics. Despite the merits of the process, it’s easy to get burned out defending an article like Global warming controversy against the sustained efforts of delusionists to include lengthy and uncritical presentations of their latest talking points.

One thing is clear though. Complaining about Wikipedia now is like complaining about the Internet. There isn’t going to be any alternative for quite some time to come.

I'm not (having myself made some extremely modest contributions to the genre) one of those who thinks that there are too many articles about railway stations, but I question the justification for entries like Port Road pub crawl, which have no substantial basis in fact (or more than one person's opinion) and are riddled with errors and omissions. As it stands today the article is a bag of chaff with a few grains (which could easily be transferred to other articles) scattered throughout.

Oh, and don't expect the Port Road pub crawl article to remain in its present form for long (and don't call me a vandal if I amend and rename it).

