Quantcast

Vigilant Internetism

The Internet home of Alexander Furnas
Internet Researcher and Open Net Advocate. Occasional Writer.

My Recent Collaboration with Gabe Lezra at The Atlantic

So, it has been a while since I posted here. Things have been busy - I finished up my Master’s degree. I figured I would post my most recent Atlantic piece below, to get the ball rolling again.

This was a fun piece to write, because it was a collaboration with my great friend (and new housemate!) Gabe. It was a nice melding of our interests.

The original Atlantic piece is here.

Here is the text:

This month Manchester City, the younger brother (and rival) of the better-known Manchester United, announced that it will release detailed data about the team for public consumption.

The club’s press release noted that “the speed of growth for the discipline of performance analytics is essentially in the clubs’ hands — it is they who have bought the data at significant cost and the rest of the analytics community simply do not have access to the data at the same level … [But while] there are many people in the analytics community right now who have the skills, desire and vision to make a difference in the performance analytics space…those people have no significant data to work with.” By opening up this data and making it available to those within the analytics community Manchester City hopes to “encourage and inspire the next generation of analytics.”

This move, while essentially unprecedented in the soccer world, fits clearly within larger cross-sector trends of making data open to harness the distributed human capital and innovative potential of hobbyists, enthusiasts, and geeks with pro-level skills. The history of success of making data available to the wonks who want to use it bodes well for the future of soccer analytics; we may be at a watershed moment.

The move to promoted innovation through openness is premised on the idea that innovation is often about cost. In particular, entry costs are important. For a pool of potential innovators (in basically any sector) the less costly the inputs required to begin innovating, the more likely it is that potential innovators will become actual innovators. If more equipment, materials, special skills or privileged information is required, fewer people will experiment, tinker, and discover. It follows that the more people are experimenting and trying to innovate, the more valuable innovation is likely to happen. This dynamic implies that in sectors in need of innovation, it is useful to assess the costs of entry and try to lower them.

A common explanation for the radically innovative tech scene in recent decades, is that the Internet lowered barriers to market entry, as basically anyone with a computer and enough time could write some killer code. Yochai Benkler, a scholar at Harvard’s Berkman Center for Internet and Society, has made a career of looking at how radically low barriers to entry in labor markets can change the cost structures and organizations of production. This trend is nowhere more evident than the Open Data movement. This movement, which gets it philosophical inspiration from the older Open Source movement, holds that data should be freely available to anyone without restriction.

In knowledge discovery in datasets, the major barrier to entry is access to the data. When corporations, governments or other private firms jealously guard their proprietary data, the number of people playing with the data and trying to discover valuable things, or putting that data to good use, will remain small. When data is made public, anyone can put that data to work. In recent years governments have begun making large troves of their data publically accessible. The U.S. government’s open-data project, data.gov, for example, has begotten over 200 citizen-developed apps. Similarly, the city of Vancouver, an early mover in the municipal open-data space, opened up their data in 2009, spawning valuable mashups of transit data, the water grid, and common spaces.

A common adage in open-source development known as Linus’ Law states that “with enough eyeballs, all bugs are shallow,” indicating that if you can get enough people involved, hard problems become easier. This is what open data does for knowledge discovery and innovation. When looking for a needle in the haystack of data, it helps to have a more people looking. The best way to get more people looking is to make it cheap to look.

Lowering the cost to look, and thus enabling more people to get involved is precisely what Manchester City has begun to do. Opening the data up promises to lower barriers to entry for experimenting with new data-driven ways of understanding the game. With more eyeballs, this problem can become shallow.

Normally “the only data you can get [publicly] is the really basic stuff: goals, assists, cards… [which is] nothing you can really work from,” says Graham MacAree, SBNation’s soccer editor, and one of the leaders in the field of public soccer analytics.

According to the club, some data will be entirely available for public consumption, but the most detailed data —“a time coded feed that lists all player action events within the game with a player, team, event type, minute and second for each action, together with the x/y/z co-ordinates for each event” — will be sent to analysts who present a project submission that is approved by the club and their data provider Opta, the leaders in soccer data mining.

This more detailed data will be useful for experts like MacAree, a veteran of baseball’s statistical revolution known as “sabermetrics” (think Moneyball), because it contains so much more information than can be gleaned from traditional soccer analysis, which has focused on individual actions in a vacuum — that is, without context: Player X passes, Player Y dribbles, and Player Z shoots and scores.

“The most important thing for me is knowing where the ball is at all times, and where all the players are at all times,” MacAree explains. “And City are proposing to release not just the what, but the where and when of the data. We’re talking very much about space and time, which are very difficult to get out of the data set we’ve already had.”

This is a foundational moment for the soccer-analytics community. The field of study, despite all the bluster about a soccer Moneyball or Jamesian moment (after the godfather of the sabermetric movement, baseball writer Bill James), has yet to progress past the equivalent of a box score. Large-scale advanced metrics are years of research away, especially because data has been so scarce. Most of the cutting-edge analytics have been painstakingly developed by hand. Previously, researchers without access to the kind of data Manchester City is making available have had to record every event in a match, watching frame-by-frame, then transcribe it to Excel, and write the code themselves to analyze it. Single match analyses like MacAree’s radial-passing maps take more than a day of labor-intensive work to assemble.

In this data environment, researchers have little hope of coming up with testable, verifiable, predictive metrics.

“If you look at baseball, the sabermetric revolution came about because data was available before it was valuable,” MacAree explains. In this environment the costs of entry to innovate were low, and Bill James, among others, was able to experiment. But “now that we know how valuable data is, there’s no reason for it to be [freely] given to us… but our contribution [community analysts’] can also be valuable. And we’ve always been about showing that we’re worth giving that data to.”

This is what is so unique about Manchester City’s decision to, at least partially, open up one of their most valuable assets to the public. They have decided to embrace the open-source nature of baseball’s Jamesian revolution, and bring it, at least partially, to soccer.

Their press release speaks directly to the analytics community, describing areas of performance analysis that City would “like to discuss with you”: “We will work directly with those of you who came up with good concepts, and also connect you to others who are working in the same research area,” they crow.

There is a long way to go in soccer analytics, and this is but a small first step into a larger world. City’s data is only for one year; for predictive models to be valuable, they must be based off, and tested against, various years of data. And this type of scientific peer review, based off years of data, will only be feasible if teams and organizations continue in City’s footsteps. But City’s move to begin opening up their detailed data represents a strong first step in capitalizing on the power of peer-production and decentralized expertise that we have seen yield meaningful results in other sectors. If the public proves that they can make something — be it a real predictive model, or even an interesting concept — worthy of investment with this data, it seems likely that other teams will follow City’s lead.

And that’s a challenge that MacAree, and others, are more than ready for.

Knight News Challenge Round 2: A Customizable Fisheye View of the News: a novel aggregation platform providing transparency and user control

Below is the application I submitted today for the Knight News Challenge (Round 2). I would love people to go to the page, and leave your thoughts. We will be applying for funding at a variety of places, so are eager to learn how to present out project better.

newschallenge:

1. What do you propose to do? [20 words]

Create a contextualizing news aggregation platform that provides user agency over personalization, and foster self-reflective curation of users’ information intake.

2. How will your project make data more useful? [50 words]
Data must be filtered…

(Source: newschallenge2)

Alexander Furnas and Nora Young on CBC's Spark Episode #184

—Interview on Pre-Crime Screening Technology

From the episode description on Spark’s page:

It sounds like something from the movie Minority Report, but the US Department of Homeland Security is researching ‘pre-crime’ technology to screen for people who may be about to commit a terrorist act. Alexander Furnas is a journalist who has given the technology a lot of thought. He considers whether technology like this could actually work. (Runs 8:11)

(Perhaps) my favorite paragraph I have ever written:

Below is the final paragraph of the Undergraduate Honors Thesis (College of Social Studies, Wesleyan University) that I wrote last Spring. The thesis was a gargantuan 258 pages even after extensive editing - I am too verbose for my own good - in part because it was a wide-ranging meditation on how technology can (and should) be used to make 21st Century Democracy more participatory and responsive. I drew on research ranging from the organizational faculties of Ants (Stygmergy), to machine learning, discourse analysis, Computer Supported Collaborative Work and Incentive-Based Design; from game theory and philosophy to more standard political science. It was a type of syncretic interdisciplinarity made possible only by the unprecedented access I had (and we all now have) to information and research. I think it turned pretty well despite its ludicrous scope - itself a form of hubris I have since learned to moderate to some degree as a more well-trained researcher. The full thing, not that anyone other than those the university required to grade it will ever slog all the way through, is available here.

But why am I posting this now, more than a year after I wrote it? In the last few days/weeks I have been looking at the incredibly exciting new Smart Congress project at the New America Foundation’s Open Technology Institute. The projects goals are incredibly close to the theoretical goals I was investigating in that thesis, which got me thinking on that past year of work. I have moved away from this line of inquiry in the last year - for methodological reasons - but I miss it. I want it back. We need to figure out a way to reinvigorate democratic legitimacy, and I think harnessing smart and well-designed technical platforms is part of the way to do it.  I am not a determinist here - technology is not a silver bullet; but it may be the tool we need.

The paragraph below, I think, illustrates well the broad scope of the issue. I think the Huntington parallel is apt. Check it out:

In 1968 Samuel Huntington argued in Political Order in Changing Societies that as modernization increased people’s social and economic mobility, their political aspirations and expectations grew in parallel. In the absence of adaptable governmental institutions to meet these expanding aspirations, they turn to social frustration and ultimately political instability. I believe that Huntington’s hypothesis can be generalized to the notion that transformations in economic and social conditions lead to transformations in political expectations and aspirations, to which governments must respond. The networked information revolution is a historical inflection point. The magnitude of the transformation that ubiquitous networked computing has wrought is on par with that of modernization, and we are only beginning to feel its social and economic effects. These economic and social changes, I believe, are likely to engender a radical transformation in political expectations and aspirations in the coming years. To meet the expanding political aspirations of a networked public, democratic governance will need to be integrated with networked computing if it is to survive.

Straight White Male: The Lowest Difficulty Setting There Is

This is a pretty good analogy for discussing privilege. It nicely gets at the fact that privilege (e.g straight white male) has an impact on outcomes, while still showing that it doesn’t mean that successful privileged people didn’t have to work to get where they are.

On the margin, things are just easier.

It makes sense that the salience of these two issues would be related. Anti-piracy laws and countermeasures tend to violate traditional privacy norms - indeed they are perhaps the biggest threat to our online privacy these days. The Google insight chart below shows the relative volume of ‘privacy’ and ‘piracy’ in news headlines since 2008. What we see here is that often after an upward blip in the public salience of piracy, there is a corresponding upward blip in the public salience of privacy (there are, however, spikes in privacy that are seemingly unrelated to piracy salience). This fits with a conception that the public salience of privacy (as measured by media attention) is driven by privacy advocates responding to specific campaigns (like anti-piracy measures) which are threats to privacy.

It is not totally clear what the time-lag here is, however, I adjusted the data to account for a two month lag in privacy salience to quick-check the hypothesis. This leads to a fairly notable correlation (.28) that is statistically significant (p = 0.0000). A quick regression tells us that, essentially, for every one point increase in the salience of piracy now we can expect a .44 point increase in the salience of privacy two months down the road.

[This is a cross post from Rough Consensus - cheers.]

On the Thom Hartmann Show

I meant to post this right after it happened, but it ended up sliding down the to-do list, as non-critical things are want to do.

After I wrote about the Department of Homeland Security’s Future Attribute Screening Technology (FAST) and everything that is wrong with it, I was contacted by a producer from the Thom Hartmann show (a progressive news/talk radio show in the). They were kind enough to have me on to talk about the project and why I think it won’t work. I didn’t get to talk about everything I wanted to - there are some points in the article that weren’t addressed - but it was a fun conversation.

Here is a video of me talking with Thom. Sorry for the only passable phone connection, I was via phone, and I have a terrible phone with terrible reception…