Yesterday I ran a workshop called "The Art of Guerrilla Research" for ELESIG, along with Tony Hirst. I'll blog it later but basically it was about what sort of research can you do without permission and funding, eg asking questions of open data (hence Tony describing the things he does).
One issue that was raised a few times was that of the ethics of it. The assumption has long been that anything openly available is fair game. So for instance there is a lot of research that uses travel blogs as its data source, and they don't require the permission of these people to analyse them or interpret them. In general, this is my stance too, but thinking through the types of things Tony does with data led me to come up with a scenario which would raise ethical issues. I offer it just as an example of how it isn't quite as clear cut as you may think regarding openly available data.
Let us imagine that there is a heinous crime we can all agree is very bad – puppy murders (I'm using a silly example so people won't get distracted by a specific crime, but you can replace puppy murders with a small or large crime/amoral act of your choice). Tony does a FOI request to find all the people convicted of puppy murders over the past decade. He then finds which of these have Facebook pages that are openly available. He creates an interest graph of their listed interests, and shows that puppy murderers tend to have a number of interests in common. He blogs this, just out of interest.
Someone else then comes along and finds all the people on Facebook who also have these interests, and publishes a list of 300 people who have 'puppy murderer' type interests. One of these, although entirely innocent of any puppy mistreatment, is attacked by a mob who accuse him of being a puppy murderer.
Now, this has used all openly available data, publicly and knowingly shared by the individuals. But by taking it and creating a new interpretation of that data, new knowledge has been generated which the original posters could not have foreseen. The new form of this knowledge then carries an ethical dimension. This is obviously an extreme example, but it illustrates the potential complexity of assuming all open data is fair game.