AI,  open access,  Open content

The darkish side of open licences

A few weeks ago Eamon Costello highlighted that Taylor and Francis/Routledge had sold the rights to harvest their content for Ai to Microsoft. I, like many others, felt a sense of outrage (although I haven’t published with them for quite a few years). I think it is the sense of powerlessness that is frustrating, they can do this, and make even more money from your content, without any consultation. However, I also reflected that as someone who ran an open access journal and publishes under open access licences, then Microsoft (and any other AI harvesters) could have been doing this happily already, without any need to consult me.

I remember a while ago some authors being affronted that their articles were appearing in Course Hero or Chegg. But these were openly licensed works, and CC-BY, not CC-NC, so there was really no comeback. In the radio conference session last week on blogs, Jim Groom also said he felt uncomfortable about all his blog content being used to train AI, but accepted that it was an aspect of openness.

This is not to say “suck it up buttercup”, or to present a pro-AI case, but for those of us who have advocated for open access and open scholarship, it does represent a tension. The point of CC-BY was that it allowed the greatest freedom of reuse. People used to get upset about even adding the Non-Commercial (NC) clause because it would restrict some usage. The unpredictability of reuse was part of the appeal. But that also encompasses usage you may not always like. I am not a lawyer, but I think unless someone is using your content to misrepresent you, then there is no legal recourse on usage by AI or sites such as Course Hero – you can ask for it to be removed I guess, but I don’t think there is a legal obligation to do so, they haven’t breached the copyright licence. There is no “CC-BY for uses I like” licence. (Someone please correct me if I am incorrect about any of the legal aspects).

This presents a quandary for open scholars – do you continue to advocate for open access for everyone, and at the same time accept that you are feeding the machine? Do you accept AI as inevitable and hope your content in some way adds to its quality (I mean, I’m not sure what my random metaphors on here will do to the learning models). Or do you seek to control content with more specific licences that might prohibit being harvested by AI but allow human access?

I think I’m coming down on the side of accepting that openness means content being used by AI or whatever sites deem to harvest it. That may not be ideal, but the benefits of open access still outweigh the downsides for me, but I accept others may come to a different stance.

Plus, I obviously like playing with AI a bit myself as the image above shows, and hey, maybe I can get AI to write the next version of Metaphors of Ed Tech, while I sit on the beach sipping cocktails. That’s how it works, right?

[UPDATE] – Coincidentally, about an hour after I posted this I got a newsletter from Creative Commons, specifically addressing AI issues. In it they state that they have “started exploring the possibilities of preference signals as a means for individuals to indicate the terms (attribution, non-commercial, research purposes etc.) by which their work can or cannot be used for AI training. A tool like this would function much like the CC licenses do!” So, maybe this post is redundant! I’m not sure how such a tool might work, but it seems like the right direction. They provide more info here.

6 Comments

  • Laura

    This feels like a growing area of debate, and raises interesting questions about open vs public domain vs commons… Will ‘open’ stay as it was and we’ll get new ways of talking about sharing? Or will open evolve somehow?

    In recent conversations (with Bill Thompson amonst others) I’ve wondered how well the early wave of open (2008-2015ish?) succeeded. Most of the movement wasn’t “open for the sake of open” – open was a means to an end or ends. Empowerment, education, fighting corruption, and so on. Did we… achieve that? should we look back with pride or challenge ourselves to learn from mistakes?

    As the new waves of related activism (AI ethics, trust and safety, say) are building, what should be taken forward and what should be left behind?

    (No answers, not even sure these are the right questions, just ponderings.)

    • mweller

      Hi Laura! These are all good questions, and like you I don’t have the answers. I think we sort of achieved some of that empowerment, but there’s always a crappy side to these things too. In the update I note that CC are looking at new types of licences so maybe that will help

  • Autumm Caines

    Hi Martin,

    Eamon Costello pointed me to this post after I posted to bsky asking for reactions from Open educators abt OpenStax partnering with Google to integrate the whole catalog into their Gemini model.

    I’m thinking about all this prepping for an upcoming keynote at the Michigan OER conference. I’m grateful to find your post here and will likely end up citing it.

    Through this research I’m coming to a place of realizing it is not so much the AI training on our articles as it is the occasional regurgitation of content verbatim. I used to think this was not possible but it turns out it is happening more than we realized. Companies say it’s a bug and they’re working on it.

    Apparently here in the US we have something called “non expressive use” which is the difference between training on your text (copyright or open) and that being established fair use and output that contains your text unattributed which are an infraction.

    Anyway thanks for the post. This is helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php