Episode 1: transcript
[INTRODUCTION]
[00:00:00] PC: We’re getting to a point of software development, where it’s not so easy to put things in buckets anymore as to what a human wrote, or what a machine wrote. The concept, may be easy, but I think the application might get really complicated.
[00:00:20] SM: Welcome to Deep Dive: AI, podcast from the Open Source Initiative. We’ll be exploring how artificial intelligence impacts free and open-source software, from developers to businesses, to the rest of us.
[SPONSOR MESSAGE]
[00:00:34] SM: Deep Dive: AI is supported by our sponsor, GitHub. Open-source AI frameworks and models will drive transformational impact into the next era of software; evolving every industry, democratizing knowledge and lowering barriers to becoming a developer. As this evolution continues, GitHub is excited to engage and support OSI’s deep dive into AI and open-source and welcomes everyone to contribute to the conversation.
[INTERVIEW]
[00:01:02] SM: I’m Stefano Maffulli. I’m the Executive Director of the Open Source Initiative. Today, I’m talking to Pamela Chestek, a lawyer with extensive experience in open-source, a board member of the Open Source Initiative. She also practices in trademark, copyright, advertising, marketing law. Thanks, Pam, for joining. Let’s jump right into it, Pam. From our virtual hallway conversations, I know that you have some very clear opinions about copyright on materials that has been created by machines. Can you share more about your thoughts on this front?
[00:01:35] PC: I just want to start off by saying that I’m speaking from the perspective of a United States copyright lawyer in the United States law. I think this is an area that may turn out to be quite different in different jurisdictions. I’m just speaking about what I know. The US has been pretty clear about what works are subject to copyright. They have been very clear for many, many years, long before computers that copyright only exists when a work was created by a human author. This goes back quite some time. Probably the most famous example that people might be familiar with was the monkey selfie, where a photographer claimed that a monkey grabbed his camera and took this really charming photo of this big grin by a monkey.
Then when he filed a copyright application to register the copyright in that photograph, the copyright office rejected it, because there had been so much publicity, that this was the work of the monkey, not of the person. The story changed over time where the person contributed – claims to have contributed more copyright on the whole content than story is originally told. Actually, Wikipedia took them to task on it. Wikipedia did a great deal of investigation on this, and reach a conclusion that it was not copyrightable, because the photo was taken by a monkey.
Another example was someone wanted to register a copyright in a work, where they said that they have not written it, but instead, the Holy Spirit had channeled through them to write this copyrightable work. The copyright office rejected it and said, “No, I’m sorry. It’s not written by a human author. We can’t register the copyright in it.” I take back whether these are copyright office decisions, or lawsuits that the copyright office is now incorporated into its guidance. don’t hold me to it if it was – if I had it backwards.
[00:03:20] SM: To be clear, is the Bible out of copyright, because of that?
[00:03:28] PC: The Bible, because at a time. Actually, I don’t know enough about the Bible to say that. Certainly, out of copyright because of the time lapse, because of the time period. I don’t know how many chapters were dictated by God, but versus someone’s retelling of what God told them.
[00:03:42] SM: There’s still God involved. Definitely, computers are now gods in this case. I was also thinking about machines, program, like painting is done by swinging a pendulum, ended up being paint on it. I mean, at that point, there is the person pushing the bucket.
[00:04:01] PC: Yeah. Actually, the copyright office, there also is this – The standard for copyright, for protection by copyright is by the supreme court requires originality and creativity. The copyright office will refuse registrations if they don’t believe that the work has sufficient creativity and originality. I have personally experienced this when I was trying to register a copyright for a monumental sculpture for a site-specific monumental sculpture. The copyright office said that it refused to register it. This was actually really a quite famous sculpture. The copyright office refused to register it, saying it wasn’t creative enough.
The copyright office does find itself, as much as they claim not to, they do end up in the role of arbiter of what is an artistic work and what is not. That’s another facet in that example, if I just push a pendulum and it goes on its own after that, is that is that creative? I could talk a long time about this, because I believe the standards are quite different depending on what work it is. Whereas, photographs are very easily considered copyrightable work, even though you just push a button. Just the development of the law around photographs protects those quite easily and other works, not.
You alluded to maybe — there is this issue of this complexity that we use computers to create works of art. It can’t be simply that a machine was involved. That can’t be the dividing line on whether or not something is copyrightable, because I use Inkscape, or Gimp to create works. The copyright office does have guidance on this on where the dividing line is. I’m going to read a long paragraph, and please forgive the reading and the length of the paragraph. This is actually based on a statement made by the copyright office from 1966. Think about that. This is from the copyright office’s own guidance called the copyright compendium on how to do registration.
It says, “The office will not register works produced by a machine, or mere mechanical process that operates randomly, or automatically, without any creative input or intervention from a human author. The crucial question is, whether the work is basically one of human authorship, with the computer or other device merely being an assisting instrument, or whether the traditional elements of authorship in the work (literary, artistic, or musical expression, or elements of selection arrangement, etc.) were actually conceived and executed not by a man, but by a machine.”
That’s theory, right? I think that’s the theory. It sounds bright line, maybe. Where is that line between what is the human being doing versus what the machine is doing? This is going to be the battleground for the copyrightability of works that are self-modifying based on input, based on the machine learning work.
[00:06:58] SM: Got it. Something like a tool that has been lately in the news, this software from the OpenAI organization called DALL·E. That basically, you feed it text, a description of something, like a sunset on a beach, and the machine is capable of generating art based on that text. Representing something looks like a sunset on a beach. I’ve seen experiments of Twitter bios described, represented as art by this software called DALL·E, and they’re beautiful, to the point where there was a conversation on Hacker News from a young artist. He wondered like, “With the output that I’ve seen from this machine, I’m probably going to be out of business.” It’s pretty clear that the art produced by DALL·E is not copyrightable, right? That’s very easy.
[00:07:52] PC: Yeah. Yeah, I think so. Yeah.
[00:07:54] SM: Now, what’s interesting for me is what happens behind the scenes, like for DALL·E to be considered, or something like DALL·E to be considered open-source. Now, that is a question that fascinates me. Because it’s certainly the thing, in order for DALL·E to be trained, to read and to generate art, they had to look at a lot of art, like interpret, look in a weird way as a computer. By doing that, it needed to – the algorithms, they end up generating the DALL·E output are an output of machine learning, training machines, learning by themselves. Is that copyrightable or not?
[00:08:35] PC: Yeah. This is, I think, where the complexity comes in. To walk through how this all comes about, how the software develops, someone wrote a software program about taking input, and then either analyzing that input, creating rules, or creating some model. Certainly, the software that a human being wrote in order to create the ultimate DALL·E system, that certainly was copyrightable. To the extent that we can also divide it into well, “Here’s the software and here’s the data.” Algorithms acting on data produce a result, that running data through an algorithm and producing a result, I don’t think the result is going to be considered copyrightable.
Where I think it gets interesting is where the software itself is modified as a result of what it has learned from the data that it has been given. As I understand it, I’m not a software engineer but we’re getting to a point of software development, where it’s not so easy to put things in buckets anymore as to what a human wrote, or what a machine wrote. The concept, it may be easy, but I think the application might get really complicated.
[00:09:53] SM: Right. Yeah, that is exactly my fascination. I’m not a software engineer either. I’m a mere architect and I’ve been an observer of this world for a long time. I do remember at one point, my very little diving into, or using AI from a more advanced perspective has been when I was putting together mail server in the past. I was installing SpamAssassin. I never really thought about it. SpamAssassin is a fairly simple machine learning system, where the software itself developed by Apache Software Foundation, is packaged by Debian and it’s fairly – it’s simple to install. APT gets installed from SpamAssassin.
Then what you do is you feed it with your set of good emails, the ‘ham’, and the bad emails, the ‘spam.’ Then there is some other components. Fundamentally, that’s what it is. You train the model, you train the SpamAssassin to understand your set of emails that are good from the ones that you don’t want to be approved. Then it creates rules. Based on those rules, it will apply the filters. Fairly simple. It’s in Debian.
Now, in that context, I do understand that the machine after feeding the spam and the ham, it generates a model. That model is generated by the machine. Is that copyrightable or not? Usually, you don’t package those in Debian, because everybody has their own spam. It’s fairly simple. I never thought about it. It can be simple to reproduce in any case.
[00:11:27] PC: From your description, the models, I think the models would tend to fall in the line of not copyrightable. Because the creative aspect of the work is in the software. You now then feed that software data, and then it spits out and then the machine figures out, the software figures out what the model should be. You aren’t making any artistic, or creative choices, or active choices. Where, I guess, as an aggressive lawyer, there is a concept under US copyright law, and this applies to databases or collections of information, is there can be copyrightability in the selection, coordination and arrangement of information.
To give an analogue example, if I choose to publish an anthology of a poet’s works, and I want it to be a complete anthology, I do not have a copyright in the anthology, in the overarching work. I don’t own the copyright in the poetry, but I also don’t own a copyright in the selection, coordination and arrangement, because there was no creative choice there. I simply identified every single work of the author and included it. Now if instead, I had said, “Well, I want to choose works of this author that are of a specific — that all talk about, say, sadness.” I go through and I select all of the poems that I think fit the selection on the creative – this creative choice that I’ve made.
Then I put them in a certain order. I don’t necessarily put them in chronological order. I put them in an order of happiest to saddest, or something. That may cross a line where that is considered copyrightable. Because there is creativity in that selection, that coordination, selection and arrangement. That applies to databases, so there is some protection for databases in the United States. The reason I’m hesitating on the model is, my argument would be, “Well, I made a creative choice in selecting what ham and spam, I was going to use for training.” Therefore, this model as a result of that training is back to this concept of was the work done by the machine, or was the work done by the human I would say? Or it’s on the human side of the line, because I chose the spam and ham to use to train.
[00:13:33] SM: Got it.
[00:13:34] PC: That’s the argument I would make. I don’t know how successful it would be, but I make it.
[00:13:39] SM: That makes sense. Because if it is not covered by copyright, then what happens? Is that considered completely public domain?
[00:13:48] PC: It’s just not protected by copyright. It’s interesting, we’ve reached, I think, a place in our society where we have this copyright maximalism going on, where there is this belief of, if I created it, therefore, I have some exclusive rights to it. That just isn’t true. There are works that just, they’re not protected by any regime at all. You may have created it, but everybody gets to use it, because it just doesn’t for whatever reason, it’s not subject to copyright protection.
This is where I think it’s going to – it also gets really interesting and maybe counterintuitive and difficult for people to accept and maybe will change over time. The Supreme Court has been very clear that what they call the ‘sweat of the brow’ is not enough for work to be copyrighted. It doesn’t matter how hard you work on it, or how much time, money and effort you put into it, if there is not this creativity and originality, those are the hallmarks of a copyrightable work. Sweat of the brow, putting a lot of time and effort into it is not enough. That’s where I think it gets really interesting, because obviously, a lot of time and energy is spent on machine learning, on tweaking the models.
I mean, we know from experience now that the image generation, or image recognition software has a very bad database that it started the training with, and that’s causing problems. Just throw into the pot, also, this concept that sweat of the brow is not enough to make it copyrightable. No matter how hard you work on it, that doesn’t make it copyrightable.
[00:15:20] SM: Got it. No, that’s very interesting. You touched on a very important point, because in the end, if something is not copyrightable, then we may not have an easy way to understand whether it’s open-source or not.
[00:15:32] PC: If I could just jump on that. Because I think that’s an interesting point, which is, it forces a real examination of what is open-source? What are our priorities? What are we trying to achieve here? The fact that something is not protected by copyright, does that get us where we wanted to be in the first – Anyway, if we think of the open-source licenses and particularly copyleft licenses being a hack on copyright that was necessary, because maybe software shouldn’t be copyrighted at all. Then is it possible that not having copyright protection for these works actually is the best solution? It’s actually a great outcome for us. Then, you also don’t have that license as an instrument of control for purposes of good. You’ve given up control.
[00:16:19] SM: It’s a very interesting conversation, because the dichotomy that it’s been hard to explain. I remember having conversations with the early European Pirate Party members, who were completely against copyright. We had that tension between not having copyright applied to software, means that also copyleft becomes without T. Going back to the open-source for artificial intelligence, one of the things that I noticed is that some pieces in for example, in Debian, there is some conversations going on inside the Debian community about whether they need rules to decide what packages they can import into the Debian archives.
Because on one hand, it’s fairly simple to say PyTorch, TensorFlow, NumPy, the basic software pieces that implement some interesting algorithms, or neural language, neural processing, text processing, and things, and computer vision. Then, there are some models that are necessary in order for science to progress. Some of these models are not available with licenses that they can be easily interpreted. There are conversations about even the basic feed, the big data sets that go into training models.
There are conversations about whether we need a definition, or we need some help to understand what can go and can be shipped into a Debian package. Do you have any thoughts on those?
[00:17:48] PC: I have been confronted with these questions. They’re starting to pop up more. What I’m actually finding more troubling is the data sets that are used to start – actually, have people who are, I don’t care about their models, or we’re going to do our own modeling, so we don’t need those models as much as. We’re going to do our own modeling. Some of this data is going to be copyrightable content. First question is, do I have permission from the copyright owner to use that data in this way?
As an example of copyrightable content, photographs. I don’t know where all of the data is, where all of these photo sets are coming from that are used for training. The subject matter that is being used for training is subject to copyright. Is all of that data allowed to be used? Was that used with permission? Then the question does follow after that is, if not, what does that mean about the model? If I use data that I shouldn’t have used, I didn’t have permission to use to model, does that taint my model?
Let’s say, the model was put under an MIT license or something, that it’s freely available. Is that okay? Is that just because the model has been sanitized enough that I can use the model? If I don’t know the quality of the content that it was – If I don’t know the progeny of the content I was trained on? That’s where I start to get my head. I can’t get past the dataset.
[00:19:16] SM: Absolutely. Because one of the things that I learned doing some research for this series is that the European Union has a new right, has introduced a new right, the right to data mining. They have turned it on by default, which is surprising. This was in response from what I’ve read so far, and would have guests explaining this to us a little bit more. It looks like, the European Commission was convinced from researchers that some of the – think of the archive of images on Flickr that implemented the Creative Commons Licenses early on. There is a very wide array of pictures with lots of metadata and tags and with freely available licenses. Only, it wasn’t clear whether the data mining was included in there.
As a human being, as a citizen of countries, I think of myself – like my face now is up there. It can be used for nefarious uses, not just to identify a white man in a picture. There are those implications. That I think, are tied to the Open Source Initiative and the Open Source Definition somehow. Because in many aspects, even though we don’t – we try to be as an organization, we try to be neutral, we do have organizations that rely on open-source to set also a stage of technology that can be implemented without discrimination. Now, we have artificial intelligence systems that are capable of deciding whether someone gets out of jail or not.
In the past, we would have said, “You have to make that code open-source, because it’s public. It’s used by public. You have to make the code open-source, so we need to be able to inspect it as the public and we should be able also to demand and demand fixes.” Now, with AI systems, things get a little bit more nebulous.
[00:21:10] PC: I don’t know if they get more nebulous or not. One of the things when you mentioned the data mining, and correct me if I’m wrong, somewhere in the back of my head that that data mining, the permission under the EU law for data mining, though, is for non-commercial use only. The OSI is very clear that we do not discriminate against commercial uses or non-commercial uses. As you’ve just explained, there’s a reason for that, which is the line drawing gets very difficult. The good and evil question, it’s insoluble as far as I’m concerned. We have to take this position of, “We’re not making value judgments on how this stuff is used.”
Knowing in particular that there are problems with models that were created from flawed databases and how harmful that’s going to be, there certainly is a big part of me that wishes that we could say, “No, it is not consistent with our belief system, that these should be used.” We have this discrimination principles. I’ve always been very clear that OSI software can be used for evil. Maybe I’m too rigid in my thinking, but I just don’t see any way to draw a different line for models.
[00:22:22] SM: I don’t think that it’s the role of the OSI necessarily to be the judge of that. We definitely have been helping people who have been involved in policymaking and policy discussions. I’m thinking like, DFF, or other organizations like that were open to being having the software available as with an open-source licenses, some baseline to accept that filing taxes, for example, or the Free Software Foundation in Europe as this campaign has been going on for a long time. Public money, public code. If it’s funded by taxes, then any software development should be free as open-source. We want to have that conversation about what is an open-source AI, or at least some groups will want to hear that. Maybe at one stage, we’ll have to have that conversation.
[SPONSOR MESSAGE]
[00:23:16] SM: Deep Dive: AI is supported by our sponsor, DataStax. DataStax is the real-time data company. With DataStax, any enterprise can mobilize real-time data and quickly build the smart, highly-scalable applications required to become a data-driven business and unlock the full potential of AI. With AstraDB and Astra streaming, DataStax uniquely delivers the power of Apache Cassandra, the world’s most scalable database, with the advanced Apache pulsar streaming technology in an open data stack available on any cloud.
DataStax leaves the open-source cycle of innovation every day in an emerging AI everywhere future. Learn more at datastax.com.
[INTERVIEW CONTINUED]
[00:23:56] SM: You mentioned you have clients working on machine learning. What kind of issues are they running into?
[00:24:02] PC: I don’t want to share too much. One commercial situation, what I have found interesting in the commercial – for commercial clients is it is a significant one. I have a client who’s a service provider for another company doing some machine learning work for them. There is this question of ownership and who’s going to own it and reuse, and reuse of data. For example, the customer might say, “Well, if you’re using my data, here’s my dataset that I want you to evaluate, and come up with some modeling. But you can’t use this dataset for anyone else.” Because they’re trying to get a commercial edge, right? They’re trying to get a market differentiator for themselves. They think that they can do that by limiting the dataset.
It actually is very, I think very similar to commercial software development, versus an open-source software development model, where it’s your proprietary to development, and you’re going to keep doing the same thing over and over and over and over again. If you’re not going to share your work product, or you’re not going to use other people’s work product. I think that the same thing is going to happen here as okay, I’ll just retrain. I’ll do the same thing with somebody else’s dataset, which may look really, really similar to your dataset.
Now, I guess, there are implications there, too. Maybe Are we better off allowing using more data, rather than restricting data? Are we going to come up with better models if we use more data, rather than having to reinvent the wheel every time? Yeah, that I found interesting.
[00:25:21] SM: Yeah. It’s exactly the early conversations we had when free software and open-source software were spreading. Why are you reinventing the wheel? Why is everybody is working on a different kernel and different Unixes from variations and dialects? Why don’t you just collaborate and put all of your energy into one and build it faster? We’re probably going to get to that point. Are you hopeful about AI being a force for good and a way to progress faster with open collaboration, or more of on the scary front of Robocops and Skynet?
[00:25:59] PC: That’s a really great question. I don’t have an answer for it, because my level of trust would come down to who’s doing it. We’ve seen people of goodwill, understand the problems and are cautious of the problems. I remember there was one – there was a Twitter bot that turned racist in about eight hours. In almost no time, it was degrading racist slurs, and they had to take it down. That, of course, gives me great pause. I think we do see that these tools are being used prematurely, being relied on in ways that are harmful to us by the police, or by the prison system.
These tools that haven’t been adequately tested, that we think that minority reporting will exist, and we can make predictions about people about whether they’re going to commit crimes before they commit them. That part of it is terrifying. There are a lot of people who recognize that these problems exist. We’re still at early stages, and we’ll see what happens.
[00:26:57] SM: I agree with you. It all depends on who’s going to be able to guide and gain the trust. So far, I’m a little bit nervous, because from what I’ve seen, AI at the level of the tools that we have seen so far, like DALL·E and the most awe-inspiring, the ones that really make you go “Wow” require an amount of data and an amount of processing power that is really not available for the Debian developer. The kinds of software development that we used to do used to be accessible 20 years ago to create a full distribution, perfectly Unix capable machines and servers doesn’t seem to be readily available in the same way with AI systems.
I’m also hoping that what some of these conversations that we’re going to have in the next few weeks will be revealing some hope and some path forward. Because I really like to see the light around academic evolution, for example, all the research. I had very interesting reads, and I recommend it if you haven’t, the papers that have been released by the Free Software Foundation on co-pilot analysis. Some of them are extremely thoughtful and eye-opening. At least, they were for me since I’m so new to the field.
[00:28:19] PC: Something occurred to – I was thinking about yesterday and the day before, whereas, I received a document for review that was written in French. This is a legal document that was written in French, and had simply been put through a machine translator for the English translation. The fact that we have reached a point, because I remember, and this is within my memory of when at the very earliest, Babel was a website where you could – That was phenomenal. It’s phenomenal that you could actually get anything as unintelligible as it was.
We’re to a point now where we rely on the machine – the machine learning, I think, is probably always the first step in any translation at this point. Then it may be reviewed by a human to make sure that it’s cogent and understandable. Sometimes not. Sometimes in the work that I do, it’s probably close enough. That there may be some syntax problems, but I get the gist. I just think of that as an example of where we may be going with machine learning. We’re still at that very early stage right now of, “Yeah, I can get the gist of it. It’s not great” but we will be getting to a point where it’s just going to be part of the ordinary fabric of our lives as to rely on all of this machine generated content, or decision-making by machines is very interesting.
[00:29:34] SM: Like that young person was saying this morning on a forum online, “I’m a designer. I see myself going out of business soon because of DALL·E. Transcribers, people who transcribe text and translators, they’re basically are already out of business as of today. With some of the GPT3, so the text processing from OpenAI and other huge projects from them is capable of also summarizing text and writing very basic marketing copy. A lot of the low level jobs in creative jobs can be gone away. It’s a fascinating world.
[00:30:16] PC: The Associated Press uses machine-written some content. It’s simple reporting on a company’s, say in a company’s earnings, they use machine generated copy of those.
[00:30:26] SM: The LA time has bought that price, little snippets about earthquakes.
[00:30:33] PC: I mean, I still have hope that we will always be able to tell the difference, that there is a difference between what a machine will generate and what a human will generate. Maybe there is only a slight difference of that. Always have the upper hand.
[00:30:45] SM: Again, with DALL·E, some of the readings that I’ve done over the weekend, people were noticing little things as much as that would make you tell whether it’s generated, or an artist did it. But nothing that you can’t just touch up on Photoshop to fix.
[00:31:03] PC: Yeah. I think it’s also, and back to the subject matter of copyright is original and creative, is it’s sometimes referred to as the creative spark that the artist has this creative spark. By definition, a machine is not going to have a creative spark. There’s hope for us, I think.
[00:31:20] SM: Is there anything we should be talking about, you think?
[00:31:24] PC: I talked about the concept — this conflict between there is this huge amount of work being done, that it appears it may not be protected by copyright, or certainly, there are arguments that it is not protected by copyright. Yet, it’s a substantial amount. It will be a substantial value proposition to a company to have that. It doesn’t look like, where the law currently stands, I would say, they may not have exclusive rights to that.
What does that say about their business model? How are they going to make money on it? Having worked at Red Hat, a question I was asked all the time, and I still get asked. I haven’t worked there in many years. When I say I worked at Red Hat, the first question out of everybody’s mouth is, “How do they make money selling free software?” Red Hat has figured out a pretty good business model to make a fair amount of money doing it. Because Microsoft was built – early software companies were built on exclusivity of copyright. You have to pay them to get to use their copyright. That’s a business model.
Those of us who have been in open-source have been thinking creatively about business models, because that one’s not available to a company that’s – Although, there are very few purists, right? Most companies are doing a combination. They’re doing this open-core, which is a loss leader on the open-source, but then they’ll sell you a license to proprietary widgets. A true, pure open-source play is very uncommon and very difficult and challenging. It may be, when all this work is not protected by copyright, people will be scrambling for how do I monetize this? What is my business model around this thing for which I have no exclusive rights?
Those of us who have been thinking about that for decades might be able to help them out with that. Maybe they’ll come up with new models. Maybe they’ll come up with stuff that we’ve never thought of, which would be really great, too. I think that’s going to be really challenging for people. How do I monetize this?
Access is, if you don’t have copyright, the second way you do it is access. This, for example, is the museum. There’s no copyright. You can’t take pictures in our galleries. There’s no copyright. You’re not infringing the copyright in most works, other than more modern ones. What they do is, it’s a condition of permission to access to the work. It’s a condition on your entry into the museum is that you do not take photographs. That is a gatekeeping that is used in open-source business models is, we’re not going to give you the executable until you pay us, then we’ll give it to you.
I expect our reliance on some gate, access gate will be one way. Now with cloud, where you don’t have to give people a copy of the software. You just give them a portal to access it. Then that access is much easier to control.
[00:34:07] SM: Like the OpenAI model seems to me like what they’re doing, they have built this great machine and then they charge by API access, maybe. There is another area of algorithms shaping things and moving stuff. It’s been fascinating to me. It’s less tied to the open-source part, but more in the general public conversations, and that the algorithms in Twitter, or Facebook, or LinkedIn that decide what items you’re interested in. Recently, there have been some conversations again about Twitter having this pie in the sky, new project called ‘blue sky.’ They want to open-source their algorithm in a very wide sense. Do you have any opinion?
[00:34:54] PC: I think, maybe, whether by open-source, whether we mean simply visibility into what the algorithm is, versus – I mean, maybe they would be willing to let other platforms use their algorithm. That would be very interesting. Particularly when we talk about, “Do I trust Twitter to filter content correctly?” I hold a lot of skepticism. If that model were to be shared publicly amongst all of the social media platforms, so that they could each tweak it, to come out with a better model? Under open-source development theory, that would be a better model, right? Because we’re not relying on just one person’s or one entities judgment. We’re getting judgments from – consensus from a lot of people. Yeah, that is interesting. If they really, truly mean open-source process, which is probably not what it is.
[00:35:42] SM: Well, yeah. Exactly. I don’t think that there is very good definition, or very good understanding of what they want to achieve. I’ve read some of their papers and they seem to be well-intentioned. Also, thinking about distributing it so that no one entity owns access to the information, or the algorithm itself. We’ll see. We’ll see if something is worth putting an eye on.
[00:36:05] PC: It brings me back to this concept, when I was talking about, is the model of copyrightable? Is it under a license? Which is, what happens is we tend to put a license on – if we don’t know whether or not the content is protected by copyright, but we want other people to use it, we put a license on it, because that’s very clear. I just throw that out there, because there’s a downside to that, which is by putting licenses on everything, then we reduce the pool of what we’re going to assume is freely available for everyone to use. It’s in the public domain. There’s no copyright about for whatever reason.
By putting a license on something, you’re saying, I think this is copyrightable, and you need a license to do that. There is a negative consequence to doing that. There’s a reason it’s done, but there’s also this negative consequence. I’m just going to throw that out there as the same concept of algorithm as they would put a license on it, and that would make everybody happy, but we’ve just now made a public statement that says, we believe this algorithm is copyrightable, owned by one entity.
[00:37:06] SM: Thank you so much.
[00:37:08] PC: It’s such a pleasure.
[END OF INTERVIEW]
[00:37:10] SM: Thanks for listening. Thanks to our sponsor, Google. Remember to subscribe on your podcast player for more episodes. Please review and share. It helps more people find us. Visit deepdive.opensource.org, where you find more episodes, learn about these issues, and you can donate to become a member. Members are the only reason we can do this work. If you have any feedback on this episode, or on Deep Dive: AI in general, please email contact@opensource.org.
This podcast was produced by the Open Source Initiative, with the help from Nicole Martinelli. Music by Jason Shaw of audionautix.com, under Creative Commons Attribution 4.0 international license. Links in the episode notes.
[END]