‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • kibiz0r@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    8
    ·
    6 months ago

    I’m dumbfounded that any Lemmy user supports OpenAI in this.

    We’re mostly refugees from Reddit, right?

    Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content’s real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn’t a huge problem, because that’s the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.

    But as Reddit started to dominate Google search results, it displaced results that might have linked to the “real home” of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.

    At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you’ll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.

    This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

    There are genuine problems with copyright law. Don’t get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don’t even own the copyright to the stuff they make. It was invented to protect creators, but in practice this “protection” gets assigned to a publisher immediately after the protected work comes into being.

    And then that copyright – the very same thing that was intended to protect creators – is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can’t speak out of turn. Fans can’t remix their favorite content and share it back to the community.

    This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can’t pay for admission. Creators are underpaid, and their creative ambitions are redirected to what’s popular. We end up with an auto-tuned culture – insular, uncritical, and predictable. Creativity reduced to a product.

    But.

    If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won’t solve it by giving OpenAI a free pass to do the exact same thing at massive scale.

    • flamingarms@feddit.uk
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      6 months ago

      And yet, I believe LLMs are a natural evolutionary product of NLP and a powerful tool that is a necessary step forward for humanity. It is already capable of exceptionally quickly scaffolding out basic tasks. In it, I see the assumptions that all human knowledge is for all humans, rudimentary tasks are worth automating, and a truly creative idea is often seeded by information that already exists and thus creativity can be sparked by something that has access to all information.

      I am not sure what we are defending by not developing them. Is it a capitalism issue of defending people’s money so they can survive? Then that’s a capitalism problem. Is it that we don’t want to get exactly plagiarized by AI? That’s certainly something companies are and need to continue taking into account. But researchers repeat research and come to the same conclusions all the time, so we’re clearly comfortable with sharing ideas. Even in the Writer’s Guild strikes in the States, both sides agreed that AI is helpful in script-writing, they just didn’t want production companies to use it as leverage to pay them less or not give them credit for their part in the production.