Source: https://front-end.social/@fox/110846484782705013

Text in the screenshot from Grammarly says:

We develop data sets to train our algorithms so that we can improve the services we provide to customers like you. We have devoted significant time and resources to developing methods to ensure that these data sets are anonymized and de-identified.

To develop these data sets, we sample snippets of text at random, disassociate them from a user’s account, and then use a variety of different methods to strip the text of identifying information (such as identifiers, contact details, addresses, etc.). Only then do we use the snippets to train our algorithms-and the original text is deleted. In other words, we don’t store any text in a manner that can be associated with your account or used to identify you or anyone else.

We currently offer a feature that permits customers to opt out of this use for Grammarly Business teams of 500 users or more. Please let me know if you might be interested in a license of this size, and I’II forward your request to the corresponding team.

  • Jaded@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    3
    arrow-down
    3
    ·
    edit-2
    1 year ago

    Models need vast amounts of data. Paying individual users isnt feasible, and like you said most of it can be scraped.

    The only way I see this working is if scraped content is a no go and then you pay the website, publishing house, record company, etc which kills any open source solution and doesn’t really help any of the users or creators that much. It also paves the way for certain companies owning a lot of our economy as we move towards an AI driven society.

    It’s definitely a hot mess but the way I see it, the more restrictive we are with it, the more gross monopolies we create for no real gains.

    • harmonea@kbin.social
      link
      fedilink
      arrow-up
      12
      arrow-down
      1
      ·
      1 year ago

      Paying individual users isnt feasible

      Sounds like their problem to solve, not mine.

    • kibiz0r@midwest.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      I don’t see why those are the only two options.

      We could update GPL, CC, etc. licensing so that it specifies whether the author intends to allow their work to be used for LLM training. And you could still put a non-commercial or share-alike constraint on it.

      Hooray, open source is saved while greedy grubby hands are thwarted.