If I were to create my own instance federated with all the other instances, as of today, how much data would I be storing, since I would make a copy of all the content?

I know this will vary a lot, but I’m looking for a ballpark figure to have an idea. I don’t think it would be a lot, but I can’t find an estimate anywhere.

Reposted from https://lemmy.world/post/55030 as I think this community is probably a better fit

  • poVoq@slrpnk.net
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    1 year ago

    Our smaller instance that has been federating for a bit more than a year now (started in March 2022) is now at 2.4gb for the database and 7gb for the image storage (which probably needs some clean-up from previous image spam waves).

    • utzer [Friendica]@social.yl.ms
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      @poVoq @ndr this will surely grow a lot, if you look at other activity pub compatible systems you’ll see a huge grow, it depends on the retention of “old” post and media, if you say just store all for a year you might keep it smaller, but if you want to replace #Reddit or so it would be better to keep stuff a bit longer, but then on the other hand the #Fediverse is probable not meant to store stuff for long term.

      On my Friendica node I have a rather short period to store foreign posts and media, and my storage is only about 47 GB, most of the media is stored in the database as well (easier and faster to backup, much slower to retrieve) and it is a single user instance with just a handful of bots besides the account I write from.

      • Petri@sh.itjust.works
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        Keeping more of the history is probably a good thing if we want to replace Reddit. Think of all the homelab/server posts you’ve used that are over a year old. Good info can last a while.

        • utzer [Friendica]@social.yl.ms
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          @sedawk probably… maybe threads can be closed and then some data can be removed.

          This is a culprit of the Fediverse players, they’re unlikely to keep all data forever (as some services do, at least until they’re gone for good like Google Waves). Storage costs money, the need will grow forever, maybe some more cost effective storage can be used for old data/posts/threads/media, just like internet archive does, they don’t used fast storage, so it takes seconds to load old website versions. But that also seems like a big leap for amateur technology enthusiasts (like most admins of Fediverse systems are).

  • hal@lemmy.one
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    Im gonna wait before I host a public instance. This is the first experience with an actual bigger user base for most instances.

    I would suggest that people don’t just willy nilly start an instance without long term planing for database growth and media storage.

    • Drew Got No Clue@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      Oh yeah, I’m still learning and planning when/if I do it.

      OTOH, we shouldn’t scare people off doing it either LOL

      • hal@lemmy.one
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        1 year ago

        You are right. But giving users a bad experience is also not good for future reputation. Users are fast with giving something they had a bad experience with, a negative stigma.

    • JustinA
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Especially right now when users and communities dont have the ability to migrate between instances. If an instance dies, it takes them all with it.

      People should only create public instances if they’re 100% sure that they’re going to keep running it and paying for it.

  • cablepick@lemmy.cablepick.net
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 year ago

    Do you plan on allowing other users? If so every image they upload anywhere will be hosted from your instance. You would need a long term plan for continued storage if you do.

    If it’s only yourself then not much. You would need space for your uploaded images and the database. Worst case you have to purge communities to free up DB space and re add them. It only tracks communities from the moment you add them. It doesn’t pull the entire history, and associated db size, into your instance.

    I can pull the real numbers when I’m not on my phone but my database and maybe 150mb now. You can see all the communities I follow here so get a relative idea: https://lemmy.cablepick.net/communities My instance has been up for 4 days. I should start tracking db size growth to give others an idea of what to expect.

    • Drew Got No Clue@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      If I follow through with this, it won’t be very soon. I’m just gathering information to see whether it’s worth it.

      Anyway, I was thinking of keeping it only for me and a few more users, maybe (definitely not open for everyone).

    • JustinA
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      One thing to note is that pictrs supports hosting your images on in object storage (e.g. AWS S3 or Backblaze B2). That can save a ton of money on storage costs. 1TB of images on B2 only costs $5/month.

    • animist@lemmy.one
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Can one set up their instance to delete posts amd associated media after x number of days?

      • cablepick@lemmy.cablepick.net
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        You would have to run raw database commands to clear followed communities’ posts. It would be possible with some experimenting. The only local media associated with non-local communities is generated thumbnails so those would have to identified and purged from pict-rs.

  • Rick@thesimplecorner.org
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 year ago

    EDIT:

    • A whopping 29MB of database
    • Container Vols 695MB.

    looked at my instance last night that I’m only subbing to other communities. It’s been running 2.5 days at that point. My VPS with Ubuntu 22.04 is at 5gb total. Next time I ssh llI look at the database size. I think I can confirm pictures are still loading from source instance (lemmy.ml went down for a brief second last night and all my images with them stopped loading).

  • Knighthawk 0811@lemmy.one
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    it’s my understanding that you won’t make a full copy of everything. you’ll only copy the communities that are added and you won’t be copying the full history.

    I’m unsure how far back it goes but it might be just the one day. i saw talk of changing it to go back as far as someone actually scrolled and pressed next but that was just an idea at this point.

        • pimeys@lemmy.nauk.io
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          I actually kind of like how Akkoma does this. It creates a constant size proxy with nginx, and all the images come from predetermined host instead of all over the net. It’s a good mixture of not using tens of gigabytes of space and still spreading the load a bit between the instances.

          • utzer [Friendica]@social.yl.ms
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            @pimeys @knighthawk0811 yes, that sounds more privacy friendly, #Friendica also has a proxy function, it is also important that the proxy is just accessible for logged in users… so it get’s more complicated. Otherwise bad people will posts media some place and link it in scam pages/emails and such.