A tiny mouse, a hacker.

  • 0 Posts
  • 36 Comments
Joined 9 months ago
cake
Cake day: December 24th, 2023

help-circle


  • It’s not. It just doesn’t get enough hits for that 86k to matter. Fun fact: most AI crawlers hit /robots.txt first, they get served a bee movie script, fail to interpret it, and leave, without crawling further. If I’d let them crawl the entire site, that’d result in about two megabytes of traffic. By serving a 86kb file that doesn’t pass as robots.txt and has no links, I actually save bandwidth. Not on a single request, but by preventing a hundred others.



  • That would result in those fediverse servers theoretically requesting 333333 * 114MB = ~38Gigabyte/s.

    On the other hand, if the site linked would not serve garbage, and would fit like 1Mb like a normal site, then this would be only ~325mb/s, and while that’s still high, it’s not the end of the world. If it’s a site that actually puts effort into being optimized, and a request fits in ~300kb (still a lot, in my book, for what is essentially a preview, with only tiny parts of the actual content loaded), then we’re looking at 95mb/s.

    If said site puts effort into making their previews reasonable, and serve ~30kb, then that’s 9mb/s. It’s 3190 in the Year of Our Lady Discord. A potato can serve that.


  • I only serve bloat to AI crawlers.

    map $http_user_agent $badagent {
      default     0;
      # list of AI crawler user agents in "~crawler 1" format
    }
    
    if ($badagent) {
       rewrite ^ /gpt;
    }
    
    location /gpt {
      proxy_pass https://courses.cs.washington.edu/courses/cse163/20wi/files/lectures/L04/bee-movie.txt;
    }
    

    …is a wonderful thing to put in my nginx config. (you can try curl -Is -H "User-Agent: GPTBot" https://chronicles.mad-scientist.club/robots.txt | grep content-length: to see it in action ;))




  • A lot of people do. Especially on GitHub, where you can just browse a random repository, find a file you want to change, hit the edit button, and edit it right there in the browser (it does the forking for you behind the scenes). For people unfamiliar with git, that’s huge.

    It’s also a great boon when you don’t want to clone the repo locally! For example, when I’m on a slow, metered connection, I have no desire to spend 10+ minutes (and half of my data cap) for a repo to clone, just so I can fix a typo. With the web editor, I can accomplish the same thing with very little network traffic, in about 1 minute.

    While normally I prefer the comfort of my Emacs, there are situations where a workflow that happens entirely in the browser is simply more practical.



  • I was in similar shoes (my server is running Debian, as it has been for the past two decades), and am going to rebuild it on something else. I chose NixOS, which I recently switched to on my desktop, because it lets me configure the entire system declaratively, even the containers. The major advantage of a declarative configuration is that it will never be out of date.

    My main reason for switching is that I’ve been running the server for a good few years, initially maintained via ansible, but that quickly turned into a hellish bash-in-yaml soup that never quite worked right. So I just made changes directly. And then I forgot why I made a change, or had the same thing copy & pasted all over the place. Today, it’s a colossal mess. With NixOS, I can’t make such a mess, because the entire system is declared in one single place, my configuration.

    Like you, I also planned to use containers for most everything, but… I eventually decided not to. There’s basically two things that I will run in a container: Wallabag (because it’s not so well integrated into NixOS at the moment), and my Mastodon instance (which runs glitch-soc, which is considerably easier to deploy via the official containers). The rest will run natively. I’ll be hardening them via systemd’s built-in stuff, which will give me comparable isolation without the overhead of containers. Running things natively helps a lot with declarative configuration too, a nice bonus.

    For reference, you can find my (work in progress) server configuration here. It might feel a bit overwhelming at first, because it’s written in a literate programming style using org mode & org roam. I found this structure to work great for me, because my configuration is thoroughly documented, both the whys and hows and whats.




  • Or one could buy any of the existing pre-built splits. Which might be more expensive, but it does not involve something one very explicitly said they don’t want to do.

    I’d rather spend twice as much on a well built keyboard with warranty than trying to solder one together myself and botch it up, and then it suddenly costs more than if I just bought a pre-built one.



  • There’s a very easy solution that lets you rest easy that your instance is how you want it to be: don’t do open registration. Vet the people you invite, and job done. If you want to be even safer, don’t post publicly - followers only. If you require follower approval, you can do some basic checks to see that whoever sends a follow request is someone you’re okay interacting with. This works on the microblogging side of the Fediverse quite well, today.

    What I’m trying to say is that with registrations requiring admin approval gets you 99% of the way there, without needing anything more complex than that.





  • Fair bias notice: I am a Forgejo contributor.

    I switched from Gitea to Forgejo when Forgejo was announced, and it was as simple as changing the binary/docker image. It remains that simple today, and will remain that simple for the foreseeable future, because Forgejo cherry picks most of the changes in Gitea on a weekly basis. Until the codebases diverge, that will remain the case, and Forgejo will remain a drop-in replacement until such time comes that we decide not to pick a feature or change. If you’re not reliant on said feature, it’s still a drop-in replacement. (So far, we have a few things that are implemented differently in Forgejo, but still in a compatible way).

    Let me offer a few reasons to switch:

    • Forgejo - as of today, and for the foreseeable future - includes everything in Gitea, but with more tests, and more features on top. A few features Forgejo has that Gitea does not:
      • Forgejo makes it possible to have any signed in user edit Wikis (like GitHub), Gitea restricts it to collaborators only. (Forgejo defaults to that too, but the default can be changed). Mind you, this is not in a Forgejo release yet, it will be coming in the next release probably in April.
      • Gitea has support for showing an Action status badge. Forgejo has badges for action statuses, stars, forks, issues, pull requests.
      • …there are numerous other features being developed for Forgejo that will not make it into Gitea unless they cherry pick it (they don’t do that), or reimplement it (wasting a lot of time, and potentially introducing bugs).
    • Forgejo puts a lot of effort into testing. Every feature developed for Forgejo needs to have a reasonable amount of tests. Most of the things we cherry pick for Gitea, we write tests for if they don’t have any (we write plenty of tests for stuff originating from Gitea).
    • Forgejo is developed in the open, using free tools: we use Forgejo to host the code, issues and releases, Forgejo Actions for CI, and Weblate for translations. Gitea uses GitHub to host the code, issues and releases, uses GitHub CI, and CrowdIn for translations (all of them proprietary platforms).
    • Forgejo accepts contributions without requiring copyright assignment, Gitea does not.
    • Forgejo routinely cherry picks from Gitea, Gitea does not cherry pick from Forgejo (they do tend to reimplement things we’ve done, though, a huge waste of time if you ask me).
    • Forgejo isn’t going anywhere anytime soon, see the sustainability repo. There are people committed to working on it, there are people paid to work on it, and there’s a fairly healthy community around it already.