Hi! I’m trying to learn Rust, as a little project, I’m trying to build a web scraper that will scrape some content and rebuild it with a static site generator, or using it for making POST requests.

I’m still at a very early stage and I still don’t know much, the simplest error handling strategy I know is using match with Result.

To my eyes, this syntax looks correct, but also looks kind of a lot of lines for a simple http request.

I know the reqwest docs suggest to handle errors with the ? operator, which I don’t know yet, therefore I’m just using what I know now.

fn get_document(permalink: String) -> Html {
        let html_content_result = reqwest::blocking::get(&permalink);
        let html_content = match html_content_result {
            Ok(response) => response,
            Err(error) => panic!("There was an error making the request: {:?}", error),
        };

        let html_content_text_result = html_content.text();
        let html_content_text = match html_content_text_result {
            Ok(text) => text,
            Err(error) =>
                panic!(
                    "There was an error getting the html text from the content of response: :{:?}",
                    error
                ),
        };

        let document = Html::parse_document(&html_content_text);

        document
    }

As for my understanding, this is what I’m doing here: I’m making an http request, if i get a Response, I try to get the text out of the response body, otherwise I handle the error by panicking with a custom message. Getting the text out of the request body is another passage that requires error handling, therefore I use the match expression again to get the text out and handle the possible error (In what circumstances can extracting the text of a response body fail?).

Then I can finally parse the document and return it!

I wonder if it is a correct and understandable way of doing what I’ve in mind.

Do you think this would be a suitable project for someone who is at chapter 7 of the Rust book? I feel like i actually need to build somethiong before keep going with the theory!

  • AernaLingus [any]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    21 days ago

    I’ll preface this by saying I’m working my way through the Rust book, too–just a bit further along–so don’t take my word as gospel.

    This exact scenario is what the ? operator was designed for: returning early with the Err if one is received[1], otherwise unpacking the Ok. As you’ve discovered, it’s a common pattern, so using the ? operator greatly cuts down on the boilerplate code. If you wanted to do the equivalent of you have here (panicking instead of returning the Err for it to potentially be handled in calling code, albeit without your custom panic messages[2]) you could achieve this with unwrap() instead of ?:

    let html_content_text = reqwest::blocking::get(&permalink).unwrap().text().unwrap();
    

    Both of these will be covered in chapter 9.

    If you want to avoid those constructs until later, the only thing I’d say is that some of the intermediate variables seem unnecessary since you can match on the function call directly:

    fn get_document(permalink: String) -> Html {
            let html_content = match reqwest::blocking::get(&permalink) {
                Ok(response) => response,
                Err(error) => panic!("There was an error making the request: {:?}", error),
            };
    
            let html_content_text = match html_content.text() {
                Ok(text) => text,
                Err(error) =>
                    panic!(
                        "There was an error getting the html text from the content of response: :{:?}",
                        error
                    ),
            };
    
            let document = Html::parse_document(&html_content_text);
    
            document
        }
    

    You could also eliminate the final let statement and just stick the parse_document call at the end, but that’s a matter of preference–I know having an intermediate variable before a return can sometimes make debugging easier.

    As for whether you should build something now or wait till you learn more–go with your gut! The most important thing is that you stay actively engaged with the material, and many people find diving into projects as soon as possible helps them learn and stay motivated. You could also use rustlings and/or Rust by Example as you go through the book, which is what I’ve been doing (specifically rustlings). It’s not as stimulating as writing a project from scratch, but it does let you write some relevant code. And if you’re not already, I highly recommend using the Brown version of the Rust Book which includes interactive quizzes sprinkled throughout. I’ve found them particularly helpful for understanding the quirks of the borrow checker, which is a topic it continues to revist throughout the book.


    1. There’s also some type coercion, but that’s beyond the scope of your question ↩︎

    2. edit: you can use expect to get the custom messages as covered in another comment–not sure how I forgot that ↩︎

  • semi [he/him]@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    21 days ago

    This will work in general. One point of improvement: right now, if the request fails, the panic will cause your whole program to crash. You could change your function to return a Result<Html, SomeErrorType> instead, and handle errors more gracefully in the place where your function is called (e.g. ignoring pages that returned an error and continuing with the rest).

    Look into anyhow for an easy to use error handling crate, allowing you to return an anyhow::Result<Html>

    • explore_broaden@midwest.social
      link
      fedilink
      arrow-up
      1
      ·
      21 days ago

      They did say they haven’t learned the ? operator (that’s chapter 9 of the rust book), so this approach might be better for once they get there.

  • nous@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    21 days ago

    For someone only on chapter 7, this is ok. I would not call it idiomatic but you have not gotten to the Error Handling in chapter 9 yet. I would probably hold on getting feedback on error handling until you have gotten to that point.

    But the TLDR of it is rust has two forms of errors, unrecoverable errors in the form of panic and recoverable ones in the form of returning a Result. In this case you have opted for panicking which IMO is the wrong choice for something that is expected to fail - and http requests and parsing external data is expected to fail (even if only some of the time). Networks fail all the time, servers go down, send back wrong responses and many other things.

    Do you really want to crash your program every time that happens? Probably not - at least not at this level. Instead you likely want to return an error from this function and let the caller deal with it instead as they will likely have more context as to what to do with it rather than in the leaf functions of where the error originates.


    But all that is probably for once you have read through chapter 9. For now it is good to know that when you have the pattern

    match foo {
        Ok(value) => value,
        Err(err) => panic!("it broke! {}", err),
    }
    

    You can generally replace that with a call to expect instead:

    foo.expect("it broke")
    

    Or just unwrap it if you dont need to add more context for what ever reason.