HTML parsing return partial result

Hi,
I tried to parse all kanjis from an HTML page.
I tried this, but it seems that the program stops at the first iteration of the body for_each.
How can I save the result in a vector and return it ?

extern crate futures;
extern crate hyper;
extern crate tokio_core;

use std::str;
use futures::{Future, Stream};
use hyper::Client;
use tokio_core::reactor::Core;

fn main() {
    let mut core = Core::new().unwrap();
    let client = Client::new(&core.handle());
    
    let uri = "http://www3.nhk.or.jp/news/easy/k10011158211000/k10011158211000.html".parse().unwrap();
    let mut buf = Vec::new();

    let work = client.get(uri).and_then(|res| {
        let mut kanjis = Vec::new();
        res.body().for_each(|chunk| {
            buf.extend(chunk.to_vec());
            let text = match std::str::from_utf8(&buf) {
                Ok(r) => r,
                Err(_) => {""}
            };
            println!("{}", text);
            for c in text.chars() {
                let val = c as u32;
                if val > 0x4e00 && val < 0x9faf { 
                    kanjis.push(c);
                }
            };
            Ok(())
        }).poll();
        Ok(kanjis)
    });
    match core.run(work) {
        Ok(vec) => println!("{:?}", vec),
        Err(_) => println!(""),
    }
}

I would keep the Vec outside the futures chain. Something like this:

let mut buf = Vec::new();
let mut kanjis = vec![];
// Put the work in its own scope so that its mutable borrow of `kanjis` ends before we print it at the end
{
        let work = client.get(uri).and_then(|res| {
            res.body().for_each(|chunk| {
                buf.extend(chunk.to_vec());
                let text = match std::str::from_utf8(&buf) {
                    Ok(r) => r,
                    Err(_) => {""}
                };
                println!("{}", text);
                for c in text.chars() {
                    let val = c as u32;
                    if val > 0x4e00 && val < 0x9faf {
                        kanjis.push(c);
                    }
                };
                futures::future::ok(())
            })
        });
        match core.run(work) {
            Ok(_) => println!("done"),
            Err(_) => println!(""),
        }
    }

    println!("result = {:?}", kanjis);

When you're using futures (as opposed to implementing one), you shouldn't find yourself calling poll(). In this case, you polled the stream once, and it may give you no items (if none available) or may give you one item. But, any subsequent items will not be returned because the ForEach future is dropped after that point (i.e. not returned back to the event loop by being returned from and_then).

1 Like