I guess this code isn't there for review, but it's not an explicit help question either. Any suggestions or comments would be cool.
Background
The mdbook has a template for creating preprocessors over here.
The goal is to fetch remote raw-markdown files from anywhere, and place them in a book. There already is a crate for this already, but this is just for learning.
For now it only works for raw-markdown text. [1]
Snippet
Preprocessors receive the mdbook's chapters, one by one as a String, and can transform it. My current idea is:
/// Replaces the URL to markdown-content by the content itself.
/// Could be used for other formats eventually.
fn urls_to_content(content: &str) -> String {
regex_replace_all!(
r"\s(\{\{\s*#remote\s+([^\s}{]{5,200}.md)\s*\}\})",
content,
|_, _whole, url| {
let body = reqwest::blocking::get(url).unwrap().text().unwrap();
body
}
)
.to_string()
}
The way it works, with a bit more context, is:
- The user adds
{{ #remote <BASE_URL>/path/to/file.md}}inside their markdown book (see snippet above) - The remote markdown content is placed in the book where the placeholder above was.
- Should be saved (to be done) so it does not download the same files every time.
- Should create a
Clientto share between allGETrequests, but I've simply written the idea at the moment.
Something I couldn't figure out is how to re-use the regex (i.e r"\s(...)") in the different places needed, rather than copy it.
The longer snippet is
use lazy_regex::regex_replace_all;
use mdbook_preprocessor::{
Preprocessor, PreprocessorContext,
book::{Book, Chapter},
errors::Result,
};
/// Preprocessor that fetches remote markdown files
pub struct Fetch;
impl Fetch {
pub fn new() -> Fetch {
Fetch
}
}
impl Preprocessor for Fetch {
fn name(&self) -> &str {
"fetch"
}
/// Modify chapters replacing `{{#remote URLs}}` by the .md content.
fn run(
&self,
ctx: &PreprocessorContext,
mut book: Book,
) -> Result<Book> {
// book.toml option for this preprocessor.
let option = "preprocessor.fetch.disable";
match ctx.config.get::<bool>(option) {
// Ok(None) is field unset.
Ok(None) | Ok(Some(false)) => {
book.for_each_chapter_mut(include_markdown);
Ok(book)
}
Ok(Some(true)) => Ok(book),
Err(err) => Err(err.into()),
}
}
/// Run when rendering to HTML,
/// But operate on markdown files.
fn supports_renderer(&self, renderer: &str) -> Result<bool> {
Ok(renderer == "html")
}
}
/// Write markdown to book.
/// This function is separated so we test the replce
fn include_markdown(chapter: &mut Chapter) {
chapter.content = urls_to_content(&chapter.content)
}
/// Replaces the URL to markdown-content by the content itself.
/// Could be used for other formats eventually.
fn urls_to_content(content: &str) -> String {
regex_replace_all!(
r"\s(\{\{\s*#remote\s+([^\s}{]{5,200}.md)\s*\}\})",
content,
|_, _whole, url| {
let body = reqwest::blocking::get(url).unwrap().text().unwrap();
body
}
)
.to_string()
}
#[cfg(test)]
mod test {
use lazy_regex::{regex, regex::Match};
use super::*;
#[test]
fn test_regex() {
let input_str: &str = r#"some text and even more but now
// Should fail: blank in `// a.`
{{ #remote https:// abc.def.g/mypath/to.md }}
// Should pass
{{ #remote https://abc.def.g/mypath/to.md }}
// Should pass
{{#remote https://abc.def.ga.b.c/mypath/to.md}}
// Should pass: `http` is accepted
{{ #remote http://this.is.insecure/fails/to.md }}
// Should pass:
{{#remote https://github.com/rvben/rumdl/blob/main/docs/markdownlint-comparison.md}}
//"#;
fn find_markdown_urls(str_file: &str) -> Vec<&str> {
// I did not find out a way to use the same regex
// since `regex!` and `regex_replace_all!` need a
// literal. And using `static reg=..` was too hard.
let found: Vec<&str> =
regex!(r"\s(\{\{\s*#remote\s+([^\s}{]{5,200})\s*\}\})")
.find_iter(str_file)
.map(|m: Match| m.as_str())
.collect();
found
}
let result = find_markdown_urls(input_str);
assert_eq!(result.len(), 4)
}
#[test]
fn test_url_replacement() {
let content = r"safgdsafgdsaf
hello world
{{#remote https://raw.githubusercontent.com/rust-lang/mdBook/7b29f8a7174fa4b7b31536b84ee62e50a786658b/README.md}}
";
let new_doc = urls_to_content(&content);
assert!(new_doc.starts_with("safgd"));
assert!(
new_doc
.contains("mdBook is a utility to create modern online books from Markdown files.")
)
}
}