"Rebuilding" some content in a file

I need to reformat those types of links in wiki-markdown: [[content]], [[content|rename]] to those:
[content](content), [rename](content). Which way is more rational: writing a parser for those links or just replaces part via regex? Any samples of implementation in an answer would be good.

Wiki is a quite complicated (I should say convoluted) format which certainly isn't regular (in the technical sense, i.e. it's not Chomsky-3). Hence a regular expression is already theoretically unsuited for this job.

The literal first Google hit for "rust wiki parser" for me is this crate. It seems to be the most comprehensive attempt at correctly parsing wikimedia markup – certainly worth a shot instead of trying to roll your own.

2 Likes

Though regexes aren't powerful enough to do a fully-correct job, they may be good enough for your usecase: I suspect that there are very few lines in your input that have [[ and ]] in them that don't denote links, and similarly that there are unlikely to be nested links (if that's even legal). Given that, a regex solution might be the cheaper solution in the short run, at the expense of a few undetected mistranscriptions.

If this is just one transformation of many, though, or if you need higher-precision output, the regex solution will quickly become a liability. Whether that technical debt is worth it depends on external factors like your available development time, the expected longevity of this code, and how accurate you really need the output to be.

1 Like

I think you've might misunderstood me..? I ned regex only for two cases and i don't think that would be unsuitable.

I did not misunderstand you – you want to rewrite some wiki-formatted markup to Markdown. That will involve parsing the original text and emitting the markdown. Contrary to what you might believe, it is not a trivial text replacement problem, because both formats have internal structure. In some of the simplest cases, it might seem to work, but as soon as you have something more complicated, the naïve approach will fail silently.

1 Like

If your input is normal markdown, a high quality parser library already exists.

There is a very famous answer on StackOverflow where the author asks something very similar.

The crux of the matter is that a regular expression just isn't powerful enough to properly do what you want.

Sure, it may work in the one or two simple cases you write tests for, but the moment you leave that happy path (and users are well known for generating unexpected input), your regular expression will start silently missing cases and leave you with a broken wiki.

I've had to do something similar in the past, and it's much easier to do it properly once than to do it poorly once and have to rewrite your code to do it properly anyway.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.