How To Fix Broken (Unclosed) BBCode With PHP

Over the past week or so I’ve been trying to import and old phpBB forum into bbPress & due to the importer struggling with incomplete BBCode, I’ve had to come up with a little PHP script to sort it out.

Initially I had just wiped out ALL bracketed text from the posts, and whilst this of course worked, it resulted in the new forum looking a bit rubbish… Or “broken”, should I say.

For example, where posts were once quoted (and shown in blockquote formatting), they were now just repeated without formatting which made it look like an error.

So I had to go back to the drawing board and think of something that could solve the problem of incomplete BBCode, but still keep most of it intact where possible.

And I figured I’d share my solution here for anybody who needs to use something similar, for whatever reason.

It’s worth pointing out that I haven’t tidied these functions whatsoever… They are just quickly written (and hacky), but they do the job all the same.

Anyway without further ado, here are the functions with explanations of what they do:

Function 1 – Clean Up The BBCode

For some reason, in my database, I noticed that a lot of the BBCode wasn’t “clean”. For example, what should be just [b][/b], what often something like [b:23432432][/b].

That makes life difficult when repairing it, so the first step I decided to take was to clean it up, with this function:


function bbCodeRepair($the_text) {
// Fix URL tags (hacky function to avoid them getting erased when fixing underline tags)
$the_text = preg_replace("/\[url.*?\]+/", "!==tempurl==!", $the_text);
$the_text = preg_replace("/\[\/url.*?\]+/", "!==/tempurl==!", $the_text);
// Fix bold text tags
$the_text = preg_replace("/\[b.*?\]+/", "!==b==!", $the_text);
$the_text = preg_replace("/\[\/b.*?\]+/", "!==/b==!", $the_text);
// Fix underline text tags
$the_text = preg_replace("/\[u.*?\]+/", "!==u==!", $the_text);
$the_text = preg_replace("/\[\/u.*?\]+/", "!==/u==!", $the_text);
// Fix italic text tags
$the_text = preg_replace("/\[i.*?\]+/", "!==i==!", $the_text);
$the_text = preg_replace("/\[\/i.*?\]+/", "!==/i==!", $the_text);
// Remove font color
$the_text = preg_replace("/\[color.*?\]+/", "", $the_text);
$the_text = preg_replace("/\[\/color.*?\]+/", "", $the_text);
// Remove font size
$the_text = preg_replace("/\[size.*?\]+/", "", $the_text);
$the_text = preg_replace("/\[\/size.*?\]+/", "", $the_text);
// Fix quoted text
$the_text = preg_replace("/\[quote.*?\]+/", "!==quote==!", $the_text);
$the_text = preg_replace("/\[\/quote.*?\]+/", "!==/quote==!", $the_text);
// Fix list tags
$the_text = preg_replace("/\[list.*?\]+/", "!==list==!", $the_text);
$the_text = preg_replace("/\[\/list.*?\]+/", "!==/list==!", $the_text);
// Fix email tags
$the_text = preg_replace("/\[email.*?\]+/", "!==email==!", $the_text);
$the_text = preg_replace("/\[\/email.*?\]+/", "!==/email==!", $the_text);
// Fix image tags
//$the_text = preg_replace("/\[img.*?\]+/", "!==img==!", $the_text);
//$the_text = preg_replace("/\[\/img.*?\]+/", "!==/img==!", $the_text);
// Now remove ALL other bracketed text
$the_text = preg_replace("/\[.*?\]+/", "", $the_text);
// Now put all our temporary bbcode back into bbcode brackets
// Fix URL tags
$the_text = str_replace("!==tempurl==!", "[url]", $the_text);
$the_text = str_replace("!==/tempurl==!", "[/url]", $the_text);
// Fix bold text tags
$the_text = str_replace("!==b==!", "[b]", $the_text);
$the_text = str_replace("!==/b==!", "[/b]", $the_text);
// Fix underline text tags
$the_text = str_replace("!==u==!", "[u]", $the_text);
$the_text = str_replace("!==/u==!", "[/u]", $the_text);
// Fix italic text tags
$the_text = str_replace("!==i==!", "[i]", $the_text);
$the_text = str_replace("!==/i==!", "[/i]", $the_text);
// Fix quoted text
$the_text = str_replace("!==quote==!", "[quote]", $the_text);
$the_text = str_replace("!==/quote==!", "[/quote]", $the_text);
// Fix list tags
$the_text = str_replace("!==list==!", "[list]", $the_text);
$the_text = str_replace("!==/list==!", "[/list]", $the_text);
// Fix email tags
$the_text = str_replace("!==email==!", "[email]", $the_text);
$the_text = str_replace("!==/email==!", "[/email]", $the_text);
// Fix image tags
//$the_text = str_replace("!==img==!", "[img]", $the_text);
//$the_text = str_replace("!==/img==!", "[/img]", $the_text);
return $the_text;
}

(the “code” BB tags should be in there too, but my WordPress plugin doesn’t like me adding them as it thinks I’m closing the tags to display the snippet).

Notice that I’ve commented out the IMG tags too. You’ll see why in a moment.

Function 2 – Count The Tags & Delete Unclosed Ones

Trying to fix the tags is a mammoth task, so it’s easier just to count them & figure out if there’s an uneven number, then remove ALL of those tags from the individual post.

Here’s the function to do that:


function bbcodeIncompleteFix($the_text) {
if (((substr_count($the_text,"[url]")) + (substr_count($the_text,"[/url]"))) % 2 != 0) {
$the_text = str_replace("[url]", "", $the_text);
$the_text = str_replace("[/url]", "", $the_text);
}
if (((substr_count($the_text,"[b]")) + (substr_count($the_text,"[/b]"))) % 2 != 0) {
$the_text = str_replace("[b]", "", $the_text);
$the_text = str_replace("[/b]", "", $the_text);
}
if (((substr_count($the_text,"[u]")) + (substr_count($the_text,"[/u]"))) % 2 != 0) {
$the_text = str_replace("[u]", "", $the_text);
$the_text = str_replace("[/u]", "", $the_text);
}
if (((substr_count($the_text,"[i]")) + (substr_count($the_text,"[/i]"))) % 2 != 0) {
$the_text = str_replace("[i]", "", $the_text);
$the_text = str_replace("[/i]", "", $the_text);
}
if (((substr_count($the_text,"[quote]")) + (substr_count($the_text,"[/quote]"))) % 2 != 0) {
$the_text = str_replace("[quote]", "", $the_text);
$the_text = str_replace("[/quote]", "", $the_text);
}
if (((substr_count($the_text,"[list]")) + (substr_count($the_text,"[/list]"))) % 2 != 0) {
$the_text = str_replace("[list]", "", $the_text);
$the_text = str_replace("[/list]", "", $the_text);
}
if (((substr_count($the_text,"[email]")) + (substr_count($the_text,"[/email]"))) % 2 != 0) {
$the_text = str_replace("[email]", "", $the_text);
$the_text = str_replace("[/email]", "", $the_text);
}
return $the_text;
}

Function 3 – Fix The IMG Tags

In my opinion, images are one of the most crucial components of the posts, so losing them is the last thing you want to do. Therefore, rather than strip out any IMG tags, I decided to remove them completely & build a separate function that detects image URLs and adds them back in properly.

So here’s the last function, which does exactly that:


function codeimageURLs($the_text) {
preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $the_text, $links);
foreach (array_unique($links[0]) as $link) {
$ext = strtolower(pathinfo(parse_url($link, PHP_URL_PATH), PATHINFO_EXTENSION));
if (($ext == "jpg") || ($ext == "jpeg") || ($ext == "gif") || ($ext == "png")) {
$the_text = str_replace($link, "[img]".$link."[/img]", $the_text);
}
}
return $the_text;
}

So if you happen to find yourself in a similar situation & want to fix BBCode, then hopefully my snippets above help you out. Just take note that they should be run in the order that I’ve listed them.

Leave a Comment