Skip to main content

Strip HTML from a String in Javascript

If the HTML is well-formed,

let cleanText = strInputCode.replace(/<\/?[^>]+(>|$)/g, "")

The explanation as per this posting is as follows:

This regex looks for <, an optional slash /, one or more characters that are not >, then either > or $ (the end of the line)

Examples:

'<div>Hello</div>' ==> 'Hello'
^^^^^ ^^^^^^
'Unterminated Tag <b' ==> 'Unterminated Tag '
^^

But it is not bulletproof:

'If you are < 13 you cannot register' ==> 'If you are '
^^^^^^^^^^^^^^^^^^^^^^^^
'<div data="score > 42">Hello</div>' ==> ' 42">Hello'
^^^^^^^^^^^^^^^^^^ ^^^^^^

Another method with caveats, see this posting:

var html = '<p>Some <em>HTML</em></p>';
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || '';