Protecting Against Cross-Site-Scripting with Output Filtering and Sanitization

Cross-Site Scripting is a big problem. Not sure what Cross-Site-Scripting (XSS) is? You can get some background here

The crux of fighting XSS attacks? In general, don't trust input that comes from your visitors.

Form Input

If you ever take form input that comes from a visitor and echo it back to them on a website, you're potentially vulnerable. Let's take a look at an example. Here's a search field:

<form method="get" action="<?=$view->action('search')?>">
    <input type="text" name="query" />
    <button type="submit">Search</button>
</form>

Then here's the relevant single page controller method:

public function search()
{
    // Actual search logic snipped
    // ...
    // Send the search query back into the results section so we can print out what
    // the user search for
    $this->set("query", $this->request->query('query'));
}

And finally, here's the search results section of the view:

<h1>Search Results</h1>
You searched for: "<?=$query?>"

The $query variable is coming from the search() controller method. This code looks fine, and will certainly run fine. However, this code is vulnerable to XSS. It's blindly trusting that the query variable is free of malicious intent. If a user enters JavaScript into this variable, it will be output by the site. Once you can force arbitrary JavaScript to run from a host website, you'll be able to do all sorts of problematic things to that host website, including exploit forms that aren't properly hardened against CSRF attacks. Best to make sure that your site is completely free of these problems.

Fortunately, this is easy to do. You could just use the built-in htmlspecialchars method that PHP provides, but Concrete CMS has an even easier way of doing this: the h() function! Just change this:

$this->set("query", $this->request->query('query'));

To this:

$this->set("query", h($this->request->query('query')));

Any potentially malicious HTML/JavaScript contained within the request will be encoded so it can't cause any damage when it's output in the display.

Attributes

Form input isn't the only input that gets displayed to site visitors that comes from untrusted sources. If you're running a communty website, you may have user input that is displayed on user profiles or member pages. This input needs to be sanitized as well! In general, we usually don't sanitize this input as it comes into Concrete; instead, we save it as it comes in, and make sure that we sanitize the output display of attributes that are known to come from untrusted sources. The Concrete attributes system makes this easy to do:

Displaying an Attribute

This code will display an attribute that comes from a user:

$ui = \UserInfo::getByName('andrew');
print $ui->getAttribute('bio');

However, who knows what this attribute might contain! If you're displaying untrusted attributes from untrusted users, it's best to run the attribute display through the sanitization functions:

print $ui->getAttribute('bio', 'displaySanitized');

How does this work? If an optional second parameter is passed to the getAttribute() method, the type of attribute will attempt to call getDISPLAYMETHODValue() first, before calling getDisplayValue and eventually getValue(). So in this example, our rich_text attribute will first call

getDisplaySanitizedValue()

and then

getDisplayValue()

This is great, because the Rich Text attribute actually contains a special getDisplaySanitizedValue method that runs the entire content of the attribute through an HTML sanitizer, if the attribute supports HTML. This way you can have complex attributes – some of which may even support a limited subset of HTML – while still allowing them to be sanitized on output.

The moral of the story? If you're displaying user attributes on profile pages, display them with 'displaySanitized' as your second parameter.