What is XSS

XSS is when an attacker tries to add his own javascript code through the forms on the site (feedback, checkout, etc.), which will then be executed in the browser of the admin / site manager or other users and will do things.

How it works

Let’s take a simple feedback form. 2 fields to fill in and a submit button.

<form method = "POST">
    <p> Title: <input name = "title"> </p>
    <p> Text: <textarea name = "content"> </textarea> </p>
    <p> <button name = "form"> Submit! </button> </p>
</form>

Add a PHP handler to this form that simply prints the title and text to the screen:

<? php
if (isset ($ _ POST ['form'])) {
    echo 'Title:', $ _POST ['title'], '<br>';
    echo 'Text:', $ _POST ['content'];
}
?>
<form method = "POST">
    <p> Title: <input name = "title"> </p>
    <p> Text: <textarea name = "content"> </textarea> </p>
    <p> <button name = "form"> Submit! </button> </p>
</form>

Perfectly. The code works, the entered text is displayed above the form.

Now, instead of the text, we will enter some javascript code, for example <script> alert (‘hello’) </script> .

When the form is submitted, this code will be executed by the browser. This is the security hole. Only usually we do not display the text immediately after submitting the form, but first save it to the database, then display it on different pages of the site.

Why XSS is dangerous

Having injected his script, the attacker gains access to the entire html page, can read and change it as he pleases.

In addition, the attacker gains access to the user’s browser cookies. Of course, only those that relate to the current site. He can steal cookies that are responsible for user authorization and substitute them into his browser.

Thus, he can enter the site under someone else’s account without a username and password. Of course, only if there are no other checks on the site: for browser compliance, IP-addresses, etc., although they can be faked if desired.

How to protect yourself from XSS

Fortunately for us, there is a simple universal tool – the htmlspecialchars () function or its sometimes used counterpart htmlentities () .

How it works. HTML has such a thing as entities or mnemonics . This is when I write a specific sequence of characters directly into HTML, for example & copy; , and the browser displays the symbol corresponding to this mnemonic, in this case the copyright icon ©.

Try it yourself:


<div>
The & para; <br>
Inverted question mark & ​​iquest; <br>
Multiplication sign (cross) & times; <br>
Left arrow & larr; <br>
Typographic cross & dagger;
</div>

So that’s it. When we run the htmlspecialchars () function , it takes our string and replaces some characters in it (quotes, angle brackets, etc.) with mnemonics so that the browser is guaranteed to display our string as a string without trying to execute it as code.

Those. when we enter the text <script> alert (‘hello’) </script> into our form , the htmlspecialchars () function will turn it into & lt; script & gt; alert (‘hello’) & lt; / script & gt; … Of course, the browser will no longer accept such code as javascript and will simply display it as it is.

Let’s check:

<? php
if (isset ($ _ POST ['form'])) {
    echo 'Title:', htmlspecialchars ($ _ POST ['title'], ENT_QUOTES, 'UTF-8'), '<br>';
    echo 'Text:', htmlspecialchars ($ _ POST ['content'], ENT_QUOTES, 'UTF-8');
}
?>
<form method = "POST">
    <p> Title: <input name = "title"> </p>
    <p> Text: <textarea name = "content"> </textarea> </p>
    <p> <button name = "form"> Submit! </button> </p>
</form>

Now, whatever javascript code we try to substitute, it will simply be output to the browser as a string.

When is it better to handle a string

There is often debate on the Internet about when is the best to process text, before writing to a database or when displaying it on screen.

Processing before writing to the database has several disadvantages:

  • We cannot find out the real length of the line, since LLC “Three Cats” is 14 characters, but LLC “Three Cats” – already 24 characters.
  • In addition to HTML, data from the database sometimes needs to be substituted somewhere else, for example, in word / excel / pdf files, where there may not be any mnemonics. We’ll have to decode all the data back to its original form.

In general, this is inconvenient. I recommend that you always save the source text entered by the user into the database, and process it when it is displayed.

How to simplify string handling

And what, every time now to write this long function?

<div>
    <?= htmlspecialchars($_POST['title'], ENT_QUOTES, 'UTF-8') ?>
</div>

No thanks. Better to write a separate function for this case:

function e($string)
{
    return htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
}

You can replace return immediately with echo, it doesn’t matter. Now it’s much easier to process strings:


if (isset ($ _ POST ['form'])) {
    echo 'Title:', e ($ _ POST ['title']), '<br>';
    echo 'Text:', e ($ _ POST ['content']);
}

If you’ve heard about templating engines like Twig or Blade, they use their own syntax for outputting variables so that they are always processed by default:


<div> {{title}} </div> <! - With processing ->
<div> {!! content !!} </div> <! - No processing ->

Leave a Reply

Your email address will not be published. Required fields are marked *

Leave a Reply

Your email address will not be published. Required fields are marked *