What is SQL Injection

SQL injection is the substitution of such data into an SQL query that changes the structure of this query. An attacker could exploit the vulnerability to execute arbitrary SQL.

Let’s imagine a typical task – displaying articles on a website. When you go to the address /index.php?id=15 , an article should be displayed, the ID of which in the database is 15.

How beginner developers usually write a database query:

$query = 'SELECT * FROM `articles` WHERE `id` = ' . $_GET['id'];

The developer expects $ _GET [‘id’] to contain a number and the final request will be like this:

SELECT * FROM `articles` WHERE `id` = 15

But instead, an attacker can pass the string -1 OR 1 = 1 :

SELECT * FROM `articles` WHERE `id` = -1 OR 1=1

When running this query, all records will be selected instead of one, since records with negative identifiers are most likely not in the database, and the condition 1 = 1 is always true.

But the point is different. After fragment 1 = 1, an attacker can supplement the query with any arbitrary SQL code.

What can an attacker do?

It depends on the specific request, as well as how you run it.

If the query is not executed through the mysqli_multi_query () function , which supports multiqueries (multiple queries separated by semicolons), then the attacker has no way of executing a completely arbitrary query like this:

SELECT * FROM `articles` WHERE `id` = 1; DROP TABLE `articles`

This will not work, since multiple queries are not supported by default.

But an attacker can do something bad. For example, using UNION you can get any data from any tables.

Let’s imagine we have an articles table with 4 fields: id | title | content | created_at and also a users table with 3 fields: id | login | password .

Since UNION allows you to combine data from tables with only the same number of columns, an attacker can specify 2 columns he needs, and fill the remaining 2 with any values, for example, units:

SELECT * FROM `articles` WHERE `id` = -1 UNION SELECT 1, `login`, `password`, 1 FROM `users`

As a result, instead of title and content, the page will display the login and password of one of the users. And this is just one of dozens of possible hacking options.

Escaping quotes

Before moving on to the existing methods of protection, I want to separately explain what shielding is in general and why it is needed.

Let’s take this example:

$name = 'Ram';
$query = "UPDATE `users` SET `name` = '$name'";

This request is fine, it will execute as we expect:

UPDATE `users` SET `name` = 'Ram'

But what if $ name contains a single quote?

$name = "Ram Mohan";

Then the SQL query becomes like this:

UPDATE `users` SET `name` = 'Mohan'

Attempting to execute this query will result in a syntax error. To avoid it, the second quotation mark must be escaped, i.e. add a backslash to it.

We will analyze the methods of escaping and their reliability a little below, but for now, for simplicity, let’s take addslashes () :

<?php
$name = "Mohan";
$name = addslashes($name);
$query = "UPDATE `users` SET `name` = '$name'";

What will happen in the end:

UPDATE `users` SET `name` = 'Mohan'

Done, the query will run even if the quotes are present.

Quotes are not the only thing to escape. Different functions can escape different characters, we’ll talk about this in detail a little later.

And now an important point. Some developers believe that escaping is sufficient to completely protect against SQL injection.

Okay, let’s take another look at the very first example with SQL injection:

$_GET['id'] = '-1 OR 1=1';
$query = 'SELECT * FROM `articles` WHERE `id` = ' . $_GET['id'];
SELECT * FROM articles WHERE id = -1 OR 1=1

There are no quotes in this query. But there is vulnerability. Hence, we conclude that escaping does not guarantee protection against SQL injection .

Ineffective ways to protect against SQL injection

Obviously, the worst option is to have no protection against SQL injection and pass the data received from the user directly to the SQL query.

$query = 'SELECT * FROM `users` WHERE `id` = ' . $_GET['id'];

Never do that! Any data must be filtered and / or validated before being substituted into an SQL query.

1. The htmlspecialchars () function

From time to time I come across articles where authors use the htmlspecialchars () function to escapes data:

$name = "Mo'han";
$name = htmlspecialchars($name);
$query = "UPDATE `users` SET `name` = '$name'";

Is it dangerous! The thing is that the htmlspecialchars () function passes dangerous characters without escaping: \ (slash), \ 0 (nul-byte) and \ b (backspace).

Here is a complete example code to demonstrate the vulnerability:

$mysqli = new mysqli('localhost', 'root', 'password', 'database');

$login = '\\';
$password = ' OR 1=1 #';

$login = htmlspecialchars($login, ENT_QUOTES, 'UTF-8');
$password = htmlspecialchars($password, ENT_QUOTES, 'UTF-8');

$sql = "SELECT * FROM `users` WHERE `login` = '$login' AND `password` = '$password'";

$items = $mysqli->query($sql) or die($mysqli->error);

while($item = $items->fetch_assoc())
{
    var_dump($item);
    echo '<br>';
}

As a result, the SQL query will be like this:

SELECT * FROM `users` WHERE `login` = '\' AND `password` = ' OR 1=1 #'

/ Is used to escape the quote immediately after $ login. `login` = ‘$ login’ turns into` login` = ‘\’ AND `password` = ‘ . After that, any code that we write will be executed, in our case it’s just OR 1 = 1 . Add # (comment) at the end to hide the last quote.

2. Filtering by the black list of characters

For some reason I do not understand, there are still developers using blacklists of characters:

$disallow = ['~', '\'', '"', '<', '>', '.', '%'];
$name = 'Вася';

$name = str_replace($disallow, '', $name);

$query = "SELECT * FROM `users` WHERE `name` = '$name'";

All characters included in the blacklist are removed from the string before being inserted into the base.

I do not want to say that this approach will not work, but its application is a big question:

  • Why make up any lists at all when there are simpler and more reliable ways of protection?
  • You need to know all the potentially dangerous symbols.
  • What if you want to allow users to use any characters from the list?

Also, I think filtering in SQL queries is a bad idea. If there are invalid characters in the line, it is better to inform the user about them and ask them to fix them, and not just cut off some of the content.

For example, a user wants to use the login ~! Mega_! Pihar! _! 9000! ~ , And after registration it turns out that his nickname has turned into MegaPihar9000 .

I think it is better to check with the user if he likes this filtered login or if he would like to change something. In short, I’m in favor of whitelisting instead of black filtering.

3. The stripslashes () function

Rarely, but there is code that uses stripslashes () before writing to the database. Since newbies are still copying this code into their projects, I’ll explain why this function is needed.

Previously, PHP had such a thing as magic quotes ( Documentation ). If this directive was included, then all data contained in $ _GET, $ _POST and $ _COOKIE was automatically escaped.

This was done to protect newcomers who substituted data directly into SQL queries. In practice, this was not the best solution:

  • It is not very convenient when all data is escaped by default, because often they are needed in their original form.
  • Ideally, escaping should take into account the encoding of the database connection, which we’ll talk about a little later. Because of this, developers had to remove the escaping with the stripslashes () function and then escaping the data again with more appropriate functions, in the case of MySQL it was mysql_real_escape_string () .

This is why stripslashes () can be found in older tutorials. To un-escape characters and get the original string.

Since PHP 5.4, the magic quotes functionality has been removed, so there is no point in using stripslashes () before writing to the database.

4. addslashes () function

In some books, you can also find recommendations to escape data with the addslashes () function .

This function is more reliable than htmlspecialchars () because it escapes both backslash and nul bytes. However, this function is worse than mysql_real_escape_string because it does not respect the encoding of the current database connection.

Therefore, even in the documentation it is directly written that this function should not be used to protect against SQL injection.Excerpt from the documentation about the addslashes () function

Effective ways to protect

1. Function mysql (i) _real_escape_string

This function works in approximately the same way as addslashes () , only it takes into account the current encoding of the database connection.

There are two important details that you should know when using this feature.

First, you must always quote escaped data. If you do not do this, there will be no sense in escaping:


// Wrong, escaped first!
$ query = 'SELECT * FROM `articles` WHERE` id` ='. $ _GET ['id'];

// Escape
$ id = $ mysqli-> real_escape_string ($ _ GET ['id']);

// Also wrong, no quotes
$ query = 'SELECT * FROM `articles` WHERE` id` ='. $ id;

// Right
$query = "SELECT * FROM `articles` WHERE `id` = '$id'";

The second danger lies in wait for those who use some specific encodings like GBK. In this case, you definitely need to specify the encoding when establishing a connection to the base.

You can read about the problem here (blog of the developer who discovered the error), here and in more detail with examples there .

2. Casting to a number

A simple and effective way to protect numeric fields is to cast the data to a number. Example:

$_POST['id'] = '15';

$id = (int) $_POST['id'];

// Or like this:
$ id = intval ($ _ POST ['id']);
// Or for fractional numbers:
$ id = (float) $ _POST ['id'];
$query = 'SELECT * FROM `users` WHERE `name` = ' . $id;

Quotation marks are not required here, since a number will be substituted into the query anyway.

There is one caveat. As I wrote above, I do not really like the idea of ​​data filtering and here it can go sideways in terms of SEO.

Let’s say you have an online store where product page URLs look like / product / 15 , where 15 is the product ID.

If the article search algorithm is that we take the second part of the URL and convert it to a number like this:

$segments[2] = '15';
$id = (int) $segments[2];

Then you can write any characters after the number 15 (only one next character must be non-numeric), for example / product / 15abcde13824_ahaha_lol and the system will still display an article with id = 15.

3. Prepared queries

One of the best ways to protect against SQL injection. The bottom line is that the SQL query is first “prepared”, and then data is transferred to it separately.

$stmt = $db->prepare('SELECT * FROM `users` WHERE `name` LIKE ?');
$stmt->execute([$_GET['name']]);

This approach guarantees the absence of SQL injection at the time of data substitution, since the query is already “prepared” and cannot be changed.

But, as usual, the details spoil everything.

If you heroically read it all the way (no), there is an interesting claim – that prepared-query PDO can also have an encoding vulnerability.

To avoid it, you need to either turn off the emulation of prepared statements, or use only reliable encodings (for example, UTF-8), or be sure to specify the connection encoding (via $ mysqli-> set_charset ($ charset) or DSN for PDO, but not via SQL query SET NAMES).

Second detail. You need to understand that protection against SQL injection will only work if we do not substitute any data directly into the query. If the developer decides to do this:

$stmt = $db->prepare("SELECT * FROM `users` WHERE `name` = '$_POST[name]'");
$stmt->execute();

Then no prepared queries will save him.

And the third detail. Column and table names cannot be substituted into prepared queries.

// Так делать нельзя
$stmt = $pdo->prepare('SELECT ? FROM ?);

Perfectly. And now what i can do?

One of the most common options is whitelisting. Simple example:


$ _POST ['product'] = [
    'title' => 'Product name',
    'article' => 'Item number',
    'content' => 'Product Description'
];

$ allowed = ['title', 'article', 'content'];

foreach ($ _ POST ['product'] as $ k => $ v)
{
    if (! in_array ($ k, $ allowed, true))
        die ('Invalid field:'. $ k);
}

If there are a lot of fields and you do not want to drive in all of them with handles, you can simply get them all from the database ( SHOW COLUMNS FROM `products`) .

Another logical option is to validate the column names, for example, allowing only letters, numbers, and underscores.

In general, again you need to finish something manually, come up with your own query generation functions. Not comme il faut. I recommend doing otherwise.

4. Ready libraries

The developers of popular libraries are probably much smarter and more experienced than us. They thought of everything for a long time and tested it on tens of thousands of programmers. So why not?

For simple projects, Medoo or RedBeanPHP is enough , for medium projects I recommend (and always use) Eloquent , but for large projects the powerful and harsh Doctrine is best suited .

Leave a Reply

Your email address will not be published. Required fields are marked *

Leave a Reply

Your email address will not be published. Required fields are marked *