Opened 15 years ago

Closed 12 years ago

Last modified 12 years ago

#11442 closed Bug (fixed)

Postgresql backend casts inet types to text, breaks IP operations and IPv6 lookups.

Reported by: eide Owned by: Sasha Romijn
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: ipv6 postgres inet dceu13
Cc: Jernej Kos, mmitar@…, asandroq Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Ticket #708 describes a problem with LIKE operations on inet types in postgresql. The solution was to cast inet to text using the HOST() function.

But by casting inet to text none of the network operations in postgresql will work, and IPv6 lookups are pretty much broken. In the database I'm currently using, doing a HOST() on a IPv6 address will always produce a compressed URL. So if I'm checking against a fullsize address in django the lookup will fail, even though they are the same address.

Here's an example of what I'm talking about:

my_db=# CREATE TABLE my_ips (ip inet);
CREATE TABLE
                            ^
my_db=# INSERT INTO my_ips VALUES ('2001:0db8:0000:0000:0000:0000:0000:0001');
INSERT 0 1

my_db=# SELECT * FROM my_ips WHERE ip = '2001:0db8:0000:0000:0000:0000:0000:0001';
     ip      
-------------
 2001:db8::1
(1 row)

my_db=# SELECT * FROM my_ips WHERE ip = '2001:db8::1';
     ip      
-------------
 2001:db8::1
(1 row)

So far so good, but when you throw HOST() into the picture, this happens:

my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:db8::1';
     ip      
-------------
 2001:db8::1
(1 row)

my_db=# SELECT * FROM my_ips WHERE HOST(ip) = '2001:0db8:0000:0000:0000:0000:0000:0001';
 ip 
----
(0 rows)

2001:db8::1 and 2001:0db8:0000:0000:0000:0000:0000:0001 are the same address, just displayed on different forms.

Currently I always make sure that I pass a compressed IP to the models with IPAddressFields. That does however assume that all postgresql databases will always return IPv6 addresses on the compressed form, and I do not know if that's correct.

The correct solution would be to not cast inet to text.

Also, the postgresql documentation on Network Address Functions and Operators states that:

The host, text, and abbrev functions are primarily intended to offer alternative display formats.

So using HOST() for lookups is acctually kind of wrong in the first place.

Change History (22)

comment:1 by morten.brekkevold@…, 15 years ago

Not only will using HOST yield wrong results, there are also severe performance implications to using the HOST function call in lookups, as it fails to utilize indexes on INET type fields. See the following example:

nav=# select count(*) from arp;   
  count  
---------
 6391765
(1 row)

nav=# explain analyze select * from arp where ip = '2001:700::1';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using arp_ip_btree on arp  (cost=0.00..905.80 rows=232 width=67) (actual time=0.021..0.021 rows=0 loops=1)
   Index Cond: (ip = '2001:700::1'::inet)
 Total runtime: 0.051 ms
(3 rows)

nav=# explain analyze select * from arp where HOST(ip) = '2001:700::1';
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on arp  (cost=0.00..200239.38 rows=32911 width=67) (actual time=9410.175..9410.175 rows=0 loops=1)
   Filter: (host(ip) = '2001:700::1'::text)
 Total runtime: 9410.196 ms
(3 rows)

nav=# 

comment:2 by kristian.klette@…, 15 years ago

Morten: The perfomance issue can be "solved" by adding a host(ip)-index on the table;

CREATE INDEX arp_host_ip ON arp (host(ip));
klette=# SELECT ip from ips where host(ip) = '2001:700:300:1800::b';
         host         
----------------------
 2001:700:300:1800::b
(1 row)

Time: 1781.635 ms
klette=# CREATE INDEX ips_host_ip_index ON ips ( host(ip));
CREATE INDEX
Time: 31937.661 ms
klette=# SELECT ip from ips where host(ip)::text = '2001:700:300:1800::b';
         host         
----------------------
 2001:700:300:1800::b
(1 row)

Time: 0.805 ms

Doesn't really solve the bug though, but boost performance at least.

comment:3 by Russell Keith-Magee, 15 years ago

Triage Stage: UnreviewedAccepted

comment:4 by Russell Keith-Magee, 15 years ago

Resolution: duplicate
Status: newclosed

On second thought - closing as a dupe of #811. IPv6 support is spotty across the board - if we're going to fix it, it isn't just a Postgres issue.

comment:5 by bobrobertson, 15 years ago

Resolution: duplicate
Status: closedreopened
Version: 1.01.1

The resolution assumes this is just an IPv6 problem, and completely ignores the enormous performance problem introduced by casting every inet record in the database to a string. This is understandable for LIKE queries, but it even uses HOST() on exact match queries.

These two queries return the same results. The first is how Django currently runs this query, and is roughly 2000x slower than the second. (Yes, I restarted Postgres between tests and flushed the OS buffers, so it is a fair comparison.)

The difference is performing n inet->string casts vs. performing 1 string->inet cast.
This also fixes the original IPv6 problem in this ticket.

Takes ~30.0 sec:

SELECT ip from ips where host(ip) = '10.0.0.1'

Takes ~0.15 sec:

SELECT ip from ips where ip = inet '10.0.0.1'

in reply to:  5 comment:6 by bobrobertson, 15 years ago

Replying to bobrobertson:

Excuse my typo.

Takes ~0.15 sec:

should have been:

Takes ~0.015 sec:

comment:7 by David Mandelberg, 14 years ago

This also breaks ordering of IPv4 addresses, making some querysets return completely incorrect results.

foo=# select inet '127.0.0.3' < inet '127.0.0.10';
 ?column?
----------
 t
(1 row)

foo=# select '127.0.0.3' < '127.0.0.10';
 ?column?
----------
 f
(1 row)

comment:8 by Julien Phalip, 14 years ago

Severity: Normal
Type: Bug

comment:9 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:10 by Aymeric Augustin, 13 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:11 by Jernej Kos, 13 years ago

Cc: Jernej Kos added

This also unexpectedly breaks any custom fields that have the inet database type. For example, when performing a lookup for a subnet, this HOST() conversion will produce no results, since the prefix length is stripped. And it is very annoying since the only way to work around it seems to be writing a custom manager that specifies a custom WhereNode implementation for the query set (or an ugly way is to add an extra space into the returned db_type so it doesn't match and is not mangled in the backend). This decision of casting every inet-thing via HOST() was really unwise and should be fixed.

Last edited 13 years ago by Jernej Kos (previous) (diff)

comment:12 by Mitar, 13 years ago

Cc: mmitar@… added

comment:13 by asandroq, 12 years ago

Cc: asandroq added

comment:14 by Aymeric Augustin, 12 years ago

Status: reopenednew

comment:15 by HM, 12 years ago

This is biting me right now. I need to lookup an ip-address in a range of ip-addresses, that is:

select * from ip_range where ipaddr_start <= '62.79.79.3' and ipaddr_end >= '62.79.79.3';

Which with the current test-database (postgresql 9.1.4) returns

 id | ipaddr_start |  ipaddr_end 
----+--------------+-------------
 47 | 62.79.79.0   | 62.79.79.255

As expected.

On Django 1.5.1:

>>> a = '62.79.79.3'
>>> print IPRange.objects.filter(ipaddr_start__lte=a, ipaddr_end__gte=a).query
'SELECT "ip_range"."id", "ip_range"."ipaddr_start", "ip_range"."ipaddr_end" 
FROM "ip_range" WHERE (HOST("ip_range"."ipaddr_start") <= 62.79.79.3  
AND HOST("ip_range"."ipaddr_end") >= 62.79.79.3 ) 
ORDER BY "ip_range"."ipaddr_start" ASC, "ip_range"."ipaddr_end" ASC;'

This yields a syntax error when pasted in to psql, on the first occurence of the unquoted "62.79.79.3".

If the ip-addresses are quoted and HOST() removed, it works.

The model is

class IPRange(models.Model):
    ipaddr_start = models.GenericIPAddressField(protocol='both',
            verbose_name='Start of IP-range', unpack_ipv4=True)
    ipaddr_end = models.GenericIPAddressField(protocol='both',
            verbose_name='End of IP-range', unpack_ipv4=True, blank=True,
            null=True, default=None)

    class Meta:
         db_table = 'ip_range'
         unique_together = ('ipaddr_start', 'ipaddr_end')

Psql reports the table as:

                                    Table "public.ip_range"
    Column    |          Type          |                       Modifiers
--------------+------------------------+--------------------------------------------------------
 id           | integer                | not null default nextval('ip_range_id_seq'::regclass)
 ipaddr_start | inet                   | not null
 ipaddr_end   | inet                   | 

I'm on a deadline and will therefore have to write my own field, will add to this ticket when done.

comment:16 by Aymeric Augustin, 12 years ago

The query and the parameters are sent separately to PostgreSQL. In .query the parameters are substituted for readability, but that obviously results in invalid SQL, and it isn't the query that's actually sent to PostgreSQL.

comment:17 by HM, 12 years ago

Still have to get rid of HOST(). LIKE isn't defined for 'inet' anyway, it has its own subnet-operators: '<<', '<<=', '>>', '>>='. The operators "contains" and "icontains" should map to postgres ">>=".It seems the backend-system isn't designed for adapting its operators to the data types so I think I'm better off using raw SQL right now :/.

comment:18 by Sasha Romijn, 12 years ago

Owner: changed from nobody to Sasha Romijn
Status: newassigned

comment:19 by Sasha Romijn, 12 years ago

Keywords: dceu13 added
Version: 1.1master

comment:20 by Sasha Romijn, 12 years ago

Has patch: set

I can see the need for more native support of INET, but the problem is that other databases do not support this type. If we would not treat INET as a string, using HOST(), the behaviour across databases would become inconsistent. Therefore, I don't think we can remove the usage of HOST().

That said, I do understand the desire to use this in custom fields. Currently, the PostgreSQL backend applies HOST() on all queries for INET columns, making custom fields difficult. I have made a patch that instead decides on HOST() based on the get_internal_name() of the field. Only when that is (Generic)IPAddressField, HOST() is used. This means behaviour of built-in fields remains consistent across database (and the same as in previous Django versions), but it is now possible to define custom fields that use INET but are not wrapped with HOST().

Tests successfully run on PostgreSQL 9.2.4 and SQLite, but overall the patch isn't very risky as no queries have changed. I have also added an additional test for #708, the reason we originally started using HOST():

PR: https://github.com/django/django/pull/1160

comment:21 by Erik Romijn <eromijn@…>, 12 years ago

Resolution: fixed
Status: assignedclosed

In 60d94c2a80a68861021526c0fef7fc40e648e81f:

Fixed #11442 -- Postgresql backend casts all inet types to text

comment:22 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

In f7467181aa56e30c946ff55190c49792d6e9a72f:

Merge pull request #1160 from erikr/host-inet-postgres2

Fixed #11442 -- Postgresql backend casts all inet types to text

Note: See TracTickets for help on using tickets.
Back to Top