Opened 12 years ago

Last modified 3 years ago

#18887 new Cleanup/optimization

LineString array method (property) returns different data type without and with NumPy installed

Reported by: mal Owned by:
Component: GIS Version: 1.4
Severity: Normal Keywords: GIS, NumPy, LineString
Cc: deprince@…, Ubercore Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Apparently LineString (and probably other geometry types) return data in different types depending on whether NumPy is installed.

Simple test case (reproducible in Django 1.3, 1.3.1 and 1.4.1).

Clean virtual env with only Django installed (and ipython).
First - no NumPy installed.

In [1]: from django.contrib.gis.geos import LineString
In [2]: line = LineString((0, 0), (3, 3))
In [3]: line.array
Out[3]: [(0.0, 0.0), (3.0, 3.0)]

Now - install NumPy and try again.

In [1]: from django.contrib.gis.geos import LineString
In [2]: line = LineString((0, 0), (3, 3))
In [3]: line.array
Out[3]:
array([[ 0.,  0.],
       [ 3.,  3.]])

[(0.0, 0.0), (3.0, 3.0)] =! array( 0., 0.],[ 3., 3.)

This is rather serious issue.

Change History (17)

in reply to:  description ; comment:1 by Claude Paroz, 12 years ago

Replying to mal:

This is rather serious issue.

Maybe you could add a real use case demonstrating why you consider this as a serious issue?

in reply to:  1 comment:2 by anonymous, 12 years ago

Replying to claudep:

Replying to mal:

This is rather serious issue.

Maybe you could add a real use case demonstrating why you consider this as a serious issue?

Because it returns two different values depending on if a module is installed or not? Any conditionals on the result will be dependant on if numpy is installed or not.

comment:3 by Claude Paroz, 12 years ago

Triage Stage: UnreviewedAccepted

Accepting on the base that this should be at least documented.

I see the potential issues, but apart documentation, do you see any possible improvements?

comment:4 by deprince, 12 years ago

Cc: deprince@… added

Would it be an option to make numpy a dependency for the installation of gis?

comment:5 by Adam DePrince <deprince@…>, 12 years ago

Scratch that ... you don't want numpy to be required to install django. It seems if the user wants a list they would call list(LineString(...)) instead of LineString(...).array. Perhaps we should issue a warning when the user calls .array without numpy installed?

>>> line = LineString((0,0), (3,3))
>>> line.array
array([[ 0.,  0.],
       [ 3.,  3.]])
>>> list(line) 
[(0.0, 0.0), (3.0, 3.0)]

comment:6 by Ubercore, 12 years ago

Owner: changed from nobody to Ubercore

comment:7 by Ubercore, 12 years ago

I've confirmed this, going to look into it a little more closely to see if we can make the behavior predictable somehow, without losing the ability to take advantage of numpy if installed.

comment:8 by Ubercore, 12 years ago

Cc: Ubercore added
Triage Stage: AcceptedDesign decision needed

Yeah, the only reasonable options I see are the ones outlined: warning when array is called, and/or noting this in the docs. The behavior here seems to be intended, not a bug. Marking as design decision needed.

comment:9 by Adam DePrince <deprince@…>, 12 years ago

Here's a proposed fix if a warning in the absence of numpy is desired.

https://github.com/adamdeprince/django/commit/d02431ac5f15616df4cb3d39592651e59fdde17f

comment:10 by Ubercore, 12 years ago

Has patch: set
Status: newassigned
Triage Stage: Design decision neededAccepted

I've submitted a pull request for the following commit, which adds a setting: https://github.com/peterlandry/django/commit/0c9839355a005e8805467aa09d4fcf9795965b76

comment:11 by Ubercore, 12 years ago

Triage Stage: AcceptedReady for checkin

comment:12 by Adam DePrince <deprince@…>, 12 years ago

No, this doesn't capture what is really happening.

There is a bug with this: [(0.0, 0.0), (3.0, 3.0)] =! array( 0., 0.],[ 3., 3.?)

This snippet doesn't mean

{{
not ([(0.0, 0.0), (3.0, 3.0)] =! array( 0., 0.],[ 3., 3.?))
}}

it means these types are completely incomparable.

If you compare these two you get completely different behavior than if you compared two lists.

numpy.array([(0.0, 0.0), (3.0, 3.0)]) == [(0.0, 0.0), (3.0, 3.0)]
array([[ True, True],

[ True, True]], dtype=bool

There will be two populations that use this API. One will have numpy installed on their dev machines and one won't. When they release extensions and share code those they share it with may not have the same numpy status.

It seems pretty clear that if you want a list you should call list(LineString(...)) and if you actually want a numpy array you could call .array and expect it. If the system will silently break that promise you should have an exception, but we don't want to break existing code, so I'm proposing we raise a warning instead. This diff doesn't do this. My proposed code does.

We can't have two APIs that depend on the module you have installed; I really feel strongly that .array without numpy installed should warn.

comment:14 by Claude Paroz, 12 years ago

Triage Stage: Ready for checkinAccepted

Please do not mark your own patch as RFC.

comment:15 by Preston Holmes, 12 years ago

Needs documentation: set
Patch needs improvement: set
Type: BugCleanup/optimization

I'm inclined to agree with claudep here that this is primarily a documentation issue - an admonition probably.

The Numpy related LineString methods aren't documented at all.

A warning would become annoying if you've accepted the non-numpy behavior in your code, and don't plan on changing it. This is as valid as the Numpy approach as there has been no documentation. So why should that user now suffer eternal warnings?

The comment on the property suggests that .array was to return a numpy array, but all the other properties are commented as returning a "list or numpy array"

If the numpy difference is documented clearly - anyone wanting to right portable/sharable code can check from django.contrib.gis.geos.base import numpy and make sure they standardize the data.

I'm closing the current pull requests as the wrong direction - not RFC.

comment:16 by Sergey Fedoseev, 9 years ago

Owner: changed from Ubercore to Sergey Fedoseev

I don't see any benefit in returning numpy array from Django, because Django gets the list at first and then just creates numpy array with that list, users can do it easily by themselves. So I propose to deprecate array property in favor of list(LineString).

comment:17 by Mariusz Felisiak, 3 years ago

Owner: Sergey Fedoseev removed
Status: assignednew
Note: See TracTickets for help on using tickets.
Back to Top