Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#22943 closed Bug (fixed)

Some validators defined in django.core.validators can't be serialized

Reported by: antialiasis@… Owned by: nobody
Component: Core (Other) Version: 1.7-rc-1
Severity: Release blocker Keywords: validators, migrations
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: yes UI/UX: no

Description

In django.core.validators, validate_slug and validate_ipv4_address are defined as RegexValidator instances constructed with precompiled regexes. Since the migrations framework can't serialize compiled regexes (as explained only in the release notes for 1.7; shouldn't this also be documented in the validators reference?), it is impossible to use these validators on model fields in a migrated app - the migration will fail with ValueError: Cannot serialize: <_sre.SRE_Pattern object at 0x0274AAD0>. Presumably Django's own built-in validators should work with migrations out of the box.

I would assume all that needs to be done here is pass the regex arguments as strings instead of compiled regexes to the RegexValidator constructor where these validators are defined.

Attachments (4)

validator_patch.diff (6.2 KB ) - added by antialiasis@… 10 years ago.
Use strings rather than precompiled regexes for built-in RegexValidator instances.
validator_patch_2.diff (11.8 KB ) - added by antialiasis@… 10 years ago.
Allow MigrationWriter to serialize compiled regex objects.
validator_patch_2.2.diff (12.4 KB ) - added by antialiasis@… 10 years ago.
Rebase and cache compiled regex type.
regex_py3_wip.diff (3.9 KB ) - added by antialiasis@… 10 years ago.
Fix Python 3 regex serialization, but a test currently fails (see comment below)

Download all attachments as: .zip

Change History (21)

comment:1 by anonymous, 10 years ago

Keywords: migrations added

by antialiasis@…, 10 years ago

Attachment: validator_patch.diff added

Use strings rather than precompiled regexes for built-in RegexValidator instances.

comment:2 by antialiasis@…, 10 years ago

Has patch: set

Well, I decided to try my own hand at a patch for this. First-time contributor, so please excuse if it's a bit shaky; in particular, I'm not sure if it's prudent to import from django.db.migrations in the validator tests or if dependencies to other parts of Django should be kept to a minimum (and if so how it would be best to solve this). Ran the entire test suite in case the change caused problems somewhere else, and everything passed (bar expected failures).

comment:3 by Simon Charette, 10 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Hi antialiasis,

Unfortunately defining those patterns as string instead of compiled re might be breaking backward compatibility since they're expected to be compiled.

I think the correct way of solving this issue would be to either:

  1. Instruct the migration framework how to deal with compiled re instance. You can have a look at how django.db.migrations.writer.MigrationWriter work for this;
  2. Implement RegexValidator.deconstruct instead of using the deconstructible decorator to make sure the compiled self.regex attribute is deconstructed to return regex=self.regex.pattern and flags=self.regex.flags kwargs.

comment:4 by Simon Charette, 10 years ago

You might have to rely on ducktyping to implement the first option since it seems there's no way to reliably detect that a given object is a compiled pattern.

Version 0, edited 10 years ago by Simon Charette (next)

comment:5 by Claude Paroz, 10 years ago

There is also another option: passing slug_re.pattern, ipv4_re.pattern, etc. to the validator instances. But the solutions proposed by Simon have the advantage of being more general.

comment:6 by Claude Paroz, 10 years ago

As for the tests, I think it would be better to add them in tests/migrations/test_writer.py, as there are already some validator-related tests in test_serialize (just create a separate test for all validator tests).

by antialiasis@…, 10 years ago

Attachment: validator_patch_2.diff added

Allow MigrationWriter to serialize compiled regex objects.

comment:7 by antialiasis@…, 10 years ago

claudep's solution is more along the lines of what I meant to do here (it stupidly slipped my mind that slug_re etc. were public API and not just intermediate variable names for cleaner constructor calls). Then again, the reason that's what I meant to do was that I had assumed actually making validators with compiled regexes serializable was impossible, or at least a lot trickier than it sounded, given it hadn't been done and the release notes were telling people to just stop passing precompiled regexes to RegexValidator. I've attached my attempt at making compiled regexes serializable (plus factoring out the validator tests and removing that item from the documentation); if it really is that simple, then of course that's a more general and useful solution.

comment:8 by Simon Charette, 10 years ago

Cc: Simon Charette added
Patch needs improvement: set

Patch is looking good! Two nitpicks:

  1. I would try to cache type(re.compile('')) instead of computing it on every instance check;
  2. Could you rebase your patch on the master branch, I can't apply it with patch -p0 < validator_patch_2.diff.

by antialiasis@…, 10 years ago

Attachment: validator_patch_2.2.diff added

Rebase and cache compiled regex type.

comment:9 by antialiasis@…, 10 years ago

Fixed the patch, should work now.

comment:10 by Simon Charette, 10 years ago

Hmm I still couldn't apply your patch on top of the lastest master so I had to manually merge all the changes.

I created a PR with it. Could you sure make I didn't miss anything?

comment:11 by Simon Charette, 10 years ago

I updated the patch to work under Py3. Some tests were failing because of the fact that re.compile(u'').flags == 32 on Py3 and 0 on Py2.

I would appreciate if someone else could make sure the changes I made make sense.

comment:12 by Simon Charette <charette.s@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 35c2c3704185dfdf6498ccf625872fa28bd7ece1:

Fixed #22943 -- Correctly serialize compiled regexes.

Thanks to antialiasis at gmail dot com for the patch.

comment:13 by Andrew Godwin <andrew@…>, 10 years ago

In f751109cb240290cb946f8d4def0c7a5fb92569b:

Merge pull request #2881 from charettes/ticket-22943-compiled-regex-deconstruction

Fixed #22943 -- Correctly serialize compiled regexes.

comment:14 by Andrew Godwin, 10 years ago

I reviewed the patch and it looks fine to me; I've merged it and backported it to the 1.7 stable branch too.

comment:15 by Andrew Godwin <andrew@…>, 10 years ago

In 2f0cc4f5fbdddbecbd9c2093baf851fd92dbe205:

[1.7.x] Fixed #22943 -- Correctly serialize compiled regexes.

Thanks to antialiasis at gmail dot com for the patch.

by antialiasis@…, 10 years ago

Attachment: regex_py3_wip.diff added

Fix Python 3 regex serialization, but a test currently fails (see comment below)

comment:16 by antialiasis@…, 10 years ago

Sorry to disappear from the discussion; I was on a trip. Thanks for correcting and accepting the patch. However, I believe charettes' Py3 correction is incomplete - the patch still checks if regex.flags is truthy to determine if it should include the flags parameter in the serialization, so if one created a regex with re.compile('', 0) on Py3, it would currently write out just re.compile('') (which Py3 will treat as re.compile('', 32)).

I've attached a version that checks if regex.flags is equal to the default flags instead, but this causes a test to fail, because even though re.compile('', re.U).flags == re.compile('').flags (on Py3), re.compile('', re.U) != re.compile('') (and the latter is what comes out of the serialization of the former). Is there any chance this could potentially cause problems in practice or should we just change the test?

comment:17 by antialiasis@…, 10 years ago

Okay, I looked into it a bit better. To correct/clarify: the real issue is that the flags attribute on a compiled regex does not simply contain the flags passed into re.compile; it contains those flags, plus any inline flags defined within the pattern itself, plus (on Python 3) the implicit Unicode flag. This means if we naïvely serialize regexes based on the flags attribute, the serialization may explicitly specify flags that were not actually passed into re.compile when the regex was created. While as far as I can tell this should not affect how the resulting regex functions, it does mean it won't compare equal to the original. Could this cause problems?

I believe this means my patch above is obsolete. We could compare the flags on the actual regex to re.compile(regex.pattern).flags to see if we can omit the flags argument, but the serialized-deserialized regex would then still compare unequal to the original if the user did explicitly specify a flag such as re.U. If this is a problem, we may need to rethink this a bit.

Note: See TracTickets for help on using tickets.
Back to Top