Changes between Version 18 and Version 19 of new_meta_api
- Timestamp:
- Jul 11, 2014, 9:41:03 AM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
new_meta_api
v18 v19 3 3 4 4 As of my 2014 Summer of Code project, my second deliverable is a refactored working implementation of the Options API. 5 The Options API is at the core of Django, it enables introspection of Django Models with the rest of the system. This includes lookups, queries, forms, admin to understand the capabilities of every model. The Options API is hidden under the _meta attribute of each model class. 6 Options has always been a private API, but Django developers have always been using it in their projects in a non-official way. This is obviously very dangerous because, as there are no official endpoints, Options could change breaking other people's implementation. Options did not have any unit-tests, but the entire system uses it and relies on it to work correctly. 7 My Summer of Code project is all about understanding and refactoring Options to make it a testable and official API that Django and any other developer can use. 5 The Options API is at the core of Django, it enables introspection of Django Models with the rest of the system. This enables lookups, queries, forms, admin to understand the capabilities of every model. The Options API is hidden under the _meta attribute of each model class. 6 Options has always been a private API, but Django developers have always been using it in their projects in a non-official way. This is obviously very dangerous because, as there is no official API, Options could change breaking other people's implementation. 7 Options also did not have any unit-tests, but the entire system uses it and relies on it to work correctly. 8 9 My Summer of Code project is all about understanding and refactoring Options to make it a testable and official API that Django and any other developers can use. 8 10 9 11 === Current state of the API 10 I now have a working and tested implementation of Options, I have managed to simplify 20+ functions and reduce them to 2 main endpoints, that are the main API. Because Options needs to be very fast, I necessarily had to add some accessors on Options for the most common calls (although both endpoints are cached, we can increase speed by avoiding function calls). Each accessor is a cached property and is computed, using the new API, on first access. 11 12 For this reason, I am planning to release in attached PR: 12 I now have a working and tested implementation of Options, I have managed to reduce it to 2 main endpoints. 13 Because Options needs to be very fast, I necessarily had to add some accessors for the most common calls (although both endpoints are cached, we can increase speed by avoiding function calls). Each accessor is a cached property and is computed, using the new API, on first access. 14 15 I am planning to release in the attached PR: 13 16 - Unit tests for the new Meta API 14 17 - The new Meta API 15 18 - The implementation of the new API throughout django and django.contrib 19 - Documentation 20 16 21 17 22 === Concepts 18 19 23 20 24 ==== Field types … … 27 31 {{{ 28 32 class Person(models.Model): 29 # DATA field30 33 data_abstract = models.CharField(max_length=10) 31 34 }}} … … 37 40 {{{ 38 41 class Person(models.Model): 39 # M2M fields40 42 friends = models.ManyToManyField('self', related_name='friends', symmetrical=True) 41 43 }}} … … 52 54 city = models.ForeignKey(City) 53 55 }}} 54 In this case, City has a related object from Person (as you can access person_set)56 In this case, City has a related object from Person 55 57 56 58 ===== Related M2M … … 68 70 69 71 ===== Virtual 70 Virtual fields do not necessarily have an entry on the database, they are "Django fields" such as a GenericRelation 72 Virtual fields do not necessarily have an entry on the database, they are "Django fields" such as a GenericForeignKey 73 71 74 {{{ 72 75 class Person(models.Model): … … 75 78 item = GenericForeignKey('content_type', 'object_id') 76 79 }}} 77 GenericForeignKey uses content_type and object_id to keep track of what model type and id is set by item, but item itself does not have a concrete presence on the database. 80 81 GenericForeignKey uses 'content_type' and 'object_id' to keep track of what model type and id is set to item, but item itself does not have a concrete presence on the database. 78 82 In this case, item is a virtual field. 79 83 … … 82 86 83 87 ===== Local 84 A local field is one that is defined on the queries model and is not derived from inheritance. 85 Fields from models that directly inherit from abstract models or proxy classes are still local 88 A local field is when is not derived from inheritance. Fields from models that directly inherit from abstract models or proxy classes are still local 86 89 87 90 {{{ … … 97 100 ===== Hidden 98 101 Hidden fields are only referred to related objects and related m2m. When a relational model (such as ManyToManyField, or ForeignKey) specifies a related_name that starts with a "+", it tells Django to not create a reverse relation. 102 99 103 {{{ 100 104 class City(models.Model): … … 105 109 }}} 106 110 107 In this case,City has a related hidden object from Person (as you can't access person_set)111 City has a related hidden object from Person (as you can't access person_set) 108 112 109 113 ===== Concrete … … 111 115 112 116 ===== Proxied relations 113 Proxied relations are when concrete models inherit all related from their proxies.117 Proxied relations are relations that point to a proxy of a model. 114 118 115 119 {{{ … … 137 141 }}} 138 142 139 get_fields takes a set of flags as parameters, and returns a tuple of field instances that match those parameters. All possible combinations of143 get_fields takes a set of flags as parameters, and returns a tuple of field instances. All possible combinations of 140 144 options are possible here, although some will have no effect (such as include_proxy combined with data or m2m by itself). 141 get_fields is internally cached for speed and a recursive function that collects fields from each parent of the model.145 get_fields is internally cached for speed and it is a recursive function that collects fields from each parent of the model. 142 146 An example of every (sane) combination of flags will be available in the model_meta test suite that I will ship with the new API. 143 The 'export_map' key is only used internally (by get_field) and is not part of the public API. 'export_map=True' will return an OrderedDict with fields 144 as keys and a tuple of strings as values. While the keys map exactly to the same output as 'export_map=False', the tuple of values will contain all 145 possible lookup names for that field. This is used to build a fast lookup table for get_field and to avoid re-iterating over every field to pull 146 out every possible name. 147 The 'export_map' key is only used internally (by get_field) and is not part of the public API. 'export_map=True' will return an OrderedDict with fields as keys and a tuple of strings as values. While the keys map exactly to the same output as 'export_map=False', the tuple of values will contain all possible lookup names for that field. This is used to build a fast lookup table for get_field and to avoid re-iterating over every field to pull out every possible name. 147 148 148 149 {{{ … … 176 177 }}} 177 178 178 'get_field' returns a field_instance from a given field name. field_name can be anything from name, attname and related_query name. 179 get_field is recursive by default and does not include any hidden or proxied relations. There has still not been any reason to add these 180 and they can be derived from 'get_fields'. 179 'get_field' returns a field_instance from a given field name. field_name can be anything from name, attname and related_query_name. 180 get_field is recursive by default and does not include any hidden or proxied relations. 181 181 If a given name is not found, it will raise a FieldDoesNotExist error. 182 182 'get_field' is internally cached and gets all field information from 'get_fields' internally. 183 183 184 184 NOTE: There is an inconsistency between the defaults of get_field and get_fields. 'get_fields' by default enables only data fields 185 while 'get_field' by default enables data and m2m. This is because of backwards-compatibility issues ( get_field already existed).185 while 'get_field' by default enables data and m2m. This is because of backwards-compatibility issues (read more below). 186 186 187 187 {{{ … … 209 209 ==== Using bitfields as flags 210 210 211 get_field and get_fields were originally designed to work with bits. The main choice for this decision was because there were many options and, 212 in order to avoid providing multiple flags, it would be better to provide bits. 211 get_field and get_fields were originally designed to work with bits. The main choice for this decision was because there were many options and to avoid providing too many flags. 213 212 The original API for bits is: 214 213 … … 239 238 240 239 The decision taken was to port 'get_field' and 'get_fields' to flags. 241 A port of the old implementation lies here if you are interested: https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade/django/db/models/options.py240 A port of the old implementation still lies here if you are interested: https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade/django/db/models/options.py 242 241 243 242 ==== Removed direct, m2m, model … … 246 245 the attributes (there is only 1 place where m2m is used). 247 246 248 The decision taken was to drop direct, m2m, model in the return type and only keep field_instance. All the rest will be derived .247 The decision taken was to drop direct, m2m, model in the return type and only keep field_instance. All the rest will be derived if needed. 249 248 250 249 ==== Removed all calls "with_model" … … 252 251 253 252 ==== Removed the need of multiple maps 254 The previous implementation relied on many different cache maps internally. This is somewhat necessary, but tends to increase bug-risk 255 when cache-expiry happens. For this reason, my implementation relies only on 2 cache tables, and I have added a specific function to do 256 cache expiry (called _expire_cache) that will wipe out all memory. 257 The downsides if this aspect is that we cache a bit more naively (there are less layers of caching) but benchmarks show this does not 258 decrease performance. 253 The previous implementation relied on many different cache maps internally. This is necessary, but tends to increase bug-risk when cache-expiry happens. For this reason, my implementation relies on only 2 cache tables, and I have added a specific function to do 254 cache expiry easily (_expire_cache). The downsides of this aspect is that we cache a bit more naively (there are less layers of caching) but benchmark shows no real decrease of performance. 259 255 260 256 ==== Used internal caching instead of lru_cache 261 Our first approach to caching was to use functools.lru_cache. lru_cache is a simple decorator that provides cache and an expiry function 262 built-it. It worked correctly with the new API but cProfile quickly showed how a lot of computing time was done inside lru_cache itself. 263 264 The decision taken was to do very caching with simple try / catch and a dictionary for memoizing. This is also because we really don't need 265 the 'lru' part of 'lru_caching': there are only a finite number of combinations that can be called. 266 267 ==== Used internal caching instead of lru_cache 268 Our first approach to caching was to use functools.lru_cache. lru_cache is a simple decorator that provides cache and an expiry function 269 built-it. It worked correctly with the new API but cProfile quickly showed how a lot of computing time was done inside lru_cache itself. 270 271 The decision taken was to do very caching with simple try / catch and a dictionary for memoizing. This is also because we really don't need 272 the 'lru' part of 'lru_caching': there are only a finite number of combinations that can be called. 257 Our first approach to caching was to use 'functools.lru_cache'. 'lru_cache' is a simple decorator that provides cache and an expiry function built-it. It worked correctly with the new API but cProfile quickly showed how a lot of computing time was done inside lru_cache itself. 258 259 The decision taken was to drop 'lru_cache' in favour of a simpler caching strategy. This is also because we really don't need the lru part of 'lru_caching'. there are only a finite number of combinations that can be called. 273 260 274 261 ==== Use cached_properties when possible 275 262 Function calls are expensive in Python, All sensible attributes with no arguments have been transformed into cached_properties. 276 A cached property is a read-only property that is calculated on demand and automatically cached. If the value has already been calculated, 277 the cached value is returned. Cached properties avoid a new stack and are used for fast-access to fields, concrete_fields, 263 A cached property is a read-only property that is calculated on demand and automatically cached. If the value has already been calculated, the cached value is returned. Cached properties avoid a new stack and are used for fast-access to fields, concrete_fields, 278 264 local_concrete_fields, many_to_many, field_names 279 265 … … 292 278 293 279 This was done for 2 reasons: 294 1) We managed to squash 2 functions (get_field and get_field_by_name) in 1 single call 295 2) I could not find any reason for the many_to_many flag to exist! there can never be data and m2m fields with the same name. So this looked 296 like a legacy parameter that didn't have any effect (because turning it off did not break any tests) 297 298 Finally, the reason the many_to_many flag existed was for a special validation case that was not documented anywhere. Russell helped me in 299 looking for edge cases and finally I came up with a failing test case: https://github.com/django/django/pull/2893. The test case would fail on the 300 new API but succeed on master. 301 302 Our final iteration was to add all the field types as flags to get_field. By making m2m as first parameter, we avoid breaking existing implementations 303 and maintain a similarity with the 'get_fields' API. 280 - 1) We managed to squash 2 functions (get_field and get_field_by_name) in 1 single call. 281 - 2) I could not find any reason for the many_to_many flag to exist! there can never be data and m2m fields with the same name. So this looked like a legacy parameter that was never removed (because turning it off did not break any tests). 282 283 The reason the many_to_many flag existed was for a special validation case that was not documented anywhere. Russell helped me in looking for edge cases and finally I came up with a failing test case: https://github.com/django/django/pull/2893. The test case would fail on the new API but succeed on master. 284 285 Our final iteration was to add all the field types as flags to get_field. By making m2m as first parameter, we avoid breaking existing implementations and maintain a similarity with the 'get_fields' API. 304 286 305 287 === Performance 306 Throughout my project I have always kept an eye on performance. Throughout the development of my API I have refactored often and always looked for 307 bottlenecks using cProfile. I am happy to say no major decrease in speed has happened, and the new implementation does a couple of optimizations 308 that were not present in the old system. Said this, I prefer to not comment on performance but just to show the benchmarks. It will be the core 309 team to decide if this is feasible or not. 288 Throughout my project I have always kept an eye on performance. I have always looked for bottlenecks using cProfile and other benchmarking tools. I am happy to say no major decrease in speed has happened, actually the new implementation does a couple of optimizations that were not present in the old system. Said this, I prefer to not comment on performance but just show the benchmarks. It will be the core team to decide if this is feasible or not. 310 289 311 290 === Main optimization points 312 291 313 292 ==== Compute inverse relation map on first access 314 In order to find related objects, the current implementation does the following 293 In order to find related objects, the current implementation does the following: 315 294 316 295 {{{ … … 323 302 REF: https://github.com/django/django/blob/master/django/db/models/options.py#L488 324 303 325 This tends to be expensive depending on the setup, but results in a O(models * fields) complexity. We can increase performance by 326 computing a inverse relation map on first access. This is done only **once**, not once per model 327 328 REF: https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/apps/registry.py#L176 329 330 In this way we have a map of model -> [related_object, related_object, ..] and computing a hash lookup is O(1). 331 332 https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/db/models/options.py#L423 333 334 Now, only 1 much smaller loop is needed. 304 This tends to be expensive, it results in a O(models * fields) complexity. We can increase performance by computing an inverse relation map on first access. This is done only **once**, not once per model (https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/apps/registry.py#L176). 305 306 In this way we have a map of { model : [related_object, related_object, ..] } and computing a hash lookup is O(1) (https://github.com/PirosB3/django/blob/soc2014_meta_refactor_upgrade_flags_get_field/django/db/models/options.py#L423). 335 307 336 308 337 309 ==== Benchmarks 338 Here is a benchmarks table. It is benchmarking soc2014_meta_refactor_upgrade_flags_get_field (68dc11708eb2170540729b71db6bcaf4c46d6504) 339 against django/master 310 Here is a benchmark results table. It is benchmarking soc2014_meta_refactor_upgrade_flags_get_field (68dc11708eb2170540729b71db6bcaf4c46d6504) against django/master. 340 311 341 312 Djangobench: each number was picked as median of 2000 trials. … … 344 315 ==== Backwards compatibility 345 316 All previous _meta functions will be backwards-compatible, with a DeprecationWarning. 346 347 317 348 318 ==== Next Steps