DevOps Zone is brought to you in partnership with:

I am an author, speaker, and loud-mouth on the design of enterprise software. I work for ThoughtWorks, a software delivery and consulting company. Martin is a DZone MVB and is not an employee of DZone and has posted 78 posts at DZone. You can read more from them at their website. View Full User Profile

Martin Fowler: User Defined Field

07.24.2013
| 5325 views |
  • submit to reddit

A common feature in software systems is to allow users to define their own fields in data structures. Consider an address book - there's a host of things that you might want to add. With new social networks popping up every day, users might want to add a new field for a Bunglr id to their contacts.

For in-memory purposes, often the best way to do this is to allow classes to include a hashmap field for user-defined fields (a pattern Kent Beck calls Variable State).

# ruby
class Contact
  attr_accessor :firstname, :lastname

  def initialize
    @data = {}
  end

  def [] arg
    return @data
  end
  def []= key, value
    @data[key] = value
  end
end

aCustomer = Contact.new
aCustomer.firstname = "Martin"
aCustomer[:bunglrId] = 'fowl'

With a setup like this you can add affordances to your UI to allow users to attach new fields to objects. If you want common user defined fields, you can use a class variable to keep a list of common keys for the hashmap. There is some awkwardness in that regular fields are accessed differently to user-defined fields, but depending on your language even this can be overcome. If your language supports[a href="http://martinfowler.com/dslCatalog/dynamicReception.html" style="font-size: 16px; background-color: transparent; color: rgb(130, 55, 151); text-decoration: none;"]Dynamic Reception then you use this to access the hashmap with regular field access.

class Contact...
  def method_missing(meth, *args)
    if @data.has_key? meth
      return @data[meth]
    else
      super
    end
  end

Often the trickiest part of this is figuring out how to persist this. If you're using a schemaless database, then it's usually straightforward - you just add the user-defined keys to your application defined ones. The trickiness comes from a database with a storage schema, particularly a relational database.

Usually the best option is to use a Serialized LOB, essentially creating a large text column into which you store the user-defined fields as a JSON or XML document. Many databases these days offer pretty nice support for this approach, including support for indexing and querying based on the data structure within the LOB. However such support, if available, is usually more awkward than using fields. [1]

Another route is using some kind of attribute table. A table might look something like this.

CREATE TABLE ContactAttributes (
  contactId   INTEGER, 
  attribute   TEXT, 
  value       TEXT, 
  PRIMARY KEY (contactId, attribute))

Again, querying and indexing are awkward. Queries can involve a good bit of extra joins that can get rather messy.

Pre-defined custom fields offer another system. Here you set the schema up with fields like custom_field_1 (and perhapscustom_field_1_name. You are limited to only the number of custom fields per instance that you have pre-defined. As usual indexing and querying are awkward.

When using a attribute table or pre-defined custom fields you may choose to have different columns for different SQL data types. So pre-defined fields might be integer_1, integer_2, text_1…, or a attribute table might have multiple value fields (text_value, integer_value).

dynamic schema is an approach that's often overlooked. To do this you set things up so that when someone adds a field, you use an alter table statement to add that field to the table. Our Mingle team does this and have been happy with how it's worked out. [2] The new fields can be indexed and queried just the same as application-defined fields. This does mean all instances get all fields, so is less handy if you get a lot of variance between instances.

Your persistance scheme choices will be affected by what you use for relational mapping. User-defined fields aren't the most well-trod parts of the relational mapping problem, so there's a lot of variation in support from different relational mapping libraries.

User-defined fields are a similar problem to non-uniform types [3]. Both problems lead to the need for a more flexible schema, or indeed a truly schemaless approach (although remember that schemaless doesn't mean you don't have a schema). If you have non-uniform types that aren't changing at the users' behest, then one of the inheritance oriented patterns may make sense. (Single Table InheritanceClass Table Inheritance, or Concrete Table Inheritance.)

Notes

1: Bret Taylor describes a scheme for indexing fields in a such a scheme by building separate index tables for each indexable field.

2: Mingle's approach is actually a bit more involved than just adding fields to an existing table. Mingle's central record type is a card (which represents stories, tasks etc). The fields on a card vary by project and you can have many projects in the same database. So rather than use a single card table, mingle creates a new table for each project's card. It then adds fields dynamically to this table as users desire.

3: Non-uniform types are types where instances use a small and very different selection of fields. Sometimes these are called sparse tables, because if you look at the whole table each row only uses a small number of a large list of columns. The difference between non-uniform types and user-defined fields is that non-uniform types have all the potential fields known to developers, while user-defined fields allow fields to be created that developers will never know about.

Published at DZone with permission of Martin Fowler, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Jaime Metcher replied on Wed, 2013/07/31 - 5:01pm

Thanks Martin - another concise and balanced exposition.  Characterising the "dynamic schema" option as "often overlooked" particularly resonated with me.  I've frequently been tasked with writing a custom query on a relational database, only to find that the developer has completely obfuscated any schema they might have had in mind (and I say "might" advisedly) by storing everything in a forest of JSON structs.   There are tools for dealing with this, but all of the sudden the job no  longer looks like a quick data extraction from a query browser.

Do developers do this because relations with the DBA have completely broken down?  Maybe , but also a lot of devs are scared by SQL, and even more so by DDL.  I've been waiting for the penny to drop that SQL is really an elegant and sophisticated DSL (and, moreover, one that is supported by hugely optimized runtimes), but from what I see in the wild I'll be waiting a while yet.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.