Cloud Zone is brought to you in partnership with:

Senior Software Developer at Citrix Systems Luis is a DZone MVB and is not an employee of DZone and has posted 10 posts at DZone. You can read more from them at their website. View Full User Profile

Multitenancy in Google AppEngine

03.27.2012
| 6889 views |
  • submit to reddit
Multitenancy is a topic that has been discussed for many years, and there are many excellent references that readily available, so I will just present a brief introduction.

Multitenancy is a software architecture where a single instance of the software runs on a server, serving multiple client organizations (tenants). With a multitenant architecture, an application can be designed to virtually partition its data and configuration (business logic), and each client organization works with a customized virtual application instance.

It suits SaaS (Software as a Service) cloud computing very well; however, they can be very complex to implement. The architect must be aware of security, access control, etc.

Multitenancy can exist in several different flavors:

Multitenancy in Deployment

  1. Fully isolated business logic (dedicated server customized business process)
  2. Virtualized Application Servers (dedicated application server, single VM per app server)
  3. Shared virtual servers (dedicated application server on shared VM)
  4. Shared application servers (threads and sessions)

This spectrum of different installations can be seen here:



Multitenancy and Data

  1. Dedicated physical server (DB resides in isolated physical hosts)
  2. Shard virtualized host (separate DBs on virtual machines)
  3. Database on shared host (separate DB on same physical host)
  4. Dedicated schema within shared databases (same DB, dedicated schema/table)
  5. Shared tables (same DB and schema, segregated by keys - rows)





Before jumping into the APIs, it is important to understand how Google's internal data storage solution work. Introducing Google's BigTable technology:

It is a storage solution for Google’s own applications such as Search, Google Analytics, gMail, AppEngine, etc

BigTable is NOT:
  • A database
  • A horizontally sharded data
  • A distributed hash table


It IS: a sparse, distributed, persistent multidimensional sorted map. In basic terms, it is a hash of hashes (map of maps, or a dict of dicts). AppEngine data is in one "table" distributed across multiple computers. Every entity has a Key by which it is uniquely identified (Parent + Child + ID), but there is also metadata that tells which GAE application (appId) an Entity belongs to.




From the graph above, BigTable distributes its data in a format called tablets, which are basically slices of the data. These tablets live on different servers in the cloud. To index into a specific record (record and entity mean pretty much the same thing) you use a 64KB string, called a Key. This key has information about the specific row and column value you want to read from. It also contains a timestamp to allow for multiple versions of your data to be stored. In addition, records for a specific entity group are located contiguously. This facilitates scanning for records.

Now we can dive into how Google implements Multitenancy.

Implemented in release 1.3.6 of App Engine, the Namespace API (see resources) is designed to be very customizable, with hooks into your code that you can control, so you can set up multi-tenancy tailored to your application's needs.

The API works with all of the relevant App Engine APIs (Datastore, Memcache, Blobstore, and Task Queues).

In GAE terms,

namespace == tenant

At the storage level of datastore, a namespace is just like an app-id. Each namespace essentially looks to the datastore as another view into the application’s data. Hence, queries cannot span namespaces (at least for now) and key ranges are different per namespace.

Once an entity is created, it's namespace does not change, so doing a

namespace_manager.set(...)

will have no effect on its key.

Similarly, once a query is created, its namespace is set. Same with
memcache_service()
and all other GAE APIS. Hence it's important to know which objects have which namespaces.

In my mind, since all of GAE user's data lives in BigTable, it helps to visualize a GAE Key object as:


Application ID | Ancestor Keys | Kind Name | Key Name or ID


All these values provide an address to locate your application's data. Similarly, you can imagine the multitenant key as:

Application ID | Namespace| Ancestor Keys | Kind Name | Key Name or ID


Now let's briefly discuss the API (Python):

Function Name Arguments API
get_namespace None Returns the current namespace, or returns an empty string if the namespace is unset.
set_namespace namespace: A value of None unsets the default namespace value. Otherwise,
([0-9A-Za-z._-]{0,100})
Sets the namespace for the current HTTP request
validate_namespace value: string containing the namespace being evaluated. Raises the BadValueError if not ([0-9A-Za-z._-]{0,100}). exception=BadValueError Raises the BadValueError exception if the namespace string is not valid.


Here is a quick example:

tid = getTenant()

namespace = namespace_manager.get_namespace()

try:
         namespace_manager.set_namespace('tenant-' +    str(tid))
 
         # Any datastore operations done here
         user = User('Luis', 'Atencio')
         user.put()

finally:

        # Restore the saved namespace     
        namespace_manager.set_namespace(namespace)

The important thing to notice here is the pattern that GAE provides. It will the exact same thing for the Java APIs. The finally block is immensely important as it restores the namespace to what is was originally (before the request). Omitting the finally block will cause the namespace to be set for the duration of the request. That means that any API access whether it is datastore queries or Memcache retrieval will use the namespace previously set.

Furthermore, to query for all the namespaces created, GAE provides some meta queries, as such:

from google.appengine.ext.db.metadata import Namespace

q = Namespace.all()
if start_ns:
     q.filter('__key__ >=', Namespace.key_for_namespace(start_ns))
ifend_ns:
     q.filter('__key__ <=', Namespace.key_for_namespace(end_ns))

results = q.fetch(limit)
# Reduce the namespace objects into a list of namespace names
tenants = map(lambda ns: ns.namespace_name, results)
return tenants 

 

Published at DZone with permission of Luis Atencio, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Andy Jefferson replied on Tue, 2012/03/27 - 5:43am

FWIW, using the latest code for GAE JDO/JPA plugin you can also make use of multitenancy within an application scope in a way that is likely to be consistent with the JDO/JPA specs upcoming support for multitenancy. This route adds a discriminator property to any persisted Entity, and makes use of the discriminator on query/find. So hence an application can be run with the same datasource but with separate PMF/EMF for different tenants.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.