home

Current Status

Alto is in its infancy and should be considered highly experimental. There are two active (proprietary) projects that will be contributing to Alto as they build out their architectures. Once these two projects are in production and stable, we'll release alto 1.0. Until then, feel free to ask questions or contribute, but don't expect things to "work" or be "supported" yet.

The Basic Idea

Alto is a set of tools for refactoring traditional J2EE web applications to run in a virtual hosting or multi-tenant environment. It is (or will be) designed to deal with situations where an application written to run in a traditional back office environment - meaning one instance, one customer - needs to migrate into more of a Software as a Service model where a single instance of the application must now support any number of individual customers or subscribers. This is necessary to make efficient use of cloud computing resources and to support autoscaling and load balancing in an economical way.

In concrete terms, a one-instance/one-customer J2EE application may need the following set of services to work in an autoscaling virtual hosting environment.

Virtual Host Identification

Various techniques for determining which customer or virtual host a given HTTP request (or message on a message queue or scheduled task) is intended for. One way of doing this is to use the HTTP Host header, but there could be others.

Virtual Host Meta-Data

Once the virtual host is identified, a virtual host aware system will need a means of discovering any configuration meta-data associated with a given virtual host. This meta data might include queue names, JDBC configuration or properties that are otherwise injected through JNDI. This meta data might come through static configuration (XML or JSON) or it might come through the use of a remote database or web service call. This project will define a REST API for discovering virtual host meta-data and provide client implementation for it - although developers could develop any means of resolving meta-data information.

JDBC Datasource Switching

One way of migrating a one-instance/one-customer application to a virtual hosting environment would be adding extra database columns to top level tables that identify the customer or virtual host. That's one way, but it's often not realistic in practice. Another way would be to give the application the ability to switch between different schemas and database servers depending on the virtual host. This project will define a new connection pool system that takes care of ensuring database connections are pointed at the server and schema appropriate to the virtual host of the given HTTP request - and without modifying any code that gets connections from the database.

Cache Segmentation

In systems that use caching, moving to a virtual host environment runs the risk of key collisions or cache hits returning data for the wrong virtual host or customer instance. Since customer A seeing customer B's information is "bad" any virtual host aware system will need a means of segmenting the cache by virtual host. In practice, most J2EE applications that may be migrated to a cloud environment were never designed to run in a distributed multi-node environment and will need to ensure cache concurrency between nodes, particularly ensuring that cache writes and cache invalidation is quickly visible to all nodes. The most common cache in use today in J2EE applications is the Enhanced Hibernate cache (EHCache). This library will provide or use third party solutions for segmenting EHCache by virtual host and adapting it to use AWS Elasticache or memcached. (Tools for adapting EHCache to Terracotta already exist.)

Cache Abstraction

The realities of scaling usually include a lot of trial and error with cache implementations. Flex Alto provides a simple abstraction called AltoCache designed to sit on top of various cache implementations without the complexity of JSR-107. We also use this abstraction for a Spring Factory bean, making it easy to use property injection to select from different cache implementations and configurations. For example, in a local traditional back office environment, you may wish to use Ehcache. In a big cloud/clustered environment you may wish to use memcached.

Distributed Locking

Mutex locks in a clustered environment require either a centralized lock server or an atomic distributed locking system. Flex Alto provides an abstraction that can be used with any locking system and several implementations, including a proprietary telnet based protocol called lockd.

Lockd is an atomic mirrored locking mechanism that supports atomic or non-atomic operation. Lockd can be run as a standalone service or embedded in your application as a Spring bean. You can opt to use a lockd client with an external lockd cluster. We also plan to provide a transaction manager based on our locking framework.

Sessionless Sessions

Maintaining user sessions in a multi-server environment is usually handled via replication. As the size of the cluster grows, the replication overhead grows along with the memory footprint required to support it. An alternate architecture eschews the use of sessions and session variables altogether, opting instead for persistent or cached user and authorization information in a common store where all servers can access it with a unique session token.

The security package includes a set of filters and repositories designed for use with Spring Security that can be used to support login persistence in a large cluster (in a multi-tenant environment), session fixation and session concurrency rules - all without the use of traditional java http sessions.

Property Injection

Traditional J2EE systems may have any number of properties injected via JNDI, such as email server configuration, etc. A virtual host aware system will need a means of ensuring that these properties, when accessed, are correct for the virtual host in scope for the transaction. This library will assume that properties are set via dependency injection (Spring, Guice, etc.) and will provide tools for creating AOP proxies around objects that use properties and intercept method calls to inject the correct value for the transaction's virtual host.

Message Queues

Many J2EE systems rely on asynchronous processing coordinated via message queues. Any virtual host system will need tools for ensuring that any given message is processed in the correct virtual host context. This can be done by using AOP proxies to intercept message enqueuing and dequeing and in the process wrapping the message with virtual host information on the enqueue operation and unwrapping message and setting the virtual host context on the dequeue operation. Another technique would be to use different queues for different virtual hosts and this would be based on some kind of namespace approach for queues. In situations where the application has the ability to dynamically define queues (or the out of band process for creating new virtual hosts has that ability), a different kind of AOP proxy can set the virtual host context based on queue names.

A related issue is apapting JMS queues to the types of message queue environments provided by cloud computing providers, such as Amazon's SQS. This library will include a JMS adapter for SQS.

Database Updates

Many applications or libraries in widespread use include tools for automatically updating the database schema during software version updates. This includes Hibernate's hbm2ddl tool and any number of internal systems. This library will include a mechanism for running database updaters against all defined virtual hosts on application startup, including hbm2ddl specifically. The method will be based on message queues.

This project also provide techniques for discovering and setting virtual host up/down state information. For example, it may be desirable to show a given virtual host as unavailable until it's database update has been completed. This project will provide the techniques for setting and reading the state. Individual applications can then make us of that information to redirect users to the appropriate messaging.

Task Specific Routing

This library will also include a technique for routing requests fitting a configurable set of characteristics to special servers for processing, providing a more precise form of scaling an application than merely adding more nodes to a load balanced cluster. We call this a Pattern Proxy and is a simple way of redirecting resource intensive tasks to another server as a way to maintain stable response times for the core production cluster. For example, you might define a pattern proxy with a regular expression matching the application's URL for generating reports and proxy that request on behalf of a server or cluster designated for report processing. The pattern proxy architecture should also support redirecting requests matching the pattern. Like virtual host resolution, the process for discovering configured proxies should be abstract enough to support static configuration via XML or JSON or an external service or database mechanism.

Scheduled Tasks

This library will include specialized tools for using the Quartz Task Scheduler within a virtual host aware context in such a way that any task fired via quartz knows it's virtual host context prior to execution.

NoSQL Databases

This library includes a simple ORM framework for working with NoSQL databases. This is built around the hash and range key approach to NoSQL databases developed by Google for BigTable and implemented in Cassandra and Amazon's DynamoDB. We also provide a implementation for Amazon SimpleDB under the assumption that SimpleDB might be an intermediate (and cheaper) step on the way to DynamoDB.

NoSQL-Relational Write-Through

Moving relational tables to NoSQL databases can have the ripple effect of breaking reports that depend on relational databases. In order to support legacy reports or business intelligence tools, we'll provide a message queue based technique for writing NoSQL changes into relational database tables. The assumption being that NoSQL will be the primary means of retrieving and manipulating the data with the relational tables used only for reporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly