Thursday, November 04, 2010

Toplink Cache and Weblogic Cluster

JPA cache

JPA (Java Persistence) implementations use a level 2 cache, which is a cache behind the session cache (aka unit of work cache). This cache is typically used when using EntityManager’s find operation or querying for entities using the primary key. This cache is also used to initialize collection members after loading up an entities collection.

JPA cache in an application server cluster such as that of Weblogic

When the JPA application is deployed on a single node of an application server, and there is no out of band access to the database, cache is really a boon. However, as soon as there are external writes to the database, the cache invalidation problem becomes a problem. The external write could be either some other application writing to the database or the application itself deployed in an application cluster scenario.

For example, consider an application which manages car distributorship. The application queries for the cars in the inventory and on sale, updates the inventory as sold. When such a query is made, it is possible that the Car entity may be loaded into the L2 cache of the JPA implementation. Now when the purchase operation updates one node in the cluster and if it is not refreshed or invalidated in the other node, then the application is potentially dealing with stale data in the cache.

To handle such situation obviously some cache synchronization techniques need to be employed. In this entry, I will be documenting three strategies that can be used with Oracle Toplink working in a Weblogic cluster environment

1. Disable the cache

2. Use Toplink Cache Coordination

3. Use Toplink Grid (Oracle Coherence integration)

Disabling the Toplink L2 cache

L2 cache can be disabled per entity or as a whole. To disable the cache for all entities, add the following property to the persistence.xml

<property name="eclipselink.cache.shared.default" value="false"/>

To disable cache per entity, you will need to use the following entry in the eclipselink-orm.xml

<cache shared="false" />

For example

<?xml version="1.0" encoding="UTF-8"?>

<entity-mappings version="2.1"

xmlns="http://www.eclipse.org/eclipselink/xsds/persistence/orm"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<entity class="mypackage.MyEntity">

<cache shared="false" />

</entity>

</entity-mappings>

However, please note the bug in the current implementation - https://bugs.eclipse.org/bugs/show_bug.cgi?format=multiple&id=304868. The work around suggested in this bug needs to be done.

Use Toplink Cache Coordination

Cache coordination is a mechanism of Eclipslink (Toplink) which allows the JPA caches on the individual nodes to communicate and synchronize the changes. The communication itself could be done through the following transports –

1. JMS

2. RMI

3. CORBA

JMS also allows for asynchronous coordination.

The following strategies can be employed to synchronize the changes in the cache –

1. SEND_OBJECT_CHANGES – This is the default and sends update events only for changes in the attributes of an entity. New object creations (for example adding a new member to a collection) is not propagated

2. INVALIDATE_CHANGED_OBJECTS – This option invalidates the entity on the peer cache whenever it changes.

3. SEND_NEW_OBJECTS_WITH_CHANGES – This option adds to the first option to also send newly created entities. This option takes care of refreshing additions of a member to a collection.

4. NONE – No updates sent

To set up cache coordination, two configurations need to be done –

1. Set up the coordination transport

2. Set up the coordination type for the entities

To set up the cache coordination transport, edit the persistence.xml and add the following properties –

<property name="eclipselink.cache.coordination.protocol" value="rmi"/>

<property name="eclipselink.cache.coordination.rmi.multicast-group" value="231.1.1.1"/>

<property name="eclipselink.cache.coordination.rmi.multicast-group.port" value="9872"/>

<property name="eclipselink.cache.coordination.jndi.user" value="weblogic"/>

<property name="eclipselink.cache.coordination.jndi.password" value="Welcome1"/>

<property name="eclipselink.cache.coordination.propagate-asynchronously" value="false"/>

<property name="eclipselink.cache.coordination.naming-service" value="jndi"/>

<property name="eclipselink.cache.coordination.rmi.url" value="t3://localhost:7004"/>

<property name="eclipselink.cache.coordination.packet-time-to-live" value="4"/>

This sets up the configuration for RMI. Please note that the RMI URL can point to any of the Weblogic managed servers since the JNDI tree is replicated in a cluster. Alternatively, the “port” can also be left out.

To set up the cache coordination type, edit the eclipselink-orm.xml and add the following –

<cache coordination-type="INVALIDATE_CHANGED_OBJECTS" />

For example –

<?xml version="1.0" encoding="UTF-8"?>

<entity-mappings version="2.1"

xmlns="http://www.eclipse.org/eclipselink/xsds/persistence/orm"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<entity class="mypackage.MyEntity">

<cache coordination-type="INVALIDATE_CHANGED_OBJECTS" />

</entity>

</entity-mappings>

However, please note the bug in the current implementation –

https://bugs.eclipse.org/bugs/show_bug.cgi?id=312909. The work around suggested in this bug needs to be done.

Use Toplink Grid

Toplink Grid is the integration of Toplink with Oracle Coherence. Toplink Grid is part of Active Cache which also includes Coherence Web.

Toplink Grid provides three strategies to integrate a JPA (Toplink) application with Coherence –

1. Grid Cache

2. Grid Read

3. Grid Write

Grid Cache is the simplest and the least intrusive for a vanilla JPA application. This basically ties the L2 cache of Toplink with Coherence so that every read from JPA cache results in a get from Coherence and similarly, every write to JPA cache results in a put to Coherence.

Grid Read and Grid Write require code changes and allow Toplink to read through or write through Coherence. However with this feature, the full benefit of Data Grid can be realized.

In this entry, the configurations for Grid Cache is described. The following steps need to be performed for configuring –

1. Create Coherence Cache configuration and refer to this from the JPA application

2. Configure Coherence Cluster and refer to this from the JPA application

3. Set up related shared libraries in Weblogic and refer to these libraries from the JPA application

4. Configure JPA entities to use Grid Cache

Coherence Cache configuration

Create coherence-cache-config.xml file in some known location say D:\ and add the Cache configuration to this file.

<?xml version="1.0"?>

<!DOCTYPE cache-config SYSTEM "cache-config.dtd">

<cache-config>

<caching-scheme-mapping>

<cache-mapping>

<cache-name>*</cache-name>

<scheme-name>eclipselink-distributed</scheme-name>

</cache-mapping>

</caching-scheme-mapping>

<caching-schemes>

<distributed-scheme>

<scheme-name>eclipselink-distributed</scheme-name>

<service-name>EclipseLinkJPA</service-name>

<serializer>

<class-name>oracle.eclipselink.coherence.integrated.cache.WrapperSerializer</class-name>

</serializer>

<backing-map-scheme>

<local-scheme>

<high-units> 10000 </high-units>

<eviction-policy> LFU </eviction-policy>

</local-scheme>

</backing-map-scheme>

<autostart>true</autostart>

</distributed-scheme>

</caching-schemes>

</cache-config>

After this create a JAR file for the above file and add the JAR file as a shared library (target to all relevant servers) in Weblogic console and refer to this shared library from MyApp.ear\META-INF\weblogic-application.xml as follows –

<library-ref>

<library-name>coherence-cache-config</library-name>

</library-ref>

Coherence cluster configuration

In Weblogic console, find “Coherence Clusters” under Services. Create a new Coherence Cluster. Specify the following –

Name: CoherenceCluster

Unicast Listen Address: localhost

Unicast Listen Port: Unique port number

Unicast Port Auto Adjust: true

Multicast Listen Address: 231.1.1.1

Multicast Listen Port: Unique port number

Refer to the above Coherence Cluster in MyApp.ear\META-INF\weblogic-application.xml as follows –

<coherence-cluster-ref>

<coherence-cluster-name>CoherenceCluster</coherence-cluster-name>

</coherence-cluster-ref>

Related Library configurations

Create shared libraries (target to all relevant servers) for the following in Weblogic console –

1. D:\Oracle\Middleware11.1.1.3\wlserver_10.3\common\deployable-libraries\active-cache-1.0.jar

2. D:\Oracle\Middleware11.1.1.3\wlserver_10.3\common\deployable-libraries\toplink-grid-1.0.jar

3. D:\Oracle\Middleware11.1.1.3\coherence_3.5\lib\coherence.jar

Refer to the above shared libraries from MyApp.ear\META-INF\weblogic-application.xml. Add the following elements

<library-ref>

<library-name>active-cache</library-name>

</library-ref>

<library-ref>

<library-name>toplink-grid</library-name>

</library-ref>

<library-ref>

<library-name>coherence</library-name>

</library-ref>

Note that reference to the cache configuration should be above reference to coherence.

Configure JPA entities to use Grid Cache

To set up the Grid Cache, edit the eclipselink-orm.xml

For example –

<?xml version="1.0" encoding="UTF-8"?>

<entity-mappings version="2.1"

xmlns="http://www.eclipse.org/eclipselink/xsds/persistence/orm"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<entity class="mypackage.MyEntity">

<customizer class="oracle.eclipselink.coherence.integrated.config.GridCacheCustomizer"/>

</entity>

</entity-mappings>

Running Coherence

To start Coherence Server, run the following

D:\>java -server -Xms512m -Xmx512m -javaagent:D:\Oracle\Middleware11.1.1.3\modules\org.eclipse.persistence_1.0.0.0_2-0.jar -cp D:\Oracle\Middleware11.1.1.3\cohe

rence_3.5\lib\coherence.jar;D:\Oracle\Middleware11.1.1.3\modules\javax.persistence_1.0.0.0_1-0-2.jar;D:\Oracle\Middleware11.1.1.3\modules\com.oracle.toplink_1.0

.0.0_11-1-1-3-0.jar;D:\Oracle\Middleware11.1.1.3\modules\com.oracle.toplinkgrid_1.0.0.0_11-1-1-3-0.jar;MyApp.jar -Dtangosol.coherence.cacheconfig=d:\coherence-cache-config.xml

-Dtangosol.coherence.management.remote=true -Dtangosol.coherence.distributed.localstorage=true -Dtangosol.coherence.clusterport=7777 -Dtango

sol.coherence.clusteraddress=231.1.1.1 com.tangosol.net.DefaultCacheServer


Labels: , , , ,

Wednesday, November 03, 2010

Java Garbage collection

Garbage collection

Java Garbage collection strategy and configuration chosen has a significant impact on the behavior of an application, particularly server side enterprise applications. There are two aspects to this –

1. Memory usage pattern of the application

2. Type of application

The garbage collection configuration needed to serve an application that creates a lot of short lived objects is different from that of an application which creates more persistent objects.

Similarly, the type of the application determines the GC to be used. Real time or near real time applications cannot take in application pauses caused by GC processing.

Garbage collection strategies

There are two parts to garbage collection –

1. Process of identifying stale objects and marking them

2. Process of garbage collection itself

Most modern collectors use either reference counting or object traversal techniques to mark stale objects. Object traversal is more popular, where by the collector uses some well known root objects to traverse the object tree and identify any object that is not referenced anywhere.

Following are some of the garbage collection algorithms –

1. Mark and Sweep - In this strategy, the GC runs through all the marked objects and frees the memory from the heap. It is not very suitable where there are lots of new objects being created and will also leave the memory fragmented.

2. Mark, Sweep and Compact - In this strategy, the GC runs through all the marked object and not only frees the memory, but also consolidates the heap space so that contiguous blocks of free memory are made available. While this strategy does not fragment memory, it is still expensive for large amounts of new object creation

3. Incremental - This strategy breaks the memory into train cars and trains and deals with memory allocation and freeing up on these managed train cars and trains.

4. Copy - In this strategy, the heap space is broken in two semi-spaces – to-semi-space and from-semi-space. All new memory allocation is performed on the from-semi-space. At some threshold, the garbage collector kicks in and copies over all the used objects to the to-semi-space. After the copy, the to-semi-space becomes the new from-semi-space. Stale objects are left behind and during the next copy, they are overwritten. This strategy is very good for a lot of new object creation. However, if the persistent object lingering on is high, it may result in a lot of copy operations thereby adding to the cost.

Garbage collectors could use any of the above collection algorithms and execute in the following modes –

1. Stop the world - Typically in this mode, the garbage collector stops all other JVM threads when it is processing. This results in intermittent pauses in application processing because of GC runs. This is generally optimized for application throughput.

2. Parallel - In this mode, the garbage collector probably has multiple threads (probably equal to the number of CPUs, however generally the parallelism can be controlled) sharing the garbage collection load. Mostly, this also stops other JVM threads and result in application processing breaks, albeit smaller ones. This approach is also generally targeted for application throughput.

3. Concurrent - In this mode, the garbage collector threads run in parallel with other JVM threads and allows for garbage collection along with the other threads. The garbage collection processing itself is broken down into phases, and application threads may be paused for a couple of phases only. This allows near real time application processing with probably lesser throughput.

So, from an application perspective, it is not really possible to choose both throughput and near real time performance and really is the compromise that application deployment and designer personnel have to make.

GC strategies in Hotspot

Hotspot breaks the heap space into three areas -

1. New or Young generation area – This area uses the copy strategy as discussed before and is optimized for new object creation and is really dedicated for newly created objects and objects with short life cycle.

This area is further sub divided into Eden and two survivor spaces (To-semi-space and from-semi-space).

Eden is the area where all new objects are created. When a threshold is reached, the GC copies the currently used objects to from-survivor-semi-space.

When the survivor threshold is reached in the from-survivor-semi-space, it further copies to the to-survivor-semi-space thereby making it the new from-semi-space. After a few runs of GC on the from-survivor-semi-space, an object which is still alive is said to have tenured and will be then moved to the old generation area.

2. Old generation area - This is the area of the heap space dedicated for long standing objects. Typically this area uses Mark, Sweep and Compact collectors.

3. Permanent area - This area is used by JVM for storing permanent objects such as Classes and Methods.

Configuring GC in Hotspot

http://download.oracle.com/javase/1.5.0/docs/tooldocs/windows/java.html

http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

The total heap size in Hotspot can be configured using –Xms (initial size, default is 2M) and –Xmx (max size, default is 64M)

Please note that the total heap size covered by the above configuration only includes the Young generation and old generation areas and EXCLUDES permanent area. To configure permanent area size, use -XX:PermSize (initial size of the permanent space, default is 4M) and -XX:MaxPermSize (max size of the permanent space, default is 32M). It is generally advisable to set this value at an appropriately high value with both initial and maximum set to the same as every time it will be resized, it will cause a full GC run.

The Young generation and old generation sizes can be further controlled using a ratio using the configuration -XX:NewRatio (ratio of YG to OG; default ranges from 2 to 12 depending on the processor and whether client or server setting). If its value is 2, then it means it is half of old area and is really 1/3rd the total heap size. If more control is needed, then -XX:NewSize (initial size of YG) and -XX:MaxNewSize (max size of YG) can be used.

To control the size of Eden and Survivor areas, -XX:SurvivorRatio (default is 8) can be used. This controls the ratio of one of the survivor semi-spaces to Eden. Default values of 8 means it is 1/8th the size of Eden.

To control the threshold when objects are copied between survivor spaces, use -XX:TargetSurvivorRatio (default is 50). This is the percentage of free space in a survivor space before the objects are copied to the from survivor semi-space. So, by default, when it is 50%, objects are copied over. For large heap spaces, this should be higher at 80 or 90 to avoid frequent copies.

To control the threshold when objects tenure in a survivor space, use -XX:MaxTenuringThreshhold. It specifies the number of times objects will be copied over before tenuring.

Apart from the above heap size configuration, new garbage collectors were introduced in JDK 1.4 -

1. Low pause collector – A Parallel copy collector is used on the new generation area along with concurrent mark, sweep and compact collector for the old generation. To choose this strategy, use - XX:+UseParNewGC (for parallel copy collector) and -XX:+UseConcMarkSweepGC (for concurrent mark, sweep and compact collector)

By default, the parallel copy collector will start as many threads as CPUs on the machine, but if the degree of parallelism needs to controlled, then it can be specified by the following option -XX:ParallelGCThreads=

This collector will give relatively less pauses for the application.

2. Throughput collector – In this, only Parallel copy collector can be used on the new generation area. To enable this, use XX:+UseParallelGC

Configuring GC on JRockit

JRockit typically divides the heap into two areas –

Nursery – This area is meant for newly created objects and typically after two runs of collection, old objects are tenured into the old area

Old area – This area is meant for the more old persistent objects

The heap size can be configured using –Xms (initial size of heap) and –Xmx (max size of heap). Nursery size can be configured using –Xns value.

JRockit collection strategy can be either dynamic or static. When configured to dynamic (which is the default), it can be specified its priority – whether it should optimize for near real time performance or application throughput using –XgcPrio flag. Values could be throughput or pausetime.

To take control of the GC strategies, the mode can be switched to static with the use of –Xgc flag. Values for this are singlepar, genpar, singlecon, gencon.

1. Single heap area with Parallel collector (singlepar) – Uses a single sized heap (not partitioned to nursery and old area) with parallel collectors (with stop the world semantics) which will use mark, sweap and compact algorithm to GC. For applications that don’t allocate a lot of short lived objects, this will improve memory utilization and throughput, although with possible long GC waits.

2. Single heap area with Concurrent collector (singlecon) – Uses a single sized heap with concurrent collectors that work concurrently with application threads. GC pauses are shorter; however application throughput will be impacted. Not good for applications that generate lot of short living objects

3. Generational heap area with Parallel collector (genpar) – Uses parallel collector on partitioned heap (nursery and old area) with stop the world semantics. This is optimal for throughput applications that may allocate large number of short living objects though may have longer pause times

4. Generational heap area with Concurrent collector (gencon) – Uses concurrent collector on partitioned heap. Optimal for real time semantics for applications that also have high number of short lived objects.

Labels: , , ,