This page last changed on Jun 28, 2009 by smaddox.

Directory caching can be used to enable fast recurrent access to user, group and role data for a particular directory.

This page describes caching that can be configured on the Crowd server, to store user and group information from a Crowd-connected LDAP directory. For an overview of the other types of caching offered by Crowd, please refer to Overview of Caching.

On this page:

Features of LDAP Caching in Crowd

Where the LDAP directory supports it, Crowd will keep an up-to-date cache of user, group and role information retrieved from the LDAP directory. Use of the cache should improve performance particularly in directories which are large, slow or off site.

Summary of the caching features:

  • The cache uses lazy loading where possible, storing only the information that is required rather than loading the entire directory into the cache.
  • Crowd ensures that the cache remains up to date by monitoring the LDAP directory for updates. When a change occurs, Crowd updates the server-side cache incrementally. The monitoring mechanism depends on the type of LDAP directory, as described in the list of supported directories below.
  • The caches are held in memory on the Crowd server machine. They can become considerably large. When necessary, the cache will overflow to disk.

The diagram below gives a conceptual overview of the caches supported by Crowd, including the LDAP caching discussed on this page. For an overview of the other types of caching offered by Crowd, please refer to Overview of Caching.

Supported LDAP Directories

Directory Monitoring Mechanism
ApacheDS version 1.5.0 and later Listening, via 'persistent search'.
See details below.
Microsoft Active Directory Polling, via 'uSNChanged'.
See details below.

LDAP Persistent Search

For supported LDAP directories, Crowd monitors changes via the LDAP change notification feature known as 'persistent search'. The word 'persistent' means that the search remains active forever, once initiated. Crowd performs the initial search on the LDAP directory and receives the results. From that point on, whenever an entry in the result set is updated, the LDAP directory sends Crowd a new copy of that entry.

Microsoft Active Directory Change Notification

Crowd sends a request to AD at regular intervals, asking to be notified of changes made since the last request, via the uSNChanged attribute. You can configure the time interval on Crowd's Directory Connector screen for your Active Directory. Details are in MSDN.

Configuring the Cache

Screen snippet: Cache Configuration for Microsoft Active Directory



You can enable or disable the cache for each directory via the Crowd Directory Connector screen, provided that the directory supports LDAP caching. Below are descriptions of and guidelines for the configuration settings.

Setting the Maximum Number of Elements in Memory

When configuring the cache, you can set the 'Max Cache Elements in Memory'. This number is proportional to the number of users/groups that can be stored in memory before overflowing to disk. If you have limited JVM memory constraints, you can set the number to a lower value. Note that loading from disk can be significantly slower than loading from memory.

Crowd uses Ehcache for its cache implementation. The maximum number of elements in memory is one of the configuration settings allowed by Ehcache. Each cached directory consists of about 20 internal caches. The maximum number of elements in memory corresponds to the maximum elements per internal cache.

The largest internal cache maps DNs to principals/groups/roles. Therefore the maximum number of elements in memory should approximate the sum of principals + groups + roles in the scope of your configured LDAP subtrees.

We recommend leaving this value at the default 50,000 unless you experience memory problems. This setting means that if the number of principals + groups + roles exceeds 50,000, then some of the entities will overflow to a disk-based cache.

We have tested with up to 8,000 users and 8,000 groups. We used 512 MB of JVM memory (-Xmx) and set the maximum number of elements in memory to 50,000.

Setting the Polling Interval for Cache Updates

When configuring the cache for Microsoft Active Directory, you can set the 'Polling Interval'. This is the time interval (number of seconds) that Crowd will wait between its requests for updates from AD.

The length of your polling interval depends on the length of time you can tolerate stale data. If you poll more frequently, then your data will be more up to date. The downside of polling more frequently is that you may overload your AD server with requests.

A good value for the polling interval would take into account the performance of your AD server and the size of the pipe between the AD server and the Crowd server.

If in doubt, we recommend that you start with an interval of 5 or 10 minutes and reduce the value incrementally. You will need to experiment with your setup. You can use Crowd's performance profiling feature to see the performance of your setup.

Very short and very long intervals:

  • We have tested polling intervals as short as 15 seconds, when the AD server is on the local network. Any shorter would mean that the polling interval exceeds the time taken to perform the poll operation. Each poll consists of 6 operations: searching for created or updated principals/groups/roles (3), and searching for deleted principals/groups/roles (3).
  • Alternatively, if you are confident that most of the AD updates will be done via Crowd and that there will be very few changes on the AD server that do not originate from Crowd, then you can set the polling interval to be something much larger, e.g. 2 hours. Note that in this case, any change made to the AD server outside of Crowd may take up to two hours to appear in Crowd.

Inspecting and Flushing the Cache

You can view directory information via the Directory Browser. The 'View Directory' screen allows you to:

  • See basic cache information — View the number of users, groups, and roles cached and the date on which the cache was last updated.
  • Flush the cache — Click the 'Flush Cache' button to remove all cached elements. Crowd will lazily reload data when the data is next requested by a client application.


Screenshot: Inspecting and flushing the cache

Limitations

Limitations for All Directories

The following comments apply to all directory types, including Microsoft Active Directory.

  1. Only specifed directories are supported. — Crowd can only support caching for directories which provide a suitable mechanism. See the list of supported directories above.
  2. Memory usage is higher. — Because of the memory requirements imposed by the caching, we recommend increasing the amount of heap allocated to Crowd to at least 512MB (-Xmx512m).
  3. Delegated Authentication directories are not supported.Delegated Authentication directories are not cached, because only the authentication is delegated to the directory, and authentication itself is not cached.
  4. Posix/NIS schema is not supported — LDAP directories using the Posix/NIS schema RFC 2307 will not be cached, because the group memberships fetching scheme does not support caching.
  5. Externally moving objects out of scope causes problems. — Do not use the external LDAP directory interface to move objects out of the scope of the sub-tree, as defined on Crowd's Directory Connector screen. This will result in an inconsistent cache. If you do need to make structural changes to your LDAP directory, flush the directory cache after you have made the changes to ensure cache consistency.
  6. Nested groups will cause a single large performance hit when finding a user's memberships. — If you are using nested groups, you will notice a single (possibly huge) performance hit on the first call to find the memberships of user. This one-off hit will occur every time the cache is flushed, such as when Crowd is restarted or when you manually flush the cache.
  7. Unique entities are required, with respect to entity type and name — There can be only one entity with a specific entity name and type. For example, you cannot have two groups with the same name in the visible group tree.
  8. DN mapping must be unique. — Directory entities must be mapped in such a way that their DN mapping is unique. For example you cannot have an entity with DN that could correspond to a group and a role.
  9. Renaming objects is not supported. — If the DN of an object is changed externally, the cache will be out of date until flushed.

Additional Limitations for Microsoft Active Directory

In addition to the general limitations listed above, please take note of these comments which apply specifically to Microsoft Active Directory (AD).

  1. Syncing between AD servers is not supported. — Microsoft Active Directory does not replicate the uSNChanged attribute across instances. For that reason, Crowd does not support connecting to different AD servers for syncing. (You can of course define multiple different directories in Crowd, each pointing to its own respective AD server.)
  2. You must restart Crowd after restoring AD from backup. — On restoring from backup of an AD server, the uSNChanged timestamps are reverted to the backup time. To avoid the resulting confusion, you will need to flush the directory cache after a Active Directory restore operation.
  3. Obtaining object deletions requires Administrator access. — Active Directory stores deleted objects in a special container called cn=Deleted Objects. By default, to access this container you need to connect as an Administrator and so, for Crowd to be aware of deletions, you must use Administrator credentials. Alternatively, it's possible to change the permissions on the cn=Deleted Objects container. If you wish to do so, please see this Microsoft KB Article.

Error Handling

If Crowd detects a connection timeout or an error, Crowd will automatically flush the caches and re-start the directory monitors. To manually flush the cache and re-start the monitors, use the "Flush Cache" button.

RELATED TOPICS

Crowd Documentation


Document generated by Confluence on Jul 30, 2009 01:29