If your instance of Endur/Findur is struggling with performance due to a high number of Operation Services, this article is for your eyes only.
On Her Majesty’s Secret Operation Services
Operation Services (OPS) are a key part of ION Group’s Openlink product. They enable users to configure business rules that tailor the solution to their specific needs. OPS have a number of triggers, from deal booking to static data configuration. They are typically used to run bespoke checks, such as blocking changes unless given criteria are met, or to initiate a downstream process such as interfacing data out of the system.
Whilst having 20 to 30 OPS is commonplace, environments with 50 or more is also not unheard of. They are typically implemented over time; most are built during the initial implementation, whilst some are added post go-live as new requirements arise. As with anything built incrementally, this can lead to inefficiencies. Growing volumes also mean processing will take longer, straining the system, impacting user experience, and even causing sporadic showstoppers.
Before diving into the potential shortfalls of an OPS implementation, we need to understand that OPS come in two forms: Pre and Post. Both mostly operate with plugins so good coding practice is paramount, but there are ther clear differences on when and how each are triggered, and where potential pitfalls lay. It is important to take note about the prior ‘showstoppers’ comment and, especially if your firm finds itself with 50+ OPS, now is the time to take a serious look.
Pre-OPS trigger before a change is saved to the database, and typically have low footprint in the application.
These are predominately used for pre-validation checks and small-scale automation (autofill etc.). Pre-OPS run on user sessions, so having too many can have a direct impact on user experience. How they are built also matters – running huge SQL queries or complex processes is not recommended. Pre-OPS are a good starting point for a review if there are complaints from the user base about poor system responsiveness. Clients looking to reduce customizations can convert plugins into Expressions, a viable alternative for simple checks and written in pseudo-code via an in-built editor.
Post-OPS are more intricate. These run after a change has been saved into the database. They are typically more complex and their processing heavier. Unlike pre-OPS, for which the data required is usually available in-memory, post-OPS often need to query the database to retrieve the required data. This can lead to inefficiencies; because each OPS runs on an object basis (e.g., for each deal being booked) the database ends up being queried repeatedly.
As post-OPS trigger after changes are registered in the system, performance issues do not directly impact the user experience. Nevertheless, too many and/or poorly optimized post-OPS can eventually strain the system and cause issues further down the line.
License to Thrill
Openlink’s Post Process Service allows the distribution of OPS to the grid for parallel processing, addressing performance considerations and allowing clients to scale up as volumes grow. Not all clients have licensed it, but in recent years ION has been pushing its Cloud + Grid-Only offering, which entails using the Post Process Service and putting post-OPS onto the grid.
Taking processing out of user sessions makes for more robust operations as user sessions can crash, whilst the Grid has built-in redundancy. Any switch should be carefully planned as there could be hidden pitfalls. The Post Process Service alone does not solve all inefficiencies: as post-OPS are run on an object-by-object basis, running them locally or on the grid still results in the database being hit by the same queries over and over. If post-OPS need to run in a specific order, for example, this may not happen once these are distributed to run in parallel.
It may just be a matter of time before most clients have to obtain the license for the Post Process Service.
Whilst the service is a good offering with numerous advantages, clients should still carefully review what this means for their environment. In some cases, moving OPS to the grid can equate to burying one’s head in the sand, as poor design could mean performance issues are simply shifted to one side rather than solved. Parallel processing could also inadvertently create problems for processes that need to run in a specific order.
Whether a client has already acquired this license or not, a review of their OPS implementation could reveal inefficiencies and highlight routes to improving system performance and stability.
A (re)View to a Kill
This is where having a holistic review of a one’s OPS implementation can help; an analysis can identify if there are performance gains to be made from a database load standpoint and assess if there are any unforeseen risks regarding the sequencing of operations.
For example, some post-OPS might be better suited as a scheduled task that process objects in bulk, rather than individually. Take a simple use case: a client needs to push data to a user table when a new deal is booked. Designed as an OPS, 100x deals would make 100 database calls to retrieve the necessary data, then another 100 to insert to user table. Distributed to the grid or not, jobs still run individually and result in 200 database calls. A scheduled task (e.g., running every five minutes) that retrieves any deals booked not yet processed to the user table, would be able to bulk-query the data and process it in just two database calls, regardless how many deals are entered in between runs.
Where the sequence of events is paramount, when processing on the grid, the order in which jobs are performed is not deterministic. TPM could also be considered as it can enforce steps to run in a specific order.
TPM allows user interactions (e.g., assignments for exception handling) and has a visual interface to view how post processes unfolded. It is grid-enabled, allowing jobs to be distributed to the grid; however, clients should be considerate of their grid sizing, as TPM might not prove as efficient as the Post Process service which is dedicated to post processing.
Aggregation of OPS and refactoring of code is also a viable approach in some cases, for example merging several OPS that trigger for the same situations (i.e., deals booked to Validated) but were built individually.
Aggregating OPS into one can also help ensuring events follow a strict sequence and improve performance. Just as William Faulkner suggested to kill your darlings, getting rid of unnecessary OPS can lead to processing and performance improvements.
Tomorrow Never Dies
As usual, there is not a one-size fits all to each situation/problem to solve. OPS are usually implemented to solve a specific business problem, all of which come in different shapes and sizes. Performance is driven by the volumes of your operations and the specifics of the implementation.
A review of your OPS implementation would, however, help you assess the impact of switching to Openlink’s Post Process service if you have not done so yet. It also a good place to start for these clients whose user base are complaining about system responsiveness. Whilst moving post-OPS to the Grid is a firm step towards better performance, we recommend doing extensive testing and ensuring performance lines up with your expectations.
If you would like our opinion on your Operation Services implementation, feel free to reach out!