Remove text index from mongo ticket repository #6221

davidgelhar · 2024-11-26T14:17:09Z

Issue to solve

Currently, the MongoDbTicketRegistry creates a "text" mongo index on the full json contents of the ticket. That index is large and expensive to maintain: it our environment the index is larger than the ticket storage itself, and load testing shows a 50% reduction in tickets-generated-per-second throughput when the index is present. (Updating this index is the bottleneck in how many authentication transactions CAS can perform).

The only use of the text index is in MongoDbTicketRegistry.getSessionsFor(), which does a TextQuery of the ticket document to search byprincipal. Since principal is already an field in the ticket document in mongo (with its own index!) there is no need to do a full-text search to find it.

Summary of changes

in getSessionsFor, query the principal directly instead of a text search of the json
remove code that creates the json text index (IDX_JSON_TYPE_ID)
remove IDX_JSON_TYPE_ID from list of supported indexes
integration tests

…rincipalId)

mmoayyed · 2024-11-26T15:08:54Z

Hello @davidgelhar Your changeset looks very good. We'll do some internal testing, and then should be good to merge. I'll follow up with an update as time allows.

One point; you do mention:

and load testing shows a 50% reduction in tickets-generated-per-second throughput when the index is present

Can you please share more details to explain how you load test, under what parameters, which load testing tool you used, what the response times, etc looked like before and after? As systematically as possible, so that we might be able to build such tests into the software itself and catch issues and performance regressions later on.

Thank you.

davidgelhar · 2024-11-26T19:48:18Z

The perfomance numbers mentioned occured in the context of an end-to-end load test of a web application using Puppeteer scripts to control a headless browser.

The test script runs in a continuous loop: go to app URL, get redirected to CAS, POST username/password to log in, verify app main screen loaded. This is all wrapped up in a Docker image that's run in a kubernetes deployment in AWS. The deployment is scaled up to generate the desired load: in this case 100 pods were running simultaneously logging in as fast as possible. Both the web application and CAS were also scaled horizontally in kubernetes: 6 instances of CAS were running in parallel. In this architecture, what can't scale out is mongo: there's only 1 mongo primary node.

Throughput was measured by counting all the "ticket issued" log messages across all CAS instances. Without the text index, about 3700 logins per minute were seen; with the index present, it dropped to 1500 logins per minute.

We had not detected this as a problem using the jMeter tests packaged with CAS, perhaps because a single workstation running the test didn't have the CPU power to drive a heavy enough load.

davidgelhar added 2 commits November 25, 2024 10:25

use IDX_PRINCIPAL instead of expensive text index in getSessionsFor(p…

c452572

…rincipalId)

remove json text index from mongo ticket registry; add tests

4efc078

apereocas-bot added this to the 7.2.0-RC3 milestone Nov 26, 2024

apereocas-bot added MongoDb Configuration labels Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove text index from mongo ticket repository #6221

Remove text index from mongo ticket repository #6221

davidgelhar commented Nov 26, 2024

mmoayyed commented Nov 26, 2024

davidgelhar commented Nov 26, 2024

Remove text index from mongo ticket repository #6221

Are you sure you want to change the base?

Remove text index from mongo ticket repository #6221

Conversation

davidgelhar commented Nov 26, 2024

Issue to solve

Summary of changes

mmoayyed commented Nov 26, 2024

davidgelhar commented Nov 26, 2024