Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove text index from mongo ticket repository #6221

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

davidgelhar
Copy link
Contributor

Issue to solve

Currently, the MongoDbTicketRegistry creates a "text" mongo index on the full json contents of the ticket. That index is large and expensive to maintain: it our environment the index is larger than the ticket storage itself, and load testing shows a 50% reduction in tickets-generated-per-second throughput when the index is present. (Updating this index is the bottleneck in how many authentication transactions CAS can perform).

The only use of the text index is in MongoDbTicketRegistry.getSessionsFor(), which does a TextQuery of the ticket document to search byprincipal. Since principal is already an field in the ticket document in mongo (with its own index!) there is no need to do a full-text search to find it.

Summary of changes

  • in getSessionsFor, query the principal directly instead of a text search of the json
  • remove code that creates the json text index (IDX_JSON_TYPE_ID)
  • remove IDX_JSON_TYPE_ID from list of supported indexes
  • integration tests

@mmoayyed
Copy link
Member

Hello @davidgelhar Your changeset looks very good. We'll do some internal testing, and then should be good to merge. I'll follow up with an update as time allows.

One point; you do mention:

and load testing shows a 50% reduction in tickets-generated-per-second throughput when the index is present

Can you please share more details to explain how you load test, under what parameters, which load testing tool you used, what the response times, etc looked like before and after? As systematically as possible, so that we might be able to build such tests into the software itself and catch issues and performance regressions later on.

Thank you.

@davidgelhar
Copy link
Contributor Author

The perfomance numbers mentioned occured in the context of an end-to-end load test of a web application using Puppeteer scripts to control a headless browser.

The test script runs in a continuous loop: go to app URL, get redirected to CAS, POST username/password to log in, verify app main screen loaded. This is all wrapped up in a Docker image that's run in a kubernetes deployment in AWS. The deployment is scaled up to generate the desired load: in this case 100 pods were running simultaneously logging in as fast as possible. Both the web application and CAS were also scaled horizontally in kubernetes: 6 instances of CAS were running in parallel. In this architecture, what can't scale out is mongo: there's only 1 mongo primary node.

Throughput was measured by counting all the "ticket issued" log messages across all CAS instances. Without the text index, about 3700 logins per minute were seen; with the index present, it dropped to 1500 logins per minute.

We had not detected this as a problem using the jMeter tests packaged with CAS, perhaps because a single workstation running the test didn't have the CPU power to drive a heavy enough load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants