XTDB Query Level Optimizations
Table of Contents
The bad #
Diving into the depths of database optimization, I recently grappled with a colossal dynamic database powered by XTDB’s database used as a rule management system. The software leverages XTDB’s bitemporal capabilities and Datalog engine, creating a temporal playground where data validity dances to the tune of specific timespans, providing a powerful dynamic data generator.
However, our system hit a snag when users subtly shifted their condition usage patterns.
This seemingly innocuous change triggered a document size explosion in our compute system, turning our once-nimble queries into lumbering behemoths. The result? A software system teetering on the brink of unusability, with query execution times stretching into eternity.
The approach #
Naively, the first approach was to cache some redundant queries with the same parameters. However, this doesn’t fit in a bitemporal database, where data can change across query time. So, the query engine needed to be split down into pieces to start dissection on the query: my approach was to find building blocks that have the most constant input query, and implement a thin layer of memoization on it.
Assuming:
(xt/q db '{:find [(count ?doc)]
:where [[?doc :document/type :foo]]}) ;; => 380291
This call used for example on each iteration of the engine, can be replaced by
(require '[clojure.core.memoize :as memo])
(defn- ^{:clojure.core.memoize/args-fn rest} xt-q* [db query]
(log/trace "computing xtdb memoized query" query)
(xt/q db query))
(defn mk-memoized-xtdb-query
"Creates a memoized version of an XTDB query function,
ignoring the db parameter."
{:added "1.23.0"}
[ttl-ms]
(memo/ttl #'xt-q* :ttl/threshold ttl-ms))
;; Use in the state/lifecycle manager, like integrant
(defonce generic-ttl 60000)
(defonce xt-q (mk-memoized-xtdb-query generic-ttl))
(time (xt-q db '{:find [(count ?doc)]
:where [[?doc :document/type :foo]]}))
;; "Elapsed time: 29026.990211 msecs"
(time (xt-q db '{:find [(count ?doc)]
:where [[?doc :document/type :foo]]}))
;; "Elapsed time: 0.104053 msecs"
The trick here consists of discarding the first parameter of the XTDB query function, ensuring that db
, your database snapshot, is not taken into account in the memoization mapping. Since a good practice for database usage would be to instantiate a new database snapshot on each engine query, including it would lead to a new cache entry every time.
Note: be sure to use the latest
clojure.core.memoize
deps in your project manager. The feature forargs-fn
annotation is only available from the 0.7.0 version.