Skip to main content
  1. Last Posts/

XTDB Query Level Optimizations

·2 mins
Table of Contents

The bad #

Diving into the depths of database optimization, I recently grappled with a colossal dynamic database powered by XTDB’s database used as a rule management system. The software leverages XTDB’s bitemporal capabilities and Datalog engine, creating a temporal playground where data validity dances to the tune of specific timespans, providing a powerful dynamic data generator.

However, our system hit a snag when users subtly shifted their condition usage patterns.

This seemingly innocuous change triggered a document size explosion in our compute system, turning our once-nimble queries into lumbering behemoths. The result? A software system teetering on the brink of unusability, with query execution times stretching into eternity.

The approach #

Naively, the first approach was to cache some redundant queries with the same parameters. However, this doesn’t fit in a bitemporal database, where data can change across query time. So, the query engine needed to be split down into pieces to start dissection on the query: my approach was to find building blocks that have the most constant input query, and implement a thin layer of memoization on it.

Assuming:

(xt/q db '{:find [(count ?doc)]
           :where [[?doc :document/type :foo]]}) ;; => 380291

This call used for example on each iteration of the engine, can be replaced by

(require '[clojure.core.memoize :as memo])

(defn- ^{:clojure.core.memoize/args-fn rest} xt-q* [db query]
  (log/trace "computing xtdb memoized query" query)
  (xt/q db query))

(defn mk-memoized-xtdb-query
  "Creates a memoized version of an XTDB query function,
  ignoring the db parameter."
  {:added "1.23.0"}
  [ttl-ms]
  (memo/ttl #'xt-q* :ttl/threshold ttl-ms))
  
;; Use in the state/lifecycle manager, like integrant
(defonce generic-ttl 60000)
(defonce xt-q (mk-memoized-xtdb-query generic-ttl))
  
(time (xt-q db '{:find [(count ?doc)]
                 :where [[?doc :document/type :foo]]}))
;; "Elapsed time: 29026.990211 msecs"

(time (xt-q db '{:find [(count ?doc)]
                 :where [[?doc :document/type :foo]]}))
;; "Elapsed time: 0.104053 msecs"

The trick here consists of discarding the first parameter of the XTDB query function, ensuring that db, your database snapshot, is not taken into account in the memoization mapping. Since a good practice for database usage would be to instantiate a new database snapshot on each engine query, including it would lead to a new cache entry every time.

Note: be sure to use the latest clojure.core.memoize deps in your project manager. The feature for args-fn annotation is only available from the 0.7.0 version.

  1. https://clojure.atlassian.net/browse/CMEMOIZE-20