Architecture guide

Relationship to Datasette

datasette-ui-extras is a third-party Datasette plugin. In many cases, it's possible to extend Datasette using its documented and supported plugin hooks.

In some cases—mainly around the UI—extension via hooks is not yet supported. For these cases, we act more like a parasite than a plugin, wildly monkey-patching Datasette and fundamentally changing what it can do.

Some parts of datasette-ui-extras might migrate into core Datasette over time, or at least be possible via officially-sanctioned plugin hooks.

The implications:

Try to add features in a modular way. Limit their interaction with other features, so their blast radius is small if they stop working due to a breaking change in Datasette. This also makes it easier to upstream the feature into Datasette core.
Try to follow Datasette's practices:
- State should be represented in URLs, not cookies. For example, we use the _dux_show_filters query parameters to toggle visibility of filters on the table page.
- Choose reasonable defaults, but always consider having a hook for the user to customize behaviour.

JavaScript

One area where we differ from Datasette is our willingness to use JavaScript.

In particular, all editing controls are currently implemented via JavaScript and the Write API.

How we communicate data to JavaScript

Most of our UI changes are achieved by self-contained JavaScript modules that run on the page. These modules sometimes need configuration information from the Python code. This information is passed via global variables:

__dux_pks is a list of the primary keys of the table, or ["rowid"] if no explicit primary key is set. This is set on the row page.
__dux_permissions is a map containing the set of permissions that the current user has. This is set on database, table and row pages.
__dux_facet_suggestions is a map suggesting candidate facets for each column. This is set on the table page.
__dux_facets is a map listing which facets are enabled. This is set on the table page.

JavaScript and CSS extensions

A given feature like lazy-load facets or edit rows may have CSS and JavaScript components.

The CSS and JavaScript should be placed in the /static/ folder and registered in the module's __init__.py.

Follow the existing examples to ensure that it's safe for your CSS and JavaScript to be included on any page. For example, you should use selectors that target body.edit-row if your code is only intended for the edit row screen.

The `dux_column_stats` index tables

Many of our features require knowing the shape of data contained in a table, or the set of eligible values to be used in a given column.

This is achieved by an indexing process that runs in the background. It maintains state in each database in a handful of hidden tables.

`dux_ids`

dux_ids assigns a unique ID to every table name and column name in your schema. This allows us to save storage space, as many of the other tables repeat these names.

Browse an example dux_ids table here.

`dux_column_stats_ops`

dux_column_stats_ops tracks the queue of indexing work to be done.

Indexing happens in batches, on a per-column, per-table database. This allows us to make progress while not locking up the Datasette instance for other users.

We use keyset pagination to remember our location in the queue and efficiently fetch the next set of items to index.

Browse an example dux_column_stats_ops table here.

`dux_column_stats`

dux_column_stats tracks high-level summary stats about the values in a column: minimum, maximum, how many were NULL, how many were JSON objects, etc.

This lets the plugin quickly propose what sort of edit control might be appropriate for a given column.

Browse an example dux_column_stats table here.

`dux_column_stats_values`

dux_column_stats_values tracks all the individual values in columns containing string-like values.

This lets us offer per-column autosuggest.

Browse an example dux_column_stats_values table here.

`dux_pending_rows`

dux_pending_rows tracks rows that have been updated and need their columns' summary stats updated.

Once we've finished the initial backfill of stats, we try to keep stats mostly up-to-date. To achieve this, every table gets an INSERT, UPDATE and DELETE trigger. The trigger creates a row in dux_pending_rows with a JSON object that represents the old and new values of the modified row.

The background indexing thread consumes these rows and updates the stats columns.

There are some caveats: we only do updates that can be completed in fixed time. For example, we can update counts, and expand the min and max values when a new value is smaller or larger than older values.

However, deleting a row that previously contained the min or max value cannot be efficiently handled in fixed time. To know what the new min or max value should be, we'd need to scan all the rows of the table. Instead, we leave the old value in place. In practice, for the purposes for which we use stats, and the usage patterns of most databases, this works fine.

Browse an example dux_pending_rows table here.

Relationship to Datasette

How we communicate data to JavaScript

JavaScript and CSS extensions

The dux_column_stats index tables

dux_ids

dux_column_stats_ops

dux_column_stats