Squidsway proposal November 2025 v.1.00

The proposal for Squidsway funds two things:
The Squidsway tool:

The Squidsway project:

The tool will be open source, for any dev (eg, ecosystem product teams) to use, and future work includes an LLM-based frontend for non-devs to query it.
The project will be funded by the community on an ongoing basis, so will be focused on live, open questions that the community is discussing at any given time. There will be a mechanism for the community to request data on issues of interest.
This proposal funds only the first three months. If the community likes what it sees, then subsequent proposals will fund ongoing work.

GOVERNANCE FAILURES ARE A TREASURY ISSUE.

SQUIDSWAY WILL SOLVE THOSE FAILURES FASTER.

I want to improve Polkadot governance because I'm a cypherpunk and I think Polkadot can lead the world, not in just governance of blockchains, but in blockchain-based governance of the offchain world.
Governance is a product on Polkadot, its a field we are leading in, and we should invest in growing the lead we have - make it something to showcase.

But you, dear tokenholder, should fund improving Polkadot governance because

GOVERNANCE FAILURES ARE A TREASURY ISSUE

We are iterating our processes based on assumption, hunches and louder voices, instead of evidence.
That wastes time and costs money.

The alternative to iterating based on vibes is data.
Squidsway is a proposal to collect and compile specific bespoke data, targeted at objectively assessing how OpenGov users respond to everything we do in OpenGov - and to generate insights from these assessments, in order to inform how we continue to iterate OpenGov.

Sprints and Milestones; Proposals and Funding

Deliverables

This first proposal is for $8k USDC, to fund 80 (=40+40) hours over around 3 months,
being the development of an MVP, followed by the first half of the validation phase.

At the end of the work funded by this proposal, the tool should consist of:

modules

ingest relevant governance events from chain data

ingest structured/quantitative offchain data (e.g. from Polkassembly)

curate data (using queries to assign tags, e.g. "whale", "shrimp")

indexer capable of reindexing

At the end of the work funded by this proposal, I expect that the outputs I will report to just be sample outputs demonstrating that the tool is functioning - concrete, but probably boring and uncontentious, observations.
Don't worry- the plan is for the insights to become more insightful over time as the tool grows to be able to ingest and compile more awkwardly structured data!

The second proposal would fund the second half of the validation phase.
By the end of that work, I intend that the tool will be ingesting qualitative (natural language) data and outputs would begin to demonstrate what is possible with the tool. I should also have some basic benchmarking to flag up any feasibility questions and potential non-labour costs for the future.

At the end of each funded period, I will report the hours spent on each sprint or other labour.
Overspends in each funding period will be added on to the next proposal for retrospective funding.
Underspends will be subtracted from the next proposal or, in the case of the project winding down (i.e. if a referendum fails), returned to treasury.

Funding

I am proposing to work via sprints, each being 20-80 hours, at $100/hr.
I am proposing to, initially, submit individual treasury referenda to fund upfront around 2 months of work (40-160 hours) each, initially with their own proposals which will be updates to this original proposal.

When the work and delivered outputs settle into a more steady rhythm (i.e. timing, expectations and amount to request become predictable), I plan to switch to the Treasury Guardian model (scheduled funding).
After about a year, the need to code modules to ingest new data sources should have reduced significantly, leaving the compilation of data (ie reindexing and querying) as the largest labour cost (which would also reduce if the LLM frontend becomes popular).
I would hope that, a year after the validation phase, that multiple people in the community will be proficient in using the tool, so that compiling the governance report would be less about the project generating insights and more like curating insights generated by the community using the tool.

Methodology

The methodology is intended to be very, very agile.
The idea of generating insights is to tell us something we didn't know, rather than setting out to prove or disprove a pre-defined set of hypotheses.
Central to that is the ability to, in investigative terms, 'pull on threads' - or, in software terms, to 'rapidly iterate'. This means that the treasury will, for each sprint/for each proposal, be funding something that it does not know what it will be.

This agile way of working is necessary because:

1 - We need to go where the evidence takes us

2 - It's likely that many of each of the small technical steps that would make up a milestone can only be identified once a previous step is complete, so identifying and costing out these small technical steps in advance would either lead to wasted labour or force investigations down an inflexible path.

The fact that, in the base case of Squidsway funding referenda, the treasury will be funding something unknown should be mitigated by the ongoing nature of the project, and the fact that each 'milestone' (ie funding period) is a small amount.

Any Questions?

Two parts, insights and tool
The tool is a backend, not a frontend
I can haz dashboard?
How is different from, say, Dune Analytics?
What do we get from these governance insights?
What kind of 'user behaviour' are we trying to encourage?
What are these 'iterations' of OpenGov?
Can you investigate <insert issue> ?
WTF is 'rich data' / 'chain indexer'?
Contact

Two parts: insights and tool

The proposal covers building and deploying the tool, and generating insights (the 'project') to be published.
The tool being open source and modular means others can generate insights too.

Most OpenGov users will benefit only from the Squidsway project - regular insight reports to be published, and the improved community-wide decision-making these are intended to bring about.
Insights will, when relevent, focus specifically on live questions of interest regarding OpenGov, governance, incentive alignment in voter behaviour and similar questions which are the focus of stakeholder (eg W3F, DAOs) consideration, or are just live issues of interest to the community.
(Funds disbursement is relevant only insofar as it affects these kind of questions but otherwise out of scope - insights on where funds go would be more relevant to the Treasury Report).

However, anybody with basic JS skills can run a Squidsway tool instance to generate their own insights in their area of focus.
As the Squidsway project will require a maximally flexible, rapid iteration reindexer, this is what the Squidsway tool will be.
It will be open source, and designed to be easy to insert quick user built modules.

I can haz dashboard?

No, but there'll be an LLM-based query engine later.
But why not?

The tool is a backend, not a frontend

The strength of the Squidsway project is insight, not just monitoring. That strength is based on the Squidsway tool being able to rapidly (as in 'rapid prototyping') integrate the widest range of data sources, and answer a wide range of questions. The tool’s inputs and outputs will not remain static, but will iterate quickly.

Sure, everybody loves a dashboard - but a dashboard is not the appropriate way to interact with data structures which change frequently. So the proposal is (for now) for a backend tool, with the data accessible through database tools such as GraphQL Explorer.

Once the validation phase is complete and the tool is producing valuable insights, I'll add an LLM instance (with a UI) capable of generating queries based on user requirements, so that non-dev users can also use the tool.

How is different from, say, Dune Analytics?

Analytics dashboards provide monitoring of the metrics that we already know can be useful (though these useful metrics are usually mere proxies to use in place of what we really want to know but lack the tools to measure).

Dune is not able to query the contextual data that Squidsway modules can for offchain, curated or compliled data.

compiled data:
DuneSQL is able to generate what I am calling 'compiled data', ie the results of queries combining datasets.
But it is not able to then use these as context for futher queries. You would need to combine all your queries into a single graph-like query and hope that it doesn't bust Dune's compute limits.
Example: ___

curated data:
Dune also cannot use 'curated' data at index time.
The nearest it could get is, if you have defined the curated data as subsets, to start from those subset/s, and then apply a query to them.
Example: the definition of "whale" or "shrimp" over the life of an account would be curated data (ie requires a separate indexing, since balances change over time). Dune would need to start from this curated data and then perform a query on each record to generate compiled data. Squidsway would be able to compile data using multiple rich data sources (rather than the one> which Dune needed to start with). It would also be able reindex bsed on this new compiled data.

offchain data:
Dune does not ingest bespoke offchain data - its only offchain data sources are those that Dune have already decided to index for general use.
Bespoke offchain data will be a major use case of the Squidsway tool.

What do we get from these governance insights?

The aim at all times, and whether in the field of governance, UX, trading or some other user behaviours, is to create actionable insights that can explain mass trends in user behaviours from the perspective of the 'individual' modellable user,
so that we can more effectively encourage the outcomes we want.
This is something which is simply not possible with the level of detail (or depth of reindexing) available in a tool like Dune Analytics.

The examples in this doc are just for illustration, they might not be of burning importance to OpenGov users.
But we are currently spending a lot of time on discussion on questions like: the effectiveness of Decentralised Voices, the procedural quality of proposals, the effectiveness of purported KPIs for large treasury spend ... let alone the toxicity we have been through relating to big-ticket spends like the various marketing spends and bounty, and allegations of corrupt collusion and improper conflict of interest.
Objective data on the actual questions at stake can get us a long way towards solving each of these problems.

Using Dune Analytics to model user behaviour may be like trying to hit a target blindfolded, but Opengov sometimes feels like a mass brawl in a group of people that are all wearing VR headsets showing a different reality.

Folks, we need all the objective data we can get.

What kind of user behaviour are we trying to encourage?

Defining and encouraging the desired outcomes is a question for OpenGov or for the teams making use of Squidsway.
Squidsway is not the part which incentivises or encourages user behaviours
- it's the part which identifies where the opportunities are to do that.

'user behaviour':

To illustrate the meaning, though: 'user behaviours' will generally be (individual or aggregated) measurable actions that can be taken onchain in the Polkdaot ecosystem, such as voting, staking, liquidity provision, but likely in more specific detail then this, such as "voting by pre-existing wallets that never voted before".
On a technical level, though, the outputs of the Squidsway tool will be the same class of measurables as its inputs - therefore, not limited to onchain actions.

'encourage':
Already, we seek to change user behaviours all the time - incentivising adoption and liquidity, using social norms to encourage delegation and voting, working on UX to reduce friction and using (some pretty blunt) technical instruments to encourage the adoption of procedures and norms for proposer and delegate behaviours.
The mechanisms we are using - game theory, finely targeted incentivisation and the like - are powerful but we are often applying them amateurishly, iterating our processes and mechanisms based on just guessing what works.

The idea behind Squidsway is that we encourage these kinds of user behaviours by more empirical (ie more reliable) means.

And more specifically?

It would be possible on a technical level to use offchain data for measurable outputs - a simple example would be SEO scores, a more complex one would be to combine Twitter data and semantic analysis to measure outcomes like "proposals that started a Twitter flame war", so the 'user behaviour' would be "not starting flame wars".
(Note- it's important for the Squidsway project to stay credible and objective, so it would not work towards subjective data like that as insights - but it would be technically possible for the tool, and it's open to anyone to run their own instance and do so)

More likely and less contentiously, 'user behaviour' could be defined in terms of compiled data - so, for example, to flag outliers in the level of correlation between whale wallets' votes and those of DVs, or to establish whether power users (eg, wallets that use many Polkadot features) are treasury conservative, or tend to vote for pro-marketing or pro-development treasury proposals.
As I envision it now, the Squidsway project is focused towards governance-related questions like these. Product teams will no doubt wish to run their own Squidsway instances to measure success on their own adoption metrics. And it may be that the community indicates that the Squidsway project should orient towards some other measurable area than governance, such as feature adoption or defi behaviours.

Can you investigate <insert issue> ?

Short answer: Yes (probably).
The Squidsway project will be reliant, to a degree, on community members feeding in to help set priorities for what to investigate.
This doesn't mean that every request will be something technically suitable for the tool, especially in the earlier phases of development. Many applications of the tool will be issues that would more properly done by product teams running thier own instance. And after a few sprints, I will develop a policy against bad-faith directions and anything that compromises the credible neutrality of the project.

In general, though - Yes, there will be a requests mechanism for anyone in the community to suggest directions, and the project will be fully funded by the community on an ongoing basis and will therefore need to be responsive to the community's needs.

WTF is 'rich data' / 'chain indexer'?

A chain indexer is a tool that indexes and stores, in greater or lesser detail, a blockchain’s data. Most relevant data in a blockchain (even data as basic as account balances) is not accessible unless you consult a node or an indexer. RPCs under the hood of polkadot.js or your wallet software connect to full nodes but data applications like most block explorers, or data dashboards, use chain indexers on their backend.

Applications that process blockchain data usually index and store the information which is easiest to obtain, and when they want to combine of these different data sources (such as comparing voting frequency of wallets against those wallets' balances), they combine already indexed datasets. This is faster, but limits the complexity of the combination.
More complex data applications such as Chainalysis's perform some degree of multi-step indexing, allowing them to retrieve during index time, so they can treat their datasets as graph data (meaning the indexer can follow trails at index time).
The Squidsway tool takes this a couple of steps further with what I'll call 'compiled and 'curated' data.

compiled data is just data that has been indexed and combined through multi-step context-aware indexing. It could be, for example, "average conviction" for each account (across the accounts' lifetimes), or "voted on with higher/ lower/ usual conviction" for each proposal.

curated data uses tags for fast reindexing of commonly used conclusions - for example, accounts could be tagged with categories from "whale" to "shrimp" despite the fact that account balances change over time.

In addition to these, the third and most powerful kind of rich data the Squidsway tool will index is
offchain data
Since the tool will reindex multiple times, there is less need for its data sources to be fast.
This opens up the possibility to make use of (API-based) web data and, at a higher processing cost, scraped web data and LLM outputs.
For example, the tool will ingest discussions from Polkassembly/ Subsquare / Polkadot forum and process the natural language in discussions there in order to generate tags for sentiment, contentiousness, compliance with each norm, etc.
I hope that this particular feature will help proposers avoid creating proposals that fail for predictable reasons, and create a healthier environment in online governance discussions in general.

How does the tool work technically?

The Subsquid/SQD SDK uses Extract-Transform-Load (ETL) pipelining to populate a database from a structured data store (archive node).

The skeleton of the Squidsway tool will extend this in two ways:
To add reindexing - so that the indexing process can be set to stop at some block and restart using external data sources and/ or the contents of its own database for the 'Extract' stage.
To add alternatives to Subsquid's 'Extract - Transform' logic:
'Extract - Cache - Transform' and 'Extract - Transform - Cache - Transform' for dealing with slower (ie external) data sources. This is to retain pipeline-like code structure while dealing with data sources that are both error-prone and many time slower. This will require durable storage which I is likely to be file-based.
These two things will be structured so that syntax is similar to that for Subsquid's BatchProcessors

The tool's modules will have a standardised interface for each of ETL, ECTL and ETCTL so that they can be written by users with a short learning curve.
Modules will be the integrations with, usually, external sources like APIs, Selenium-based scraping of standard webpages, RAG-LLM scraping of other online sources, and LLM processing of scraped sources. Additionally, common or complex queries of the tool's own database will be modules.

Contact

squidsway@daouse.com , tg or, probably better: the comments in the most recent proposal or forum discussion.