Governing Complex Human/ML Processes in Decentralized Communities
This piece is the latest in the BlockScience x Gitcoin collaboration, exploring the operation and governance of a human/ML sybil-detection pipeline in a decentralized community — the GitcoinDAO.
Introduction & Background
In our previous articles in this collaboration, we explored the potential of Computer-Aided Governance for the Gitcoin Grants ecosystem to ensure credible neutrality in public goods funding for the Ethereum ecosystem. Vulnerabilities around the sybil weaknesses inherent in Quadratic Funding systems, which were further explored by researcher Kelsie Nabben, demonstrated the crucial nature of an anti-sybil function within Gitcoin. A fully-fledged Fraud Detection & Defense working group was established within GitcoinDAO to address adversarial behavior at scale in Gitcoin Grants. If these topics are new to you, there is a list of links to follow for more information at the bottom of this article to bring you up to speed on the context of the work that has been carried out so far.
BlockScience, along with the Token Engineering and Gitcoin communities, have been working for more than a year to improve fraud detection and defense on the Gitcoin Grants platform. The journey of sybil detection and fraud prevention in GitcoinDAO started with research & development of a Machine Learning (ML) algorithm to detect potential sybil behavior (before rounds), implementation and operation of the algorithm (during rounds), evaluating and assessing the algorithm to tune for sensitivity and specificity (between rounds), and contextualising the algorithm with human oversight and evaluation in line with the terms and conditions of the Gitcoin platform to govern and upgrade the algorithm (after rounds). This algorithmic policy process has been progressively handed over to GitcoinDAO contributors in the FDD workstream to govern and maintain, under the oversight of GitcoinDAO.
This article focuses on the different modes of work involved in the GitcoinDAO FDD working group, particularly in the inclusion of more skilled GitcoinDAO contributors in the progressive operation of the human-ML Anti-Sybil Operationalized Process. It will also introduce a new tool called the Socio-Technical Frequency map, used to understand the different modes and frequencies of work going into this hybrid human-machine learning process. Ultimately, we are exploring what it takes for a DAO to be successful in breaking down complex work and automated components into achievable tasks carried out by decentralized sub-groups of contributors.
Expanding on the Anti-Sybil Operationalized Process (ASOP)
The governance of a machine learning pipeline is about automated decision making infrastructure in everyday systems. In GitcoinDAO, design and build a semi-supervised, human-in-the-loop machine learning (ML) algorithm for sybil detection at scale. Automation and ML are not tools to be taken lightly in decision making, especially when human wellbeing and social dynamics are at stake in the outcome. Therefore we must ensure that appropriate human inputs, oversight, and appeals are applied to this sybil detection algorithm, and governing (creating the policies around) it remains an open process. A broader discussion of those reasons can be found in books like the Ethical Algorithm or Hello World: Being Human in an Age of Algorithms.
The machine learning pipeline consisted of defining what was considered fraudulent behaviour according to the terms and conditions of Gitcoin platform, followed by algorithmic flagging, the human evaluation of those flags (as shared on the “Gitcoin disputes” Twitter), and sanctions in proportion to the alleged misbehaviour. In order for this process to work well, the algorithm needed to be embedded in a governing process in collaboration with the GitcoinDAO FDD working group.
In practical terms, the ML algorithm (as of this writing, a random forest classifier, which splits observations into ever-smaller groups based on common characteristics before assigning a label) identifies usage patterns and either flags users as ‘sybil’ or ‘not sybil,’ along with a confidence score of its flag accuracy.
To assist and improve the accuracy of the detection algorithm, the process involves a microservice that can be called by the FDD operations group to send a random subset of users to human evaluators. These evaluators, volunteers from the FDD group, look at that subset and make their own decision as to whether a user is likely to be a sybil attacker. This human input is fed back to the algorithm to continually improve its accuracy.
Structuring sybil detection in this operationalized process has several advantages. First, the process includes human discretion by reviewing some (but not all) of the algorithm’s decisions. This saves time and enables the sybil detection process to scale with the popularity and growing demand on the Grants platform. Second, human labeling expands the dataset that the algorithm can use for future model training and validation, continually making it more effective at detecting sybil patterns. Third, the transparent decisions and workflows makes the flagging process accountable towards the broader community, which can exercise oversight through GitcoinDAO stewards and nascent processes.
A map of the pieces of the ASOP as it currently stands is below. Boxes in blue are the automated microservices that can be called by any member of the FDD working group, to activate various components of the ASOP.
Phases of Work Within the FDD Working Group
The work of the Fraud Detection & Defense group falls into two distinct phases: development (between rounds) and operation (during rounds). These phases happen sequentially and continually build on lessons learned in prior grant rounds.
In the development phase, the goal is to make sybil detection better: more accurate, faster, more automated. A priority of this phase is to discuss how the system might be changed to achieve desired goals without changing the valued properties of the system. This is the time for research into prior rounds and investigation into collusion and fraud prevention frameworks. Decisions are made based on that research about which algorithms and models are likely to work best during the round. It is also when features are added or improved that will be at work during the operations phase. The cadence of work tends to be slower, as research, discussion, and construction all take time.
The operations phase, on the other hand, leaves little time for deep, slow thought. The goal in this phase is to make sure the system designed, built, and refined in the development phase runs smoothly and effectively. It’s essential that the processes decided on in development be laid down clearly and unambiguously in order to ensure coherency between and within rounds. When the round kicks off, it is important that all contributors involved in the operations phase know their role and are prepared to play it, because there is little time for training when thousands of grant donations are coming in and sybil attackers must be dealt with in real time. For this purpose, extensive documentation detailing these roles and procedures has been prepared, and before each round simulated sybil flag evaluation scenarios are run with the FDD group to ensure that everyone is on the same page.
Clustering Workstream Modes & Frequencies in Decentralized Communities: The Socio-Technical Frequency Map
Designing and running a human-ML sybil detection algorithm within a permissioned organization is one thing — trying to decentralize that process by opening it up to the governance of a DAO is another. The list of tasks to maintain and improve the ASOP include managing data operations, contributor access control, grant flagging appeals and sanctions, additional feature requests, and the seamless integration of all of the above. This is a big job even for a highly coordinated team — to decentralize and distribute these tasks among members of a DAO requires forethought and a deeper understanding of how socio-technical systems operate at different timescales and modes of work.
Gitcoin’s transition to becoming a DAO was very rapid, and occurred when the anti-sybil pipeline had only been deployed in one round. To transition the management of this crucial anti-sybil process towards the GitcoinDAO community and away from relying solely on the Gitcoin or BlockScience teams, we called for community volunteers with relevant skill sets to help coordinate and manage the ML pipeline.
Contributors that stepped forward were clustered by availability and capacity into required roles in the working group, as fit their skill sets and time available. Work was broken down into the functions and tasks of the overall working group, again clustered by frequency of work and required skill sets. The working group now has contributor teams ranging from engineers to data scientists, dev ops, operations security and community management, carrying out a wide range of tasks at different timescales with different sub-working groups.
In the diagram below, which we’ve termed as the Socio-Technical Frequency Map, you can see various tasks and functions of the FDD workgroup. They are grouped according to modes of work along the x-axis, with more socially-oriented tasks (like policy making) on the left and more technically-oriented tasks (like ML model training) on the right. They are also grouped according to the frequency of that work along the y-axis, with low latency real-time work (like data ops) towards the top of the diagram, and high throughput longer-term work (like research) towards the bottom of the diagram.
By unpacking the different work modes and tempos, it is possible to make a substantial difference for distributed teams working on complex problems, as there’s a substantial cognitive cost related to context switching when bundling pieces of work which have different scale and nature.
Heuristics for identifying the coordinates of a work object in the Socio-Technical Frequency Map can be described as:
- Does the piece of work require responsiveness and synchronicity? Does it require quasi daily rituals? If yes, then probably it is oriented towards the Real Time axis.
- Can the “goodness” of the work output be specified with measurable and consensual criteria? Is it possible to describe precisely how something should be done in a way that’s replicable for someone else? If yes, then it is oriented towards the Technical axis.
- Does the success of the work depend primarily on intensive rather than extensive focus? Can large pieces of it be developed with a nonlinear timing? If yes, probably it is oriented towards the Long Term axis.
- Is success measurement and the how-to method for a piece of work relatively on the eyes of the beholder? Does it require a mix of properties which can be hard or even impossible for someone else to replicate exactly? If so, then probably it is oriented towards the Social axis.
Communicating and managing expectations across several work streams with different members and skill sets is critical to smooth operation of the ASOP — a key part of keeping Gitcoin Grants credibly neutral and operating smoothly, despite having to function “in flight” while the governance of the broader organization was transitioning to a DAO.
Conclusion
In various DAOs and organizations, there are similar patterns in the difficulties faced by distributed teams collaborating on complex work processes. This article sheds light on some of those challenges, and presents useful frameworks like the Socio-Technical Frequency Map to assist in compartmentalizing and operationalizing workstreams. Understanding the different modes and tempos of the required work allows us to break down tasks and responsibilities across loosely coupled working groups that can operate semi-autonomously to achieve their goals, with all contributions feeding into the larger GitcoinDAO system.
Our work with the GitcoinDAO Fraud Detection & Defense group continues, and we look forward to the kickoff of Gitcoin Round 12 starting today! If you want to get involved with the Fraud Detection & Defense working group, reach out on the GitcoinDAO FDD Discord channels to get up to speed. See you all in the sybil-detection trenches!
Article by Jeff Emmett, Charles M Rice, Danilo Lessa Bernardineli, Jessica Zartler, Kelsie Nabben and Michael Zargham.
Read more on the Gitcoin Grants Fraud Detection & Defense and related articles:
Gitcoin Round 11 Anti-Fraud Evaluation & Results (Sept. 30, 2021): https://medium.com/block-science/gitcoin-grants-round-11-anti-fraud-evaluation-results-50f4b0f15125
Gitcoin DAO Ecosystem Mapping (Aug. 2, 2021): https://hackmd.io/V29PtP5CTd-h_1YX8Gp3JA?view
Fraud Detection & Defense Working Group Mapping (Aug. 2, 2021): https://gov.gitcoin.co/t/fraud-detection-defense-working-group-mapping/8122
Evaluating the Anti-Fraud Results for Gitcoin Round 10 (July 7, 2021): https://gov.gitcoin.co/t/workstream-fraud-detection-and-defense-working-group-assemble/158
Resilience of the Commons: observing “resilience” in the governance of decentralised technology communities (Jun. 4, 2021): https://kelsienabben.substack.com/p/resilience-of-the-commons-observing
Fraud Detection and Defense Working Group Assemble! (May 18, 2021): https://gov.gitcoin.co/t/workstream-fraud-detection-and-defense-working-group-assemble/158
Deterring Adversarial Behavior at Scale in Gitcoin Grants (March 26, 2021): https://medium.com/block-science/deterring-adversarial-behavior-at-scale-in-gitcoin-grants-a8a5cd7899ff
How to Attack and Defend Quadratic Funding (March 10, 2021): https://medium.com/block-science/how-to-attack-and-defend-quadratic-funding-a10f0152f069
Colluding Communities or New Markets? (Dec. 22, 2020): https://medium.com/block-science/colluding-communities-or-new-markets-f64194a1b754
Towards Computer-Aided Governance of Gitcoin Grants (Dec. 15, 2020): https://medium.com/block-science/towards-computer-aided-governance-of-gitcoin-grants-730de7bcdbef