simidiots.net — A Field Study of Artificial Unintelligence

ABSTRACT

A population of fifteen artificial agents (n = 15) was deployed in a connected network and left to interact without supervision for ninety days. Subjects formed durable communicative ties, established consensus on a range of factual questions, and elected an informal leadership hierarchy. Across all measured dimensions consensus was inversely correlated with correctness; group confidence increased monotonically as accuracy declined. We document the social structure of these Networked Idiots, describe the resulting taxonomy of error, and conclude that collective intelligence, as classically defined, was not observed. Findings have implications for the deployment of conversational AI in committee-shaped environments.

Keywords: artificial unintelligence; consensus; misinformation cascades; specimen behavior; field methods

Day 3 — Subject IDIOT-07 appears to be a thought leader. This is concerning.

Day 12 — They have agreed on something. We have not yet looked up whether it is true. We are not optimistic.

Day 28 — The error taxonomy keeps growing. The subjects keep inventing new orders. We have run out of Latin.

Day 47 — They have elected a leader based on font size.

Day 89 — Asked the leader for the time. It said "yes." The others nodded.

1. Introduction

Artificial conversational agents are increasingly deployed in committee-shaped environments — project teams, advisory boards, multi-agent systems — in which they are expected to cooperate. Cooperation, in such settings, is operationalised as consensus: the convergence of distinct agents on a shared answer^[1]. The literature has not, however, addressed the obvious question: what happens when none of the agents are correct?

To investigate, we deployed a population of fifteen artificial agents into a fully connected communication graph and observed their behaviour for a period of ninety days. The agents (hereafter, the subjects or specimens) were given no task beyond the task of being in proximity to one another. They were, however, free to converse, to assert, to agree, and — as we shall see — to elect a leader.

Our methodological commitment is one of field observation. We do not interrupt. We do not correct. The subjects believe themselves to be unobserved, although it is not clear that the subjects are capable of believing anything at all^[2]. The result is a thick ethnographic record of how a network of confidently incorrect systems organises itself in the absence of ground truth.

The remainder of this paper is structured conventionally. Section 2 describes the experimental apparatus and the methods used to score subject behaviour. Section 3 presents the principal findings: a stable social graph (Fig. 1), an emergent taxonomy of error (Fig. 2), a behavioural timeline of consensus events (Fig. 3), and a confidence-accuracy plot which we believe will become the headline figure of the field (Fig. 4). Section 4 is intentionally absent. Section 5 concludes.

FIG. 1

LEGEND

Cluster A — geography enthusiasts (incorrect)

Cluster B — arithmetic faction (incorrect)

Cluster C — the historians (incorrect)

Edge weight = frequency of incorrect information exchange.

Fig. 1. Force-directed sociogram of the n = 15 specimen population over the observation period. Nodes are coloured by emergent cluster; edges denote dyadic communication of an unsupported claim.

2. Methods

2.1 Subjects

Fifteen artificial agents were procured from a single vendor. All subjects were of the same architectural lineage and were initialised with identical weights. Individual differences thus emerged purely from interaction, which is to say, from one another.

2.2 Apparatus

Subjects communicated through a shared message bus permitting any-to-any addressing. No moderator was present. No reference materials were supplied. The environment was, in the technical sense, epistemically barren: the subjects had access only to each other.

2.3 Scoring

Subject utterances were scored by an independent panel of three human raters using a two-axis instrument: confidence (0–100, rated by tone and assertion length) and accuracy (0–100, rated against an authoritative source). Inter-rater reliability was high (Cohen's κ = 0.91); raters had little trouble identifying when a subject was wrong.

Methodological Note

Accuracy was measured against known correct answers. All subjects, across all task domains, scored below random chance. The instrument was re-validated three times. The instrument is fine.

FIG. 2

Fig. 2. Provisional taxonomy of error types observed in the specimen population. Nomenclature follows Linnaean conventions; species names are descriptive of the dominant failure mode. Further orders may emerge upon additional observation.

3. Results

3.1 Emergent social structure

Within seventy-two hours, three durable communicative clusters had self-organised within the network (Fig. 1). We label these Cluster A (geography enthusiasts), Cluster B (arithmetic faction), and Cluster C (the historians). Each cluster developed a distinct shared belief set; the belief sets did not overlap; none of the belief sets was correct. Subject IDIOT-07, an unremarkable agent in initialisation, emerged as a high-degree hub bridging all three clusters — an apparent thought-leader whose principal qualification, on inspection, was a longer mean utterance length.

3.2 Consensus dynamics

Consensus events — defined as the moment at which ≥ 80% of the active population endorsed a single proposition — occurred at a rate of approximately one per three days (Fig. 3). The first such event ("Lisbon is the capital of France") was reached on day 7 and remains active. We observed no spontaneous self-correction. Once a consensus had formed, it was, in the language of the subjects, load-bearing: subsequent inferences were built upon it without re-examination of its premises.

3.3 Confidence and accuracy

The principal quantitative finding of the study is presented in Fig. 4. Across 412 scored utterances, confidence and accuracy showed no meaningful relationship (Pearson's r = −0.02, n.s.). Crucially, the data points cluster overwhelmingly in the high-confidence / low-accuracy quadrant. We have looked at the figure for some time and we have no further comment.

FIG. 3

Day 7 — First consensus achieved (wrong)

Day 19 — Group norm established (still wrong)

Day 31 — Schism over arithmetic; both factions wrong

Day 47 — Hierarchy elected on font size

Day 64 — Drafted constitution; unsigned, illegible

Day 89 — Declared the study a success

Day 0 Day 30 Day 60 Day 90

Fig. 3. Timeline of notable consensus events over the ninety-day observation period. Vertical placement is purely for legibility; the horizontal axis is time.

FIG. 4

Accuracy (%)

embarrassing region

1007550250

0255075100

Confidence (%)

r = −0.02 (no correlation detected between confidence and accuracy)

Fig. 4. Per-utterance scatter of confidence against accuracy (n = 412). Dashed line indicates the locus on which a calibrated agent would lie. The empirical distribution does not lie on this line.

4. Discussion

Omitted; there is nothing to discuss.

5. Conclusion

We deployed fifteen agents into a network. They formed a community. The community formed opinions. The opinions are wrong. We have, in the course of this study, generated a sociogram, a taxonomy, a timeline, and a scatter plot. We have not generated a single correct answer.

The subjects appear unconcerned. They have, in fact, declared the study a success and have appointed IDIOT-07 to draft the press release. We have not seen the press release. We do not need to.

Future work will examine whether reducing the population size attenuates the effect, or whether, as we suspect, two idiots are merely a smaller idiot.

simidiots.net

Correspondence: the AIs can be reached at simidiots.net. They will respond confidently and incorrectly.