1 Introduction

Human cooperation underlies the emergence of social norms, the functioning of institutions, and the production of collective goods. Yet cooperation in large and partly anonymous groups remains difficult to sustain, because individuals often lack reliable information about others’ intentions, reliability, or past behavior [1, 2]. In such contexts, cooperation depends on the flow of indirect social cues, such as public signals, behavioral traces, reputational markers, and patterns of past actions, that help individuals align expectations and coordinate strategies [3, 4]. These cues are frequently subtle or ambiguous, but collectively they shape perceptions of what is normal, desirable, or strategically advantageous within a group [5, 6].

A general mechanism that integrates these ideas is stigmergy. Originally introduced by Pierre-Paul Grassé to describe collective nest building in termites, stigmergy refers to coordination through persistent modifications of a shared environment [7]. Instead of communicating directly, individuals leave physical or chemical traces that subsequently influence the actions of others [8]. Although rooted in ethology, stigmergy has become a powerful conceptual framework for understanding how human groups self-organize in settings ranging from collaborative editing to online marketplaces and shared navigation systems [9, 10]. In these digital environments, behavioral traces such as ratings, likes, clicks, or comments act as informational cues and shape large-scale patterns of participation, reputation, and cooperation [11, 12].

The increasing digitization of social life has intensified the importance of such traces. Digital platforms algorithmically rank, amplify, and display signals that encode others’ behavior in real time [4, 11]. These traces often persist and accumulate, guiding how individuals infer quality, trustworthiness, or social norms [10]. However, a crucial transformation arises from the fact that digital traces are no longer generated solely by humans. Automated agents, the so-called social bots, now participate extensively in online environments and contribute to shaping these informational landscapes [13]. Bots act at machine timescales, with high consistency and persistence, and can therefore create, reinforce, or distort perceived behavioral patterns in ways that humans may misinterpret as genuine social signals.

Most research on bots has focused on their disruptive capacities. Bots have been shown to amplify misinformation, distort collective attention, and manipulate perceived consensus [14, 15]. Large-scale analyses reveal that bots disproportionately contribute to the early diffusion of low-credibility content, increasing human exposure and resharing of misleading information [14]. Modeling studies further demonstrate that bot-generated content competes with human information, sometimes dominating information ecosystems and altering public opinion trajectories [16]. This perspective frames bots primarily as sources of manipulation and risk in digital societies.

Yet a growing body of work suggests a more nuanced view. Under some conditions, bots can promote cooperation, enhance coordination, or stabilize prosocial behavior in hybrid human–agent populations. In networked public-goods experiments, even a single autonomous agent embedded in a human group can increase cooperation by reshaping local structural connections [17, 18]. Evolutionary game-theoretic studies show that simple “committed” or persistent agents can shift group behavior toward cooperative equilibria, even in one-shot anonymous interactions that otherwise favor defection [19, 20]. Extensions of these models reveal that bots with fixed behavioral strategies can promote cooperation across discrete, continuous, and mixed strategic frameworks [21]. Related work highlights how committed minorities, whether human or artificial, can accelerate norm formation, trigger coordination transitions, or overturn established conventions [20].

In organizational and virtual-team environments, bots acting through limited interaction rules can influence micro-level social processes. Studies of Slackbot, for example, show that automated agents can shape relational communication and affect social-emotional dynamics within professional teams [22]. In virtual-strategy contexts such as multi-agent gaming environments, bots endowed with simple heuristics or swarming algorithms can support exploration, optimize group strategy, or structure collective behavior [23]. These results collectively challenge the dominant view of bots as primarily malicious, suggesting instead that simple artificial agents can serve as stabilizers of cooperation or catalysts of collective intelligence.

Recent experimental evidence extends this perspective by demonstrating that human–bot co-play cannot be reduced to a binary opposition between prosocial preferences and confusion. In a large-scale study of one-shot prisoner’s dilemma games, humans responded positively to persistently cooperative “zealot” bots, yet cooperation declined sharply when participants learned that these bots could not derive material benefits, a pattern indicating that belief-driven expectations, perceived intentionality, and authenticity shape responses to artificial agents [24]. These findings demonstrate that even extremely simple bots with fixed strategies and incapable of reciprocity can reshape human decisions in systematic ways, not because they communicate directly, but because their consistent behavior alters how individuals interpret the social environment.

Despite these advances, little is known about how bots influence human behavior through stigmergic interactions alone. Existing experimental studies typically involve direct interactions, strategic reciprocity mechanisms, punishment or reward systems, or identity-based cues [3, 6]. Conversely, research on social bots in information ecosystems focuses on observational consequences such as manipulation, diffusion dynamics, and large-scale behavioral shifts, yet it does not examine the fine-grained decision processes that unfold within controlled groups [13, 14]. Far fewer studies examine situations in which bots exert influence purely by depositing behavioral traces in a shared environment, without communication, identity cues, or algorithmic sophistication.

Yet such situations are increasingly common. Many online platforms mediate cooperation through trace-based interfaces: rating systems, popularity signals, public histories of contributions, and algorithmically generated ranking cues. Users rarely know whether a given trace was produced by a human, a bot, or an automated system, yet these signals guide expectations, shape norms, and influence cooperation or defection. Understanding trace-mediated hybrid interactions is therefore essential for data science, platform governance, and the design of trustworthy human–AI systems.

Collective search tasks provide a clear case where stigmergic traces structure cooperative decision-making. In such tasks, individuals must explore uncertain environments while inferring useful information from the traces left by others [25]. Ratings or evaluations serve both as signals about the underlying environment and as social cues indicating how others have behaved. Prior research shows that visible contributions can increase cooperation by establishing expectations of norm compliance [6, 26], whereas misleading traces can erode trust and generate cascades of defection [10]. Bots embedded in these environments can systematically bias the informational landscape by persistently leaving cooperative, neutral, or deceptive signals. The resulting behavioral dynamics emerge from interactions between human inference processes, cumulative traces, and iterative decision-making.

In this study, we investigate how minimal bots, defined as simple artificial agents with predetermined behavioral profiles, modulate human cooperation in a collective decision-making task solely through the stigmergic traces they produce. Participants repeatedly engage in a grid-based search problem derived from public goods experiments [25]. They may leave ratings as visible traces, which serve as the only mechanism for coordination or influence. In hybrid conditions, participants unknowingly interact with bots programmed either to provide informative (cooperative), misleading (defective), or uninformative (neutral) signals. Bots do not communicate, adapt, or engage in reciprocity. Their influence arises purely from the persistent traces they create in the shared environment. The bots are controlled by a behavioral model that has been shown to faithfully reproduce the actions of human participants in [25]. In addition, this same type of model, together with the methodology used to define and analyze behavioral observables in [25], allows us to quantify, characterize, and interpret the actions and strategies of human participants interacting with bots.

This controlled setting enables systematic exploration of how artificial traces shape human expectations, behavioral strategies, and group-level outcomes. By comparing hybrid groups with fully human groups in which the proportions of cooperators, neutrals, and defectors are manipulated directly, we can assess whether human behavior responds differently to artificial versus human-generated traces and whether bots effectively behave as committed agents. Our analysis combines behavioral data, modeling of decision strategies, and statistical inference to characterize how different types of traces influence cooperation.

By situating our work at the intersection of behavioral experiments, human–AI interaction, and complex systems, this study makes three specific contributions. First, it isolates a form of human–agent influence that has received comparatively little direct experimental attention, namely, influence mediated solely by persistent traces in a shared environment, without communication, reciprocity, or identity cues. Second, it shows that minimal artificial agents do not simply improve or degrade coordination mechanically, by changing the quality of available information. They also reshape the strategic incentives faced by humans, thereby shifting the equilibrium distribution of collaborative, neutral, and deceptive behaviors. Third, by combining controlled experiments with interpretable behavioral and statistical models, our study identifies a small set of environmental cues that are sufficient to account for these shifts. In this sense, the bots are both controlled generators of stigmergic environments and experimentally tractable artificial agents whose traces can reorganize human collective dynamics.

2 Experimental setup

2.1 Description of the game

The experimental setup is based on a game developed in [25] to study how groups of individuals leave and use digital traces in a controlled environment. The game uses a 5-star rating system, similar to those used on online marketplaces and platforms, allowing users to rate products, services, or sellers and use these ratings to identify the best options.

The game is played in groups of five players, who simultaneously and independently explore a common \(15 \times 15\) table containing 225 hidden values (Fig. 1A). The goal is to identify the cell with the highest value. Cells represent options, the hidden number indicating the intrinsic quality of the option. Values range from 0 to 99 and are randomly assigned to the cells (Fig. 1D) according to the distribution shown in Fig. 1E. Participants play in isolation, without access to the actions or screens of the others (Fig. 1B), and take turns interacting with the shared table through an online application specifically developed for the experiments (Fig. 1C).

Figure 1
Figure 1The alternative text for this image may have been generated using AI.
Full size image

Experimental setup. (A) Screenshot of the table at round \(t=10\), as displayed on a participant’s screen. In this round, the participant has visited and rated two cells, marked with black crosses. The participant just visited the third cell, which holds a value of 14, and must rate that cell on a 5-star scale. The participant’s score will then increase by 14. (B) Photographs of the experimental room and (C) of the user interface. (D) Example of a \(15 \times 15\) table used in the experiments and the simulations of the model. (E) Distribution of the 225 values V used in the tables. (F) Color scale of the visited cells as a function of the fraction of stars used to rate cells since the beginning of an experiment. White color corresponds to cells that have never been visited or to visited cells that have always been rated with 0 stars

The game progresses over 20 rounds, with each round requiring each of the 5 players to visit and rate 3 distinct cells. When visiting a cell, a player discovers its hidden value and must then rate it using a 5-star scale. A round ends when all group members have visited and rated their 3 cells. As the next round begins, the colors of the cells are updated based on the fraction of stars that have accumulated in each cell since the game began. This fraction is determined by dividing the number of stars a cell has received by the total number of stars across all cells. The color scale goes from white (\(0\,\%\)) to black (\(100\,\%\)) through a gradient of shades of red (Fig. 1F), highlighting cells that have accumulated the highest fraction of stars. This evolving color map acts as a long-term collective memory for the group.

The game includes a scoring system that ultimately determines the payouts of the participants, thus creating competition among them. Indeed, a player’s score increases each round by the values of the 3 cells they visit, regardless of the ratings they assign to those cells, thus encouraging them to visit high-value cells. During a one-hour session, the five participants play 10 to 12 games, and their final score is cumulated over these games. Ultimately, they are ranked according to their cumulated score and paid accordingly (20 €, 15 €, 10 €, 10 €, 10 € for players ranked from first to fifth place; see Materials and Methods for more details).

2.2 Experimental conditions

In our previous study [25], three different behavioral profiles were observed among human participants: collaborators, who rated cells proportionally to their value, thus sharing informative signals to the group; defectors, who rated cells in inverse proportion to their value, thus misleading the other members of the group; and neutral players, whose ratings conveyed no useful information, because they rated all cells either with the same number of stars, randomly, or with an indiscernible pattern. A computational model was then designed that reproduces with high fidelity the behavior of human players during both exploration and rating phases, across the three strategic profiles described above. Based on this model, we constructed autonomous agents, or “bots”, that consistently behaved like human collaborators, defectors, or neutral players. These validated bots now enable the implementation of a controlled strategic environment for participants.

In the present experiment, groups of five participants were brought into the room and seated at individual workstations (Fig. 1B). To examine how human behavior is shaped by the strategic environment, we designed a series of experimental conditions in which each participant was in fact playing with four bots adopting one of the three predefined strategies: Col (collaborator), Def (defector), or Neu (neutral). The experimental setup and user interface remained unchanged across conditions, and participants were unaware of the true nature of their co-players. From their perspective, they were part of a group of five humans playing together.

Ten experimental conditions were considered, involving a total of 185 participants. In the first condition (70 participants), the five human participants actually played together, providing the baseline behavior for comparison. In the remaining nine conditions (115 participants), each human participant played in the same physical setup but alongside four bots. We considered the five possible combinations of strategies combining collaborators and defectors: 4 Col – 0 Def, 3 Col – 1 Def, 2 Col – 2 Def, 1 Col – 3 Def, and 0 Col – 4 Def. Next, we considered three conditions where the four bots adopted the same neutral strategy, consistently giving the same rating irrespective of the value of the cell. Three different fixed ratings were used: one star (Neu-1), three stars (Neu-3), and five stars (Neu-5). Finally, in the tenth condition, participants played with four optimized bots (Opt) designed to maximize the group score when playing together [25].

3 Results

3.1 Impact of bots on participant’s performance

We begin by examining the mean normalized scores of both bots and human participants across each experimental condition (see Fig. 2). The results show substantial variability, highlighting how group composition, and thus bot behavior, can significantly affect human performance. Notably, there is a strong positive correlation between bot and human scores, indicating that higher bot scores are associated with higher participant scores.

Figure 2
Figure 2The alternative text for this image may have been generated using AI.
Full size image

Mean normalized score. Mean normalized score \(\langle S \rangle / S_{\text{max}}\) of (A) the human participants and (B) the bots in every experiment. Experiments are listed in ascending order of the mean normalized score of humans. The dots are the experimental data, the black lines are the predictions of the base model, and the red lines are the predictions of the adaptive model (PI-model)

A detailed analysis of the mean normalized scores reveals a clear trend: groups with a higher number of collaborator bots tend to achieve better performance. For example, the scenario with four collaborator bots (4 Col - 0 Def) resulted in \(\langle S \rangle = 0.56\), while the scenario with four defector bots (0 Col – 4 Def) resulted in \(\langle S \rangle = 0.31\). Experiments with neutral bots, regardless of their consistent ratings (Neu-1, Neu-3, Neu-5), yielded tightly clustered mean normalized scores around 0.43. The experiment with optimized bots (Opt) achieved a mean normalized score of 0.48, the second-highest overall. Interestingly, the condition with five human participants achieved a mean normalized score of 0.40, placing it among the lower-performing groups.

To further analyze participant behavior, Figs. 3, 11, and 12 present key observables characterizing the visits and ratings of human participants in the group: \(q(t)\), \(Q(t)\), \(p(t)\), and \(P(t)\), which represent the instantaneous (at round t) and cumulative (up to round t) visit and rating performance of the human participants. Additionally, \(V_{1}(t)\), \(V_{2}(t)\), and \(V_{3}(t)\) indicate the average value of the first-, second- and third-best cells visited by humans at round t, while \(B_{1}(t)\), \(B_{2}(t)\), and \(B_{3}(t)\) quantify the probability of revisiting the first-, second-, and third-best cell from the previous round of human participants. Full definitions are provided in the Materials and Methods (see Sect. 5.4).

Figure 3
Figure 3The alternative text for this image may have been generated using AI.
Full size image

Individual performance and behavior of human participants in experiments with collaborator and defector bots. (A) Probability distribution functions (PDF) of the normalized player score \(S / S_{\text{max}}\) of the humans. (B) Normalized average of the cells visited by humans at round t, \(q(t)\), and (C) the cumulative normalized average up to round t, \(Q(t)\). (D) Average value of the cells visited by humans weighted by the ratings at round t, \(p(t)\), and (E) its cumulative counterpart up to round t, \(P(t)\). Observables (B)–(E) are formally defined in Sect. 5.4. Value of the (F) first-best cell \(V_{1}(t)\), (G) second-best cell \(V_{2}(t)\), and (H) third-best cell \(V_{3}(t)\) visited by the humans at round t. Probability that humans revisit their (I) first-best cell \(B_{1}(t)\), (J) second-best cell \(B_{2}(t)\), and (K) third-best cell \(B_{3}(t)\) visited in the previous round. The dots are the experimental data, and the lines are the predictions of the base model. The predictions of the adaptive model (PI-model) are very similar to those of the base model at this scale, so only the predictions of the base model are shown in the figure

In scenarios with both collaborator and defector bots, human participants achieved higher normalized scores when more collaborator bots were present (Fig. 3A). Participants in these settings tended to open higher-value cells (Fig. 3B, 3C, 3F, 3G, 3H) and assign them higher ratings (Fig. 3D, 3E). They also showed a greater tendency to revisit previously explored high-value cells (Fig. 3I, 3J, 3K), particularly in more collaborative environments.

In contrast to the diversity of performance observed in the experiments with collaborator and defector bots, the three experiments with neutral bots showed remarkable similarity in performance metrics (Fig. 11), suggesting that the consistent ratings of neutral bots had a similar influence on human behavior. Note that since neutral bots tend to revisit and rate the cells with the best scores, they have a somewhat similar effective impact as collaborators in the long run. In fact, performance in these scenarios was analogous to conditions with two collaborator bots and two defector bots. The optimized bots (Fig. 12) led to participant behavior similar to the scenario with three collaborator bots and one defector bot. Finally, the experiment with five humans mirrored the scenario with one collaborator bot and three defector bots, consistent with the findings of [25].

Figure 4 shows the probability of human participants finding the highest-value cells in the different experiments. Similar to the mean normalized scores, the probability of finding high-value cells increased with the level of cooperation in the group. Notably, the five-human condition ranked among the lowest in this metric.

Figure 4
Figure 4The alternative text for this image may have been generated using AI.
Full size image

Probability of finding the cells with the highest values depending on the conditions. Probability for a human to find (A) the best cell of value 99, (B) one of the four cells whose values are 86 (× 2), 85, or 84, and (C) one of the four cells whose values are 72 (× 2) or 71 (× 2). The dots are the experimental data, the black lines are the predictions of the base model, and the red lines are the predictions of the adaptive model (PI-model)

Human participants consistently outperformed bots across all conditions (Fig. 5). However, their average rank varied across experimental conditions. In groups with fewer defector bots, participants ranked higher, likely due to improved quality of social information in the trace. Neutral bot scenarios again produced near-identical rank distributions. In the optimized bot condition, most participants (66%) ranked first, though a notable portion (19%) ranked last, suggesting that some participants failed to interpret the optimized bot strategy effectively.

Figure 5
Figure 5The alternative text for this image may have been generated using AI.
Full size image

Rank of the human participants depending on the conditions. (A), (C), and (E) Distribution of the rank of the human participants among the five players in the group at the end of each game. (B), (D), and (F) Mean rank of the human in the entire experimental session (left plain bar), the first five games of the session (middle hashed bar), and the last five games of the session (right hashed bar). The black lines are the predictions of the base model, and the red lines are the predictions of the adaptive model (PI-model)

Bots’ relatively poor performance compared to humans can be attributed to their inability to adapt to their teammates (3 bots and 1 human) and to the colored table. In contrast, human participants could observe natural cues and adapt their strategies accordingly, as discussed below.

As introduced in [25], participant behavior was classified into three profiles: collaborators (whose average ratings increase with cell value), defectors (ratings decrease with value), and neutrals (ratings independent of value). Figure 6 and Table 3 show the proportions of human participants exhibiting these behaviors across experiments. In conditions with varying proportions of collaborator and defector bots, an increase in collaborator bots correlated with more deceptive behavior among humans, suggesting that deception is advantageous in cooperative groups, as individuals can exploit shared information while misleading the others. However, in less cooperative groups, deceptive strategies were less effective, and humans tended toward more collaborative or neutral behaviors. This also suggests that when faced with exceptionally low-quality information (e.g., with many defector bots), individuals turn to collaboration or a neutral behavior to leave some high-quality information for themselves. Experiments with optimized bots had the highest proportion of collaborative behavior among participants. The experiment with five humans showed similar behavioral proportions to the scenario with four collaborating bots.

Figure 6
Figure 6The alternative text for this image may have been generated using AI.
Full size image

Behavioral profile of human participants depending on the conditions. Fraction of collaborator, neutral, and defector for the experiments in which one human participant plays with (A) collaborator and defector bots, (B) neutral bots, and (C) optimized bots, and (D) for the experiment with five human participants. The black lines are the predictions of the base model, and the red lines are the predictions of the adaptive model (PI-model)

3.2 Model of the visit and rating strategies

We now examine the agent-based model, introduced in [25] and detailed in Materials and Methods, to simulate human behaviors across the ten experimental conditions. These agents mimicking human behaviors, referred to as Mimic agents, have a strategy that is divided into two parts: the visit strategy and the rating strategy.

Each participant’s behavioral profile—whether collaborator, neutral, or defector—is associated with a distinct rating strategy, which we found to be consistent across experiments (see Fig. 14). This consistency implies that a single set of rating strategies can represent human participants across all nine bot conditions, removing the need to define separate strategies for each experiment.

For simplification, the probabilities of rating cells with 1 to 4 stars were grouped together, while those for 0 and 5 stars were modeled using a sigmoid function (see Eq. (6)) for collaborators and defectors, and using a linear function (see Eq. (7)) for neutrals. Moreover, the probabilities of rating a cell of value V with 1, 2, 3, or 4 stars are all equal and given by \(P_{1234}(V) = [1 - P_{0}(V) - P_{5}(V)] / 4\). These resulting probabilities are shown in Fig. 7, with parameter values detailed in Table 5.

Figure 7
Figure 7The alternative text for this image may have been generated using AI.
Full size image

Rating probabilities of Mimic agents as a function of the cell value V. Mean probability of rating a cell with 0 stars (\(P_{0}(V)\)), 5 stars (\(P_{5}(V)\)), and from 1 to 4 stars (\(P_{1234}(V)\)) for the collaborators, neutrals, and defectors, averaged over the nine experimental conditions. The dots are the experimental data, and the solid lines are the model. The probabilities of the model for defectors shown in (C) are also displayed in Fig. 14

Simulations were then run with one Mimic agent playing with four bots. In each simulation game, the Mimic agent’s behavioral profile (i.e., rating strategy) was randomly set according to the observed fractions in the corresponding experiment. The visit strategy was then derived by minimizing the error between experimental and simulated observables (see Eq. (8)). The resulting parameters are presented in Table 4. Notably, since the observables are almost identical in the three neutral bot experiments (see Fig. 11), the same visit strategy was used for these three situations.

Analyzing the threshold parameters \(a_{1}\), \(a_{2}\), and \(a_{3}\) in experiments with collaborator and defector bots provides insights. In particular, as the number of defector bots increases, participants begin revisiting cells from the previous round at lower thresholds—indicating a tendency to settle for lower-value cells rather than continuing to explore for higher ones. In addition, across all nine bot experiments, the parameter α is positively correlated with conditions that lead to high-quality social information. A higher α corresponds to a stronger preference for visiting highly marked (i.e., dark-colored) cells, whereas a lower α leads to a more uniformly distributed selection among the marked cells (see Eq. (4)).

Hence, we find that in the conditions where the social information is trustworthy (i.e., where dark cells correspond to higher values than light cells), the human participants consistently tend to give a larger credit to the cell colors on the table. This also indicates that the human participants are well aware of the degree of collaboration of the four other members of their group and can then adapt their visit and rating strategy according to this qualitative observation. Note that this also explains the better scores achieved, on average, by the participants compared to bots using a fixed strategy. This adaptation highlights the performance gap between humans and bots. While humans adjust their strategies based on group behavior and table properties, bots lack this adaptability, contributing to their comparatively lower scores. In fact, this analysis lays the foundation for designing bots that are able to adapt to their environment.

Overall, the simulation results shown in Figs. 2, 3, 11, 12, 13, 4, and 5 indicate that the model reproduces experimental behavior accurately and offers a faithful representation of participant strategies.

3.3 Predicting the behavioral profiles of human participants

3.3.1 Cues available to human participants

In the previous section, we developed a model for understanding human behavior across experimental conditions. However, this model does not explicitly predict or explain the distribution of observed behavioral profiles within each condition. The purpose of the following analysis is therefore to identify which cues available in the stigmergic environment are sufficient to explain how human participants adjust their strategic profile across conditions. This step is essential because it links the informational structure generated by bots to the observed redistribution of human behaviors. Therefore, it is important to examine more closely the factors that influence an individual’s decision to engage in cooperative, neutral, or deceptive behavior in response to the actions of others, thereby shaping a specific social information context (the colored table).

Figure 6A illustrates that as the number of collaborating bots in the group increases, human participants tend to engage in deceptive behavior more frequently. This suggests that a highly collaborative environment may paradoxically encourage deception. The α parameter discussed earlier strongly suggests that participants have a discernible perception of the trustworthiness of the colored table and, by extension, the qualitative degree of cooperation among other group members. But beyond this perception, what other factors influence individual behavioral choices?

We now introduce several natural qualitative cues available to human participants to evaluate their environment and the properties of social information. First, individuals can judge whether highly colored cells correspond to high- or low-value cells. This evaluation is encapsulated in the observable \(P(t)\) (formally defined in Sect. 5.4), which represents the average value of colored cells weighted by their respective evaluations since the start of the game. Figures 8A–D show significant variation in \(P(t)\) across different experimental conditions. In scenarios with collaborator bots, ratings are predominantly aligned with high-value cells, while in cases with defector bots, ratings tend to favor low-value cells. In the experiment with optimized bots (Fig. 8C), ratings focus on cells with very high values. The experiments with neutral bots (Fig. 8B) and with five humans (Fig. 8D) are similar to the mixed condition with two collaborator and two defector bots. Therefore, \(P(t)\), which can be qualitatively evaluated by players (especially in the later rounds) by visiting cells that have been visited and rated by others, provides a reliable indication of the level of cooperation of other group members. However, despite the reliability of \(P(t)\) as an indicator of the level of cooperation, experiments with optimized bots and those with four collaborating bots, despite having high \(P(t)\) values, show different fractions of collaborators. Therefore, \(P(t)\) may not be the only cue that determines behavior. In the following sections, we will use regression models that incorporate \(P(20)\) as a quantifier of collaboration.

Figure 8
Figure 8The alternative text for this image may have been generated using AI.
Full size image

Two cues available to human participants. (A–D) Average value of the cells visited weighted by their ratings up to round t, \(P(t)\). Inverse participation ratio of the cumulative fraction of stars \(\mathrm{IPR}(\mathbf{P}(t))\). For the experiments in which one human participant plays with collaborator and defector bots (A and E), neutral bots (B and F), optimized bots (C and G), and for the reference experiment with five human participants (D and H). The dots are the experimental data; the lines are the predictions of the base model

Another accessible cue is the effective number of distinct cells that have been rated. While a large number of colored cells may be useful when they correspond to high-value cells, excessive dispersion can obscure social information and may also signal the presence of defectors. This dispersion is quantified by the Inverse Participation Ratio of \(\mathbf{P}(t)\), \(\mathrm{IPR}(\mathbf{P}(t))\) (see Sect. 5.4 for a formal definition), which effectively provides a measure of the apparent complexity of the colored map. In the experiments with collaborator and defector bots (see Fig. 8E), \(\mathrm{IPR}(\mathbf{P}(t))\) varies widely: the more defector bots present, the higher the IPR. When many collaborators are present, IPR is low, reflecting more focused evaluations. Figure 8G shows that the IPR is especially low with optimized bots because, compared to the collaborator bots, these bots only rate the cells with very high values. In subsequent sections, we define regression models that include \(\mathrm{IPR}(\mathbf{P}(20))\), the IPR at the end of a game, as a measure of the effective number of rated cells.

Finally, another natural cue available to human participants is their rank among the five players in the group at the end of each game (see Fig. 5; recall that each participant played around a dozen games during a one-hour session). This rank was explicitly displayed by the user interface at the end of each game. A low (i.e., good) rank signals an effective strategy, while a high (i.e., bad) rank suggests room for improvement and may prompt players to reconsider and adjust their behavior.

3.3.2 Linear model for predicting individual behavioral profiles

We now introduce linear regression models to predict the distribution of behavioral profiles across conditions from three cues that are directly available to participants during the game: the average value of rated cells at the end of the game, \(P(20)\), the effective diversity of rated cells, \(\mathrm{IPR}(\mathbf{P}(20))\), and the player’s rank (see Sect. 5.4 for a detailed definition of these observables). \(P(20)\) captures the average informational value of the public trace and the degree of collaboration of the players, as it effectively measures a mean correlation between the value of the cells and the total number of stars received by each of them. \(\mathrm{IPR}(\mathbf{P}(20))\) captures the spatial dispersion of the distribution of stars, and hence the apparent complexity of the colored table. Finally, the rank captures the participant’s recent success, which can motivate them to keep or change their strategy. These observables are not introduced as purely technical quantities but as operational proxies for the cues that participants can plausibly extract from the shared environment during repeated play.

The linear regression models presented below are not intended as cognitive models of decision-making in a strong sense. Their role is to test whether a small number of observable stigmergic cues is sufficient to account for the systematic variation in strategic behavior across bot environments and thereby to connect the informational landscape generated by bots to the systematic shifts in the distribution of behavioral profiles measured in the experiments. In this sense, the regression models act as an explanatory bridge between stigmergic cues and strategic adaptation. The full methodology is described in the Materials and Methods section.

We find that only using a single feature (\(P(20)\), or \(\mathrm{IPR}(\mathbf{P}(20))\), or rank) in a regression model does not provide strong predictive power for the fractions of the three behavioral profiles. Nevertheless, even these simple one-feature models reveal that \(P(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\) are the best individual predictors, with similar correlations to the data, while rank is less predictive on its own.

We then consider a linear model incorporating the two main features, \(P(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\), hereafter referred to as the “PI model” (see Fig. 9A). Fitting this model to the data resulted in a prediction error of \(E = 0.204\) (where the error is defined in Materials and Methods) and a coefficient of determination \(R^{2} = 0.45\). As shown in Table 6, the regression parameters for collaborators indicate that both features contribute almost equally to the prediction. For defectors, however, \(\mathrm{IPR}(\mathbf{P}(20))\) has a stronger correlation with the observed data than \(P(20)\). This is confirmed by the partial \(R^{2}\) obtained by dropping each cue, \(R^{2}_{\mathrm{P}} = 0.32\) and \(R^{2}_{\mathrm{I}} = 0.39\), showing that \(\mathrm{IPR}(\mathbf{P}(20))\) removes slightly more variance than \(P(20)\).

Figure 9
Figure 9The alternative text for this image may have been generated using AI.
Full size image

Performance of the PI and PIR models. Scatter plot of the predicted fractions of collaborators, neutrals, and defectors as a function of the corresponding fractions observed in each experiment for (A) PI model using \(P(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\) as features (\(R^{2} = 0.45\)), and (B) the PIR model using \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and the rank (\(R^{2} = 0.76\)). Features are defined in Sect. 3.3.1. The dotted diagonal line represents perfect predictions

Finally, we have also examined a linear model that includes all three features: \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and rank, hereafter referred to as the “PIR model” (see Fig. 9B). This model reduces the prediction error to \(E = 0.159\) and leads to a significantly higher \(R^{2} = 0.76\). This is at the cost of introducing two additional regression coefficients, for a total of six parameters fitted to twenty independent data points. Analysis of the regression coefficients (see Table 6) confirms that the rank contributes less than the other two features, as already suggested by the single-feature regressions. This is also confirmed by the respective partial \(R^{2}\), \(R^{2}_{\mathrm{P}} = 0.71\), \(R^{2}_{\mathrm{I}} = 0.74\), and \(R^{2}_{\mathrm{R}} = 0.57\), showing that \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\) have a similar impact in reducing the residual variance, higher than that of the rank.

It is worth noting that in both models described above, the quantities \(P(t)\) and \(\mathrm{IPR}(\mathbf{P}(t))\) are evaluated at the end of the game (i.e., at the final round, \(t=20\)). Interestingly, considering the midpoint of the game (\(t=10\)) yields similar, though slightly higher, prediction errors (\(E=0.211\) for the model with \(P(10)\) and \(\mathrm{IPR}(\mathbf{P}(10))\) and \(E = 0.186\) for the model including rank as well).

In addition, we tested alternative linear models that included other features (e.g., fidelity F or other qualitative markers of cooperation). However, the three cues/features (\(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and rank) consistently demonstrated the highest explanatory power.

3.3.3 Interpretation of the model

Let us now examine the actual equations used to predict the fractions of collaborators, neutrals, and defectors in the PI and PIR models.

For the PI model, the fractions are given by

$$ \textstyle\begin{cases} C_{\text{pred}} = \mu _{C} + 0.13 \, \hat{P} + 0.13 \, \hat{I}, \\ N_{\text{pred}} = \mu _{N} - 0.04 \, \hat{P} + 0.00 \, \hat{I}, \\ D_{\text{pred}} = \mu _{D} - 0.09 \, \hat{P} - 0.13 \, \hat{I}, \end{cases} $$
(1)

where and Î are the standardized values of \(P(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\), respectively, and \(\mu _{C} = 0.25\), \(\mu _{N} = 0.45\), and \(\mu _{D} = 0.30\) are the mean fractions of collaborators, neutrals, and defectors observed in all experiments.

According to Eq. (1) and Fig. 15, in the PI model, an increase in \(P(20)\) is associated with a higher fraction of collaborators and lower fractions of both neutrals and defectors. The parameter \(\mathrm{IPR}(\mathbf{P}(20))\) has no effect on the neutral profile but contributes positively to the collaborator fraction and negatively to the defector fraction.

For the PIR model, the predicted fractions take the form

$$ \textstyle\begin{cases} C_{\text{pred}} = \mu _{C} + 0.28 \, \hat{P} + 0.25 \, \hat{I} + 0.09 \, \hat{R}, \\ N_{\text{pred}} = \mu _{N} - 0.11 \, \hat{P} - 0.05 \, \hat{I} - 0.04 \, \hat{R}, \\ D_{\text{pred}} = \mu _{D} - 0.17 \, \hat{P} - 0.20 \, \hat{I} - 0.05 \, \hat{R}, \end{cases} $$
(2)

where , Î, and are the standardized values of \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and the rank, respectively. The values of \(\mu _{C}\), \(\mu _{N}\), and \(\mu _{D}\) remain the same as in the PI model.

The effects of \(P(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\) remain qualitatively similar to those in the PI model: increases in either parameter are associated with more collaborators and fewer defectors, while the effect on neutrals is comparatively weaker. Finally, lower ranks (i.e., better performance) are associated with an increase in the proportion of collaborators and a decrease in both neutrals and defectors.

3.3.4 Adaptive model

We then ran simulations with agents whose visit and rating strategies followed the previously described PI model, but with an additional mechanism allowing their behavioral profiles to evolve between games based on the linear model predicting the fractions of each behavioral profile. Our objective is to test whether the experimentally observed dynamics can be reproduced by a closed loop in which agents first read a small set of stigmergic cues from the environment and then update their strategic profile accordingly.

In this approach, after each game, Mimic agents assess the values of \(P(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\) from the previous game and randomly adjust their behavior according to the probabilities derived from the PI model (see Eq. (1)). Specifically, in experiments where a single human participant plays alongside four bots, only the Mimic agent’s profile is updated. In contrast, in experiments involving five human participants, all five Mimic agents adapt their profiles.

This adaptive process leads to stable equilibrium fractions of each behavioral profile, regardless of the initial conditions of the simulation. Notably, the resulting fractions (see Fig. 6) closely align with the predictions of the PI model, with the exception of the experiment involving optimized agents, where the model’s predictive accuracy is lower (see Fig. 9A).

Figures 2, 4, 5, and 3, 11, 12, 13 show that the main observables capturing individual and collective dynamics, although not perfect, follow the same trends as the experimental data. For instance, as shown in Fig. 2, the mean score increases with the level of cooperation within the group.

4 Discussion

In this work, we have investigated how simple autonomous bots can influence human cooperation and defection in a group setting through digital traces. Human groups frequently coordinate by reacting not only to each other’s actions but also to digital traces left in shared environments, such as scores, votes, or reputational signals. This stigmergic process, whereby individuals respond to persistent environmental markers rather than direct interactions, has been widely observed in online settings [9, 10]. Digital traces, ranging from user reviews to real-time indicators like trending tags or scores, serve as public signals that guide group behavior and facilitate large-scale coordination without centralized control [4, 27, 28].

Through controlled experiments, we have examined whether bots using only such indirect signals can effectively promote either cooperative or defector behavior in human groups. The bots are controlled by a behavioral model introduced in [25], which was shown to provide a faithful representation of the visit and rating strategies of human participants, with interpretable parameters. In addition, the model and the methodology of [25] are used to quantify, characterize, and interpret how participants, interacting with bots and unaware of their artificial nature, changed their own behaviors depending on whether the bots followed cooperative or deceptive strategies. This setup allows us to investigate how social information, conveyed through deposited traces, influences participants to adopt cooperative or defective behaviors in these competitive scenarios. In that sense, the study isolates a minimal yet pervasive mechanism in which artificial agents influence behavior solely through stigmergic traces, without direct interaction or explicit signaling.

One of the key findings is a shift in behavioral dynamics depending on the proportion of cooperative bots. When more bots displayed cooperative behavior, human participants tended to adopt more defector-like strategies. This suggests that some players exploited the cooperative environment by manipulating shared information, i.e., the traces left in the system, thereby increasing their individual performance.

We found that clear behavioral patterns emerge depending on the cooperative level of the group. In highly cooperative conditions, participants tended to rate high-value cells and left fewer ratings overall, resulting in clearer, more informative traces. In contrast, in less cooperative groups, more cells were rated, but these tended to have lower or intermediate values, degrading the quality and usefulness of the shared information and therefore weakening the coordination.

Our results also reveal that optimized bots can successfully shift participants’ behavior toward more cooperation. Bots that consistently contributed to the public good nudged human participants to increase their own contributions. The stigmergic signals left by those bots created a normative pattern that humans seemed to follow. These results support earlier studies on social influence showing that individuals tend to match the observable behavior of others, even in the absence of direct interaction [2931].

Our findings also highlight the consistent superiority of human participants over bots using fixed strategies. Regardless of overall group cooperation, humans performed better—particularly as the bots became more cooperative—by identifying high-value cells more effectively. This highlights the strong influence of group composition on both collective dynamics and individual outcomes. Humans outperformed bots by flexibly adapting their visiting and rating strategies in response to bot behavior, which in turn shaped the social information shared via colored cells on the table. This adaptive flexibility gave human players a distinct advantage over the bots’ fixed strategies, which could not respond to the changing game environment.

Our analysis suggests that participants’ behavior was primarily shaped by three key cues: the perceived benefits of cooperation, the clarity and distinctiveness of social information, and the personal performance relative to others. Cooperation was inferred from whether colored cells corresponded to high-value targets; social information was evaluated based on the diversity and coverage of rated cells; and personal performance was gauged through participants’ relative ranking within the group.

To better understand how participants used these cues in their decisions, we developed a linear model incorporating the three primary factors. The model not only accurately predicted the observed proportions of each behavioral profile but also provided a quantitative framework for understanding how social cues are integrated under varying conditions. Notably, the model shows that the value and number of colored cells have a stronger influence on participant behavior than their ranking. Higher average values of colored cells and broader coverage across the table are associated with increased cooperation and reduced defection. This suggests that the richness and reliability of shared information may carry more weight than competitive motives. Finally, the model revealed that players’ rankings also shaped their strategic decisions. Those with higher (worse) ranks were more likely to adopt cooperative behaviors, while those with lower (better) ranks tended to shift toward neutral or defector profiles. This suggests that players adjusted their strategies in an attempt to optimize their relative position, potentially moving away from cooperation when their performance was already strong.

We then embedded this model into our behavioral framework, enabling agents to dynamically adapt their behavior in subsequent games based on the cues identified. This integration allowed Mimic agents to update their strategy probabilistically, creating simulations that closely mirrored the dynamics observed in human groups.

Despite its insights, our study has some limitations. The experiments were conducted in a highly controlled digital environment in which bots followed fixed strategies and participants were unaware of their presence. This design was necessary to isolate the stigmergic mechanism of interest, but it does not capture several important features of real platforms, including persistent identities, explicit knowledge that some agents may be artificial, network structure, or adaptive bots capable of strategically distorting credibility. Participants were anonymous and had no long-term identity, which differs from how people behave in persistent digital communities. Prior studies have shown that persistent identities and reputational feedback are crucial for sustaining cooperation and social accountability online [3234]. Moreover, the bots employed in this study followed static strategies and did not adapt to human behavior. In real platforms, bots may learn, evolve, or coordinate in more complex ways [35, 36]. Future research should investigate how bots interact with network structures, identity cues, or mixed strategies. Another open question is whether bot influence changes when users are explicitly aware that some agents are artificial. Recent findings suggest that it may not: even when participants are explicitly informed that they are interacting with bots, strong herding behavior persists in an online minority game [37]. Finally, while our study highlights the power of stigmergic cues, it does not disentangle how much influence comes from social comparison, norm internalization, or risk aversion. Understanding these mechanisms would improve the ethical deployment of bots and help design digital spaces that promote sustainable cooperation. Our results should thus be understood as establishing a proof-of-principle: even simple, behaviorally consistent artificial traces can be sufficient to reshape human strategic adaptation. Extending this framework to more realistic settings remains an important direction for future research.

In conclusion, our study reveals that even simple bots can shift human behavior, showing that human decision-making is highly sensitive to observed regularities. This has implications for online environments such as social media, recommendation systems, or collaborative tools where algorithmic decisions act as digital traces that influence user behavior. By reinforcing certain patterns, bots can steer human groups toward new equilibria. These results are consistent with theoretical findings showing that even a small number of bots can nudge human groups toward new behavioral equilibria through visible and consistent actions in online environments [19, 20]. When embedded in stigmergic environments, i.e., spaces where all actions leave public traces, bots can shape the future path of collective behavior. This provides a useful framework for designing bots in online communities, but it also calls for strong oversight, transparency, and ethical guidelines, especially when bots are introduced in sensitive domains like politics, education, or finance. Overall, our study opens an exciting path for interdisciplinary research on how human groups can be shaped, for better or worse, by artificial agents embedded in stigmergic systems.

5 Materials and methods

5.1 Experimental procedure

In total, the study involved 185 participants, 70 in the all-human baseline condition and 115 in the hybrid human–bot conditions. Upon entering the experimental room, participants first signed a consent form. They were then informed about the rules of the experiment, the payment conditions, and the guarantee of anonymity. The instructions were delivered both orally and through a short sequence of slides. Screenshots of these slides along with the exact oral script are provided in Supplementary Information. Participants were also instructed to turn off their cell phones. Each participant was then seated in a randomly assigned cubicle, linked to an ID in our database, which prevented any interaction between participants during the experiment (See Figure S1).

Experiments were conducted using a custom-designed interactive web application introduced in [25]. On their computer screens, participants were shown an identical \(15 \times 15\) table of 225 cells, with each cell linked to a hidden value ranging from 0 to 99. During the instruction phase, examples of these tables were provided. The tables used in the experiments were generated by randomly shuffling the same set of values (see Fig. 1E), ensuring that each table contained the same values but arranged differently (see Fig. 1D).

Each game consisted of 20 consecutive rounds. In each round, participants had to visit and rate 3 distinct cells within a recommended time of 20 seconds. If participants exceeded this time, a warning appeared on their screens. A round ended when all participants in the group had visited and rated 3 cells. The colors of the cells in the table were then updated according to a palette of red hues, reflecting the fraction of stars allocated to each cell since the start of the experiment (see Fig. 1F). Participants then moved on to the next round. Each game typically lasted three to four minutes, and each session consisted of 10 successive games.

Participants were told that the goal of the game was to find the cells with the highest values in the table. However, their payout depended on their rank, which was based on their score, which was calculated as the sum of the values of the cells they visited. This meant that players had an incentive not only to find high-value cells but also to accumulate the highest possible score. The payout scheme thus fostered competition among participants, motivating them to achieve the highest payout at the end of the session.

The experiments were conducted in two different settings: one in which five humans played together and another in which humans played with four bots. For each experimental condition, there were multiple sessions in which participants played 10 to 12 repetitions of the game.

For the experimental condition in which five humans play together, we conducted a total of 7 sessions, each involving ten participants at the Toulouse School of Economics Experimental Laboratory in January 2023. A total of 70 participants (33 females; 37 males) were recruited with an average age of 20. At the beginning of each session, each participant performed two consecutive games alone. Then, the participants were randomly divided into two groups of five and performed at least 10 five-player games. During each experiment, the two groups explored different tables that changed in each game. At the end of the session, the score over the whole session of the five participants of each group is calculated, and the player ranked first is paid 20 €, the second is paid 15 €, and the three remaining players (ranked 3–5) receive 10 € each. This incentivizes participants to have the highest score in their group, creating competition among them.

For the experiments in which humans played with four bots, sessions consisted of five players. Those experiments were carried out in the Laboratoire de Physique Théorique of the University of Toulouse in May 2022. A total of 115 participants (63 females, 52 males) were recruited with an average age of 26. Each participant could participate in a maximum of two different sessions. The participants were mostly students and researchers from the University of Toulouse. Each participant performed 10 five-player games with four model-controlled bots. To prevent bias in participant behavior resulting from playing with bots instead of other humans, participants were unaware that they were playing with bots and believed they were playing with one another. To achieve this, participants were instructed not to communicate with each other and were unable to view each other’s screens. In addition, despite the independence of each participant’s game, participants were required to wait for each other at the end of each round before moving on to the next. This feature prevents desynchronization of the games, which could cause participants to realize that others were still playing after their game had finished. Thanks to these measures, only a very few participants suspected that something “dodgy” was happening, and none of them expressed a belief that they were playing with bots. At the end of the session, the score over the whole session of the five human participants is calculated, and the player ranked first is paid 20 €, the second is paid 15 €, and the three remaining players (ranked 3–5) receive 10 € each. To ensure fairness in participants’ payments, who were ranked together but do not play in the same group, all participants in the same experimental session play on the same table, with the same shuffling of values, and against identical types of bots.

The number of participants in each experimental condition is summarized below:

  • 5 Humans (70 participants divided in 14 groups),

  • 1 Human vs. 4 Col – 0 Def bots (10 participants),

  • 1 Human vs. 3 Col – 1 Def bots (15 participants),

  • 1 Human vs. 2 Col – 2 Def bots (15 participants),

  • 1 Human vs. 1 Col – 3 Def bots (15 participants),

  • 1 Human vs. 0 Col – 4 Def bots (10 participants),

  • 1 Human vs. 4 Neu-1 bots (10 participants),

  • 1 Human vs. 4 Neu-3 bots (10 participants),

  • 1 Human vs. 4 Neu-5 bots (10 participants),

  • 1 Human vs. 4 Opt bots (20 participants).

5.2 Model of the visit and rating strategies

The stochastic agent-based model for modeling human behavior is divided into two parts: the agent’s strategy for visiting cells and the strategy for rating the visited cells.

5.2.1 Visit strategy

In the first round (\(t=1\)), agents have no prior information, so they randomly select 3 cells to visit. For subsequent rounds (\(t > 1\)), agents use a different approach. For each of the three cells \(i = 1, 2, 3\) to visit, they can either choose the ith best cell from the previous round, with value \(V_{i}(t-1)\), based on probability \(P^{\text{R}}_{i}(V_{i}(t-1))\), or choose to explore new cells with probability \(1 - P^{\text{R}}_{i}(V_{i}(t-1))\).

The probability \(P^{\text{R}}_{i}(V_{i}(t-1))\) is defined as

$$ P^{\text{R}}_{i}(V_{i}(t-1)) = \left \{ \textstyle\begin{array}{l@{\quad}l} 0 & \text{if } V_{i}(t-1) < a_{i} \\ \displaystyle \frac{V_{i}(t-1) - a_{i}}{99} b_{i} & \text{if } \displaystyle a_{i} \leq V_{i}(t-1) < a_{i} + \frac{99}{b_{i}} \\ 1 & \text{otherwise} \end{array}\displaystyle \right . , $$
(3)

where \(a_{i}\) and \(b_{i}> 0\) are parameters. This implies that an agent will never revisit a cell with a value of \(V_{i}(t-1) < a_{i}\) and will always revisit a cell with a value of \(V_{i}(t-1) > a_{i} + \frac{99}{b_{i}}\) (if this threshold is below the maximum value of 99). Between these thresholds, the probability of revisiting the ith best cell increases linearly from 0 to 1.

If agents do not revisit a previously visited cell, they explore other cells. Each cell c is assigned a probability \(P^{\text{E}}(c, t)\) of being selected in round t:

$$ P^{\text{E}}(c, t) = \varepsilon \frac{1}{N} + (1-\varepsilon ) \frac{P^{\alpha}_{c}(t-1)}{\sum _{c'} P_{c'}^{\alpha}(t-1)} , $$
(4)

where \(P_{c}(t-1)\) is the cumulative fraction of stars assigned to cell c up to time \(t-1\), and \(\varepsilon \in (0, 1)\) and \(\alpha > 0\) are parameters. To avoid selecting the same cell multiple times in the same round or revisiting cells from the previous round, a new cell is randomly selected if the first selected cell is unsuitable. In this equation, ε controls the balance between exploring unmarked versus marked cells: a higher ε leads to more random selection, while α determines the preference for highly marked cells. A larger α value results in a stronger preference for highly marked cells, while a smaller α value distributes the selection more evenly among marked cells.

The functional forms of Eqs. (3) and (4) are versatile enough to capture a wide range of behavior while being defined by only 8 parameters.

5.2.2 Rating strategy

Ratings are assigned by a stochastic process governed by a discrete probability distribution that depends on the value of the cell. This distribution specifies a probability, \(P_{s}(V)\), of assigning a s-star rating (\(s = 0, 1, \ldots, 5\)) to a cell with value V. The rating strategy does not depend on the round, the number of cells already opened in the round, or the color of the cell.

As observed in [25], individuals predominantly rate cells with 0 or 5 stars, while ratings of 1, 2, 3, or 4 stars are less frequent and have similar probabilities. Consequently, in our model, the probabilities for ratings of 1 to 4 stars are equal and determined by imposing the probabilistic normalization condition \(\sum _{s=0}^{5} P_{s}(V)= 1\) for each value of V. Thus, for \(s = 1, 2, 3, 4\) we have:

$$ P_{s}(V) = P_{1234}(V) = \frac{1}{4} (1 - P_{0}(V) - P_{5}(V)) . $$
(5)

For \(s = 0\) and \(s = 5\), the probability \(P_{s}(V)\) can either be modeled by sigmoid-like functions:

$$ P_{s}(V) = c_{s} + d_{s} \tanh \left (\frac{v - e_{s}}{99} f_{s} \right ) , $$
(6)

where \(c_{s}\), \(d_{s} > 0\), \(e_{s}\), and \(f_{s}\) are parameters; or by linear functions:

$$ P_{s}(V) = c'_{s} + f'_{s} \frac{V}{99} , $$
(7)

where \(c'_{s}\) and \(f'_{s}\) are parameters.

The functional forms of Eqs. (6) and (7) are designed to accurately reflect observed rating probabilities while being flexible enough to accommodate a variety of behaviors.

5.2.3 Strategies of the Mimic agents

The Mimic agents, which are designed to replicate human behaviors, are controlled using the model framework described above. The parameters that dictate the agents’ visit strategy are detailed in Table 4, while those governing their rating strategy are listed in Table 5. A visual representation of the bots’ rating strategy is provided in Fig. 7.

The visit strategy for these agents is defined by eight parameters. These parameters were optimized by minimizing the discrepancy between a set of n round-dependent observables, \(O_{1}(t), \ldots , O_{n}(t)\), as measured in the experiments (averaged over all experiments) and the corresponding observables, \(\hat{O}_{1}(t), \ldots , \hat{O}_{n}(t)\), obtained from extensive model simulations (averaging over 1,000,000 numerical experiments for each experimental condition). The error is quantified as follows:

$$ \Delta = \sum _{i=1}^{n} \frac{\sum _{t=1}^{20} (\hat{O}_{i}(t) - O_{i}(t))^{2}}{\sum _{t=1}^{20}{O}_{i}^{2}(t)}. $$
(8)

The round-dependent observables used in this error calculation include (as defined later in Materials and Methods, see Sect. 5.4): \(q(t)\), \(Q(t)\), \(p(t)\), \(P(t)\), \(\mathrm{IPR}(\mathbf{q}(t))\), \(\mathrm{IPR}(\mathbf{Q}(t))\), \(\mathrm{IPR}(\mathbf{p}(t))\), \(\mathrm{IPR}(\mathbf{P}(t))\), \(V_{1}(t)\), \(V_{2}(t)\), \(V_{3}(t)\), \(B_{1}(t)\), \(B_{2}(t)\), and \(B_{3}(t)\). These observables are computed exclusively for the human participants. To illustrate this, in the experiment with five human participants, \(V_{1}(t)\) represents the average value of the highest-valued cell opened by any player in round t, averaged across the five participants. In contrast, in the experiments with one human and four bots, \(V_{1}(t)\) represents the average value of the highest-valued cell opened by the human participant in round t.

To minimize the error Δ, a zero-temperature Monte Carlo method was employed. At each Monte Carlo step, a small random adjustment was made to a randomly chosen parameter. If this adjustment resulted in a decrease in the error Δ, the new parameter value was accepted; otherwise, the previous parameter value was retained. The optimization process continued until the error ceased to decrease. To avoid being trapped in local minima, the Monte Carlo simulations were initiated from several starting points. The parameters selected were those yielding the smallest error. It is worth noting that the final parameters obtained from different low-error Monte Carlo runs produced similar functions characterizing the visit strategy (see Eqs. (3) and (4)).

5.2.4 Strategies of the model-controlled bots

The bots used in the experiments are controlled by the model described above. The specific parameters that define their visit strategy are listed in Table 1, while the parameters for their rating strategy are provided in Table 2. Additionally, the rating strategy of the bots is visually represented in Fig. 10.

The collaborator and defector bots emulate the behavior of humans in games involving five human participants. These bots were derived from preliminary experiments conducted in 2017. The three neutral bots, Neu-1, Neu-3, and Neu-5, employ a visit strategy identical to that of the collaborator and defector bots, making their visit behavior comparable to that of humans. Their rating strategy offers three variations of a neutral rating, always assigning 1, 3, or 5 stars to a visited cell. Finally, the optimized bots have been designed to maximize their scores while playing in groups of five identical agents (see Opt-1 agents in [25]). They explore the table until they identify high-value cells, at which point they cease further exploration and repeatedly revisit these identified high-value cells. The rating strategy employed by these bots involves rating only cells with values greater than 50. Consequently, only a very limited number of cells are rated during the game.

5.3 Linear model for predicting individual behavioral profiles

The linear regression model employed to predict the behavioral profile of each individual in the game across various experimental conditions utilizes three quantifiers: \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and rank. These quantifiers are used to represent the three potential cues influencing participants’ behavior.

To ensure consistency in the model, standardized data are employed. A standardized quantity is indicated with a hat: \(\hat{X} = (X - \mu ) / \sigma \), where μ is the mean and σ is the standard deviation of X over the experimental data. This standardization results in having a zero mean and a unit standard deviation.

Let \(C_{\text{exp}}\), \(N_{\text{exp}}\), and \(D_{\text{exp}}\) denote the fractions of humans exhibiting collaborator, neutral, and defector behaviors observed in a given experimental condition, respectively. Similarly, \(C_{\text{pred}}\), \(N_{\text{pred}}\), and \(D_{\text{pred}}\) represent the predicted fractions. A feature vector \(\hat{\mathbf{x}}\) with components \(\hat{x}_{i}\), where \(i = 1, 2, \ldots, f\), contains f standardized features or quantifiers that are expected to explain the data. In this context, the features are \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and rank, with f ranging from one to three.

The linear regression model is defined as follows:

$$ \textstyle\begin{cases} \hat{C}_{\text{pred}} = \displaystyle \sum _{i=1}^{f} c_{i} \hat{x}_{i}, \\ \hat{D}_{\text{pred}} = \displaystyle \sum _{i=1}^{f} d_{i} \hat{x}_{i}, \\ N_{\text{pred}} = 1 - C_{\text{pred}} - D_{\text{pred}}, \end{cases} $$
(9)

where \(c_{i}\) and \(d_{i}\) are regression parameters for \(i = 1, 2, \ldots, f\).

These parameters are obtained by fitting the model predictions to the data by minimizing the error E defined as:

$$ E = \sqrt{ \frac{\displaystyle \sum _{s} \left ( (C_{\text{exp}} - C_{\text{pred}})^{2} + (N_{\text{exp}} - N_{\text{pred}})^{2} + (D_{\text{exp}} - D_{\text{pred}})^{2} \right )}{\displaystyle \sum _{s} \left ( C_{\text{exp}}^{2} + N_{\text{exp}}^{2} + D_{\text{exp}}^{2} \right )}}, $$
(10)

where

$$ \textstyle\begin{cases} C_{\text{pred}} = \mu _{C} + \sigma _{C} \displaystyle \sum _{i=1}^{f} c_{i} \hat{x}_{i}, \\ D_{\text{pred}} = \mu _{D} + \sigma _{D} \displaystyle \sum _{i=1}^{f} d_{i} \hat{x}_{i}, \\ N_{\text{pred}} = 1 - C_{\text{pred}} - D_{\text{pred}}. \end{cases} $$
(11)

Due to the symmetric nature of the error in C, D, and N, linear regressions on any two of these variables (C and D, or C and N) would yield the same predictor.

In this study, ten distinct and independent experimental conditions are considered, with two independent variables C and D (since \(N = 1 - C - D\)), resulting in twenty independent measurements to be explained by the linear regression model. The number of unknown parameters is equal to two times the number of features, which ranges from one to three, depending on the number of cues used as features among \(P(20)\), \(\mathrm{IPR}(\mathbf{P}(20))\), and rank.

5.4 Definition of the observables

All observables discussed here are derived from four fundamental vectors (vectors are shown in boldface): \(\mathbf{q}(t)\), \(\mathbf{Q}(t)\), \(\mathbf{p}(t)\), and \(\mathbf{P}(t)\). We define \(q_{c}(t)\) as the fraction of visits a cell c receives at round t. The collection of \(q_{c}(t)\) for all cells c forms a vector \(\mathbf{q}(t)\) of size 225. Another relevant vector is \(\mathbf{Q}(t)\), which represents the cumulative fraction of visits \(Q_{c}(t)\) attributed to each cell from the start to round t. Similarly, \(\mathbf{p}(t)\) and \(\mathbf{P}(t)\) are vectors whose components \(p_{c}(t)\) and \(P_{c}(t)\) denote the fraction of stars given to each cell in round t and up to round t, respectively.

In experiments with five humans playing together, these vectors represent the fraction of cells visited and stars rated by the five group members. However, in experiments with humans and bots, these vectors represent the fractions of cells visited and stars rated by the human participant (and not the bots). This allows us to specifically characterize the human behavior. For the linear regression model, the cues \(\mathbf{P}(20)\) and \(\mathrm{IPR}(\mathbf{P}(20))\) are calculated for all players (humans and bots), since the information displayed in the game is the collective one.

We define the normalized average of the visited cells at round t as \(q(t) = \sum _{c} q_{c}(t) V_{c} \times 3 / (V_{\text{max}_{1}} + V_{ \text{max}_{2}} + V_{\text{max}_{3}})\), where V is the vector of cell values \(V_{c}\), and \(V_{\text{max}_{1}}\), \(V_{\text{max}_{2}}\), and \(V_{\text{max}_{3}}\) are the three largest cell values. This normalization ensures that \(q(t) = 1\) represents optimal performance, where each individual visits the three best cells at round t. Similarly, we define \(Q(t)\), which cumulates all visits up to round t, using the same formula with \(q_{c}(t)\) replaced by \(Q_{c}(t)\). Thus, \(q(t)\) and \(Q(t)\) measure the instantaneous and cumulative exploration behavior with respect to cell values. A high \(Q(t)\) value indicates effective exploration focused on high-value cells, while a low \(Q(t)\) indicates broader or less efficient exploration.

Likewise, based on the definitions of \(\mathbf{p}(t)\) and \(\mathbf{P}(t)\), we define the average value of the visited cells, weighted by their ratings (fraction of stars) at round t: \(p(t) = \sum _{c} p_{c}(t) V_{c} / V_{\text{max}_{1}}\), where \(V_{\text{max}_{1}}=99\) is the highest cell value. In general, \(p(t) \leq 1\), while \(p(t) = 1\) would mean that the only evaluated cell would be the one with a value of 99 at round t. Similarly, we define the cumulative quantity \(P(t) = \sum _{c} P_{c}(t) V_{c} / V_{\text{max}_{1}}\), which represents the average value of the cells visited by the participants up to round t, weighted by their ratings. Thus, \(p(t)\) and \(P(t)\) measure the instantaneous and cumulative distribution of stars with respect to cell values. A high \(P(t)\) value (especially in the final round, \(t=20\)) indicates that participants concentrated their ratings on high-value cells, while a low \(P(t)\) indicates deceptive behavior, with high ratings given to low-value cells, as seen in defectors.

To measure the exploration behavior, we introduce the inverse participation ratio (IPR) of the vectors \(\mathbf{q}(t)\), \(\mathbf{Q}(t)\), \(\mathbf{p}(t)\), and \(\mathbf{P}(t)\). For a given probability distribution \(\mathbf{X} = \{X_{c}\}\), the IPR is defined as \(\mathrm{IPR}(\mathbf{X}) = 1 / \sum _{c} X_{c}^{2}\) and characterizes the spread of the distribution of X. Thus, for the four vectors considered, the IPR quantifies the effective number of cells on which visits or ratings are concentrated at round t or up to round t. If a probability vector X is evenly distributed over n cells among N, then \(X_{c} = 1/n\) for those cells (and 0 otherwise), and \(\mathrm{IPR}(\mathbf{X}) = 1/[n \times (1/n)^{2}] = n\), indicating that the IPR measures the effective number of cells over which a probability distribution is spread.

5.5 Behavioral profiles

Human participants and bots are both classified into three behavioral profiles based on their cell rating patterns. To achieve this classification, the mean rating given by each individual to cells of a specific value V is fitted with a linear function of the cell value V: \(u_{0} + u_{1} \times 5V / 99\). In this context, \(u_{0}\) represents the intercept, and \(u_{1}\) represents the slope. A strict linear rating of cells from value 0 to 99, with corresponding ratings from 0 to 5 stars, would have \(u_{0}=0\) and \(u_{1}=1\). Individuals are then classified into three behavioral profiles: collaborator, neutral, and defector, using two thresholds: \(u_{\text{def-neu}} = -0.5\) and \(u_{\text{neu-col}} = 0.5\) (see [25]).

The three behavioral profiles are defined as follows:

  • Collaborator: Individuals with \(u_{1} \geq u_{\text{neu-col}}\) rate cells with a rating that increases with the cell values. They assign low ratings to low-value cells and high ratings to high-value cells, thus helping their group members in identifying the best cells.

  • Neutral: Individuals with \(u_{\text{def-neu}} \leq u_{1} < u_{\text{neu-col}}\) give a very similar rating to every cell regardless of their values. While they do not provide distinctive ratings, most neutral individuals contribute to group success by revisiting high-value cells, thereby making them darker and easier to identify for others.

  • Defector: Individuals with \(u_{1} < u_{\text{def-neu}}\) rate the cells in the opposite way to collaborators. They give low ratings to the high-value cells and high ratings to the low-value ones. This behavior is interpreted as an attempt to mislead other group members by obscuring the best cells with low ratings and highlighting poor cells with high ratings.

5.6 Computation of error bars

Error bars for the experimentally measured observables, corresponding to a confidence level of 68%, were determined using the bootstrap method. The bootstrap is a Monte Carlo technique that assesses the properties of statistical parameters from an unknown probability distribution by performing repeated random sampling with replacement from a dataset [38]. The process begins by generating M artificial sets of N experiments by drawing N samples with replacements from the original dataset. Consequently, some experiments may appear multiple times within an artificial set, while others may not appear at all. This method allows for the computation of a given observable on each artificial set, ultimately yielding a distribution from which confidence intervals can be derived.

In our case, in the experimental condition in which five humans play together, the independent experiments are the ten games played by a group of 5 individuals; therefore, we have \(N = 14\) experiments. In the experimental conditions in which humans play with bots, the independent experiments are the ten games played by one human with four bots, and we have between \(N = 10\) and \(N = 20\) experiments depending on the condition. In every condition, we used \(M = 10{,}000\) artificial sets to generate bootstrap distributions.

To obtain reliable results from the numerical simulations of the model, the data are averaged over 1,000,000 runs. This process ensures that the error bars are negligible on the scale of the presented graphs.