Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Methodological implications of sample size and extinction gradient on the robustness of fear conditioning across different analytic strategies

Abstract

Fear conditioning paradigms are critical to understanding anxiety-related disorders, but studies use an inconsistent array of methods to quantify the same underlying learning process. We previously demonstrated that selection of trials from different stages of experimental phases and inconsistent use of average compared to trial-by-trial analysis can deliver significantly divergent outcomes, regardless of whether the data is analysed with extinction as a single effect, as a learning process over the course of the experiment, or in relation to acquisition learning. Since small sample sizes are attributed as sources of poor replicability in psychological science, in this study we aimed to investigate if changes in sample size influences the divergences that occur when different kinds of fear conditioning analyses are used. We analysed a large data set of fear acquisition and extinction learning (N = 379), measured via skin conductance responses (SCRs), which was resampled with replacement to create a wide range of bootstrapped databases (N = 30, N = 60, N = 120, N = 180, N = 240, N = 360, N = 480, N = 600, N = 720, N = 840, N = 960, N = 1080, N = 1200, N = 1500, N = 1750, N = 2000) and tested whether use of different analyses continued to produce deviating outcomes. We found that sample size did not significantly influence the effects of inconsistent analytic strategy when no group-level effect was included but found strategy-dependent effects when group-level effects were simulated. These findings suggest that confounds incurred by inconsistent analyses remain stable in the face of sample size variation, but only under specific circumstances with overall robustness strongly hinging on the relationship between experimental design and choice of analyses. This supports the view that such variations reflect a more fundamental confound in psychological science—the measurement of a single process by multiple methods.

Introduction

Fear conditioning paradigms are critical to understanding and improving treatment for several psychiatric disorders, including post-traumatic stress disorder (PTSD) and anxiety [1, 2]. Fear extinction occurs when a previously conditioned fear stimulus (conditioned stimulus, CS+) is repeatedly presented without aversive reinforcement, causing new safety information to compete with pre-existing fear memory [35]. Patients with anxiety-related disorders show deficits in extinction learning, which is believed to facilitate disease progression and maintenance [6, 7]. The rate of an individual’s fear extinction learning can be estimated by the decrease in threat response to the unreinforced CS+ when compared to the safety signal (CS-), typically indexed via various physiological measures [8], with skin conductance responses (SCRs) being most commonly used. Extinction learning has been subject to extensive research on its neurobiological basis [919], and serves as a highly informative framework for investigating pharmacological and psychological adjuncts to exposure therapy for PTSD, and deficits associated with treatment outcomes [2027].

The replicability crisis has inspired a growing movement dedicated to improving the quality of research practices in psychological science [2832]. These issues of replicability extend to research on human fear conditioning [8]. Importantly, inconsistent research practices in fear conditioning might explain the contradictory and null outcomes identified across recent large-scale studies and meta-analyses [7, 3336]. These limitations have been identified for several methodological domains including, but not limited to, study design [37, 38], pre-processing of psychophysiological data [3942], and statistical analysis strategies [4346]. It is increasingly clear that issues such as these undermine the replicability of fear conditioning research, and the subsequent translation of experimental findings to clinical outcomes.

Our previous report [54] was concerned with the effect of analytic strategy on robustness. Simply put, ‘robustness’ in psychology refers to the ability for a result to be consistent across multiple arbitrary statistical specifications [28]. In our case, arbitrary specifications associated with inconsistency in analytical strategies and we demonstrated stark divergence of effect sizes when different statistical methods were used to index extinction [47]. Specifically, a large data set was resampled to create 40 data sets of N = 60 rows with three groups per sample. Different statistical strategies, all intending to measure extinction, were compared against each other across the 40 data sets, but varied with respect to the numbers of trials included the stages of the phases the trials were drawn from, and whether the data was analysed trial-by-trial or averaged. We tested the effect of these variations on robustness of studies that compared acquisition learning to extinction learning, change in responding during extinction (e.g., early to late extinction), and where extinction was treated as a single effect estimate. We showed that the rank order of these strategies varied significantly depending on the data set, which illustrates less than desirable robustness of these statistical tests [47]. However, solutions to the issue of inconsistent analytic strategy remain unexplored.

In the current study, we aimed to investigate one plausible solution—increased sample size. Increasing sample size will increase the power of a study—that is, the ability to detect a specific effect size within a sample. It has been observed that many fear conditioning studies may be underpowered due to low samples [39] and it is possible that improving the precision of physiological measures through more advanced pre-processing could be sufficient to improve robustness of fear conditioning and extinction outcomes [3941, 48]. By increasing power, we increase the probability of detecting the effect, and it is possible that heterogeneity of outcomes can be caused by underpowered studies that do not accurately capture this effect. However, heterogeneous statistical analyses have been reported to produce misleading or false results independent of power considerations [28, 49].

To investigate if larger samples could address the analytical issue we previously identified, we bootstrapped data from existing data sets, obtaining rank orderings of previously used statistical methods for indexing fear acquisition and extinction [47]. In our previous study, each resampled data set had a sample size of N = 60 rows, broken into three groups during analysis. Here, we resampled from our real data (N = 379) with replacement to create bootstrapped samples of N = 30, 60, 120, 180, 240, 360, 480, 600, 720, 840, 960, 1080, 1200, 1500, 1750, and 2000 observations, with each row being equivalent to one subject. These numbers were chosen to cover a broad range of plausible sample sizes used in human fear conditioning research. We performed two experiments—in the first, group allocation was randomised, and no group-level effect was anticipated. In the second, we added a group-level effect to our bootstrapped data. The group level effect was varied across three conditions, which were roughly based on [50] with one group who had high responding during acquisition and rapid extinction, another group who had lower responding during acquisition and rapid extinction, and another group who did not extinguish the CS+ response. This work represents a significant contribution above our previous study because (a) we create much larger samples spanning a wide range of simulated sample sizes; and (b) we test the results of this and our previous work against the presence of a simulated group-level effect. Testing our hypotheses with the inclusion of simulated group-level effects is a significant contribution because most fear conditioning studies will observe group differences and our original analyses were likely not representative of these studies; hence, the effect of heterogeneity of analytical methods in studies with groups effects is unknown. In this extension of our previous work, we therefore aimed to identify possible boundary conditions of an originally bleak report of the robustness of statistical analysis pipelines for fear conditioning research.

We hypothesised that larger samples would not improve robustness of rank ordering between analytic strategies in either condition, because we believe that the issue of analytical heterogeneity is a fundamental violation of replicability that cannot be solved by increasing power alone. We hypothesised that the type of simulated effect would vary the robustness of different statistical strategies because some strategies are used to examine different stages of learning during fear conditioning tasks.

Methods

The current manuscript uses secondary data analysis strategies on existing datasets, and did not require further ethical approval. The original studies received ethical approval from the University of Tasmania Social Sciences Human Research Ethics Committee. The fear acquisition and extinction procedures, as well the data set, for this study are identical to those of our previous study [47]. Briefly, six data sets gathered over seven years were resampled with replacement to form new samples. Participants reported no significant physical illnesses, no history of head trauma or loss of consciousness, no current or significant historical use of illicit substances, and no heavy alcohol use or dependence. Of the 379 participants included in this dataset, N = 51 (13.46%) had a diagnosis of PTSD (clinician diagnosed) or had a score above 40 on the PCL-IV or above 30 on the PCL-5 [51, 52]. No other psychiatric diagnoses were permitted in any of the studies. PTSD cases were retained in the sample in order to remain consistent with the previous study [47]. Since the predictor variable in these studies are the analytical strategies themselves, it is unlikely that systemic variability in participant characteristics would affect results [47].

Fear conditioning paradigm and equipment

As in our previous report [47], data was obtained from five trials of acquisition learning and ten trials of extinction learning (split into early and late extinction phases of five trials each, which were separated by an instruction screen) across a total of 379 participants across the six studies. Acquisition and extinction phases were also separated by an instruction screen, which in all cases read “In the following phase, you may or may not receive shocks. Please press any key to continue”. For each trial, a CS+ (a coloured circle) and a CS- (a different coloured circle) were presented on a computer screen for 12s with intertrial intervals of 12-21s (M = 16s). In all studies, skin conductance was recorded from the first and third fingers of the left hand in micro-Siemens (μS) using a 22 mVrms, 75 Hz constant-voltage coupler (ADInstruments). A stimulus isolator (ADInstruments) was placed on the right hand and delivered a 500ms electric shock immediately following the CS+ offset during acquisition learning. No shocks were delivered during the extinction learning phase. Skin conductance responses (SCRs) were scored using a custom-coded peak scoring method which subtracts the average skin level 2s prior to CS onset from the peak conductance occurring 0.9-5s following CS onset, which scores the first interval response, and it should be noted that studies score skin conductance responding differently [53]. A bidirectional Butterworth filter was applied to the raw SCR trace to reduce noise.

Resampling procedure

Data was bootstrapped (i.e., resampled with replacement) using rows of participant data [54]. Using bootstrapping, it is possible to validate the accuracy of statistical techniques across a range of sample sizes, and this has been done in previous literature assessing the effect of sample size on correlation, factor analysis, principal components analysis, prognostic modelling, and other statistical techniques [5558]. Data was resampled by row such that all CS+ or CS- responses from a particular phase (e.g., acquisition) were resampled together. New data sets of N = 30, N = 60, N = 120, N = 180, N = 240, N = 360, N = 480, N = 600, N = 720, N = 840, N = 960, N = 1080, N = 1200, N = 1500, N = 1750, and N = 2000 rows were created and a ‘Group’ variable consisting of equal but random allocation of belonging to the number 1, 2, or 3. Therefore, no group-level effects were expected in this analysis. Sample sizes were chosen to cover a wide range of possible study power in the simulated datasets. These sample sizes were determined arbitrarily due to current debate concerning accurate power determination of fear conditioning research using skin conductance responding [39]. Three groups were used because in our field of research (PTSD) it is typical to examine a PTSD group against both a trauma-exposed control and a non-trauma exposed control group [59].

For the second experiment, scores were modified for the third Group upon bootstrapping such that a higher but gradually decreasing CS+ response (relative to CS- response) was expected in each phase. Data was produced that resembled the three fear conditioning trajectories reported by [50]. These trajectories were replicated in our own clinical fear conditioning data (manuscript in preparation), and group-level simulated effects were created in the data from the current report based on the difference between each of the three trajectories and our bootstrapped data that did not have a simulated group-level effect. The modifications to produce the simulated effects are described below. Scores for Group 3’s CS+ were modified to be 1, 0.8, 0.6, 0.4 and 0 standard deviations higher than their bootstrapped values during trials 1–5 of acquisition; 2, 1.5, 1, 0.8, 0.5 standard deviations higher than their bootstrapped values during trials 1–5 of early extinction; and 1, 0.8, 0.5, 0.2, and 0 standard deviations higher than their bootstrapped values during trials 1–5 of late extinction. Two other distinct group-level effects were simulated for Group 3, with CS+ modified to be 0, 0.3, 0.3, 0.3, and 0.3 standard deviations higher during acquisition, 1, 1, 1, 1, and 1 standard deviations higher during early extinction, and 1.5, 1, 0.5, 0.1, and 0 standard deviations during late extinction higher than the average data for Group 3a; and 0, 0, 0.3, 0.3, and 0.3 standard deviations higher during acquisition, 2, 1.5, 1, 0.5, and 0.3 standard deviations higher during early extinction, and 1.5, 1, 0.5, 0.1, and 0 standard deviations during late extinction higher than the average data for Group 3b. These simulated effects were achieved by adding the same value (e.g., 2 standard deviations above the mean score for trial 1) to all scores individually within that group. Therefore, all analyses in this study were conducted three times with Group 3 consisting of one of the three sets of simulated effects. An illustration of an example of this data is provided in Fig 1. Simulated effects were roughly based on the findings of Galatzer-Levy et al. (2017) [56], who identified three distinct trajectories during acquisition and extinction phases in fear conditioning data. In Fig 1, Group 3 is the trajectory that shows high differential acquisition and rapid extinction, Group 3a is the trajectory that shows moderate differential acquisition and rapid extinction, and Group 3b is the group that does not show extinction.

thumbnail
Fig 1. Example of simulated group-level effects in bootstrapped data (N = 960).

SCR = Skin conductance response. Groups 3-3b have simulated effects of differing gradients to reflect possible differences in physiological expression of acquisition and extinction between participants and between studies. Error bars are 95% Confidence Intervals.

https://doi.org/10.1371/journal.pone.0268814.g001

Types of analytical strategies included in comparisons

Further analyses were conducted using base R. Analytic strategies were identical to those used previously [47] and are summarised in Table 1. As described previously, some strategies averaged trials or subtracted CS- from CS+ scores, whereas others did not. These details are described in Table 1. The goal of these strategies was to either: (1) determine the change in SCRs from acquisition to extinction learning (CON-EXT); or (2) determine a static measure of extinction learning (EXT) or (3) determine the change in SCRs across the extinction learning phase (EXT-EXT). Since the goals of the strategies differed in these ways, we divided strategies into each of these categories and compared outcomes only within each category.

thumbnail
Table 1. Description of different strategies for measuring extinction learning using skin conductance responses (Ney et al., 2020).

https://doi.org/10.1371/journal.pone.0268814.t001

Data analysis

For each strategy, we compared the highest order group-level interaction via its computed partial eta squared (ɳp2) effect size. For each sample size, bootstrapped (1,000 times) Kendall non-parametric ranked order correlation coefficients (Tb) and associated 95% bootstrapped confidence intervals were computed between analytical strategies of each of the three categories, based on the ɳp2 effect sizes generated. Therefore, each sample size (e.g. 30 “participants”) was resampled 100 times to generate a rank order (Tb) of ɳp2 across the different analysis strategies, and this procedure was bootstrapped 1,000 times to generate mean Tb and 95% confidence intervals. The mean Tb and its associated confidence intervals were the average correlation between one strategy and each of the other strategies separately (e.g., creating three mean Tb values for Strategy 1 of the acquisition—extinction category). This entire procedure was completed using a custom R script that is available from the authors upon request. The data was compiled and is reported in the Supplementary Material up until N = 960. Data beyond this size is not reported due to excessive amount of the data reported in the manuscript and because the results at N>960 were almost identical to those obtained at N = 960. Using the average Tb effect size of each strategy, we tested whether the rank order coefficients improved with increased sample size using Pearson’s coefficient (r). This was completed for both the first (no group-level effect) and second (simulated group-level effect) experiments.

During data compilation, it was evident that there were large decreases in effect sizes with increased sample size (p < .001). As an exploratory analysis, effect sizes averaged across sample sizes for each category of analytical strategy were compared using Pearson’s correlations (r). To ensure that the effects observed in this exploratory test were not due to variability caused by our resampling process (where CS+ or CS- scores for each participant from only one phase were resampled), we resampled using the full data from each participant to create data sets of N = 30, N = 60, N = 120, N = 240 rows, with three equally sized groups randomly allocated amongst these rows. Samples were not created that were larger than the number of actual participants to avoid repeating participant data in the same sample. Again, the ɳp2 effect sizes from each category of analytical strategy were averaged and compared across sample size.

Results

The overall data from the original sample (N = 379) is reported in S1 Fig. The main index that was used as an outcome in the present study was the rank order of effect sizes produced by different analytical (i.e., statistical) approaches when applied to the same dataset. To ensure that this result was robust, datasets were bootstrapped so that the analysis was repeated many times. If a low rank order effect is produced, this implies that application of different analytical approaches to the same datasets produces inconsistent effect sizes relative to the other approaches. A high rank order effect suggests that application of different approaches to the same datasets produces consistent effect sizes relative to the other approaches, which implies robustness. To assess the robustness of each analytical method within each bootstrapped dataset, Kendall’s rank correlation coefficient values (Tb) and corresponding 95% confidence intervals were computed for each of the three sets of analyses with sample size set to N = 30, N = 60, N = 120, N = 180, N = 240, N = 360, N = 480, N = 600, N = 720, N = 840, N = 960, N = 1080, N = 1200, N = 1500, N = 1750, and N = 2000 rows. Complete statistics from an exemplar of these analyses are reported in S1S42 Tables and are summarised in Fig 2. We also entered the rank order for each analytical strategy compared to every other strategy into Pearson correlation models across each sample size. This data is visualised in Fig 2 and reported in Table 2.

thumbnail
Fig 2. Effect of sample size on average Kendall’s rank order effect size (Τb) between statistical strategies attempting to elicit the same construct from different data sets.

Higher Τb implies higher robustness. Top panel is data without simulated group-level effect, second panel simulates rapid decreasing differential conditioning during acquisition, third panel simulated gradual decrease in differential conditioning during acquisition, fourth panel simulated no change in differential conditioning during acquisition or early extinction.

https://doi.org/10.1371/journal.pone.0268814.g002

thumbnail
Table 2. Pearson’s correlation coefficient and significance of the relationship between sample size and rank order between different statistical strategies used to index static extinction (EXT), change in extinction (EXT-EXT) and acquisition to extinction (ACQ-EXT) during fear learning paradigms.

https://doi.org/10.1371/journal.pone.0268814.t002

Overall, findings for non-simulated effect datasets are congruent with our previous findings [54], which was conducted with a sample size of N = 60. There were no significant trends in the data for the no-effect data (summarised in Table 2), which suggests that increasing sample size did not improve robustness caused by variability in analytical strategies used to assess similar constructs in the same data.

ACQ-EXT

Strategies 1 and 3, which compared acquisition to extinction, produced high correlative values across sample sizes, whereas Strategies 2 and 4 were not similar to any Strategies (S22S28 Tables). This finding replicated our findings from our previous report at N = 60.

When a group-level effect was simulated, however, these results changed. Only Strategies 1 and 3 showed positive but increasingly weak correlative improvements in the acquisition with increasing sample size (Fig 2 and S1S7 Tables), whereas combinations of other strategies were increasingly significantly and negatively correlated with increased sample sizes, meaning that they estimated fear responding in opposite directions to one another and that this pattern got worse with a larger sample (Fig 2 and Table 2).

EXT

Some of the correlations between the static extinction strategies failed to be supported compared to our previous study in the data without simulated group level effects (S29S35 Tables). These were mainly between strategies 1, 4 and 5, which were not supported in data derived from the new data sets but had been correlated in our previous report. Correlations between Strategies 1 and 3; 2, 6 and 7; and 5, 3, and 4 continued to be supported of the static extinction strategies (S29S35 Tables and Fig 2).

There were very few supported correlations in static extinction Strategies when group effects were simulated (i.e., high Tb values, primarily correlations between 1 and 3 were supported), but some of these improved with increased sample size (Fig 2, Table 2, and S8S14 Tables). Correlations between Strategies 1, 3, and 5 for static extinction improved significantly with increased sample size, whereas correlations between Strategies 2, 4, and 5 were significantly negatively correlated with increasing sample size. Other combinations of strategies showed no change with increasing sample size (Fig 2 and Table 2).

EXT-EXT

At higher sample sizes, some of the significant correlations from our earlier study [47] within the early-late extinction strategies were no longer significant, though this did not follow a particularly consistent pattern (S40S42 Tables). In all cases, Strategies 3 and 4 of the early-late extinction category continued to be correlated (Fig 2 and S36S42 Tables), but this did not improve with higher sample size (Table 2).

Strategies 2 and 4, 3 and 4, as well as 2 and 3 of early-late extinction showed some moderate-high evidence of correlation that improved logarithmically when group level effects were simulated (Fig 2 and S15S21 Tables). Unexpectedly, these results were not substantially affected by the type of simulated effect (Fig 2). However, only Strategies correlating with Strategy 1 from early-late extinction changed by improving with increased sample size, after correcting for multiple comparisons using False Discovery Rate Q = .1.

Sample size and average effect sizes

During data compilation, we noticed large decreases in effect sizes with increased sample size. As an exploratory analysis, we correlated the average effect size (ɳp2) from each category of analytical strategies with the sample size. The average effect size for each set of analyses decreased significantly as a function of sample size (all p < .001), as shown by Fig 3A. Effect sizes of all three types of analyses reduced at a similar rate. This effect was replicated when the data was resampled from full participant rows (i.e., in real data, Fig 3B).

thumbnail
Fig 3. Average effect size decreased as sample size increased for all types of analyses (p < .001).

Panel A is the correlation using resampled data of responses. Panel B is the correlation using resampled data of responses. Error bars are 95% Confidence Intervals.

https://doi.org/10.1371/journal.pone.0268814.g003

Discussion

In this study we investigated whether the decreased robustness that arises from inconsistent analytic strategy [54] could be amended by increased sample sizes. To do so, we tested whether greater sample sizes affected the robustness of outcomes via lower divergence of results obtained across varied analytical strategies. Robustness did not improve when sample size was increased for any of the strategies included in our analysis that did not include a simulated effect. However, in contrast to our hypothesis, a simulated effect resulted in several changes in robustness, particularly within strategies that examined extinction as a single index. The kind of effect that was simulated (in terms of the gradient of fear responding across trials) did not substantially affect these results. These findings have several implications for study design and statistical analysis of fear extinction via SCRs.

Our previous study provided evidence that heterogeneity in analytical strategy in the assessment of fear extinction can reduce robustness of effects when tested across different data sets [47]. This problem has been reported in other fields, such as human neuroimaging [76, 77], and high flexibility in data analysis is an established cause of increased false positives [49]. It is possible, however, that some types of strategies produce more robust results than others. The current findings support several assertions that we made in our previous paper in this regard. First, studies that examine change in SCRs from acquisition to extinction will show varying robustness depending on what sections of acquisition and extinction are used, but robustness does not seem to be affected if small variations in number of trials or use of averaged compared to maximal values are used. Similarly, analysing extinction on a trial-by-trial basis is inconsistent with strategies that averaged across trials, but both strategies are internally consistent regardless of the number of trials included, or whether differential responses (CS+ > CS-) were calculated. Finally, we found mixed evidence that number of trials and use of differential responding affects robustness of strategies examining change during extinction, which were associations that we had previously identified as having moderate support [47].

The main aim of the current study was to understand whether improving power, by increasing sample size, would improve robustness that is affected by heterogeneity of analytical strategies. As we had anticipated, the limitations imposed by varied analytic strategies holds even when applied to samples of greater size, but this appeared to apply only when no effect was present in the data. While this supports the validity of our prior study [54], it also challenges our previous findings in several ways. Firstly, in the data containing simulated group-level effects, some strategies improved markedly in terms of robustness as sample size increased. However, these cases were contradicted by several other strategies that showed weaker robustness with increasing sample size. Most importantly, not all strategies showed these patterns, and marked improvement in robustness in the static-extinction strategies was primarily observed at the higher sample sizes, which research groups would not have the capacity to collect. Improving robustness of strategies examining changes in fear responding from early to late extinction might be achievable by increasing sample size to an amount that is viable with respect to research resources. Critically, these results were not substantially affected by alteration of the gradient of the group-level effect. This implies that it is possible that an improved set of data analysis strategies for fear extinction data could be applied robustly across fear extinction phenotypes, which were recently identified in [50]. These findings provide critical boundary conditions and caveats to our previous findings, and we strongly emphasise that not all analytical approaches that we highlighted as problematic (or robust) in our previous report will be applicable to all real-world samples. Critically, by extension our current data also demonstrates that a statistical approach that seems unrobust in this study could be robust if the underlying effect is different.

This research is important because sample considerations are frequently the first criticism addressed in experimental psychology research, and supports the notion that further methodological innovations are required to enhance fear extinction research, beyond simply increasing study power [43]. Several research groups have begun moving towards Bayesian inference in fear extinction [7881] and computational modelling has also been explored in assessing physiological responses to fear conditioning [45]. It is possible that these contemporary statistical frameworks may offer solutions to the deficits imposed by heterogeneous analytic strategies in extinction research. However, further research is needed to explore this as a viable possibility to conventional data analysis, particularly in terms of accessibility to non-statisticians.

While compiling the data in the current study, we observed strong effects of lower sample size resulting in higher effect size. It is likely that the reduced effect size we observed with increasing sample size reflected increasing precision of the effect size, which is reflected in the increasingly narrow confidence intervals. The relationship between smaller samples reaching significance with higher effect sizes is intrinsically wedded to the parameters of power analysis in null-hypothesis significance testing (NHST) [82]. In a simple case, when performing a-priori power analysis (to determine an appropriate sample size), specifying higher r (e.g., effect size) and a significance criterion (α) of p < 0.05 will result in a generally lower N, all things being equal [83]. The propensity for studies with small sample sizes to inflate effect sizes is well documented [8486]. This is sometimes attributed to publication bias [85], but in the context of the current study, higher variability in our smaller samples is a likely cause, as indicated by 95% confidence intervals. Our findings suggest that these issues are likely to be prevalent until a minimum of n = 40 participants per group for a 3-group design (which may vary depending on the number of groups). However, it has been reported that this estimate is improved by advanced SCR scoring methods [39]. Interestingly, the inflection point of logarithmic improvement in some of the strategies in terms of robustness was at this same sample size, raising the possibility that there may be some relationship between adequately powered data and the propensity for certain strategies (mainly the early-late extinction strategies) to perform robustly. Relatedly, power analyses using single point estimates from previous fear conditioning studies is likely problematic given that heterogeneity of experimental parameters and effect sizes that are chosen by researchers affect power calculations. Instead, it might be more useful to estimate the expected variability and build a power analysis based on the precision of the anticipated effect size (i.e., the effect size’s confidence interval) [87].

Considering this finding, as well as the overall results of the current report, we suggest two implications for the enhancement of robustness in fear extinction research via SCR. First, in line with our prior report (Ney et al., 2020), it is critical that a specific analytic strategy is implemented only when the experimenter seeks to measure a specific aspect of fear extinction, one that corresponds clearly to the strategy in question. For instance, some of the analytic strategies identified in this and the prior study [54] can credibly be used to measure distinct aspects of extinction learning. For instance, subtracting early and late extinction responses might represent a principled measure of extinction learning per se, while subtracting mean extinction responses from mean acquisition responses could represent something quite different, albeit equally worthy of investigation. Critically, if these different strategies are used, it is incumbent on the experimenter to interpret the results consistently. Labelling all different strategies under a homogenised term (i.e., ‘extinction learning’) could otherwise incur costs to robustness, and ultimately, failures to replicate. Similarly, it is important that standardised methods for comparing extinction between group relative to acquisition learning are developed, because there is significant heterogeneity in current methods that do this [46], yet some relative estimation is essential given that the effects observed during extinction are often contingent on responses during acquisition.

Second, this study illustrates that the pervasive issue of measuring one construct by a diverse array of analyses remains an issue even in the face of some methodological changes, in this case, sample size. An implication of this is that other methodological changes may also be unable to ameliorate this effect, but more critically, that future research should strive to find ways to analysis extinction learning that circumvent the effect altogether. In other words, analysing data in different ways will almost always lead to different outcomes, and reduced robustness or replicability. Therefore, rather than finding ways to homogenise between different analytic strategies as a path forward, ongoing work could seek to characterise extinction via more principled quantitative approaches. It is critical to consider that fear acquisition and extinction are multifaceted processes that cannot be captured by a single parameter. In many cases, researchers will make different statistical decisions based on the type of learning process that they are interested in—for example, analysing data trial-by-trial may assess the rate of learning, whereas comparing mean responses during extinction to acquisition might assess someone’s relative performance between phases. One way of addressing the propensity for different studies to use different types of analyses is to use multiverse approaches. Multiverse analysis is an approach that assesses a statistical problem with multiple analytical methods [88]. In fear conditioning, multiverse packages have been written for R [89], and can potentially directly address the issues highlighted within this paper by increasing transparency of statistical decision making as well as the relative importance of a reported result [53]. In this way, not only does multiverse analysis reduce the potential of p-hacking, but it also facilitates comparison between studies that may have otherwise analysed their results in incomparable ways. Similar to this, it is almost certain based on this and recent data that different experimental designs (e.g., number of trials, induction of uncertainty via instructions, etc) are likely to produce different outcomes that may not be readily comparable between studies. We are aware of current work aiming to produce ‘typical’ fear conditioning experiments that may help to standardise the field, but in the meantime it is also possible that further investigation of the relationship between specific statistical analyses and experimental designs may help to improve the comparability of findings between fear conditioning studies.

The current study is primarily limited by the possibility of our findings not generalising to other fear extinction designs. For instance, we have a relatively low number of trials and long-duration stimuli (12 s), which are not the case for many studies. Further, these results may not be transferable to different data pre-processing methods and will need to be checked independently by groups that use these methods. One issue that we did not explicitly examine was the effect of number of trials on statistical outcomes—however, it is probable that the number of trials included in a study presents another significant heterogeneity factor that, when analysed using similar methods, may reduce robustness. Our experimental phases were all separated by brief instruction screens, including between early and late extinction learning, and this detail may have impacted on the patterns observed in our results. Third, our sample included a small proportion of PTSD participants, though this was done to replicate our previous study [47]. While we do not anticipate that this would affect our primary outcome, some variability in the bootstrapped samples may have been due to participant characteristics such as this. Next, we only simulated one type of potential group-level effect in our data and this may have resulted in some strategies showing greater or lesser robustness, depending on the aim of the strategy. Therefore, we cannot be prescriptive concerning which strategy may perform best with group-level effects; however, it is relevant to note that a model that best describes extinction has not been formalised and thus it is unknown what group-level extinction data should look like. Finally, there may be many more analytical strategies in the literature that were not included in the present paper. These strategies could alter the robustness between strategies reported here. The strategies reported here were identical to those identified in the previous paper—based on highly cited examples; hence, it is possible that there are different analytical strategies reported in less cited studies.

In conclusion, we found that larger sample size does not improve the robustness of fear extinction results when assessed across heterogeneous analytical strategies when no effect is simulated but does alter robustness under some circumstances when an effect is simulated. We also report that smaller sample sizes (less than N = 120, or n = 40 per group) result in inflated effect sizes, both in simulated and original data. Although this issue is not unique to fear extinction, formal identification of it may encourage better powered studies and more progressive methods in the future. Future studies should examine how robustness of fear extinction analyses can be improved and ensure that studies are adequately powered such that effect sizes are not artificially inflated.

Supporting information

S1 Table. Conditioning—Extinction, N = 30.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s001

(DOCX)

S2 Table. Conditioning—Extinction, N = 60.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s002

(DOCX)

S3 Table. Conditioning—Extinction, N = 120.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s003

(DOCX)

S4 Table. Conditioning—Extinction, N = 240.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s004

(DOCX)

S5 Table. Conditioning—Extinction, N = 480.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s005

(DOCX)

S6 Table. Conditioning—Extinction, N = 720.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s006

(DOCX)

S7 Table. Conditioning—Extinction, N = 960.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s007

(DOCX)

S8 Table. Static Extinction, N = 30.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s008

(DOCX)

S9 Table. Static Extinction, N = 60.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s009

(DOCX)

S10 Table. Static Extinction, N = 120.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s010

(DOCX)

S11 Table. Static Extinction, N = 240.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s011

(DOCX)

S12 Table. Static Extinction, N = 480.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s012

(DOCX)

S13 Table. Static Extinction, N = 720.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s013

(DOCX)

S14 Table. Static Extinction, N = 960.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s014

(DOCX)

S15 Table. Early—Late Extinction, N = 30.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s015

(DOCX)

S16 Table. Early—Late Extinction, N = 60.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s016

(DOCX)

S17 Table. Early—Late Extinction, N = 120.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s017

(DOCX)

S18 Table. Early—Late Extinction, N = 240.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s018

(DOCX)

S19 Table. Early—Late Extinction, N = 480.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s019

(DOCX)

S20 Table. Early—Late Extinction, N = 720.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s020

(DOCX)

S21 Table. Early—Late Extinction, N = 960.

Strategy comparisons using Kendall rank correlation coefficient between effect-simulated datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s021

(DOCX)

S22 Table. Conditioning—Extinction, N = 30.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s022

(DOCX)

S23 Table. Conditioning—Extinction, N = 60.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s023

(DOCX)

S24 Table. Conditioning—Extinction, N = 120.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s024

(DOCX)

S25 Table. Conditioning—Extinction, N = 240.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s025

(DOCX)

S26 Table. Conditioning—Extinction, N = 480.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s026

(DOCX)

S27 Table. Conditioning—Extinction, N = 720.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s027

(DOCX)

S28 Table. Conditioning—Extinction, N = 960.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes from Conditioning to extinction learning phases estimated.

https://doi.org/10.1371/journal.pone.0268814.s028

(DOCX)

S29 Table. Static Extinction, N = 30.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s029

(DOCX)

S30 Table. Static Extinction, N = 60.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s030

(DOCX)

S31 Table. Static Extinction, N = 120.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s031

(DOCX)

S32 Table. Static Extinction, N = 240.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s032

(DOCX)

S33 Table. Static Extinction, N = 480.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s033

(DOCX)

S34 Table. Static Extinction, N = 720.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s034

(DOCX)

S35 Table. Static Extinction, N = 960.

Strategy comparisons using Kendall rank correlation coefficient between datasets with a static extinction learning efficacy estimated.

https://doi.org/10.1371/journal.pone.0268814.s035

(DOCX)

S36 Table. Early—Late Extinction, N = 30.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s036

(DOCX)

S37 Table. Early—Late Extinction, N = 60.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s037

(DOCX)

S38 Table. Early—Late Extinction, N = 120.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s038

(DOCX)

S39 Table. Early—Late Extinction, N = 240.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s039

(DOCX)

S40 Table. Early—Late Extinction, N = 480.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s040

(DOCX)

S41 Table. Early—Late Extinction, N = 720.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s041

(DOCX)

S42 Table. Early—Late Extinction, N = 960.

Strategy comparisons using Kendall rank correlation coefficient between datasets with changes during extinction learning estimated.

https://doi.org/10.1371/journal.pone.0268814.s042

(DOCX)

S1 Fig. Overall responding in the real data set.

https://doi.org/10.1371/journal.pone.0268814.s043

(TIF)

References

  1. 1. Craske M.G., et al., Treatment for anxiety disorders: Efficacy to effectiveness to implementation. Behaviour research and therapy, 2009. 47(11): p. 931–937. pmid:19632667
  2. 2. Lebois L.A.M., et al., Augmentation of extinction and inhibitory learning in anxiety and trauma-related disorders. Annual review of clinical psychology, 2019. 15: p. 257–284. pmid:30698994
  3. 3. Bouton M.E., Context and behavioral processes in extinction. Learn Mem, 2004. 11(5): p. 485–94. pmid:15466298
  4. 4. Bouton M.E., Context, ambiguity, and unlearning: sources of relapse after behavioral extinction. Biol Psychiatry, 2002. 52(10): p. 976–86. pmid:12437938
  5. 5. Kalisch R., et al., Context-dependent human extinction memory is mediated by a ventromedial prefrontal and hippocampal network. The Journal of neuroscience: the official journal of the Society for Neuroscience, 2006. 26(37): p. 9503–9511. pmid:16971534
  6. 6. Zuj D.V., et al., The centrality of fear extinction in linking risk factors to PTSD: A narrative review. Neurosci Biobehav Rev, 2016. 69: p. 15–35. pmid:27461912
  7. 7. Duits P., et al., Updated meta-analysis of classical fear conditioning in the anxiety disorders. Depress Anxiety, 2015. 32(4): p. 239–53. pmid:25703487
  8. 8. Lonsdorf T.B., et al., Don’t fear ’fear conditioning’: Methodological considerations for the design and analysis of studies on human fear acquisition, extinction, and return of fear. Neurosci Biobehav Rev, 2017. 77: p. 247–285. pmid:28263758
  9. 9. Ney L.J., et al., Dopamine, endocannabinoids and their interaction in fear extinction and negative affect in PTSD. Progress in Neuro-Psychopharmacology and Biological Psychiatry, 2021. 105: p. 110118. pmid:32991952
  10. 10. Abraham A.D., Neve K.A., and Lattal K.M., Dopamine and extinction: a convergence of theory with fear and reward circuitry. Neurobiol Learn Mem, 2014. 108: p. 65–77. pmid:24269353
  11. 11. Hill M.N., et al., Integrating Endocannabinoid Signaling and Cannabinoids into the Biology and Treatment of Posttraumatic Stress Disorder. Neuropsychopharmacology, 2018. 43(1): p. 80–102. pmid:28745306
  12. 12. Ney L.J., et al., Modulation of the endocannabinoid system by sex hormones: Implications for Posttraumatic Stress Disorder. Neurosci Biobehav Rev, 2018. 94: p. 302–320. pmid:30017748
  13. 13. Lebron-Milad K., Graham B.M., and Milad M.R., Low Estradiol Levels: A Vulnerability Factor for the Development of Posttraumatic Stress Disorder Biological Psychiatry, 2012. 72: p. 6–7. pmid:22682395
  14. 14. Gogos A., et al., Sex differences in schizophrenia, bipolar disorder and PTSD: Are gonadal hormones the link? British Journal of Pharmacology, 2019. 176(21): p. 4119–4135. pmid:30658014
  15. 15. Merz C.J., et al., Neural Underpinnings of Cortisol Effects on Fear Extinction. Neuropsychopharmacology: official publication of the American College of Neuropsychopharmacology, 2018. 43(2): p. 384–392. pmid:28948980
  16. 16. Stockhorst U. and Antov M.I., Modulation of Fear Extinction by Stress, Stress Hormones and Estradiol: A Review. Front Behav Neurosci, 2015. 9: p. 359. pmid:26858616
  17. 17. Zuj D.V., et al., Endogenous cortisol reactivity moderates the relationship between fear inhibition to safety signals and posttraumatic stress disorder symptoms. Psychoneuroendocrinology, 2017. 78: p. 14–21. pmid:28135580
  18. 18. Ney L.J., et al., BDNF genotype Val66Met interacts with acute plasma BDNF levels to predict fear extinction and recall. Behaviour Research and Therapy, 2021. 145: p. 103942. pmid:34340176
  19. 19. Ney L.J., et al., Translation of animal endocannabinoid models of PTSD mechanisms to humans: Where to next? Neuroscience & Biobehavioral Reviews, 2022. 132: p. 76–91.
  20. 20. Graham B.M., Callaghan B.L., and Richardson R., Bridging the gap: Lessons we have learnt from the merging of psychology and psychiatry for the optimisation of treatments for emotional disorders. Behav Res Ther, 2014. 62: p. 3–16. pmid:25115195
  21. 21. Zuj D.V. and Norrholm S.D., The clinical applications and practical relevance of human conditioning paradigms for posttraumatic stress disorder. Progress in Neuro-Psychopharmacology and Biological Psychiatry, 2019. 88: p. 339–351. pmid:30134147
  22. 22. Milad M.R. and Quirk G.J., Fear extinction as a model for translational neuroscience: ten years of progress. Annu Rev Psychol, 2012. 63: p. 129–51. pmid:22129456
  23. 23. Lange I., et al., Neural responses during extinction learning predict exposure therapy outcome in phobia: results from a randomized-controlled trial. Neuropsychopharmacology, 2020. 45(3): p. 534–541. pmid:31352467
  24. 24. Fullana M.A., et al., Human fear conditioning: From neuroscience to the clinic. Behaviour Research and Therapy, 2020. 124: p. 103528. pmid:31835072
  25. 25. Picó-Pérez M., et al., Common and distinct neural correlates of fear extinction and cognitive reappraisal: A meta-analysis of fMRI studies. Neuroscience & Biobehavioral Reviews, 2019. 104: p. 102–115. pmid:31278951
  26. 26. Scheveneels S., et al., The validity of laboratory-based treatment research: Bridging the gap between fear extinction and exposure treatment. Behav Res Ther, 2016. 86: p. 87–94. pmid:27590839
  27. 27. Vervliet B., Craske M.G., and Hermans D., Fear extinction and relapse: state of the art. Annu Rev Clin Psychol, 2013. 9: p. 215–48. pmid:23537484
  28. 28. Simmons J.P., Nelson L.D., and Simonsohn U., False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci, 2011. 22(11): p. 1359–66. pmid:22006061
  29. 29. Wagenmakers E.J., et al., Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). J Pers Soc Psychol, 2011. 100(3): p. 426–32. pmid:21280965
  30. 30. Koul A., Becchio C., and Cavallo A., Cross-Validation Approaches for Replicability in Psychology. Frontiers in Psychology, 2018. 9(1117).
  31. 31. Wingen T., Berkessel J.B., and Englich B., No Replication, No Trust? How Low Replicability Influences Trust in Psychology. Social Psychological and Personality Science, 2019. 11(4): p. 454–463.
  32. 32. Rabeyron T., Why Most Research Findings About Psi Are False: The Replicability Crisis, the Psi Paradox and the Myth of Sisyphus. Frontiers in Psychology, 2020. 11(2468). pmid:33041926
  33. 33. Beckers T., et al., What’s wrong with fear conditioning? Biol Psychol, 2013. 92(1): p. 90–6. pmid:22223096
  34. 34. Pöhlchen D., et al., No robust differences in fear conditioning between patients with fear-related disorders and healthy controls. Behaviour Research and Therapy, 2020. 129: p. 103610. pmid:32302820
  35. 35. Abend R., et al., Anticipatory Threat Responding: Associations With Anxiety, Development, and Brain Structure. Biological psychiatry, 2020. 87(10): p. 916–925. pmid:31955915
  36. 36. Vervliet B. and Boddez Y., Memories of 100 years of human fear conditioning research and expectations for its future. Behav Res Ther, 2020. 135: p. 103732. pmid:33007544
  37. 37. Ryan K.M., et al., The need for standards in the design of differential fear conditioning and extinction experiments in youth: A systematic review and recommendations for research on anxiety. Behaviour Research and Therapy, 2019. 112: p. 42–62. pmid:30502721
  38. 38. Melinscak F. and Bach D.R., Computational optimization of associative learning experiments. PLOS Computational Biology, 2020. 16(1): p. e1007593. pmid:31905214
  39. 39. Bach D. and Melinscak F., Psychophysiological modelling and the measurement of fear conditioning. Behaviour Research and Therapy, 2020. 127: p. 103576. pmid:32087391
  40. 40. Benedek M. and Kaernbach C., Decomposition of skin conductance data by means of nonnegative deconvolution. Psychophysiology, 2010. 47(4): p. 647–58. pmid:20230512
  41. 41. Green S.R., et al., Development and validation of an unsupervised scoring system (Autonomate) for skin conductance response analysis. Int J Psychophysiol, 2014. 91(3): p. 186–93. pmid:24184342
  42. 42. Jentsch V.L., Wolf O.T., and Merz C.J., Temporal dynamics of conditioned skin conductance and pupillary responses during fear acquisition and extinction. International Journal of Psychophysiology, 2020. 147: p. 93–99. pmid:31760105
  43. 43. Ney L.J., et al., Critical evaluation of current data analysis strategies for psychophysiological measures of fear conditioning and extinction in humans. International Journal of Psychophysiology, 2018. 134: p. 95–107. pmid:30393110
  44. 44. Krypotos A.M. and Engelhard I.M., Testing a novelty-based extinction procedure for the reduction of conditioned avoidance. J Behav Ther Exp Psychiatry, 2018. 60: p. 22–28. pmid:29486371
  45. 45. Tzovara A., Korn C.W., and Bach D., Human Pavlovian fear conditioning conforms to probabilistic learning. PLOS Computational Biology, 2018. 14(8): p. e1006243. pmid:30169519
  46. 46. Lonsdorf T.B., Merz C.J., and Fullana M.A., Fear extinction retention: Is it what we think it is? Biological Psychiatry, 2019. 85(12): p. 1074–1082. pmid:31005240
  47. 47. Ney L.J., et al., Inconsistent analytic strategies reduce robustness in fear extinction via skin conductance response. Psychophysiology, 2020. 57(11): p. e13650. pmid:32748977
  48. 48. Bach D.R., et al., Dynamic causal modeling of spontaneous fluctuations in skin conductance. Psychophysiology, 2011. 48(2): p. 252–7. pmid:20557485
  49. 49. Ioannidis J.P., Why most published research findings are false. PLoS Med, 2005. 2(8): p. e124. pmid:16060722
  50. 50. Galatzer-Levy I.R., et al., Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting PTSD. Transl Psychiatry, 2017. 7(3): p. e0. pmid:28323285
  51. 51. Weathers F., et al., The PTSD Checklist (PCL): Reliability, Validity, and Diagnostic Utility, in Annual Convention of the International Society for Traumatic Stress Studies. 1993: San Antonio, TX.
  52. 52. Weathers, F., et al. The PTSD Checklist for DSM-5 (PCL-5)—Standard [Measurement instrument]. 2013.
  53. 53. Sjouwerman, R., et al., A data multiverse analysis investigating non-model based SCR quantification approaches. 2021.
  54. 54. Johnson R.W., An Introduction to the Bootstrap. Teaching Statistics, 2001. 23(2).
  55. 55. Mundfrom D.J., Shaw D.G., and Ke T.L., Minimum Sample Size Recommendations for Conducting Factor Analyses. International Journal of Testing, 2005. 5(2): p. 159–168.
  56. 56. Collins G.S., Ogundimu E.O., and Altman D.G., Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Statistics in Medicine, 2016. 35(2): p. 214–226. pmid:26553135
  57. 57. Kocovsky P.M., Adams J.V., and Bronte C.R., The Effect of Sample Size on the Stability of Principal Components Analysis of Truss-Based Fish Morphometrics. Transactions of the American Fisheries Society, 2009. 138(3): p. 487–496.
  58. 58. Schönbrodt F.D. and Perugini M., At what sample size do correlations stabilize? Journal of Research in Personality, 2013. 47(5): p. 609–612.
  59. 59. Ney L.J., et al., Cannabinoid polymorphisms interact with plasma endocannabinoid levels to predict fear extinction learning. Depress Anxiety, 2021. pmid:34151472
  60. 60. Graham B.M. and Milad M.R., Blockade of estrogen by hormonal contraceptives impairs fear extinction in female rats and women. Biol Psychiatry, 2013. 73(4): p. 371–8. pmid:23158459
  61. 61. Milad M.R., et al., The influence of gonadal hormones on conditioned fear extinction in healthy humans. Neuroscience, 2010. 168(3): p. 652–8. pmid:20412837
  62. 62. White E.C. and Graham B.M., Estradiol levels in women predict skin conductance response but not valence and expectancy ratings in conditioned fear extinction. Neurobiol Learn Mem, 2016. 134 Pt B: p. 339–48.
  63. 63. Grady A.K., et al., Effect of continuous and partial reinforcement on the acquisition and extinction of human conditioned fear. Behav Neurosci, 2016. 130(1): p. 36–43. pmid:26692449
  64. 64. Milad M.R., et al., Neurobiological basis of failure to recall extinction memory in posttraumatic stress disorder. Biol Psychiatry, 2009. 66(12): p. 1075–82. pmid:19748076
  65. 65. Zuj D.V., et al., Impaired fear extinction associated with PTSD increases with hours-since-waking. Depress Anxiety, 2016. 33(3): p. 203–10. pmid:26744059
  66. 66. Garfinkel S.N., et al., Impaired contextual modulation of memories in PTSD: an fMRI and psychophysiological study of extinction retention and fear renewal. The Journal of neuroscience: the official journal of the Society for Neuroscience, 2014. 34(40): p. 13435–13443. pmid:25274821
  67. 67. Schiller D., et al., Preventing the return of fear in humans using reconsolidation update mechanisms. Nature, 2010. 463(7277): p. 49–53. pmid:20010606
  68. 68. Milad M.R., et al., Presence and acquired origin of reduced recall for fear extinction in PTSD: results of a twin study. Journal of psychiatric research, 2008. 42(7): p. 515–520. pmid:18313695
  69. 69. Milad M.R., et al., Fear conditioning and extinction: influence of sex and menstrual cycle in healthy humans. Behav Neurosci, 2006. 120(6): p. 1196–203. pmid:17201462
  70. 70. Blechert J., et al., Fear conditioning in posttraumatic stress disorder: evidence for delayed extinction of autonomic, experiential, and behavioural responses. Behav Res Ther, 2007. 45(9): p. 2019–33. pmid:17442266
  71. 71. Michael T., et al., Fear conditioning in panic disorder: Enhanced resistance to extinction. J Abnorm Psychol, 2007. 116(3): p. 612–7. pmid:17696717
  72. 72. Phelps E.A., et al., Extinction learning in humans: role of the amygdala and vmPFC. Neuron, 2004. 43(6): p. 897–905. pmid:15363399
  73. 73. Milad M.R., et al., Deficits in Conditioned Fear Extinction in Obsessive-Compulsive Disorder and Neurobiological Changes in the Fear Circuit. JAMA Psychiatry, 2013. 70(6): p. 608–618. pmid:23740049
  74. 74. Soliman F., et al., A genetic variant BDNF polymorphism alters extinction learning in both mouse and human. Science (New York, N.Y.), 2010. 327(5967): p. 863–866. pmid:20075215
  75. 75. Zeidan M.A., et al., Estradiol modulates medial prefrontal cortex and amygdala activity during fear extinction in women and female rats. Biol Psychiatry, 2011. 70(10): p. 920–7. pmid:21762880
  76. 76. Carp J., The secret lives of experiments: Methods reporting in the fMRI literature. NeuroImage, 2012. 63(1): p. 289–300. pmid:22796459
  77. 77. Botvinik-Nezer R., et al., Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 2020. 582(7810): p. 84–88. pmid:32483374
  78. 78. Sjouwerman R. and Lonsdorf T.B., Experimental boundary conditions of reinstatement-induced return of fear in humans: Is reinstatement in humans what we think it is? Psychophysiology, 2020. 57(5): p. e13549. pmid:32072648
  79. 79. Krypotos A.M., et al., A Primer on Bayesian Analysis for Experimental Psychopathologists. J Exp Psychopathol, 2017. 8(2): p. 140–157. pmid:28748068
  80. 80. Krypotos A.M., Klugkist I., and Engelhard I.M., Bayesian hypothesis testing for human threat conditioning research: an introduction and the condir R package. Eur J Psychotraumatol, 2017. 8(sup1): p. 1314782. pmid:29038683
  81. 81. Cameron G., Schlund M.W., and Dymond S., Generalization of socially transmitted and instructed avoidance. Frontiers in Behavioral Neuroscience, 2015. 9(159). pmid:26150773
  82. 82. Hoenig J.M. and Heisey D.M., The Abuse of Power. The American Statistician, 2001. 55(1): p. 19–24.
  83. 83. Cohen J., Statistical Power Analysis. Current Directions in Psychological Science, 1992. 1(3): p. 98–101.
  84. 84. Button K.S., et al., Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 2013. 14(5): p. 365–376. pmid:23571845
  85. 85. Kühberger A., Fritz A., and Scherndl T., Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size. PLOS ONE, 2014. 9(9): p. e105825. pmid:25192357
  86. 86. Hackshaw A., Small studies: strengths and limitations. European Respiratory Journal, 2008. 32(5): p. 1141. pmid:18978131
  87. 87. Lakens, D., Sample Size Justification. 2021.
  88. 88. Steegen S., et al., Increasing transparency through a multiverse analysis. Perspect Psychol Sci, 2016. 11(5): p. 702–712. pmid:27694465
  89. 89. Lonsdorf, T.B., et al., Multiverse analyses in fear conditioning research. 2021.