Attention networks and the intrinsic network structure of the human brain

Abstract Attention network theory distinguishes three independent systems, each supported by its own distributed network: an alerting network to deploy attentional resources in anticipation, an orienting network to direct attention to a cued location, and a control network to select relevant information at the expense of concurrently available information. Ample behavioral and neuroimaging evidence supports the dissociation of the three attention domains. The strong assumption that each attentional system is realized through a separable network, however, raises the question how these networks relate to the intrinsic network structure of the brain. Our understanding of brain networks has advanced majorly in the past years due to the increasing focus on brain connectivity. The brain is intrinsically organized into several large‐scale networks whose modular structure persists across task states. Existing proposals on how the presumed attention networks relate to intrinsic networks rely mostly on anecdotal and partly contradictory arguments. We addressed this issue by mapping different attention networks at the level of cifti‐grayordinates. Resulting group maps were compared to the group‐level topology of 23 intrinsic networks, which we reconstructed from the same participants' resting state fMRI data. We found that all attention domains recruited multiple and partly overlapping intrinsic networks and converged in the dorsal fronto‐parietal and midcingulo‐insular network. While we observed a preference of each attentional domain for its own set of intrinsic networks, implicated networks did not match well to those proposed in the literature. Our results indicate a necessary refinement of the attention network theory.


| INTRODUCTION
In order to react adequately and to act purposefully in a dynamic and ever-changing environment, the brain needs to prioritize information processing, for example, by anticipating when and where sensory information will appear, or by selecting more relevant over less relevant information. Attention refers to the cognitive function that guides the prioritization and selection of some at the expense of other information (Cowan, 1999;Posner & Fan, 2008). Converging evidence from single cell recordings, electrophysiology, and neuroimaging suggests that ongoing neural information processing is enhanced in a highly specific and targeted way when attention is shifted toward a certain location in the visual field (Brefczynski & DeYoe, 1999;Heinze et al., 1994;Kastner, Pinsk, De Weerd, Desimone, & Ungerleider, 1999;Luck, Chelazzi, Hillyard, & Desimone, 1997;Müller, Bartelt, Donner, Villringer, & Brandt, 2003) or toward task-relevant stimulus features (Egner & Hirsch, 2005;O'Craven, Rosen, Kwong, Treisman, & Savoy, 1997;Rees, Frith, & Lavie, 1997). While the effects of attention become apparent in increased firing rates and BOLD activity in sensory areas, which process the currently attended information, the recruitment and control of attention signals is realized by neural systems that additionally include areas upstream on the cortical processing hierarchy (Posner & Dehaene, 1994;Posner & Petersen, 1990). Attention network theory assumes three largely independent systems that realize one out of three different types of attention: the alerting system initiates a state of increased arousal in direct anticipation of upcoming stimuli, the orienting system shifts the attentional focus to locations in space, and the control system selects and amplifies relevant information when distracting or task-incompatible information is present (Posner & Petersen, 1990). The three systems are thought to dissociate neuroanatomically into independent "attention networks" (Posner & Rothbart, 2007). Evidence for the relative independence of the three attention systems comes from research with the attention network test (ANT; Fan, McCandliss, Sommer, Raz, & Posner, 2002). The ANT is a reaction time task that combines the flanker task (Eriksen & Eriksen, 1974) to study the attentional selection of relevant information at the expense of irrelevant distractors with the Posner cueing task (Posner, 1980) where briefly presented cues carry information when and where an upcoming target stimulus will appear. Behavioral indices of the efficiency of alerting, orienting, and control are uncorrelated  which is interpreted as an indication of independent systems. Moreover, genetic work points toward different genetic contributions and underlying susceptibility variants for the attention systems (Fan, Wu, Fossella, & Posner, 2001;Fossella et al., 2002;Reuter, Ott, Vaitl, & Hennig, 2007). Furthermore, neuroimaging work with the ANT has revealed nonoverlapping activation patterns for task contrasts that probe alerting, orienting, and attentional control (Fan, Mccandliss, Fossella, Flombaum, & Posner, 2005), adding further evidence for dissociable and presumably independent systems. More recent work, however, has documented partially overlapping activations for the different attention systems (Xuan et al., 2016), which is interpreted as the neural manifestation of interactions between the attention systems. Such interactions have also been observed at the behavioral level (Callejas, Lupiáñez, & Tudela, 2004;Fan et al., 2009). Finally, it has been hypothesized that the three attention systems dissociate at the level of intrinsic connectivity networks (Petersen & Posner, 2012).
This, however, has not been addressed empirically.
Attention network theory uses the term "network" to refer to the distributed activation foci in the ANT. As the term "network" is also used to describe a set of intrinsic connectivity networks in the brain, it is imperative to clarify how attention networks and intrinsic connectivity networks relate to each other. Our understanding of brain networks has advanced majorly in the past years due to the increasing focus on brain connectivity. Several large-scale networks that delineate along functional boundaries of the brain have been identified in spontaneous intrinsic BOLD fluctuations in the task-free resting state (Fox & Raichle, 2007;Smith et al., 2013;van den Heuvel & Hulshoff Pol, 2010). Importantly, the intrinsic network architecture persists into task states and matches the topology of task-evoked activations (Cole, Bassett, Power, Braver, & Petersen, 2014;Gordon, Stollstorff, & Vaidya, 2012;Nickerson, 2018;Smith et al., 2009). If the three attention systems were actually independent networks, we would assume that each system activates a distinct or distinct group of intrinsic connectivity network (ICN). This has also been suggested previously, for instance, that the three attention networks segregate within an "extended fronto-parietal network" (Xuan et al., 2016), that the orienting network corresponds to a dorsal and a ventral frontoparietal network and the attention control network to a distinct fronto-parietal and an insular-opercular network (Petersen & Posner, 2012). The fronto-parietal and insular-opercular network have also been discussed regarding their role in alerting (Sadaghiani & D'Esposito, 2014). Some of these previous propositions, however, rely only on anecdotal arguments and appear in conflict with each other.
At present, it is unclear how the idea of three separable and independent attention networks as activated by the ANT is reflected in the overall network structure of the brain. We designed the current study to directly probe the spatial correspondence between ICN and the three attention networks. Since attention networks are often equated with the activation patterns elicited by the ANT at the measurement level, such comparison would also clarify the relationship between two distinct concepts for which the term "network" is widely used.
We first recorded resting-state fMRI data in order to delineate ICN and second recorded task fMRI data from the same participants with the most recent version of the ANT (the revised ANT, (Fan et al., 2009;Xuan et al., 2016). We made use of recent developments by the Human Connectome Project to achieve high spatial precision through multimodal surface matching (Robinson et al., 2018) and minimal spatial smoothing (Glasser et al., 2016). We expect to replicate previous findings with the ANT: We expect behavioral indices for the efficiency of different attention systems to be uncorrelated and we expect significant activations at previously reported voxel locations.
We probed the relationship between ICN and different attention contrasts through separate spatial regression analyses (Gordon et al., 2012). We first ask whether attention systems dissociate at the ICN level. Since the regressions' beta weights quantify bivariate spatial correspondence, a dissociation of the attention systems at the ICN level would be reflected in a nonsignificant or significantly negative correlation of the beta weights from different attention contrasts.
Furthermore, we would expect that no single ICN contributes to all attention contrasts. We ask second, if certain ICN contribute specifically to any of the attention systems, in order to obtain evidence in favor of any of the previous proposals how the attention networks relate to different ICN.

| Participants
We recruited N = 86 healthy young adults (age: M = 26.17 years, SD = 5.41 years; n = 39 females, n = 47 males) through flyer advertisements on campus, mailing lists, and announcement in undergraduate psychology classes. Participants were screened during a telephone interview to meet the following inclusion criteria: Native-level proficiency in German, right-handedness, and age between 18 and 35 years. We targeted an equal amount of male and female participants. Participants were excluded when they indicated past or present psychiatric or neurological illness, psychotropic substance use in the past 6 months, or any contraindication to MRI. Participants reported to have normal or corrected-to-normal vision during the experiment. Informed written consent was obtained prior to enrollment in the study. Participants were remunerated with the usual rate of 10 EUR/hr (i.e., 25 EUR for the entire study) or its equivalence in course credit, if desired by the participant. The study protocol was in accordance with the Declaration of Helsinki and approved by the ethics committee of the University Hospital Bonn.

| Attentional network test
We adapted the Attentional Network Test in its revised form (Xuan et al., 2016). The ANT-R combines a spatial cueing with a flanker task and is the standard protocol to activate different attentional systems in the brain. We administered a total of 288 trials in four runs of 72 trials each. A typical trial sequence is shown in Figure 1. The ANT-R follows a 4 Â 2 design with the factors cueing condition (no cue, double cue, valid spatial cue, and invalid spatial cue) and target (congruent flanker, incongruent flanker). Activation maps and behavioral indices for the attention networks were computed by contrasting different cue and target conditions as described below (see task analysis and behavioral analysis). Each run lasted for 420 s, leading to a total time of around 30 min for the whole experiment.
Throughout each run a fixation cross was presented in the middle of the screen, surrounded by a rectangle on its left and right side (the F I G U R E 1 Schematic overview over stimuli and stimulus timing in a typical trial sequence. Each trial started with a 100 ms presentation of either no cue, a double cue, or a spatial cue. After a cue-target interval of 0, 400, or 800 ms, five arrows were flashed for 500 ms as target stimulus. Participants indicated via button press whether the central arrow pointed to the left or to the right. Flanking arrows were either congruent or incongruent (half of the trials each). Target offset and onset of the next cue were spaced by a jittered interval (mean interval across trials: 4,000 ms, range: 2,000-12,000 ms). Targets appeared either at the cued position (valid spatial cues) or at uncued position (invalid spatial cue). A total of 288 trails was presented in four runs rectangles subtended 4.69 of visual angle to both sides). The fixation cross and the rectangles remained visible during the whole run. In every trial, arrows were presented in one of the rectangles: An arrow in the center (target) was surrounded by two arrows each on the left and the right side (flankers). Each arrow subtended 0.58 of visual angle and the distance between arrows was 0.06 of visual angle. The arrows pointed either to the left or to the right and the five arrows could either be congruent (i.e., the target arrow pointed to the same direction as the surrounding flankers) or incongruent (i.e., the target arrow pointed to the opposite direction as the flankers). Participants were instructed to select as fast and accurately as possible the direction of the middle arrow by either pressing a button in the left or the right hand. In some trials, a cue was presented before the flankers appeared via brightening of one or both of the rectangles. As a spatial cue, only one of the rectangles flashed, while a brightening of both rectangles (double cue) served as a temporal cue. Spatial cues could either be valid, that is, the arrows were presented in the rectangle that brightened, or invalid, that is, the arrows were presented in the opposite rectangle. A short interval was implemented between cue and flanker presentation.
Each trial consisted of three phases: a cue phase (100 ms), a short interval (0, 400 or 800 ms, equally distributed), and a target phase (500 ms). The different conditions were spread across two blocks consisting of 144 trials: Of the 144 trials, one sixth, that is, 24, were no cue, double cue, and invalid spatial cue trials, respectively. The other half, that is, 72 trials, consisted of valid spatial cues. Each cue type was followed by each other cue type equally often (Fan et al., 2009).
The 24 combinations of interval between cue and target phase, flanker type (congruent or incongruent) and target location (left or right rectangle) were randomized for each cue condition. The interval between offset of target and onset of the next trial was distributed systematically between 2,000 and 12,000 ms with a mean of around 4,000 ms (for details see Fan et al., 2009). While the target was only presented for 500 ms, participants had additional 1,200 ms to press the button after offset of the target, leading to a total time frame of 1,700 ms to respond.
The experiment was programed with Presentation software version 20.1 (Neurobehavioral Systems, Inc., Albany, CA) and presented via a projector in the MR scanner. The projection screen had a resolution of 1,024 Â 768px (24 Â 18 cm) and the distance between screen and participants' eyes was around 62 cm. Participants took part in a short training block consisting of 10 trials outside of the MR scanner to get familiar with the setup.

| Image acquisition
All MR images were acquired in a single session on a Siemens 3T Prisma equipped with a 32 channel head coil at the Berlin Center for Advanced Neuroimaging between March and December 2019. We adopted MR sequences from the HCP-Lifespan project (Harms et al., 2018). The following protocols were acquired in a fixed order: (a) T1-weighted structural (Multiecho MPRAGE, voxel size 0.8 mm isotropic, time to repeat TR = 2.4 s, time to echo TE = 22 ms, flip angle 8 ), (b) T2-weighted structural (SPACE, voxel size 0.8 mm isotropic, TR = 3.2 s, TE = 563 ms, flip angle 120 ), (c) BOLD rfMRI (multiband echoplanar, 72 slices, 805 volumes, TR = 800 ms, voxel size 2 mm isotropic, TE = 37 ms, flip angle 52 , A-P encoding direction) including two spin echo fieldmaps (A-P and P-A encoding), and (d) tfMRI in four runs with run-specific spin echo fieldmaps and the same pulse sequence as for rfMRI 4) Diffusion-weighted images (DWI). DWI data will not be part of the present report. A reference image without multiband acceleration was acquired for each functional run.

| Preprocessing
We adapted the HCP minimal preprocessing pipelines (github.com/-Washington-University/HCPpipelines) for structural and functional preprocessing . If not stated otherwise, we used version 4.1 of the pipelines, Freesurfer 6.0.0, and FSL 6.0.1 under Linux Debian 10. Structural images (T1 and T2) were corrected for gradient distortions, aligned, brain extracted, bias field corrected, and registered to MNI space using nonlinear transformation. Structural images where then further processed with HCP's Freesurfer pipeline with improved brain extraction, alignment, and adjustment of the white matter surface. The Freesurfer output was converted to Nifti and Gifti files and used to create a brain mask for all further analyses.
Cortical surfaces were then registered to template space based on cortical folding (MSMsulc, Robinson et al., 2018) and downsampled to the 32k_LR surface space. All functional data (rfMRI and task fMRI) and the corresponding field maps were processed with the fMRIVolume pipeline, which included correction for gradient distortions, motion, EPI image distortions, co-registration with the T1 structural image, and normalization to MNI volumetric space. All transformations were applied in one step. Functional data were then intensity normalized to their global 4D mean and masked. The resulting volume timeseries were further processed with the fMRISurface pipeline to create individual CIFTI dense timeseries grayordinate files by resampling subcortical gray matter voxels to standard subcortical parcels and by partial-volume-weighted and cortical-ribbon-constrained-mapping of cortical gray matter voxels onto standard surface vertices. In this step, we applied light volumeand surface-based smoothing with a Gaussian filter with 2 mm full width at half maximum.
Resting state timeseries were processed further to remove artifacts. Each participants' volumetric rfMRI timeseries were first run through FSL's Multivariate Exploratory Linear Optimized Decomposition into Independent Components (MELODIC) tool (ve3.15) and then processed using FSL Fix (v1.06.15). We used a classifier that had been trained on the HCP young adult sample as distributed with FIX. Automatic component classification worked excellent despite small differences in acquisition parameters between our data and the training data. Manual inspection indicated that no component had to be relabeled. Artifactual components were regressed out together with the six head motion parameters and their first temporal derivatives. The cleaned rfMRI time series were then converted to grayordinates as described above. The FIX pipeline was run on a 12-core Mac Pro (High Sierra 10.12.6) machine using R (v3.3.3), Matlab (v2018b), and HCPpipelines (v4.2.1). Relevant R-packages were used in the respective version mentioned in the FIX documentation and re-compiled when needed.

| Independent component analysis
We identified ICN at the group level by running a group ICA in MELODIC after concatenating all participants' FIX-cleaned dense timeseries grayordinate files in time and reducing the data matrix into a 1,609 dimensional subspace. We requested 27 components, after estimating the ICA's dimensionality in Matlab using HCP code (icaDim.m). Four artifactual noise components were identified through visual inspection and the remaining 23 components were kept for further analyses.
We labeled these ICN based on (a) Table T1), and (c) visual inspection and comparison with a detailed map of cortical areas (see Figure S1).

| Task analyses
First level analyses were performed in SPM12 (www.fil.ion.ucl.ac.uk/) using a general linear model. Surface images were converted to "fakevolumetric" nifti-images using wb_command. Condition specific regressors were created by convolving a train of delta functions with SPM's canonical hemodynamic response function. Following the ANT's four (cues) by two (targets) design, separate regressors reflected the onsets of the following events: Congruent targets following double cues, congruent targets following valid cues, congruent targets following invalid cues, congruent targets following no cues, incongruent targets following double cues, incongruent targets following valid cues, incongruent targets following invalid cues, incongruent targets following no cues. In addition to these eight regressors, we added one additional regressor with the onsets of error trials, 12 regressors with the 6 head motion parameters and their temporal derivatives, and one constant per run. The final design matrix contained (8 + 13 + 1) * 4 columns.
Linear weighted contrasts were computed on the estimated beta images to derive the attention network maps (see Fan et al., 2005;Xuan et al., 2016): Alerting network: Double cue minus no cue (across target conditions). Control network: all incongruent targets minus all congruent targets (across cue conditions). The orienting network was operationalized via the Validity effect (invalid cue minus valid cue, across target conditions), which is a combination of disengaging attention from an invalid location (invalid cue minus double cue, the Disengaging effect) and moving and engaging the attentional focus to a validly cued location (valid cue minus double cue, the Moving + Engaging effect). Individual contrast images were back-converted to cifti-files and then passed on to second level group analysis.
We used the Sandwich Estimator (SwE) Toolbox for SPM12 (Guillaume, Hua, Thompson, Waldorp, & Nichols, 2014) for grouplevel analyses of individual contrast images to assess activation of attentional systems across all participants. SwE's main application is longitudinal and repeated measures neuroimaging data, but SwE is also suitable for more simple designs like ours. We used the modified SwE procedure with a small sample size correction (type c) and a wild bootstrapping procedure with 999 bootstraps. The family-wise error was corrected at the cluster level (p <.05) with a cluster-forming threshold of p <.001. The thresholding of the activation maps was also F I G U R E 2 Thresholded statistical maps of independent components and their grouping into ICN through hierarchical clustering of associated time courses. The components in the red cluster belong to the executive control and fronto-parietal network, the components in magenta to the default mode and language networks, the components in light green represent the (visual) occipital network, the components in blue the midcingulo-insular and dorsal fronto-parietal "attention" networks, and the components in the darker green cluster represent the somatomotor and auditory networks. The numbers of the components correspond to the order of the ICA output (ordered by variance explained), the ordinal position in the figure was determined by the clustering done for display purposes: All follow-up analyses on spatial correspondence with ICN made use of the unthresholded maps.
In order to establish that our adaptation of the ANT led to similar activations as reported in previous work, we repeated the second level analysis in SPM12 with volumetric data (4 mm smoothing kernel). We created volumetric masks for each of the five attention contrast with 8 mm spheres around the peak voxel locations reported in Xuan et al. (2016) and controlled the familywise error at the voxel level within these masks. The masks are shown in the supplementary Figure S3 and Table T2.

| Spatial regression
Our main question focuses on the spatial relationship between intrinsic ICN and the different attentional "networks" as activated by the ANT-R. We used a multiple spatial regression approach (Gordon et al., 2012) to predict group-level ANT-activation maps from the 23 grouplevel ICN. Separate models were estimated for each activation map.
Unthresholded activation maps (z-images) were reshaped to column vectors that included all cortical vertices and subcortical voxels. These vectors served as criterion in the regression analyses. On the predictor level, all 23 nonartifactual unthresholded IC maps were reshaped into a grayordinate * component matrix. Ordinary least square regressions were fitted using Matlab's fitlm function. Possible confounds due to collinearity were ruled out by inspecting condition indices and variance decomposition proportions from the predictor matrix. Effect sizes for individual ICN were calculated as partial regression coefficients by residualizing each ICN from all remaining ICN, fitting a linear regression model, and obtaining the adjusted R 2 .

| Behavioral analysis
We analyzed reaction times and error rates to compute behavioral indices of the different attention networks. For reaction time analyses, all error trials and responses outside a response window of 1,700 ms after target onset were excluded (see Xuan et al., 2016).
Behavioral indices were computed as differences between experimen-

| Final sample
We had to exclude two participants because of incidental findings.
The final rfMRI sample included N = 84 (mean age M = 26.34, SD = 5.35, n = 38 female, n = 46 male). Six participants were excluded from the task analysis for committing an excessive number of errors in the ANT (n = 4), large artifacts in the task fMRI data (n = 1), and incomplete task fMRI data (n = 1). The final tfMRI sample included N = 78 subjects (mean age M = 26.19, SD = 5.34, n = 35 female, n = 43 male).

| Behavioral results
Mean differences and test statistics for reaction times and committed errors are presented in Table 1. As expected, the presence of temporal and valid spatial cues led to faster reaction times while invalid cues and incongruent flankers led to slower responses. The same pattern was also visible in the error rates, except for the alerting contrast where the difference was not statistically significant.
As expected, behavioral indices for the major attention contrasts (alerting, validity, control) where not correlated (see Table 2). Significant correlations were only obtained for contrasts that shared a reference condition.
In sum, we replicated previous findings on the behavioral independence and interactions of attention networks (Fan et al., 2009;Xuan et al., 2016).  Table T1 and Figure S1. The springgreen cluster on the very right of Figure 2 represents the larger somatomotor network (SMN): ICN #13 ("mouth" network) as well as #15 and #21

| Intrinsic connectivity networks
("hand" network). ICN #23 corresponds to the auditory network. dorsolateral prefrontal cortex. We will therefore use the label executive control network when referring to these three components.

| Spatial regression
We fitted five linear models that regressed the spatial distribution of task-evoked activity in the five attention contrasts onto the 23 ICN.
The 23 predictor variables did not show any signs of collinearity (all condition indices <1.61, all variance decomposition proportions <.5, see Figure S2). The adjusted coefficients of determination (R 2 ) of the regression models indicated that the spatial brain-wide topology of

| Overlap between attentional networks
The spatial regression analysis pointed at seven ICN components that

| Alternative parcellation
We used our own ICA-based parcellation for the assessment of spatial represented at the network level, we reconstructed 23 independent components from high-resolution resting-state data and utilized a spatial regression approach (Gordon et al., 2012) to study the topological correspondence between ICNs and the activation of the attention networks. We did not find evidence for a dissociation at the intrinsic network level: If an ICN increased its activation during one type of attention, it was also more likely to activate during other types of attention. We also did not find a clear correspondence between attention networks and single ICNs. Instead, we observed that each attention system activates components in multiple ICNs, that the majority (around) 87% of all components contribute to at least

| Attentional networks: the extended frontoparietal network hypothesis
It has been suggested that the fronto-parietal network underlies attention (Toro et al., 2008), which is supported by correlations between network properties of the fronto-parietal network and behavioral indices of attention (Markett et al., 2014;Visintin et al., 2015). Accordingly, Xuan et al. (2016) have argued that the three attention networks activate different parts of an extended frontoparietal network. The fronto-parietal network described in the original reports is a larger network and hierarchically organized into separable networks: the ventral attention, dorsal attention, and fronto-parietal control network (Fox, Corbetta, Snyder, Vincent, & Raichle, 2006;Power et al., 2011;Thomas Yeo et al., 2011;Vincent, Kahn, Snyder, Raichle, & Buckner, 2008). The ventral attention network is thought to support stimulus-driven bottom-up attention which is conceptually similar to alerting, the dorsal attention network is thought to support top-down attention which is conceptually similar to orienting, and the control network is thought to underlie executive functioning and cognitive control which is conceptually similar to the attention control system (Vincent et al., 2008;Vossel, Geng, & Fink, 2014). While our network partition did find three larger networks that involved lateral frontal and posterior parietal cortex, they do not fully match to the three described networks. Our dorsal fronto-parietal network (IC #14 and #18) corresponds well to the dorsal attention network. Our executive control network (IC #16, #20, and #10) includes dominantly dorsolateral and medial prefrontal cortex and the anterior cingulate and matches the description of the fronto-parietal control network. Our third fronto-parietal network (IC #3, #5, and #22), however, matches only partly the description of the ventral attention network. Our network included ventro-and dorsolateral, orbitofrontal and frontopolar cortex, as well as inferior parietal and lateral temporal cortex, but did not include the temporo-parietal junction which represents the major posterior hub in this network Fox et al., 2006;Vossel et al., 2014). In addition to the less optimal correspondence of our fronto-parietal networks with the three previously described networks, we did not alerting. We therefore conclude that an "extended fronto-parietal network" does not capture the nature of the three attention networks well. Rather, we see major overlap of the three attention systems within the dorsal fronto-parietal network and parts of the insular midcingular network. If we would define the "extended fronto-parietal network" to include these two network components, we would capture the ICN underlying attention, however, we cannot then conclude that the attention systems dissociate within the "extended frontoparietal network" but that the "extended fronto-parietal network" is the attention network.

| Orienting: the dorsal and ventral attention network hypothesis
The dorsal and ventral attention network have been proposed as two anatomically and functionally distinct networks (Corbetta & Shulman, 2002). In an attempt to incorporate the dorsal and ventral attention networks into attention network theory, the two networks have been equated to the orienting network (Petersen & Posner, 2012). We will first discuss the representation of the dorsal and ventral attention networks in our ICN partition before discussing their activation by the ANT.
The ventral attention network was initially described as rightlateralized but later work suggests similar organization in the left hemisphere (Vossel et al., 2014). The ventral attention network features in prominent atlases of canonical ICN (Power et al., 2011;Thomas Yeo et al., 2011) but the labeling as "ventral attention" has been contested. Others have used the labels "salience network" (Seeley et al., 2007) or "cingulo-opercular network" (Dosenbach, Fair, Cohen, Schlaggar, & Petersen, 2008) to refer to a network with similar anatomy and function. The label ventral attention network has also been used to describe a left-lateralized network whose implicated brain regions are more suggestive of an involvement in language (Ji et al., 2019;Power et al., 2011). To assess the correspondence between ICN and the orienting network, we followed previous recommendations and distinguished between different orienting effects: neural activity associated with the disengaging from an invalid spatial cue, the moving and subsequent engaging of the attentional focus to a validly cued spatial location, and the combination of the two (the validity effect) which corresponds to previously described orienting contrasts (Fan et al., 2009;Xuan et al., 2016). Our results confirm the involvement of the dorsal fronto-parietal and midcingulo-insular network in orienting. We also observed contributions from a component that we classified as part of an executive control network. This component, however, included many cortical regions that have been 4.3 | Attentional control: the fronto-parietal cingulo-opercular hypothesis Attention network theory assumes an attention control system that is involved in the detection of targets for focal and conscious processing (Posner & Petersen, 1990), guided and controlled visual search (Posner & Dehaene, 1994), and the selection of relevant over distracting information . Ongoing control involves the maintenance of task-sets that set the context for moment-to-moment adjustments of cognitive processing: Previous studies indicate that set-maintenance is supported by the cingulo-opercular network and adjustments of the attentional focus is carried out by the frontoparietal network (Dosenbach et al., 2007(Dosenbach et al., , 2008. This has led to the proposal that the attention control network relies on these two separate ICN: A fronto-parietal network that is distinct from the dorsal attention network and the cingulo-opercular network that we labeled midcingulo-insular network (Petersen & Posner, 2012 unequivocally ascribed to the attention control system (Fan et al., 2005;Fan & Posner, 2004;Petersen & Posner, 2012). While our present results are thus consistent with previous findings, they cast doubt that attentional control relies solely on a fronto-parietal network distinct from the dorsal network and the midcingulo-insular network. Rather, attentional control seems to be implemented by the dorsal attention network with additional contribution from lateral prefrontal and anterior cingulate cortex. We also observed strong deactivations of the default mode network during attentional control. The traditional view of the default mode network is that of a task-negative network that stands in an antagonistic relationship with frontoparietal networks and de-activates unspecifically in demanding tasks (Fox, Zhang, Snyder, & Raichle, 2009;Raichle et al., 2001;Shulman et al., 1997). Newer evidence, however, suggests that deactivations in the default mode network encode spatial vision (Szinte & Knapen, 2020) which opens the possibility that the default mode network plays a more direct role in visual attention than expected. Future work is needed to address this hypothesis, but for now we content that the default mode network also contributes to the attentional control network.

| Alerting and the midcingulo-insular network
Alerting refers to a state of increased sensitivity to incoming stimuli (Posner, 2008). In addition to tonic alertness as a self-initiated state of sustained vigilance, phasic alertness can use external cues to temporarily increase vigilance in anticipation of upcoming information. While both types of alertness are thought to be realized by the same alerting network, the typical alerting contrast in the ANT uses temporal cues to induce a state of alertness and thus taps primarily into the phasic component. The midcingulo-insular network has been shown to increase its activity and functional connectivity in a task that required tonic alertness (Sadaghiani & D'Esposito, 2014) and increased prestimulus activity in the midcingulo-insular network leads to faster responses to unpredictable stimuli (Coste & Kleinschmidt, 2016). The midcingulo-insular network also activates in reaction to rare oddball stimuli, which implies a similar involvement in phasic alerting (Kim, 2014). We subsumed three distinct components under the midcingulo-insular network and found one of these components to be strongly involved in the alerting contrast, supporting these previous observations. But while we do find evidence for the alertingmidcingulo-insular network hypothesis, we cannot conclude that this network is specifically involved in the alerting component of attention. The ICN was consistently recruited by all attention networks and we found widespread activation overlap in the premotor part of the ICN (see Figure 5). The other two components of the midcinguloinsular network, encompassing either lateral frontal, inferior parietal, and the tempero-parietal junction or superior parietal and midcingulate cortices did not contribute to alerting. Additionally to the midcingulo-opercular network we found that alerting activated the dorsal attention network and deactivated a fronto-parietal component including dorsolateral prefrontal and inferior parietal cortex, as well as the default mode network. As much as we confirm the role of the midcingulo-insular network in alerting, we neither found a one-to-one correspondence between this ICN and the alerting network nor did we find evidence for a specific relationship between the midcinguloinsular network and alerting in the context of other attention systems.

| Overlap between attention networks
We found several ICN components that were involved in all three attention networks. While different brain regions within ICN also tend to co-activate together during tasks (Smith et al., 2009), there would still be chance that the three attention networks proposed by attention network theory dissociate within a given ICN component. But to the contrary, we found major overlap of the attention systems in posterior parietal cortex along the intraparietal sulcus (the dorsal frontoparietal network) and in premotor cortex (dorsal fronto-parietal and the midcingulo-insular network). Overlapping activations in premotor cortex occurred in three distinct regions: the bilateral frontal eye field, the bilateral area 6v, and the bilateral premotor eye fields. Covert spatial attention, that is, the adjustment of the attentional focus in the absence of overt eye movements, has been tightly linked to the premotor cortex (Moore, Armstrong, & Fallah, 2003;Rizzolatti, Riggio, Dascola, & Umiltá, 1987) and the frontal eye fields have been identified as the neural origin of the "attentional spotlight" that modulates activity in visual areas (Thompson, 2005). The premotor eye field has also been linked to saccadic eye movements, but also to attention, 6v is an area in superior premotor cortex adjacent to the frontal eye field and delineates from the frontal eye field regarding its myelin content and its response profile to different tasks (Glasser et al., 2016).
We found area 6v involved in the midcingulo-insular network while the frontal and premotor eye fields belonged to the dorsal frontoparietal network. The frontal eye fields and surrounding areas have been previously associated with different attention networks (Xuan et al., 2016) but more work is needed to directly contrast the role of the frontal and premotor eye fields and area 6v within attention networks. We believe that the regions play a role in covert spatial attention by adjusting the attentional focus, irrespective of whether it is moved in space, activated in preparation of upcoming stimuli, or tuned to select relevant over irrelevant information. We also found that visual ICNs contributed to all attention contrasts. While visual activations have been reported previously with the ANT (Xuan et al., 2016), attention network theory dissociates the attention networks from stimulus processing areas (Petersen & Posner, 2012). We presume that visual activations are most likely the result of top down modulation by the attention networks; attentional modulation, for instance, has been described as early as area V1 (Luck et al., 1997). However, we cannot rule out that eye movements in reaction to the cues have led to some spurious activations in visual areas. Future work with a modified ANT is needed to dissociate the source and target of attentional modulation in early visual areas.

| Methodological considerations
In the following, we are going to address methodological aspects regarding the definition of attention networks, our resting-state decomposition into ICNs, and the spatial regression approach.
We defined the attention networks as the set of activated grayordinates in different contrasts in the ANT, the standard protocol proposed by the authors of attention network theory (Fan et al., 2009;Fan & Posner, 2004). This decision was motivated by previous work on attention networks Xuan et al., 2016).
Defining a network solely on task co-activations, however, is not without criticism. The term "network" is commonly used to describe a set of network nodes including their mutual relationships (Albert & Barabási, 2002). By simply focusing on task-evoked activations we thus omitted any information on functional interactions within the attention network. Work on task-evoked whole-brain functional connectivity changes suggests that task-activations and task-connectivity carry different information (Gerchen & Kirsch, 2017) and taskconnectivity can point toward important network nodes that do not show strong activation changes between task conditions (Markett, Jawinski, Kirsch, & Gerchen, 2020). While the current operationalization of attention networks is thus consistent with previous work and aids interpretability in the context of previous findings, future work will want to utilize methods that aim at functional connectivity to map attention networks in more detail. When combined with analytic approaches from network science, such approach can also highlight different roles of brain area in the context of distributed systems (Zink, Lenartowicz, & Markett, 2021). While we kept the definition of attention networks consistent with previous work, we applied a slightly modified statistical model than Xuan et al. (2016) and did not separate the cue from the target stage in separate regressors. The reason for this was the short onset asynchrony between cues and targets, which is common in the ANT. Despite this difference, our model was able to reproduce the activations for the three main attention networks as described in the literature.  (Smith et al., 2011), allows for an automated optimization of the model order parameter (Beckmann & Smith, 2004), and most importantly, operates on the grayordinate-level which makes a direct comparison of ICN maps and task-activation maps straightforward. Nevertheless, it needs to be pointed out that the exact parameter has a major impact on the results of the spatial regression analysis. We therefore repeated the analysis with a published ICN partition that follows a similar hierarchical network structure (Thomas Yeo et al., 2011). This analysis yielded similar results and supports the main conclusions.
Despite all progress in network neuroscience, the field has yet to agree on a comprehensive list of ICN and their names (Uddin et al., 2019). To a certain extent, the apparent differences between studies might arise from the rather indirect approach to neural activity inherent to functional neuroimaging, and from parameter choices for clustering and community detection. But more importantly, they can also reflect the hierarchical structure of functional interactions in the brain where larger networks delineate into several smaller networks at higher resolution levels (Betzel et al., 2013;Hilgetag & Goulas, 2020;Meunier, Lambiotte, & Bullmore, 2010). The hierarchical nature of ICN was also reflected in our present ICN partition. We found our reconstructed 23 signal components to correspond to nine larger ICN that have all been described in the literature. Importantly, we observed clear sensory (auditory and visual) and motor networks, which is an essential criterion for a valid network parcellation. Independent component analyses allow the different components to overlap.
We probed the relationship between ICNs and the attention network maps through a spatial regression approach as described previously (Gordon et al., 2012). Since we were interested in the spatial covariation of signals across the entire brain, which is expressed in single statistical parameters, no adjustment of the task activation and IC maps for multiple comparison was required and we submitted unthresholded maps to the regression analyses. By analyzing unthresholded maps, we also made use of the full set of grayordinates and included the full range of grayordinate loadings which aids the interpretation of the spatial regressions' beta weights (positive signs indicate recruitment, negative signs indicate suppression). We verified the absence of multicollinearity between IC components, which is not only a prerequisite for the spatial regression analysis but also a confirmation that our ICA-approach was successful in yielding spatially independent components. It needs to be noted, however, that the present approach assumes static ICNs that persist across task and resting states and are invariant across participants.
While these assumptions hold at large (Cole et al., 2014;Smith et al., 2009), there is still ample evidence for subtle yet reliable variation in network structure across tasks, time, and individuals (Cole et al., 2014;Muldoon & Bassett, 2015;Seitzman et al., 2019). We hope that the present comparison between attention networks and the intrinsic network architecture will stipulate more research into the network-level representation of attention that will extend the current focus to temporal dynamics and individual differences.
Unfortunately, we did not have the technical equipment to record eye gaze data, which is a shortcoming of the present work that needs to be mentioned. Participants were instructed to maintain fixation throughout each trial, to encourage covert shifts of attention. Stimulus display and timing did not require eye movements, but without eye tracking data, there is no direct way to confirm that all participants followed this instruction at all times. While it seems to be possible to infer gaze location from functional MRI directly, relevant software tools had not been publicly available yet (Frey, Nau, & Doeller, 2021).

| Conclusions regarding attention network theory
While we found a good overall correspondence between the attention network maps and the brain's intrinsic connectivity architecture, we did not find unique relationships between any attention network maps and single ICN, challenging most previous conjectures on the representation of attention at the network level. Each attention contrasts activated several ICN, and we found that all attention networks converged within the dorsal fronto-parietal and midcingulo-opercular network, pointing toward a shared neural resource between the different attention networks. Given that interactions and spatial overlap between attention networks have been described previously (Xuan et al., 2016), we argue to reconsider the notion of separable and independent attention networks. Instead, we propose that attention is supported by a distributed network in which different subroutines of attention (alerting, orienting, and control) segregate into different subnetworks and are integrated by hubs in the dorsal fronto-parietal and midcingulo-insular network. While this proposal requires further empirical investigations, it would be well in line with several discoveries regarding the network-level representation of cognitive control and higher cognition (Braun et al., 2015;Cohen & D'Esposito, 2016;Cohen, Gallen, Jacobs, Lee, & D'Esposito, 2014;Cole et al., 2013;Zink et al., 2021). At the same time, we propose to reconsider terminology: Using the term "network" for the distributed patterns of task-evoked activations and for distributed patterns of intrinsically generated functional connectivity alike suggests too much of a conceptual equivalence which is not supported by the data.

ACKNOWLEDGMENT
Open Access funding enabled and organized by Projekt DEAL.