Methods for mapping 3D chromosome architecture

Determining how chromosomes are positioned and folded within the nucleus is critical to understanding the role of chromatin topology in gene regulation. Several methods are available for studying chromosome architecture, each with different strengths and limitations. Established imaging approaches and proximity ligation-based chromosome conformation capture (3C) techniques (such as DNA-FISH and Hi-C, respectively) have revealed the existence of chromosome territories, functional nuclear landmarks (such as splicing speckles and the nuclear lamina) and topologically associating domains. Improvements to these methods and the recent development of ligation-free approaches, including GAM, SPRITE and ChIA-Drop, are now helping to uncover new aspects of 3D genome topology that confirm the nucleus to be a complex, highly organized organelle. How chromosomes are positioned and folded within the nucleus has implications for gene regulation. In this Review, Kempfer and Pombo describe and evaluate methods for studying chromosome architecture and outline the insights they are providing about nuclear organization.

The nucleus of human cells harbours 46 densely packed chromosomes. Chromosomes are folded into hierarchical domains at different genomic scales, which likely enable efficient packaging and organize the genome into functional compartments. Chromosomes occupy distinct positions within the nucleus, called chromosome territories, which are partitioned into chromosomal compartments, and further partitioned into topologically associating domains (TADs) and chromatin loops mediated by CCCTC-binding factor (CTCF) or enhancer-promoter contacts (Fig. 1). Chromatin folding is a major feature of gene regulation and dynamically changes in development and disease [1][2][3][4] . Transcriptional control is mediated through physical contacts between enhancers and target genes, which occur via loop formation between the respective DNA elements. Functional loops between regulatory regions and genes are thought to occur predominantly within TADs. The expression of genes can also be influenced by their positioning relative to spatial landmarks inside the nucleus that are enriched for specific biochemical activities, such as the nuclear lamina. The disruption of enhancer-gene contacts and alteration of nuclear subcompartments play important roles in disease, including congenital disorders and cancer. Importantly, many disease-associated mutations of the linear genomic sequence can only be understood by considering their 3D conformation in nuclear space.
Advances in our understanding of chromosome folding have been restricted by a lack of approaches that can map chromatin contacts genome-wide while simultaneously retrieving spatial information, such as molecular distances between different genomic regions or between genomic regions and distinct nuclear compartments. Until recently, studies of 3D genome folding were limited to two main technologies: imaging, particularly fluorescence in situ hybridization of DNA (DNA-FISH); and approaches based on chromosome conformation capture (3C), namely Hi-C (high-throughput chromosome conformation capture). DNA-FISH was a revolutionary approach, which allowed visualization of the spatial organization of chromosomes and genes in the nucleus 5,6 . The approach provides single-cell information, but typically has a limited throughput that allows only a small number of genomic loci to be analysed at a time. 3C-based approaches, which depend on proximity ligation of DNA ends involved in a chromatin contact, have helped identify enhancer-promoter contacts. Highthroughput derivatives, such as Hi-C, map chromatin contacts genome-wide at a length scale of hundreds of kilobases to a few megabases.
More recently, improvements in imaging techniques have increased the number of loci that can be analysed in parallel 7 and have extended the approach to live cells 8,9 . Orthogonal ligation-free approaches have also emerged, namely genome architecture mapping (GAM) 10 , splitpool recognition of interactions by tag extension (SPRITE) 11 and chromatin-interaction analysis via droplet-based and barcode-linked sequencing (ChIA-Drop) 12 , which have started to reveal novel aspects of chromatin organization. GAM, SPRITE and ChIA-Drop map chromatin contacts genome-wide and identify topological domains but also robustly detect a previously unappreciated level of high-complexity chromatin contacts that involve three or more DNA fragments and uncover specific contacts that span tens of megabases.

Chromosome territories
The nuclear volumes occupied by each specific chromosome. Chromosomes tend to interact predominantly within themselves and occupy distinct regions within the interphase nucleus.
Here, we review the main approaches currently used in 3D genome research, highlighting their major advantages and caveats. To recognize the strengths of each technique, it is important to understand the principles and experimental details underlying each method, their intrinsic biases and their power to capture specific aspects of 3D genome architecture (TABle 1). We discuss major features of 3D genome organization that have emerged, at the kilobase scale and above, through the application of these different  technologies, and highlight discrepancies between approaches. We will not cover chromatin folding at the level of nucleosomes, which has been reviewed previously 13 .

Imaging-based detection of contacts
The visualization of nuclear structures and specific genomic sequences is key to understanding how chromatin is organized in the nucleus. Various light microscopy and electron microscopy techniques can be used to identify nuclear compartments or image the physical positions of specific genomic loci in the nucleus of fixed or live cells. The most commonly used imaging technique for detecting chromatin contacts in fixed cells is DNA-FISH. Contacts can be visualized in live cells using insertions of DNA binding site arrays (such as the Lac operator-repressor 14,15 , Tet operator-repressor 16 Fig. 1 | Methods for studying the major features of 3D chromatin folding across different genomic scales. a | Chromosomes occupy discrete territories in the nucleus, which were first detected using imaging techniques. The 3D-fluorescence in situ hybridization (3D-FISH) image shows the positions of the chromosome territories of chromosome 2 (red) and chromosome 9 (green) within DAPI-stained nuclei (blue) from mouse embryonic stem cells (ESCs). Chromosome territories are also detected as regions of high-frequency intrachromosomal interactions on contact maps generated by chromosome conformation capture (3C)-based methods (such as Hi-C (high-throughput chromosome conformation capture)) and ligation-free approaches (such as genome architecture mapping (GAM)). b | DNA inside the nucleus separates into hubs of active (A compartment) and inactive (B compartment) chromatin, clustering around the nucleolus, splicing speckles, transcription factories and other nuclear bodies not represented here. Electron spectroscopy imaging of the mouse epiblast shows the distribution of heterochromatin (yellow) around the nucleolus (light blue) and at the nuclear periphery. Decondensed euchromatin (dark blue) is positioned more centrally in the nucleus. Nucleic acid-based structures are stained yellow , protein-based structures blue. Hi-C and split-pool recognition of interactions by tag extension (SPRITE) contact maps of mouse chromosome 11 show the separation of chromatin into discrete contact hubs (A and B compartments), which are visible as checkerboard-like contact patterns. c | At shorter genomic length scales, chromatin folds into topologically associating domains (TADs), which overlap with domains of early and late replication, and DNA loops, that arise from cohesin-mediated interactions between paired CCCTC-binding factor (CTCF) proteins. Multiplexed FISH of consecutive DNA segments in a 2-Mb region in the human genome shows the emergence of TADs in the population-average distance map. In Hi-C and GAM contact maps, TADs are represented by regions of high internal interaction frequencies and demarcated by a drop in local interactions at their boundaries. d | Contacts between a gene and its cis-regulatory elements occur via loop formation between the enhancer bound by RNA polymerase II (Pol II) and the gene promoter. These contacts can be detected by live-cell imaging; shown are contacts between the enhancer (green) and promoter (blue) of the eve gene in a Drosophila melanogaster embryo, with simultaneous imaging of eve mRNA expression (red). The circular chromosome conformation capture (4C)-sequencing track shows the interactions between the Shh gene promoter and the ZRS (a limb-specific enhancer of the Shh gene) in the anterior forelimb in mice. GAM data can be processed using the mathematic model statistical inference of co-segregation (SLICE) to extract the most significant enhancer-promoter contacts from the data set, resulting in a contact matrix with only the high-probability interactions 10 . The most significant interaction at the Sox2 locus can be found between the Sox2 gene and one of its well-studied enhancers 189   FISH long-range contacts within large genomic regions, such as between TADs 10,24 or in whole chromosomes 25 , can be accurately detected. However, short-range interactions between chromosomal regions that are less than 100 kb apart are difficult to detect, making it harder to quantify fine-scale chromatin folding below the TAD level, such as enhancer-promoter interactions. High-resolution imaging of chromatin contacts can be achieved using cryo-FISH, in which standard FISH probes are hybridized to thin (~100-200 nm) cryosections from cells fixed using conditions optimized to preserve the nuclear ultrastructure; the signal is then visualized using fluorescence or electron microscopy 10,19,[25][26][27] . More recently, the short length and high specificity of fluorophore-tagged oligonucleotides known as Oligopaints 28 have made it possible to target 15-kb loci using conventional microscopy 29 or 5-kb regions using super-resolution microscopy (when combined with a second labelling step to enhance the fluo rescence signal) 30 . Oligopaints are not derived from cloned genomic regions but are instead generated from synthetic libraries of short (~60-100 bp) oligonucleotides, which are produced by massively parallel synthesis 31 . Once generated, the library pool can be amplified in a flexible manner, using different primer pairs to give rise to different sets of FISH probes. The ease of design of Oligopaints has opened new possibilities for the study of chromatin folding, such as being able to visualize chromatin in different epigenetic states at a resolution of tens of nanometres 22 . Oligopaintbased FISH has also been used in combination with high-throughput imaging to generate low-resolution contact maps (for example, at the TAD level) of whole chromosomes 7 and high-resolution (30-kb) contact maps for stretches of DNA 1.2-2.5 Mb in length 32 . In addition, molecular beacon FISH probes have emerged as a way to target genomic regions as short as 2.5 kb (reF. 33 ). In an unbound state, these probes form a hairpin loop that minimizes the off-target fluorescent signal by bringing together the fluorescent label and a quencher.

CCCTC-binding factor
(CTCF). A transcription factor with 11 conserved zinc-finger (ZF) domains. This nuclear protein is able to use different combinations of the ZF domains to bind different DNA target sequences and proteins. CTCF is enriched at topologically associating domain (TAD) borders, where its binding can be important to specify TAD border definition.

Chromatin
The combination of DNA, rNA and protein that constitutes the chromosomes in eukaryotic cells. Broadly, heterochromatin is associated with transcriptional repression and euchromatin is associated with transcriptional activity. To determine the specificity of a contact, spatial distances between interacting loci should be compared with distances between non-interacting loci in numerous cells. The distribution of distances, the mean distance and the median distance can all inform about the quality and abundance of the contact in a cell population.
By reducing the background signal from the unbound probe, the technique improves the visualization of small genomic regions.

Live-cell imaging of nuclear structures.
Chromosome folding is a highly dynamic process that varies greatly throughout the cell cycle 34,35 . Our ability to study these chromatin dynamics has been revolutionized by technologies based on genome editing that allow specific genomic loci to be targeted in live cells. Early iterations of this approach were rather laborious; cell lines needed to be created in which the target locus was tagged with DNA binding site arrays that recruit a fluorescently tagged cognate DNA binding protein (such as the Lac operator-repressor 32,33 , Tet operatorrepressor 34 and ANCHOR 35 systems). Now, loci can be targeted in live cells with a version of the CRISPR system that uses an endonuclease-deficient form of Cas9 (dead-Cas9 (dCas9)) fused with a fluorescent protein 36 . The tagged dCas9 is recruited to the genomic locus of interest via its interactions with sequence-specific small guide RNAs (Fig. 2). For simultaneous labelling of two genomic regions, small guide RNAs can be differentially modified to act as scaffolds that bring fluorescent proteins to the target loci. For example, fusion proteins that comprise a fluorescent protein and either tandem dimer MS2 coat-binding protein (tdMCP) or tandem dimer PP7 coat-binding protein (tdPCP) can be directed to target loci by guide RNAs containing MS2 or PP7 aptamers, respectively. As both proteins have a comparably high exchange rate, which compensates for photobleaching, this approach is also well suited to long-term live-cell imaging [37][38][39] . However, most CRISPRbased methods are currently limited to the detection of repetitive sequences because they rely on a single species of guide RNA, which hybridizes to identical genomics sequences, to direct simultaneous binding of dozens of copies of the fluorescent protein to achieve a strong fluorescent signal. A notable exception is the chimeric array of gRNA oligonucleotides (CARGO); by delivering 12 different guide RNAs into a single cell, this technique was able to efficiently label a non-repetitive 2-kb genomic region 40 .
Ligation-based detection of contacts 3C-based methods extract chromatin interaction frequencies between genomic loci via chromatin crosslinking and proximity ligation (Fig. 3). Following formaldehyde fixation to capture protein-mediated and RNA-mediated contacts, chromatin is fragmented using a restriction enzyme, and the crosslinked restriction fragments are ligated 41 . The purified ligation fragments are called a 3C library. The ligation frequency between two loci of interest can be quantified by PCR using appropriate primer pairs. Thus, 3C focuses on interactions between two loci ('one versus one') and requires prior knowledge of the targets of interest. However, the 3C library contains all ligation products for the genome investigated and the 3C workflow can therefore be adapted to enable genome-wide analysis of chromatin contacts. Chromosome conformation capture-on-chip 27 or circular chromosome conformation capture 42 , both called 4C, enrich for interactions of one region with the remaining genome ('one versus all'). Chromosome conformation capture carbon copy (5C) 43 captures contacts of a larger genomic stretch at high resolution ('many versus many'). Finally, Hi-C 44 captures all ligation events across the entire genome ('all versus all'). Workflows and differences between these techniques have been described elsewhere in great detail 45 . Here, we focus on the most commonly used versions ( Fig. 3; TABle 1).

Mapping all contacts at a single locus with 4C.
A straightforward and cost-effective method to obtain additional information from a 3C library is 4C. Here, primers for a region of interest (such as a promoter) are used to amplify all ligation partners of the locus under investigation (called the 'viewpoint') (Fig. 3). The amplified ligation products are sequenced (to a depth of 1-5 million reads per library 46 ) and used to analyse genome-wide interaction partners of the region of interest at a resolution of a few kilobases. 4C has been widely used to investigate cis-regulatory landscapes of genes, especially in development and disease 47 . It is well suited for detecting short-range regulatory interactions 48 , but has also been applied to detect contacts spanning long genomic distances, including whole chromosomes 27,49 .

Mapping all contacts occurring within a large genomic region with 5C.
In 5C, large genomic regions spanning up to several megabases are amplified from the 3C library using an elegant, yet complex, mix of forward and reverse primers. For example, 5C analysis of a 4.5-Mb chromosomal region around the Xist gene revealed the presence of TADs 24 . 5C has the advantage of producing high-resolution data at an affordable sequencing depth (~60 million reads per library to obtain resolution of 15-20 kb for a 1-Mb region) 50 . However, the resolution of 5C is dependent on the ability to design forward and reverse primers for all possible restriction fragments across a given locus; in the absence of appropriate primers, some mappable fragments will be excluded from the contact map.
Mapping all contacts at one or more loci with capturebased methods. A 3C library can be enriched for one or more genomic targets of interest using capture-based methods, such as Capture-C 51 , Capture Hi-C 52 and CAPTURE 53 . In these approaches, biotinylated oligonucleotides complementary to a genomic region of interest are used to pull-down specific ligation products from the library, which are then amplified and sequenced. These approaches can be used to detect interactions of one viewpoint but also of entire genomic regions 47 or groups of targets 54,55 .

Mapping all genome-wide contacts with Hi-C and its derivatives.
Hi-C is the most commonly used genomewide approach to map chromatin contacts from a 3C chromatin preparation 44 . In this approach, the ends of crosslinked DNA restriction fragments are labelled with biotin and then ligated. After ligation, the exonuclease activity of T4 DNA polymerase is used to remove the biotin label from the ends of unligated fragments. Ligated

Nuclear lamina
A protein mesh, consisting of lamins and other membraneassociated proteins, at the inner nuclear membrane that contributes to nuclear structure and function. Chromatin in the proximity of the lamina tends to be heterochromatic and transcriptionally repressed.

Fluorescence in situ hybridization
A technique that can be used to visualize the location of nucleic acid sequences within the nucleus using sequencespecific fluorescent probes that hybridize to the regions of interest, combined with microscopy.
Chromosome conformation capture (3C). A technique used to detect the frequency of interactions between any specified two loci in the genome. interactions between loci are captured by formaldehyde fixation, followed by restriction enzyme digestion and ligation. The frequencies of interactions between loci are determined by quantitative real-time PCr.
Hi-C (High-throughput chromosome conformation capture). A genome-wide version of chromosome conformation capture that allows all chromatin interactions in the genome to be mapped simultaneously. The frequencies of interactions between loci are determined by paired end sequencing.

Proximity ligation
Fixation of cells, followed by fragmentation of chromatin and ligation of nearby, crosslinked DNA fragments.  Chromosome conformation capture (3C)-based assays measure contact frequencies of pairs of DNA loci by proximity ligation of crosslinked and fragmented chromatin. All 3C-based assays involve fixation of the chromatin, isolation of nuclei and DNA fragmentation (for example, with a restriction enzyme). The obtained crosslinked chromatin fragments are then processed for 3C, circular chromosome conformation capture (4C) or chromosome conformation capture carbon copy (5C), which map chromatin contacts for preselected regions, or for genome-wide assays, such as high-throughput chromosome conformation capture (Hi-C) and proximity ligation-assisted chromatin immunoprecipitation sequencing (PL AC-seq). In 3C, 4C and 5C, the crosslinked chromatin fragments are ligated and the DNA is purified. In 3C, the interactions between two chosen genomic regions are detected by PCR amplification with primers specific to the two regions of interest. PCR products are analysed semi-quantitatively on an agarose gel or by real-time quantitative PCR . Interactions are defined by higher ligation frequencies compared with control regions of similar genomic distance. In 4C, interactions of one viewpoint with the whole genome are measured. The ligated and purified DNA is fractionated with a secondary restriction digest, and the digested, smaller DNA fragments are circularized and amplified with primers facing outwards from the viewpoint. The PCR products are sequenced by paired end sequencing, providing the sequence information and frequency of every chromatin contact of the viewpoint. In 5C, the ligated and purified DNA is directly amplified using primers for all restriction fragments within a consecutive genomic region, usually hundreds of kilobases up to several megabases. The PCR products are sequenced and provide information about the ligation frequencies of all fragments within the region of interest. In Hi-C and PL AC-seq, digested DNA fragments are labelled with biotin, ligated and then fragmented further by sonication. In PL AC-seq, DNA fragments bound to a protein of interest are pulled-down by immunoprecipitation. Then, in PL AC-seq and Hi-C, the DNA is purified, biotinylated nucleotides are removed from unligated fragment ends and all ligated DNA fragments are pulled-down with streptavidin beads. After pull-down, DNA fragments are sequenced and provide information about the interaction frequencies of all pairs of loci in the genome (Hi-C) or the interactions mediated by a protein of interest (PL AC-seq).

Genome architecture mapping
fragments, which retain the biotin label, are enriched using streptavidin beads to minimize the number of unligated DNA molecules in the sequencing library. Depending on the enrichment efficiency, about 50-70% of sequencing reads map to pairs of ligated restriction fragments in Hi-C libraries 56 . In tethered chromosome capture (TCC) 57 , an early modification of Hi-C, the detection of unspecific ligation events between non-crosslinked material is minimized by tethering the crosslinked and biotinylated chromatin to streptavidin beads before ligation. This approach detects more long-range intrachromosomal contacts and contacts between chromosomes than standard 3C-technologies 57 . By contrast, genome conformation capture (GCC) 58 , an approach developed at the same time as Hi-C, sequences all DNA present in the 3C library, without preselection of ligated fragments. Although currently much more expensive, especially for large genomes, GCC has the advantage of allowing direct normalization of DNA abundance, thereby controlling for biases in sequencing and for the presence of genomic alterations, such as copy number variations. Methods for detection and normalization of copy number variations have also recently been developed for Hi-C [59][60][61] .
Many other variants of genome-wide 3C-methods have been reported, ranging from technical optimizations of the original Hi-C protocol (such as DNase Hi-C 62,63 and in situ Hi-C 64 ) and advances to improve resolution (such as Micro-C) [65][66][67] , to protocols based on the enrichment of contacts mediated by specific proteins or open chromatin regions (open chromatin enrichment and network Hi-C (OCEAN-C) 68 ). Currently, the most commonly used version is in situ Hi-C. In the original Hi-C protocol, sodium dodecyl sulfate (SDS) is used to disrupt the nuclear membrane and ligation of crosslinked DNA therefore occurs partially in solution. In situ Hi-C omits this SDS step, allowing ligation of chromatin fragments within the presumably more native environment of the intact nucleus. As a result, the number of random ligation events is reduced and signal-to-noise ratios are improved, thereby reducing the sequencing depth and enabling higher-resolution contact maps. However, detailed analyses of the nuclear fragments that contribute to contacts in the original version of Hi-C showed that large portions of the chromatin were thought to remain inside the partially digested nucleus during ligation 69 . Nonetheless, the in situ Hi-C protocol is faster and easier than the original version 64 , mainly because it does not require extensive dilution of the crosslinked chromatin prior to DNA ligation. Consequently, all subsequent steps can be conducted in smaller volumes, allowing more efficient ligation and DNA extraction. Easy Hi-C is another recent approach to simplify Hi-C 70 . It avoids biotin enrichment and can be used with lower cell numbers than standard Hi-C (TABle 1).

Mapping genome-wide contacts in single cells with singlecell Hi-C.
Standard Hi-C generates average contact maps from millions of cells, without any possibility to understand heterogeneity of the cell population. Singlecell Hi-C overcomes this limitation by allowing Hi-C contact maps to be produced from individual cells isolated during the process of generating Hi-C libraries 71,72 .
This approach allows rare cell types to be studied 73 and helps chromosome structures to be determined at specific stages of the cell cycle 74 . The single-cell Hi-C protocol involves in situ proximity ligation of crosslinked and digested chromatin, followed by isolation of single nuclei from the cell suspension and generation of sequencing libraries from each nucleus 71,74 . Single-cell combinatorial indexed Hi-C (sciHi-C) adopts a different approach; instead of isolating single cells, DNA within each nucleus is tagged with a unique combination of barcodes 75 . First, cells are fixed, lysed and digested with a restriction enzyme. Then, the cell suspension of digested, but intact, nuclei is split into 96-well plates, indexed with individual barcodes, pooled and split again. After several rounds of indexing, in situ proximity ligation and library preparation are performed on pooled nuclei, allowing high-throughput generation of single-cell Hi-C libraries.
One of the major challenges in single-cell Hi-C is the efficient recovery of contacts: inefficient digestion and ligation and incomplete recovery of input material result in contact maps that represent only a proportion of the contacts that may exist in a single cell. Modifications of the original protocol increased the average number of contacts detected in one cell from ten thousands up to hundreds of thousands 34,74 , but this remained a fraction (2-5%) of the possible contacts in the genome. Recently, the development of Dip-C (diploid chromatin conformation capture) has increased the number of detectable contacts to an average of 1 million per cell by omitting biotin incorporation and including a whole-genome amplification step in the protocol 76 .
Combining 3C-based approaches with chromatin immuno precipitation. 3C-based methods can be used to study chromatin contacts mediated by specific proteins, such as chromatin modifiers, architectural proteins, members of the transcription machinery or cell type-specific transcription factors. To explore contacts that coincide with chromatin occupancy of specific proteins, Hi-C libraries can be enriched by chromatin immunoprecipitation (ChIP) before ligation. Early methods, such as ChIPloop 77 and enhanced 4C-ChIP (e4C) 78 , required that chromatin be solubilized to enable specific immunoprecipitation before ligation. However, standard 3C conditions often do not fully solubilize chromatin, as nuclei stay mostly intact after SDS treatment 69 , resulting in low signal-to-noise ratios. Other approaches, such as chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), included sonication of the nuclei, as is more typically used for ChIP 79 . Although sonication allows efficient precipitation of chromatin, its influence on the outcome of the subsequent proximity ligation remains unclear. Challenges in implementing ChIA-PET have led to other strategies for combining ChIP with Hi-C, namely Hi-ChIP 80 and proximity ligation-assisted chromatin immunoprecipitation sequencing (PLAC-seq) 81 . Instead of performing protein pull-down followed by ligation of DNA fragments, Hi-ChIP and PLAC-seq perform in situ Hi-C and proximity ligation before sonication and immunoprecipitation. In this order, the ligation occurs in intact nuclei under optimal conditions, before chromatin contacts specific to the protein of interest are Split-pool recognition of interactions by tag extension (sPriTe). A ligation-free approach to detect chromatin interactions by tagging crosslinked chromatin complexes. The DNA (and rNA) molecules within an individual chromatin complex are identified after sequencing by their unique combination of barcodes that have been sequentially added using a split-pool strategy.

Chromatin immunoprecipitation
(ChiP). A method used to determine whether a given protein binds to, or is localized to, specific chromatin loci in vivo, detected after (native or crosslinked) chromatin purification and immunoprecipitation, followed by DNA detection by PCr, microarray hybridization or sequencing.
www.nature.com/nrg enriched. Regardless of these increased efficiencies, the results from immunoprecipitated 3C-libraries should be interpreted carefully because of the bias introduced by enriching for genomic regions that are bound by the protein of interest 82 .

Genomic resolution of genome-wide 3C-methods.
A major consideration for any genome-wide technique is genomic resolution. Hi-C data represent interaction frequencies between genomic regions in a contact matrix, consisting of equally sized genomic bins. The bin size (resolution) depends almost entirely on the sequencing depth. Resolutions of 30 kb or lower are often preferred to study the chromatin domain and compartments but also long-range contacts between large genomic regions (such as TADs); using standard Hi-C, this requires sequencing depths of approximately 200-400 million reads in mammalian genomes. However, billions of reads become necessary for high-resolution (1-kb) data sets of the human genome that can provide detailed insights into 3D genome topology 64 . Recently, a computational approach, HiCPlus, applied deep learning to infer highresolution contact matrices from low-resolution Hi-C data, which reduced the sequencing depth required to obtain a given resolution by a factor of 16 (reF. 83 ).

Ligation-free detection of contacts
The reliance of 3C-based approaches on the ligation of the ends of DNA fragments found in a cluster of contacts favours the detection of 'simple' chromatin contacts which involve two or a few genomic regions. This bias occurs because each DNA fragment can ligate with only one or two other fragments, so not all instances of every interaction in a complex cluster are detected 84 . Thus, the full interactome of each DNA fragment is diluted by the choice of only one or two other fragments during ligation. Recently, three ligation-free approaches have been developed for genome-wide mapping of chromatin contacts: GAM 10 , SPRITE 11 and ChIA-Drop 12 . These methods are orthogonal to ligation-based approaches and are starting to provide new insights into 3D genome topology. Other ligation-free approaches -tyramide signal amplification (TSA-seq) 85 and DNA adenine methyltransferase identification (DamID) 86-88 -map chromatin with respect to nuclear landmarks (such as the nuclear lamina or various nuclear bodies), thereby helping to define chromatin positions in 3D space.

Mapping contacts with nuclear structures with DamID and TSA-seq.
DamID is an in vivo genome-wide method for detecting interaction sites between a protein of interest and DNA. The DNA binding domain of the protein of interest (for example, RNA polymerase II (Pol II)) is fused to the DNA adenine methyltransferase (Dam) protein from Escherichia coli [86][87][88] , which specifically methylates adenines in the sequence GATC. When the fusion protein is expressed at low levels in cells, GATC sequences within or close to DNA binding sites of the protein of interest are marked by methyl ation. After DNA extraction, the methylated GATC sites are cut with a methylation-sensitive restriction enzyme and adapters are added to the restriction fragments to ensure only methylated binding sites are amplified and sequenced. In an interesting adaptation called targeted DamID (TaDa) 88 , expression of the Dam fusion protein is restricted to a specific cell type of interest, using targeted expression systems (such as the Gal4-UAS system), which allows detection of DNA-protein interactions, in a cell type-specific manner without prior isolation or sorting of cells. DamID has been successfully used to study DNA interactions with proteins such as Lamin B1, which resulted in the genome-wide mapping of lamina-associated domains and provided spatial information about chromatin with regard to the nuclear periphery 89,90 . However, interactions between chromatin and other nuclear compartments, such as splicing speckles, are not readily detected with DamID because most of the DNA surrounding these compartments does not directly bind to the tagged proteins 91 .
TSA-seq addresses this problem using tyramide signal amplification to measure the distances between chromatin and nuclear compartments 92 . In this approach, horseradish peroxidase (HRP) is conjugated to an antibody that binds to a protein specific to the nuclear compartment of interest, where it catalyses the production of biotin-conjugated tyramide free radicals, which diffuse and bind to nearby macromolecules -including DNA. Biotin-labelled DNA can be subsequently selected by biotin pull-down and sequenced to identify all genomic regions that were close enough to the protein of interest to be labelled. TSA-seq has been used to map genomewide the distances between all genes and their nearest splicing speckle 92 .
Another recent adaptation of DamID, called DamC, detects 4C-like contacts between a target region and the surrounding DNA regions, up to distances of a few hundred kilobases 93 . In DamC, Dam is fused with the reverse tetracycline receptor (rTetR), which binds to Tet operator sites inserted at the genomic region of interest. The Dam fusion protein methylates the target and its interaction partners in vivo. When combined with highthroughput sequencing, DamC reveals chromatin contacts independently of crosslinking or ligation, but unlike the other 3C-methods and ligation-free approaches it requires engineering of the cells of interest. Comparison of DamC data with 4C and Hi-C data showed high similarities at the level of TADs and CTCF loops at many genomic sites; however, some differences at loops and sub-TAD structures could also be observed 93 .
Mapping all genome-wide contacts with GAM. In GAM, nuclei are sectioned in random orientations from a popu lation of fixed and sucrose-embedded cells using ultra-thin cryosectioning (220 nm thickness). Single nuclear slices are then isolated directly from the cryosection by laser microdissection. GAM thus avoids cell extraction or sorting, both of which can disrupt cellular and nuclear structures, which can be especially important when analysing complex tissues. The DNA from every slice is extracted, whole-genome amplification is performed and indexed sequencing adapters are added before the DNA from all slices is pooled for sequencing (Fig. 4). From the sequencing data for several hundred nuclear sections, each from a single cell,

Genomic resolution
The size of the window (often in the range of kilobases) when, for most assays, reads after sequencing are mapped to the genome and then binned into equally sized genomic windows (bins).

Sequencing depth
The average number of reads representing a given nucleotide in the reconstructed sequence. A 10× sequence depth means that each nucleotide of the transcript was sequenced, on average, 10 times.

Nuclear bodies
Membrane-less compartments in the nucleus with high concentrations of DNA binding proteins, chromatin modifiers or rNAs that can be involved in shaping chromatin structure and modulating gene regulation. Nuclear bodies include the nucleolus, splicing speckles and Polycomb bodies.

NAture reviews | GENEtICS
chromatin contacts between pairs of DNA loci can be inferred by counting their co-segregation frequency (that is, how often the two loci are contained in the same nuclear sections). Genomic regions that are closer in 3D space are more frequently found in the same nuclear slice. To detect statistically significant interactions, GAM was combined with a mathematical model, statistical inference of co-segregation (SLICE) 10 . The most specific chromatin contacts detected with SLICE were found to contain active genomic regions, such as active enhancers and actively transcribed genes, with these contacts extending over megabases up to entire chromosomes 10 . SLICE separately models the random interactions that depend on genomic distance and www.nature.com/nrg the specific interactions that occur at a given physical distance (for example, below 100 nm) 10 ; it interrogates which pairs of loci co-segregate more often in the collection of slices than expected from random contacts, and quantifies the frequency of specific interaction in the cell population. GAM also allows genome-wide interactions between three or more DNA loci to be detected simultaneously, and has detected long-range contacts between TADs containing super-enhancers and highly-transcribed TADs 10 . The resolution of GAM data sets depends on the number of nuclear slices collected. With 400 nuclear slices, sequenced with ~1 million reads per slice, it was possible to achieve a resolution of 30 kb for pairwise chromatin contacts 10 , comparable with a Hi-C library with similar sequencing depth 94 . Larger GAM data sets comprising a few thousand nuclear slices will help define the maximal resolution that can be practically afforded by GAM.

Mapping all genome-wide contacts with SPRITE and
ChIA-Drop. SPRITE 11 and ChIA-Drop 12 detect chromatin interactions by tagging crosslinked chromatin complexes. Similar to 3C-based approaches, these methods rely on mild fixation and fragmentation of chromatin inside the nucleus -but unlike 3C-based approaches, they do not use proximity ligation. Instead, in SPRITE, the crosslinked chromatin fragments are split across a 96-well plate, where each well contains a unique barcode (Fig. 4). The indexed chromatin complexes are re-pooled, followed by sequential rounds of splitting, barcoding and pooling. The DNA (and RNA) molecules within an individual chromatin complex are identified after sequencing by their unique combination of barcodes added using this split-pool strategy; only DNA fragments that were crosslinked with each other will display the same combinations of barcodes. In ChIA-Drop, crosslinked and fragmented chromatin is separated into single chromatin complexes by droplet formation using a microfluidics device. Each droplet contains reagents for barcoding and amplification, and barcoded complexes are pooled and sequenced, as in SPRITE. SPRITE detects TADs and loop domains, both of which are features of Hi-C contact maps. However, SPRITE also detects additional genome-wide features of nuclear architecture, such as the association of specific genomic regions with nucleoli and splicing speckles. The predominant chromatin hubs around these nuclear bodies contain genomic regions from different chromosomes, an observation that is in agreement with single-cell imaging 95 but that had not been made using 3C-based assays. SPRITE also detects long-range contacts between regions containing active genes and super-enhancer regions that were first recognized as being multiway-specific interactions in a study using GAM 10 .

Comparing approaches
Fundamental differences exist between current approaches for mapping 3D genome folding, including how the chromatin is fixed and prepared, their power to detect multiple chromatin contacts or contacts with different spatial distances and protein occupancy, and their ability to detect long-range contacts within the same (Fig. 5) or different chromosomes. These diffe rences have sometimes led to observations that can be difficult to reconcile between the different approaches.

Fixation and chromatin preparation.
With the exception of live-cell methods (such as DAM-based and CRISPRbased approaches), all chromatin folding techniques start by crosslinking DNA-protein complexes to stabilize nuclear structures (TABle 2). Chemical fixation using formaldehyde is the most common approach for crosslinking, but concentrations, buffers and fixation times vary widely; for example, 1% formaldehyde is typically used for 3C-based methods, 4% for most DNA-FISH experiments in whole cells and 8% for GAM or cryo-FISH in nuclear slices. Other fixatives include solvent-based precipitation using ethanol, methanol or acetone. A recent imaging study compared the effects of formaldehyde fixation and cryofixation on nuclear structure using partial wave spectroscopy 96 . It revealed that weaker fixatives (such as 4% formaldehyde in PBS) introduce larger structural distortions than stronger fixatives (such as glutaraldehyde, often used for electron microscopy). However, the distinction between condensed and decondensed chromatin remains detectable at the population level 96 , which is consistent with the ability of all current chromatin folding methods to successfully map euchromatin and heterochromatin. The effect of varying crosslinking conditions (from no fixation to 5% formaldehyde fixation) has been examined in Capture-C experiments; similar short-range interactions were detected under all conditions, but formaldehyde concentrations below 2% improved the efficiency of detection 97 . Our own previous work showed that the organization of the active form of Pol II, which marks transcription sites, can be highly disrupted with weaker fixatives, but not with the fixation regimen used for GAM or cryo-FISH 98 . In FISH, denaturation of the DNA Fig. 4 | Ligation-free methods to map chromatin contacts genome-wide. a | Genome architecture mapping (GAM) measures co-segregation frequencies of genomic regions by slicing the nucleus into thin nuclear sections and sequencing the DNA content of a large number of randomly collected slices. To obtain nuclear slices, cells are fixed and cryosectioned. Single nuclear slices are isolated from the cryosection using laser microdissection. DNA is extracted from each nuclear slice by whole-genome amplification and sequenced. The sequence information is used to score the presence or absence of genomic loci in each slice. Spatial proximity of all pairs of loci in the genome is inferred from the frequency of their co-occurrence in the population of slices. b | Split-pool recognition of interactions by tag extension (SPRITE) detects chromatin interactions of multiple genomic regions by tagging single crosslinked chromatin complexes with unique combinations of identifiers before sequencing. Cells are fixed and the crosslinked chromatin is fragmented using sonication. The resulting chromatin complexes are split into wells of a 96-well plate, and DNA in each well is ligated to a unique barcoded adapter. The contents from all wells are pooled and split again, followed by adapter ligation. The process is repeated five times so that each chromatin complex is labelled with a unique combination of adapter sequences. DNA is purified and sequenced, and the adapter combination of each sequenced DNA fragment is used to identify all genomic regions that share the same combination of adapters and were, therefore, initially crosslinked together, inferring spatial proximity. c | Chromatin-interaction analysis via droplet-based and barcode-linked sequencing (ChIA-Drop) detects chromatin contacts by barcoding crosslinked chromatin complexes after cell fixation, lysis and chromatin fragmentation. Barcodes are delivered in a droplet that contains a unique identifier and reactions for adapter ligation and DNA amplification. Each chromatin complex is loaded onto a droplet in a microfluidics device and sequenced. Barcodes identify regions from the same droplet, indicating regions that were crosslinked due to spatial proximity. b | Long-range super-enhancer contacts can also be found when looking at GAM contacts without filtering for the most significant interactions (~500 million reads, 40-kb resolution, mouse ESCs, data from reF. 10 ), and although they are not readily detected in Hi-C data with an average sequencing depth (~240 million reads, 40-kb resolution, mouse ESCs, data from reF. 94 ), they start to emerge in deep-sequenced in situ Hi-C data (~800 million reads, 50-kb resolution, mouse ESCs, data from reF. 191 ). Heat maps were generated by Christophe Thieme from the published, normalized matrix files.
Scores are colour-coded, where the colour-code range (maximum and minimum cutoffs) is determined by the mean value of the bin distances 1-20 and -50 to -30 from the diagonal, respectively. c | The plot shows the distribution of contact frequencies detected by high-throughput chromosome conformation capture (Hi-C), GAM and split-pool recognition of interactions by tag extension (SPRITE) (all clusters or clusters with 2-10 reads) along the linear genomic distance of chromosome 11 in mouse ESCs, scaled to the maximum observed value in each data set. The ligation-free methods GAM and SPRITE detect similar ranges of chromatin contacts, which can extend over large genomic distances. By contrast, Hi-C contacts typically extend over shorter genomic distances. However, SPRITE data can be sorted based on the number of interactions within one chromatin complex. When considering only small SPRITE clusters with fewer than 10 genomic regions in the same chromatin cluster, the range of detection between Hi-C and SPRITE is comparable, indicating that Hi-C favours less complex short-range contacts over long-range interactions involved in chromatin hubs with many interaction partners. The plot was generated using the same data for GAM (reF. 10 ) and Hi-C (reF. 94 ) as used in part b, and data provided by Sofia Quinodoz for normalized SPRITE clusters for chromosome 11, according to figure 3B of reF. 11 . Part a adapted from reF. 10 99,100 . However, 3D-FISH preserves the organization of centromeres seen by imaging the same cells before and after hybridization 100 , and cryo-FISH retains the organization of active Pol II sites 101 .
In another method, resolution after single-strand exonuclease resection (RASER)-FISH, heat denaturation of the DNA is avoided and DNA accessibility is achieved by exonuclease digestion, thereby reducing the effects of DNA denaturation 102 .

Multiplicity of chromatin contacts.
The dependency of 3C-based methods on DNA end ligation results in preferential detection of low-multiplicity contacts that involve only a few genomic regions 84 . However, every 3C library also includes interaction events that occur at complex clusters involving more than two DNA fragments, albeit at a low representation. Current methods to capture these higher-complexity ligation events include multi-contact 4C (MC-4C) 103 , which uses long-read sequencing (such as nanopore sequencing) of 4C libraries to capture three-way contacts of a region of interest, and chromosomal walks (C-walks) 104 , which implement multiple ligation steps followed by dilution and barcoding of the isolated ligation products. Alternatively, methods such as the concatemer ligation assay (COLA) 105 and Tri-C 106 generate 3C libraries with a restriction enzyme that cuts small DNA fragments, which increases the frequency of detecting multiple ligation events in one sequencing read. Estimates based on direct comparison of pairwise and multiway ligation events indicate that only 17% of chromatin contacts in mouse embryonic stem cells (ESCs) are pairwise contacts and, therefore, the majority of the genome is involved in higher-order contacts between more than two genomic loci 104 . These observations are supported by a recent study using SPRITE, which showed that classical ligation-dependent methods under-represent higher complexity contacts 11 (Fig. 5c). Assays that do not depend on ligation detect DNA fragments that are in spatial proximity regardless of the number of interacting genomic loci. For example, long-range multiway contacts between genomic regions harbouring super-enhancers were readily found in GAM 10 , FISH 10 and SPRITE 11 data, but had not previously been detected with Hi-C. Furthermore, analyses of triplet interactions between TADs in GAM showed that multiple interactions between super-enhancer regions and active genes are a common feature of genome conformation in mouse ESCs 10 .

Spatial distance between contacting genomic regions.
The spatial distance between genomic loci is thought to influence the probability of ligation irrespective of the frequency of contacts. Whereas cryo-FISH and SPRITE have readily detected abundant interchromosomal  NAture reviews | GENEtICS contacts in human, mouse and Drosophila cells 25,76,107,108 , 3C-based methodologies are more often used to explore specific contacts within chromosomes, with some exceptions 27,76,[109][110][111] . Recent CRISPR-Cas9 live-cell imaging of a small number of chromatin contacts within and between chromosomes showed that interchromosomal contacts display spatial distances in the range of ~280 nm, in contrast to distances of ~190 nm for intrachromosomal interactions 20 . Interestingly, only the intrachromosomal contacts could be observed in matching Hi-C data, indicating a dependency of close spatial distances for successful proximity ligation.

Protein-mediated interactions versus bystander contacts.
GAM and all imaging-based techniques collect all possible spatial relationships between genomic regions, regardless of their involvement in a protein-mediated interaction, and allow sampling of the whole range of spatial distances within the interphase nucleus. Thus, these methods also detect bystander contacts. However, it is possible to identify the most specific contacts through effective sampling to take into account all behaviours of all genomic regions at all linear distances across the cell population. In this regard, GAM currently has more statistical power than FISH as it samples all possible combinations, whereas FISH remains limited to the analyses of a subset of regions or chromosomes.

Levels of concordance between different methods.
The validation of results obtained by 3C-based methods often entails the use of DNA-FISH on a few selected loci. Many examples show agreement between 3C interaction frequencies and spatial distances measured by FISH, especially at large genomic distances 44,64,[112][113][114][115] . Loci in the same TAD are often closer in nuclear distance than loci in different TADs 24,94 , and interaction frequencies obtained from Hi-C correlate with spatial distances at and above the TAD level 115 . A linear relationship between Hi-C contacts and FISH distances was found by investigating the physical distances between all TADs along a chromosome 115 . An overall correlation between Hi-C interactions and the median spatial distance measured by high-throughput FISH have recently been shown for 90 pairs of loci. However, the range of physical distances between genomic regions containing Hi-C interactors (with high ligation frequency) and noninteractors (with low ligation frequency) overlap extensively, with about 20% of distances being closer between two non-interactors than two interactors 21 . Thus, Hi-C captures spatial proximity but Hi-C interactions are not easily translated into physical distances. Other comparisons between FISH and 3C-based methods have also found non-trivial relationships between physical distance distributions and population-average interaction frequencies 113 and show that contact frequency is distinct from average spatial distance, both in polymer simulations and in experimental data 116 . The use of FISH to validate Hi-C results has helped investigate false positives in Hi-C data, assuming FISH is correct, but is not a valid strategy for an unbiased search for contacts that are missed by Hi-C (that is, false negatives). Thus, any under-represented contacts in Hi-C data have so far not been systematically studied. The development of orthogonal genome-wide ligation-free approaches, such as GAM and SPRITE, have been able to identify new aspects of 3D genome folding that had not been detected by Hi-C but which are fully validated by FISH 10,11 . The first and relatively small GAM data set, combined with the mathematical model SLICE, identified specific long-range contacts across genomic distances that span tens of megabases, which involve active and enhancer-rich genomic regions (Fig. 5a,b). One promising outcome of the emergence of these orthogonal approaches is the development of analysis tools that use the information they generate about such long-range contacts to discover the same contacts in Hi-C data. In this regard, it is interesting to note that CTCF depletion in human cells results in the detection by Hi-C of longrange contacts between super-enhancers 117 , which raises the possibility that CTCF-mediated contacts may be preferentially detected by Hi-C in normal conditions, but once CTCF-dependent interactions are lost, other underlying folding patterns, including long-range contacts, become easier to detect.
The first SPRITE data set has also highlighted novel aspects of 3D folding that are not readily captured by Hi-C 11 . By discriminating contacts according to their multiplicity, SPRITE shows a contact decay with genomic distance that is Chromosome territories and interchromosomal very similar to Hi-C when considering only low-complexity SPRITE clusters (2-10 genomic regions per contact hub; Fig. 5c). By contrast, SPRITE shows a striking abundance of long-range contacts when considering also higher-order contacts, which confirms early theoretical predictions that ligation-based approaches are biased to the detection of more simple 3D chromatin contacts 84 . Although GAM and SPRITE are orthogonal methodologies, their frequency of contacts relative to genomic distance are remarkably concordant 10,11 (Fig. 5c).

Limitations and applications of different methodologies.
Methods that use proximity ligation are limited by the low efficiency of ligation, and are also potentially affected by the local distance between, or the topology of, the two DNA ends within the cluster of contacting DNA fragments. SPRITE also depends on ligation of a small oligo to each DNA end in a contact cluster; however, it is no longer dependent on the physical distance between two DNA fragments in the cluster, which allows mapping of all contacts within one chromatin complex. In 3C-based methods and SPRITE, detection of contacts depends on the efficiency of the fragmentation step to expose the DNA end. In GAM, there is no DNA restriction digest or ligation, and the detection of DNA depends on its extractability and sequencing depth. 3C-based methods, GAM, SPRITE and FISH can be applied directly to cells, tissues or organisms, whereas insertions of DNA binding site arrays (such as the Lac operator system), CRISPR-based imaging and Dam-related methods require genetic engineering of cell lines or whole organisms, and will not be suitable for the analyses of most human biopsies.
Each of the assays discussed here has different limitations and applications, and thus contributes to our www.nature.com/nrg current understanding of 3D genome folding in different ways. 3C-based techniques have the advantage of providing enormous amounts of chromatin contact information in one comparably simple biochemical experiment, although they may require high-depth sequencing when aiming for high resolution. 3C-based methods, and in particular proximity ligation itself, also have important limitations that favour the detection of more simple contacts over higher-order chromatin contacts, which can lead to misunderstanding the importance and abundance of certain interactions. However, 3C-based techniques are well suited for studying local chromatin folding within the range of kilobases up to a few megabases.
Imaging and ligation-free methods have the ability to detect chromatin contacts at all scales of chromosome folding, including contacts between chromosomes. GAM and SPRITE can be readily used for sequenceunbiased genome-wide explorations, whereas detection of contacts with DNA-FISH remains limited to preselected loci and is most often used to validate findings from genome-wide techniques. Imaging fluorescently labelled chromatin loci in live cells with CRISPR-based techniques will improve our understanding of possible artefacts resulting from chromatin preparation or fixation. Other developments based on cryo-focused ion beam (cryo-FIB) milling of intact, frozen cells 118 or cryolysis 119 also hold the potential of devising fixation-free versions of GAM and SPRITE that sample fractionated frozen nuclei.

Insights into chromatin organization
Each of the techniques available for studying 3D chromatin folding has provided important structural and functional insight into the different hierarchical levels of chromatin organization.

Chromosome territories and interchromosomal contacts.
FISH imaging shows that specific chromosomes occupy discrete non-random nuclear spaces during interphase, termed chromosome territories 120 (Fig. 1). Chromosome territories show cell type-dependent preferences in terms of both their radial position within the nucleus and their position relative to other chromosomes 25,107,121 . Specific contacts can be detected at the interface between chromo some territories 20,122,123 ; overall, an estimated 20% of the volume of chromosome territories intermingles with other chromosome territories, often at their peripheries, both in human primary lymphocytes 24 and in Drosophila melanogaster cells 25,108 . The extent of intermingling between chromosome territories directly correlates with translocation probabilities upon ionizing radiation damage, highlighting that the physical proximity between chromosomes affects their stability in response to DNA damage 25,124,125 . The organization of chromosomes into discrete territories is also inferred from 3C-based and ligation-free approaches, as higher interaction frequencies are detected within chromosomes than between them (Fig. 1). 3C-based technologies have also detected contacts between chromosomes, and these have been successfully validated by imaging 52,74,78,110,122,126-128 .
Chromatin hubs and compartments. The organization of chromosomes into subchromosomal domains has been extensively studied. For example, in mammalian cells, chromatin domains were observed in relation to replication origins, which contain many replicons and maintain their domain co-association across subsequent cell cycles 129 . The compartmentalization of chromosomes into early and late replicating domains [130][131][132] was also shown to be linked to transcriptional activity, with sites of active transcription occurring predominantly in early replicating domains 133 . More recently, these observations have been largely confirmed by genome-wide assays to map replication and transcription, in which transcriptionally active and early replicating chromatin domains organize into separate subcompartments, distinct from late replicating domains [134][135][136] . Early analyses of nuclear organization by electron and confocal microscopy had shown that chromatin occurs in highly condensed (heterochromatic) and less condensed (euchromatic) states 137 , and revealed that transcription occurs in euchromatic areas of the nucleus 138 . With the emergence of whole-genome 3C-based methodologies, such as Hi-C, the mapping of active and repressed chromatin states has become possible at the genome-wide scale, providing powerful insights into how gene expression relates to chromatin compaction. Application of principal component analysis to Hi-C data revealed a strong segregation of ligation events into two distinct compartments (A and B compartments) according to the activity state of the genomic regions 44 . These compartments can also be seen in contact maps generated by ligation-free approaches 10,11 (Fig. 1). Comparisons with linear maps of protein occupancy on chromatin helped reveal a strong relationship between the A compartment and transcriptionally active, open chromatin, as defined by DNase hypersensitivity, and the B compartment with closed chromatin, defined by repressive epigenetic marks of heterochromatin 44 . Increased depth of Hi-C data sets has since allowed smaller subcompartments to be detected, which capture fine differences in replication timing as well as preferred associations with the nucleolus or the nuclear lamina 64 .
Nuclear compartments or domains. Nuclear compartments are membrane-free organelles enriched for specific nuclear proteins and RNAs, which often have preferred associations with specific genomic regions and thereby influence the large-scale organization of chromosomes during interphase. They include the nucleolus, nuclear lamina, splicing speckles, paraspeckles, Cajal bodies, promyelocytic leukaemia bodies, Polycomb bodies, replication factories and transcription factories, which have all been described initially using microscopy 139,140 (Fig. 1). For example, active ribosomal gene clusters are localized in the nucleolus, where the large ribosomal RNAs are transcribed, processed and assembled into pre-ribosomes 141 . Splicing speckles occupy internal nuclear positions, separate from the nuclear lamina and nucleoli, and bring together gene-dense regions 91,142,143 . Association between specific genes at splicing speckles has been shown using imaging techniques, and has been confirmed at the genome-wide level with SPRITE, which revealed that regions from different chromosomes come together at the same speckles 11 . Genome-wide mapping of gene association with speckles has also recently been achieved by TSAseq 92 . Fluorescence microscopy and electron microscopy showed that transcription itself occurs at discrete sites in the nucleus, termed transcription factories, which may organize active transcription units [144][145][146] , with only a small proportion of transcriptional activity (~5-10%) being found immediately adjacent to the most prominent splicing speckles 146 . Interestingly, the fraction of the genome that associates closely with slicing speckles has been shown by TSA-seq to contain highly transcribed genes and super-enhancers 92 , in keeping with previous imaging data 142 . Co-expressed genes can share the same transcription factory, which may be compatible with mechanisms of coordinated gene regulation via chromatin folding 26,78,147,148 , but it remains unclear whether transcription factories are strictly specialized. Recent findings show that several factors involved in the transcription process, such as Pol II 149 or transcriptional co-activators BRD4 and MED1 (reF. 150 ), can form condensates by liquid-liquid phase separation, a process that may concentrate transcription factors and generate transcription factories. Moreover, the formation of nuclear condensates has been suggested as a general principle of nuclear body formation 151 . Clustering of distant genomic regions is not only mediated by transcription, but also occurs in the context of gene repression. Chromatin contacts at Polycomb bodies, which are repressive nuclear compartments, are a prominent example of gene clustering. In D. melanogaster, Polycomb-repressed Hox genes come together over a genomic distance of 10 Mb when they interact with a Polycomb body 152 . Other studies have reported long-range intrachromosomal and interchromosomal contacts between Polycomb-bound genes in human teratocarcinoma cells 153 and in mouse ESCs 52 .
Understanding how the preferential associations of genomic regions with specific nuclear domains relate to 3C-derived chromatin contacts remains a major challenge. Comparisons of genome-wide maps of laminaassociated domains 90 and Hi-C contacts show a strong coincidence between the transcriptionally inactive B compartment and the nuclear lamina 64,94,154 or late replication domains 136 . Repressive histone marks that define the heterochromatic B compartment are also strongly enriched at genomic regions that associate with the nucleolus 155 , suggesting that the compacted B compartment is both situated at the nuclear periphery and clustered around the more central nucleoli, separated by the active, open A compartment. However, the bimodal separation of the A and B compartments derived from 3C-technologies should not be naively inferred as strictly active or silent chromatin. Genes can be activated in all areas of the nucleus, including at the nuclear lamina 156 or at centromeric regions 157 , and gene positioning at the periphery does not always lead to gene inactivation 158,159 . Heterochromatin domains also contain active sites of transcription 160 . Consequently, a strict separation of the A and B compartments, as defined by 3C-approaches, seems unlikely, especially considering that contacts between compartments can be found in Hi-C maps 94 and that they are found even more robustly using orthogonal methods, such as GAM, that do not rely on weak fixations 10 . These observations suggest that long-range gene-regulation mechanisms are complex, and not only depend on pairwise contacts between genomic regions but may also be influenced by the local nuclear environment where each region is located 95 . The ongoing challenge of disentangling the direct functional relationships between the positions of genomic regions in the nucleus, their local and long-range contacts, and the state of gene activity is being addressed by analysing chromatin contacts at the single-cell level and with allele specificity, for example, using DNA-FISH 21 or single-cell Hi-C 73,74 .
TADs and loop domains. At smaller scales, chromosomes fold into self-associating chromatin domains, termed TADs 24,94,161 (Fig. 1). Chromatin domains had been previously identified by microscopy but their detailed genomic composition was unclear. Since the discovery of TADs, the segmentation of the genome into megabase-sized domains has been extensively studied in several organisms and with different methodologies, leading to major breakthroughs in the discovery of mechanisms of disease caused by congenital genomic rearrangements 3,47,162,163 . TADs often enclose clusters of co-regulated enhancers and promoters 164,165 . Their size has been re-examined with the increasing resolution afforded by improved 3C-based assays, and found to vary from 40 kb to 3 Mb in the human genome 64 , leading to the proposal of smaller loop domains as a substructure of TADs. Loop domains had been detected by microscopy before the emergence of 3C-technologies as DNA loops between transcriptionally active regions 166 . Loop domains derived from 3C-based technologies often coincide with pairs of convergent CTCF binding sites, indicating that CTCF binding can contribute to the partition of specific regions of the genome into self-associating domains 64,[167][168][169] . Higher-order contacts between TADs have also been investigated, leading to the identification of metaTADs, which bring together distant TADs in cell type-specific patterns that relate to gene activity 154,170 .
It has been debated whether TADs represent domains that exist predominantly across the cell population or represent an average of individual preferred contacts. Although interactions observed in single cells by singlecell Hi-C and imaging do not often identify whole TADs, the contacts detected frequently occur with the TAD coordinates defined by population Hi-C 32,34,72 . However, this preference might not be as strong as anticipated. Imaging of chromatin contacts in mouse ESCs and oocytes showed that in 40% of cases 3D physical distances between regions that flank TAD borders are shorter than distances between regions within TADs 73 , leading to highly variable contact clusters in individual cells that do not coincide with the positions of TADs in the cell population. This observation agrees with the detection of chromatin contacts between regions separated by TAD borders in single cells, often found at similar frequencies to regions within TADs 21 . However, it is particularly noteworthy that combining the singlecell Hi-C data results in the same TAD coordinates www.nature.com/nrg observed in bulk population Hi-C, which supports the idea that TADs represent contact preferences of a cell population, rather than compact domains of chromatin in single cells 73,171 .

Chromatin contacts between cis-regulatory elements.
Physical contacts between enhancers and promoters are essential for the transcription of genes 85 and can occur over distances ranging from less than 1 kb up to several megabases [172][173][174][175][176] (Fig. 1). Genome-wide maps of candidate promoter-enhancer contacts can be created using highresolution 3C-based methodologies that enrich for contacts mediated by Pol II or promoter histone marks, or that preferentially capture promoter-based contacts 52,80,81 . Direct pairwise contacts between gene promoters and enhancers have become the most prominent concept of enhancer function, possibly as a result of the increased power of 3C-based technologies to detect local pairwise contacts rather than higher-order conformations. However, other mechanisms for regulating enhancer function are also emerging, which can involve formation of chromatin hubs, tethering of genes to active chromatin or nuclear environments 156,158,177,178 and phase separation 179,180 . An interesting study in budding yeast suggests homologue pairing as a mechanism for gene activation 181 . In the diploid yeast genome, upon glucose deprivation of the cell, both copies of the genomic locus containing the gene TDA1 are relocalized to the nuclear periphery, where the homologues associate with each other and TDA1 expression is activated. A more classical concept of gene regulation can be observed at developmental loci, where cis-regulatory contacts between enhancers and promoters are thought to occur most commonly within TADs 48,162,182 . Although regulatory landscapes within TADs seem to be a common mechanism, genes themselves also contact each other across TAD boundaries over large genomic distances 10,152,153,183 . Ligation-free methods, such as FISH, GAM and SPRITE, all detect long-range contacts across TAD borders 10,11,154 , and detailed analyses of Hi-C ligation frequencies also identify ligation events across TADs, over tens of megabases, that are statistically different from random contacts 154 . The functional relevance of these contacts is a compelling question that is beginning to be addressed by developments that allow ectopic chromatin contacts to be engineered in the cell 184,185 . The spatial and functional relationship between gene promoters that contact each other also remains poorly understood. Deletions of several gene promoters in the mouse ESC genome altered the expression of nearby genes 186 . This observation suggests that genes themselves may act as enhancers for other genes, possibly by recruiting cis-regulatory signals, and supports the concept that clustering of genes in transcription factories has regulatory functions.

Conclusions
The development of genome-wide approaches for studying 3D genome folding have revolutionized our ability to understand the regulatory content of the linear genomic sequence. Alongside 3C-based methods, the recent development of orthogonal technologies to map chromatin contacts brings us closer to uncovering 3D genome folding architectures with unprecedented detail, at all genomic scales and with single-cell resolution. The ongoing revolution in live-cell imaging, including improvements to the number of genomic loci that can be tagged simultaneously, will provide a deeper mechanistic understanding of how 3D folding structures are formed and disassembled, and how they contribute to genome stability gene expression, and of homeostatic changes in cell states in response to stimuli. Ultimately, the ability to detect changes in chromosome topology will open new avenues for disease diagnostics, disease target discovery and many other applications.
Published online 17 December 2019