The data presented here (https://doi.org/10.18452/23552) was collected for the Master’s Dissertation “Buchillustrationen im digitalen Zeitalter: Konzept für ein Datenmodell” (Book Illustration in the Digital Age: A Concept for a Data Model) at the Humboldt University (Berlin) in 2020. In the following year, an extended version of this dissertation was published in the series “Berliner Handreichnungen zur Bibliotheks- und Informationswissenschaft” (http://dx.doi.org/10.18452/23601). The data analysed here are records of page views in the Warburg Institute Iconographic Database, as documented by Google Analytics. In the dissertation, they were used to trace aspects of the actual use of a scholarly image database of works of art. The Warburg Institute Iconographic Database was created in 2010, a total refit of the software is due in late 2021. Afterwards, the URLs discuss here will in all likelihood no longer be functioning. In 2019, the database comprised the following elements, each of them with its own URL - 2 start pages (start page, Advanced Search screen) - c. 102,000 image records - c. 60,000 ‘folders’ (Folders unite thematically related image records into groups, they are structured hierarchically in up to eight levels. A folder can either contain subfolders or images. Each folder is part of one and only one overarching folder, whilst an image record can be assigned to several folders). - Pages of search results have their own URLs that contain the search terms. In February 2020, Rembrandt Duits, one of the curators of the Warburg Institute Photographic Collection and the creator of the database software, gave the author a spreadsheet containing over 100,000 URLs – all addresses of pages of the database for which Google Analytics recorded at least one visit in 2019. They form the basis of the analyses attempted here – although it appears that the visits to some URLs were not recorded completely or not at all (for example, only 140 visits to the Advanced Search screen are recorded – but the URLs of search results show that this page must have been visited more than 1,000 times). Because of travel restrictions and the closure of the Warburg Institute for several months during 2020 it was impossible to go back to the Institute and do more work with the Google Analytics interface. For each visited URL, Google Analytics gives the following information: - URL (“https://iconographic.warburg.sas.ac.uk“ has to be added at the beginning.) - Total number of visits (labelled “Treffer”) - Number of sessions with at least one visit to this URL (“Sitzungen”) - Average time of visits (“Zeit”) - Number of sessions, in which this URL was the first visited page of the database (“Eingangsseite”) - Bounce rate: percentage of sessions, in which this URL was the first visited page of the database, but no other page was visited afterwards (“Bounce Rate”) - Percentage of sessions, in which this URL was the last visited page of the database (“Ausgangsrate”) In most spreadsheets, this information appears after the URL in this order – any variants are described below. All spreadsheets are available here both as Excel (Office 2010) files and as csv files. ##### Spreadsheet “0 - unbereinigt” [raw data] Complete list of the URLs and accompanying information as recorded by Google Analytics, sorted alphabetically according to the URL (111,058 URLs). ##### The spreadsheets with names starting with “1 - Bereinigung” [data cleansing] contain records that were excluded, corrected, or merged with other records in a process of deduplication. Altogether, 1,747 URLs, approximately 1.6 % of the total number, were excluded, and over 4,100 URLs, approximately 3.7 % of the total, merged with others. Spreadsheet “1 - Bereinigung - 1 unsinning” [nonsensical] This spreadsheet contains addresses that do not follow the correct syntax but have some characters inserted at different places – hence they do not lead to pages in the database. They were probably the result of attempts at hacking the database (471 URLs, excluded). ### Spreadsheet “1 - Bereinigung - 2 Translate” This spreadsheet contains addresses that were apparently created by Google Translate. Here, these very long URLs were divided into two sections, the second (Column B) starts with the URL of the corresponding page of the database. These records could have been merged with others through deduplication – however, since most of them document one-off visits to individual folders, this appeared to be not worthwhile (197 URLs, excluded). ### Spreadsheet “1 - Bereinigung - 3 Erweiterte Suche” [Advanced Search] This spreadsheet contains URLs created by problems in using the Advanced Search function. Here, the long URL was divided in order to show the search terms (for details of the Advanced Search function see the introduction to the spreadsheets beginning with “5 - Erweiterte Suche” below). Here, Column B contains the number of the results page, Columns C and E the fields that were searched, columns D and F the search terms). In case of 56 URLs, the default “(any)” in the free field was not overwritten with a search string but remained at the beginning of the string. In all of these cases, there is also a record for an URL without the “any”. Hence, it was assumed that the user had noticed the mistake and executed a second search, and therefore these records were excluded. In 40 more cases (not listed here), there was only a search with the “(any)” at the start, they were counted like normal searches. In eight URLs, the name of an artist appears instead of the ID of his authority record. They could only have been created by making a search and then manually changing the URL of the results page (excluded). ### Spreadsheet “1 - Bereinigung - 4 Bild für Bild” [one by one] The database allows browsing search results one by one. When this option was used, two URLs were created for every image. One was counted as a normal visit to the image record, the second was excluded as a duplicate (1,015 URLs). ### Spreadsheet “1 - Bereinigung - 5 Facebook” This spreadsheet contains URLs ending with what appears to be a tracker code from Facebook (434 URLs) or from another programme (6 URLs). These URLs were merged with URLs without these appendices. A list at the end of the spreadsheet indicates how many URLs appear with how many different tracker codes. ### Spreadsheet “1 - Bereinigung - 6 Ampersand” This spreadsheet contains URLs that have instead of the character “&” the string “&”. They lead to an incorrect place in the database (all from “&” onward is ignored) and probably were created automatically through a misinterpretation of links. They were joined with the correct URLs (also indicated in this spreadsheet) through deduplication (15 URLs). ### Spreadsheet “1 - Bereinigung - 7 Einfache Suche mit Duplikaten” [Simple Search, with duplicates] Spreadsheet “1 - Bereinigung - 8 Einfache Suche Duplikate markiert” [Simple Search, duplicates marked] Spreadsheet “1 - Bereinigung - 9 Einfache Suche dedupliziert” [Simple Search, deduplicated] These three spreadsheets show the deduplication of URLs of results pages in Simple Search. It was necessary because, whilst the Simple Search process ignores capitalisation and diacritical marks, those nevertheless appear in the URL of the results page. The first spreadsheet contains the URLs of all Simple Search results pages, the second marks those that stand for the same search terms, in the third those are merged, with the statistical information from Google Analytics adapted. Altogether, 3,677 URLs were joined to others. At a first attempt, by mistake several hundred URLs in which the search started with a blank had not been included; furthermore, some errors had occurred in the recalculation of the statistical information. Unfortunately, this result had already been used for the coding of Simple Search requests (spreadsheets with names starting with “4 - Einfache Suche”). When doing this process again, by mistake a list of URLs was used in which the addresses with Facebook tracker codes had not been deduplicated. This was then done ‘again’ – hence the sum of the records removed through data cleansing and deduplication is slightly larger than the difference between the numbers of records of the raw and the cleansed data. ##### Spreadsheet “2 - Bereinigt” [cleansed] This spreadsheet contains the cleansed data (105.261 URLs). Here, the URLs of the Advanced Search were shortened, the unchanging parts were replaced with “Erweiterte Suche: ”. Column H gives the absolute number of bounces (hence views of a page as only page of the database in a session, calculated from the bounce rate and the number of visits to the page as first page of the session). ##### The spreadsheets whose names begin with “3 - Eingangsseiten” deal with the pages that were the first pages visited during a session. ### Spreadsheet “3 - Eingangsseiten - 1 Gesamt” [total] This spreadsheet contains all pages for which use as start page in the database was recorded, ordered by the frequency of this use (7,458 URLs). ### The following spreadsheets contain excerpts of “3 - Eingangsseiten - 1 Gesamt” – based on the number of uses of the page as start page: Spreadsheet “3 - Eingangsseiten - 2 1x Eingangsseite” (4,934 URLs) Spreadsheet “3 - Eingangsseiten - 3 2-9x Eingangsseite” (2,144 URLs) Spreadsheet “3 - Eingangsseiten - 4 10-99x Eingangsseite” (316 URLs) Spreadsheet “3 - Eingangsseiten - 5 100x oder mehr Eingangsseite” (64 URLs) Spreadsheet “3 - Eingangsseiten - 6 10-12x Eingangsseite” (86 URLs) In the last two of these spreadsheets, Column I briefly identifies the page specified by the URL. In the last spreadsheet it was furthermore indicated in Column J – as far as it made sense – if the English Wikipedia contained a link to this page (in the penultimate spreadsheet that was virtually always the case and hence is not indicated). ##### The spreadsheets whose names start with “4 - Einfache Suche” analyse the URLs of the results pages of Simple Search. ### Spreadsheet “4 - Einfache Suche - 1 Gesamt” [total] This spreadsheet contains the first results pages for Simple Search, altogether 13,037 URLs (the search results are displayed in units of 60 records, hence there can be multiple results pages for a single search. Here all but the first results pages are omitted). The long URLs of the results pages were divided into several columns, some unchanging elements have been deleted. The columns contain: - A: Number of the results page (always “1”, see above) - B: First word of query - C: Second word of query - D: Third word of query - E: Fourth word of query - F: Fifth word of query (all following words are ignored by the database) - G: String containing the first five words of the query, separated by blanks. At the right of the usual information from Google Analytics (Columns I–M), there are two more columns: Column N contains a numbering of all queries, Column O its remainder modulo 10. Because coding the search requests was rather time-consuming, the following spreadsheets only contain the rows with a remainder “0”. This sample is less representative than it could have been since the URLs were in alphabetical and not in random order in this spreadsheet. ### Spreadsheet “4 - Einfache Suche - 2 Stichprobe codiert” [coded sample] This spreadsheet contains the sample created above. Instead of the URLs of the results pages, here only the string containing the first five words of the query (see above, here Column A) and a code (Column B) are given. These codes were manually assigned to the queries and indicate the type of information included in them – if the words of the query refer to different types of information, several codes were assigned: d date h manuscript (“Handschrift” in German: name of holding library, shelf mark, common name of manuscript) k artist (“Künstler” in German) m museum / collection n medium o place (“Ort” in German) p depicted person or object s depicted scene (several persons, or action) t text w common name of a work of art (“Werk” in German) Queries with the following codes were not analysed any further (altogether 93): ? meaning of search query unclear x searches for collections of the Warburg Institute that are not described in this database, or for photographers or earlier owners of photographs – the latter queries were too rare to be included into an analysis. At the end of each list of queries with the same code or combination of codes, two numbers are given: the number of queries falling into this category (Column B, beneath are the numbers of page visits), and the sum of the numbers of sessions, in which queries of this kind had been made (Column C, beneath are the numbers of sessions). ### Spreadsheet “4 - Einfache Suche - 3 Ergebnisse” [results] This spreadsheet summarises the number of queries (Column B) and the sum of the numbers of sessions in which each query of this kind had been made (Column C) for every code or combination of codes. First, all combinations are listed (organised according to the type of search term in Blocks a and b, according to the number of searches in Blocks c and d), then for every code all combinations in which it occurs (Block e). At the end (Block f), it is indicated for every code in how many percent of all cases it was searched it was searched alone (Column D), and in how many percent of searches for any code it was searched (Column E). ### Spreadsheet “4 - Einfache Suche - 4 Erfolg” [success] This spreadsheet only deals with a part of the sample analysed above, the queries coded with “s” (“scene”, 292 URLs). All of these searches were executed again in summer 2020, and the quality of the results was evaluated. In many cases it was satisfactory (at least one retrieved record fitted with the meaning of the query – naturally a subjective observation), in other cases nothing was found because the database did not contain any relevant records. However, it was not rare that no results at all, or only results that appeared to be misleading, were found because the search term contained typos, was in a language other than English, or used a terminology different from the one in the database. These cases are identified here (Column B) – as far as possible, an alternative terminology that would have led to the correct results, is given (Column C). ##### The spreadsheets whose names start with “5 - Erweiterte Suche” analyse the results pages of the Advanced Search function. This function combines a free search with the search for specific fields (e.g., artist, or location). Entries to those fields in the database are references to separate authority files; search terms can be chosen in the Advanced Search screen through dropdown menus. The URL of the results page contains an abbreviation of each queried field and the ID of the authority record of the selected search term. Analogous to the URLs of results pages of Simple Search (see above), here only the abbreviations of the fields and search terms listed, whilst the unchanging parts of the URL have been deleted. Searches for only one term from a dropdown field can be launched via the Advanced Search screen but also from every image record – a click to a term linked to an authority record (e.g., the name of an artist) automatically starts an Advanced Search for all records that are linked to the same authority record. The following abbreviations for fields appear in the URLs of the search results: var free text (combined with a text string, not the ID of an authority record) cen_a century (from …) cen_z century (to …) art_1 artist plc place (first part of the location field, denoting a town only) loc location (complete location field, denoting a town and then a building or a collection) auc auction date msn manuscript shelf mark aut author (first part of the book field) bk book (complete book field, comprising author, title, place, and year) spc_coll Special Collection (This field was created to label collections within the Photographic Collection, so larger bequests, or the collections of images for Adam Bartsch’s catalogue of prints. In practice, it was used for a number of purposes, including naming photographers). spc_coll_no Special Collection – Number (like “var”, this is a field for a text string, not the ID of an authority record). This field was used in the database to indicate, for instance, some catalogue numbers of collections within the Photographic Collection. It was hardly ever used in searches – the few queries were repeated in May 2020, and since none of them produced a helpful search result, this field was ignored in the analysis. ### Spreadsheet “5 - Erweiterte Suche - 1 Gesamt” [total] This spreadsheet contains all results pages of the Advanced Search function – both those of the searches that could only be initiated via the Advanced Search screen and those that could likewise be initiated by clicking onto an entry in an individual record (together 5,278 URLs). Column A gives the number of the results page (also here, results are indicated in batches of 60), Columns B–Y the individual fields (the left of each two columns contains the abbreviation of a field, the right the corresponding search term). Columns Z–AC, AE and AG give the usual information from Google Analytics (Columns AD and AF add the absolute number of both bounces and last visited pages in the database, calculated from the percentages given by Google Analytics and here included in AE and AG). ### Spreadsheet “5 - Erweiterte Suche - 2 nur Menu” [search screen only] As mentioned above, an Advanced Search can be launched not only through the Advanced Search screen, but also by clicking onto an entry in an individual record. This spreadsheet contains the URLs of Advanced Search results pages corresponding to searches that could only have been started through the Advanced Search screen – hence, they search in two or more drop-down fields or include the specific free text field of the Advanced Search (1,097 URLs). ### Spreadsheet “5 - Erweiterte Suche - 3 Geordnet” [sorted] This spreadsheet contains the first results pages of Advanced Search (the later pages are omitted). Differently from the spreadsheet “5 – Erweiterte Suche – 1 Gesamt” the rows are ordered according to the combinations of fields in the search. To facilitate the sorting process, the abbreviations of field names only appear in the first row, elsewhere they are all replaced by “1”. ### Spreadsheet “5 - Erweiterte Suche - 4 Ergebnisse” [results] This spreadsheet summarises the results of the preceding one. Every row of Block a deals with a single field of the search. Columns B and C give the number of searches including this field and the number of searches for this field only respectively. The following columns show the percentage of searches for this field only: in relation to all searches including this field (Column D), in relation to all searches for one field only (Column E), and in relation to all searches (Column F). Row 17 adds the frequently used combination of fields Cen_A and Cen_Z (thus centuries of start date and end date). Block b indicates the frequencies of searches for all single fields and combinations of fields. ##### The spreadsheets whose names begin with “6 - Mappen” analyse the visits to folders, thus to the themed groups to which the individual image records are assigned. The URL of a folder contains the IDs of all folders this folder is part of but does not indicate if it contains subfolders or image records (it has to be one or the other), the URL of an individual image record does not indicate to which folders it belongs. Hence, it was necessary to count manually how many subfolders or image records a folder contains. This was done for two samples – images of the Zodiacal signs and illustrations of the Aeneid. This was done in summer 2020 – there were only very small differences to the state of the database during 2019, the year from which the Google Analytics records come. ### Spreadsheet “6 - Mappen - 1 alle Adressen Zodiac” [total] Spreadsheet “6 - Mappen - 2 alle Adressen Aeneid” [total] These spreadsheets contain all records from the two samples. This means that all URLs contained in the spreadsheet “Zodiac” are from folders that belong on level 4 to the folder “Zodiac” (in the URL “cat_4=41”). Columns B–E give short titles for the connected folders on the subordinate levels 5–8, thus the rightmost filled-in cell gives the title of the actual folder. A “*” before the name of the folder indicates that it is the first in the list of subfolders of a given folder, a “#” that it is the last one. If the folder contains images, not subfolders, Column F gives their number. Columns G–L contain the usual statistical information from Google Analytics. Correspondingly, the spreadsheet “Aeneid” contains the URLs of all folders that belong on level 4 to the folder “Aeneid” (in the URL “cat_4=970”). Columns B–D give short titles for the connected folders on the subordinate levels 5–7 (level 8 is never used in this sample), Column E the number of image records the folder contains, Columns F–K the information from Google Analytics. ### Spreadsheet “6 - Mappen - 3 Aufrufe nach Ebene Zodiac” [visits according to level] Spreadsheet “6 - Mappen - 4 Aufrufe nach Ebene Aeneid” [visits according to level] These spreadsheets analyse the frequency of visits to folders depending on their position in the database – thus the number of levels one has to descend from the main folders “Zodiac” or “Aeneid” on level 4 in order to reach them. Most of the images in both samples belong to a series (of Zodiacal signs and illustrations of the Aeneid respectively) – thus their image records are assigned both to a folder for the specific iconography and to a folder for the image cycle. Both types of folders are here analysed separately (iconographic folders in Blocks a and b, cycles in Blocks c and d). An additional column (M in “Zodiac”, L in “Aeneid”) gives the number of visits divided by the number of sessions with at least one visit (repeated visits in one session suggest navigation between image records, e.g. between records belonging to one cycle of images), the rightmost column (N and M respectively) indicates if the page was ever visited or not in 2019. ### Spreadsheet “6 - Mappen - 5 Zyklen Zodiac” [cycles] Spreadsheet “6 - Mappen - 6 Zyklen Aeneid” [cycles] These spreadsheets focus on the folders containing image cycles (see above). They are sorted according to the number of elements each cycle contains. (A “cycle” folder was normally created if an image belonging to an image cycle was catalogued – even if no other image of this cycle was in the database. Hence, whilst most Zodiac cycles contain twelve images, there are many “cycle” folders with less than this number, or with one image only.) ### Spreadsheet “6 - Mappen - 7 Untermappen Zodiac” [subfolders] Spreadsheet “6 - Mappen - 8 Untermappen Aeneid” [subfolders] These spreadsheets compare the frequency of visits to an overarching folder with the sum of the frequencies of visits to its subfolders – thus, if normally only one or several subfolders were being visited. Block a lists all folders with subfolders (starting with the most specialised ones, folders at level 7 with subfolders at level 8). At the end of each group of subfolders there are sums of the numbers of visits to each of them, and of sessions with at least one visit. In the spreadsheet “Zodiac”, Block b brings together, in the same order as before, for every overarching folder the number of subfolders (Column B), the number of sessions in which the overarching folder was visited at least once (Column C), the sum of the numbers of sessions for each of the subfolders (Column D), and finally the relation of the value of Column D to Column C (Column E). Block c repeats these entries, arranged according to the value of Column E. Since the spreadsheet “Aeneid” contains far less folders with subfolders, they are already ordered according to Column E in Block b, there is no Block c. ### Spreadsheet “6 - Mappen - 9 Blaettern Zodiac” [navigation within a folder] Spreadsheet “6 - Mappen - 10 Blaettern Aeneid” [navigation within a folder] These spreadsheets organise the folders according to the number of image records (Block a) and subfolders (Block b) they contain. Here, the last column (M in spreadsheet “Zodiac”, L in “Aeneid”) indicates the relation between the number of visits to the folder and the number of sessions with at least one visit. Thus, it can be seen how much navigation took place between folders and individual records or subfolders contained by them (if one goes from the folder to an image record and back to the folder, it results in two visits to the folder in one session). The list starts with folders with image records, ordered by the number of images contained. They are followed by folders with subfolders, according to their number. At the end, the average values from column M and L respectively are brought together (Block c). Since the same types of image occur with different signs of the Zodiac (e.g. “Single Figure” or “with Planets”), the folders appear again, organised according to these groups, at the bottom of the spreadsheet (Block d, Summary in Block e).