The following article refers to maps that you can view here.
In Hypercities: Thick Mapping in the Digital Humanities, the authors begin with the claim that “maps are visual arguments and stories; they make claims and harbor ideals, hopes, desires, biases, prejudices, and violences” (Presner et al. 14). Our digital mapping project aims to create informative visuals to render legible a spatial narrative to those who visit our Omeka collection showcasing the “Poor Sinners’ Pamphlets.” Such a representation is always subject to both gains and losses in as much as maps reveal as much as they cover up; the digital offers new possibilities for revitalizing archives, but in doing so it necessitates choices that always negate some details in order to reveal others. Our project required vast amounts of behind-the-scenes data curation work and meticulous, thoughtful choices about how best to tell a story with the data that is invisible in the final maps. The maps offer scholars multiple approaches to the pamphlets from a greater distance than can be achieved with a visit to the physical archive. Our goal in mapping the pamphlets is not a perfect representation of their totality but an attempt – with a recognition and acknowledgement of the limitations therein - to share these documents and the narrative they offer not only about the public spectacle of punishment in the eighteenth and nineteenth century Europe, but about our own ability to map the body and how the body interacts with the world surrounding it in our own place and time.
Preparing the Data
The initial data preparation for the mapping section of the “Poor Sinners’ Pamphlets” project began in an in-class lab that included all of the students in the course. Students were instructed to arrive to class with their own Bing Maps API key and divided into groups that then used the GPS Visualizer to geocode the cities they were responsible for on the comprehensive list of publication locations (see Appendix Image A). They were also advised to note any abnormalities they came across in the location list, such as entries that included two cities, ones that listed just the country of publication, and (seemingly) obvious errors such as London, France.
After each group finished geolocating their set, the students came back together to discuss the irregularities they encountered during the process. They made a number of decisions a group concerning how to address each of the problems. Some of choices we made required relatively little discussion, such as simply eliminating entries with no location information from the mapping spreadsheet and investigating and correcting blatant errors ourselves. Others required a bit more in-depth debate in order to ensure that we were aware of the potential complications we could be adding to the data. For instance, the group decided that entries with multiple cities listed for their publishing location should be doubled so that there would be two nearly identical entries for each multi-city item, the only difference being that each of the entries would list one of the cities from the original spreadsheet (see Appendix Image B). We concluded that any decision concerning the multi-city entries would cause some form of loss, whether it be, on the one hand, a more obvious connection between the specific pamphlet and its recorded locations of publication or, on the other hand, the complete elimination of one of the recorded cities. This particular group consensus allows for each entry to be geocoded and mapped while still preserving the links between the artifact and the cities it was published in.
Creating the Geocoded Datasets
In this first step, we geocoded nearly all of the locations in the dataset with the help of the whole class. This process was not only of value for students to learn about geocoding and the trials of working with messy, unedited data, but was also beneficial to the two of us working on the mapping project as it cut out a significant amount of tedious work time. Building on the group’s work, we were able to add the geocoded data into each entry quickly through OpenRefine by creating text facets based on the column that listed each city, allowing us to match the coordinates with their appropriate locations (rows).
After completing most of the geocoding with the whole class, we set out to fine-tune our data to render it usable within the CartoDB mapping program. We thus used OpenRefine to make corrections and adjustments to the data. OpenRefine allowed us to isolate all the instances of text within the dataset (for example, all of the entries labeled “Undetermined”) so that we could ensure the uniformity of our dataset. We began in OpenRefine by adjusting for the multi-city entries within the data, since entries with multiple cities would be unplottable on a map, as required by CartoDB and as we had discussed in class. This adjustment required adding additional rows for each multi-city entry and separating the cities so that each appeared on its own line, then copying the other data pertaining to the individual pamphlets. In order to preserve the integrity of our data, we also added a column to denote the subsequently duplicated records. We labeled the new column “Location Type” and it provided a place to denote other variants within the data, for example pamphlets that listed only a country for a publication site, or in many cases, no location data at all. Having divided the multi-city entries, we were able to again use OpenRefine to insert geolocation data for each city individually, rendering these pamphlets plottable via latitude and longitude coordinates.
We then continued to clean up the data, ensuring that all entries in full set had a city, country, latitude, and longitude values, and that any missing data were clearly labeled as “undetermined.” We also used OpenRefine to create a dataset for the subset of the 101 documents which are presented in full on our Omeka site. By adding a column to our full dataset, in which we inserted a “yes” for each entry that was also part of the subset, OpenRefine was able to filter out the subset entries by creating a text facet of the column that separated the entries into those that contained a “yes” in the column and those that were left blank, making it simple to delete the superfluous data and create a ready-made, geocoded subset (see Appendix Images C and D).
Planning the Maps
After preparing the full dataset and subset, we met to discuss our goals for the maps. We began by experimenting with our data within CartoDB to explore possible settings and outputs. We quickly realized that we were in many ways limited by our data, and by the program itself. A primary problem we encountered dealt with the chronology of the pamphlets. One setting within CartoDB, the “Choropleth,” allowed us to plot points on the map that showed a color gradation according to year. While exploring the Choropleth option, we had to confront a gap in our data: several years of publication were missing, some had too many digits, and others were only partially determined. This inconsistency meant that we could not label the data within the column as “year,” because the program could not recognize it with the incomplete data. While keeping our original datasets, we made the decision to create copies of our datasets for use in CartoDB only, from which we removed the incompatible data (years like 18211 or 185u?). With the data now consistent (blanks were inserted for unknown years, as the program could not read text within a numerical column), it became possible to plot the pamphlets on a map using their chronological data.
Now familiar with the capabilities of CartoDB, we discussed what our data contained, and what we could show with it. Our data contained years and cities of publication for most entries, so we decided to create maps that would display the frequency of publication by city and over time. Another organizing principle for the stories contained in our archive was based on the types of crimes committed; CartoDB allowed us to add a spatial dimension to the broader look at the distribution of crime in our archive, represented by a map that would show these categories.
In order to display categories on the map, we had to again refine the data, and because of the work involved, we chose to create a map that included categories only for our subset. We began by restoring from our original dataset two columns of subject tags. We first adjusted them to match the agreed upon tagging categories used by the rest of the class, changing entries such as “Murder - Germany - 19th Century” to simply “Murder and Murderers” for clarity and consistency, as the year and location data were already represented in other columns (see Appendix Image E). This process presented some additional choices and problems, which became clear after a trial run with the category data in CartoDB. The category map could only display data from a single column, so we needed to prioritize the tags we wanted to represent, and we also realized that some tags were more or less helpful. Two categories were too similar to be read clearly: the data contained categories for both “Brigands and Robbers” and “Theft - Robbers - Thieves”. We conducted research and adjusted the “Brigands and Robbers” category to “Gang Robbery” to reflect the subtle difference. We also realized that other categories (“Capital Punishment” and “Laws - Trials - Sentences”) could apply to all the pamphlets without denoting clear distinctions. Our goal was not to remove this information from our maps, but to find a way to display categories that were distinct, informative, and still representative of the individual stories contained in our collection.
In order to create a map that presented clear, readable data while still preserving the nuances of the diverse pamphlets, we divided the available subject tag data into two columns: one column would be used by the map to organize the pamphlets according to category of crime for color-coded display, and a second column would be used to label each point with a full disclosure of our tagging data, including those categories not represented on the map itself. The result is shown most clearly in the map’s portrayal of female criminals. Since our Omeka site contained an exhibit of pamphlets related to female actors and victims, we wanted to represent this subgroup on our map. We chose to put all of the tags for “Women Perpetrators” in the first column - the one used by the map to display the data (labeled “Subject Tags I). This ensured that the category “Women Perpetrators” would be significant enough to appear in the legend and on the map. All additional tags (such as women who committed poisoning or infanticide) we included in a second column, which would appear as a label on the map. Through organizing the data this way, we created a balance between legibility and disclosure. Visitors to the site are able to search the data for all crimes committed by women, but are also able to hover over each particular pamphlet to find out additional details about each single case.
Dividing the subject data between categories for mapping and labels was not a wholly satisfactory solution; in one case insufficient data left a considerable gap in our map. We had planned to include “serial murder” in the category map, but after making the adjustments to the data, we realized the name appeared only on the legend, but no corresponding pamphlets were visible on the map. We searched our data and found that each of the “serial murder” pamphlets in our subset also, ironically, had undetermined locations; even though the individual pamphlet was likely produced in a unique place, the multiple locations of the serial crimes was coincidentally mirrored by the undetermined location data, making them unplottable. The example of the tag “Serial Murder” pamphlets re-emphasizes the distortions created in the map by limitations in our data. While “Murder and Murderers” is one of the most significant categories we mapped, with serial cases no small portion of the larger category, these serial murder pamphlets became unrepresentable because they lacked geolocation data, creating an unavoidable gap in our project, and the eschewal of certain stories in our collection.
Making the Maps
Once our datasets were prepped and put into a format legible to CartoDB, we uploaded both the full set and subset spreadsheets onto our shared account (see Appendix Image F). To prepare for this stage, we had previously experimented with different maps in order to explore the affordances and limitations they offered by displaying our data. Next, it was simply a matter of connecting the datasets to new maps. We thus used the “wizard” feature (CartoDB’s term for each available preset) to create maps that made a particular facet of our data legible yet also visually accessible (see Appendix Image G).
We both wanted to showcase a variety of maps, so we used four different presets in our maps: “intensity,” “simple,” “category,”, and “choropleth.” “Intensity” was used to show popular printing areas of the full data set. The setting changes the color of the plot points depending on the frequency of the coordinates’ appearance in the dataset. In our map, for example, a red plot points signifies a location with a relatively high number of instances in our dataset, thus allowing viewers to understand that red plot points are more popular printing areas in our project. The “Simple” setting was used to compare our specific subset with the greater fullset. It encourages the viewer to focus the actual location of each point since it does not complicate the map with a wide variety of colors and an extensive legend, making it fitting for us to use for the more straightforward map. In order to allow viewers to easily compare the two data sets, we connected the map to both the subset and the full set data and simply changed the colors of the points to illustrate which pamphlets only occurred in the full set and which ones were also included in the subset. To map the tags that appear in the subset, we used the “Categories” setting, which automatically generates a multitude of colors to represent each classification that occurs in the selected data column and creates a corresponding legend. In our case, we used “Categories” to map the tags column in our subset that described the crime detailed in each pamphlet, allowing viewers to see which type of crime occurred where. Finally, we chose “Choropleth” to illustrate the different publication years in our subset. This setting divides the data in the selected column into “buckets” (segments) and assigns each one a particular color. “Choropleth” appeared to be better suited for representing a range of years than “Intensity” because of this ability to create easily visible time periods through the use of “buckets” rather than overwhelming viewers with a full color spectrum. We chose to divide up the subset data into three “buckets” each encompassing approximately fifty years, which allows for viewers to quickly approximate when the specific pamphlet was published.
However, we did run into some difficulty in the historical map that we created due to limited supply of applicable maps and CartoDB’s restrictions on image size. The selection of maps of the German Empire from the eighteenth and nineteenth centuries at the Michigan State University Map Library was relatively limited. Numerous maps only showed a small portion of the area and several others were awkwardly shaped or too large to be scanned in one piece and would thus need to be stitched together. The map we ultimately selected had never been scanned before, and the head of the Michigan State University Map Library, Kathleen Weessies, allowed us to choose the resolution quality we wanted. Because we wanted viewers to be able to clearly see the map if they zoomed in and per Weessies’ recommendation, we chose the map to be scanned at 300 dpi. The scan of the image turned out to be exactly what we wanted, but we then encountered yet another problem with CartoDB. The website requires each image imported as a baseline map to be less than 1MB and smaller than 1024x1024 pixels, and thus would not accept our 6.2MB, 5159x3659 pixel image. Ultimately, the historic map embodies many of trade-offs present in our entire project: our resizing efforts were successful and viewers can see what the data looks like on a map of Germany in 1759, but this map is not the high-resolution, sophisticated map that we hoped for.
Overall, our goal was to bring out the aesthetics and clarity of the maps, such as color choices that seemed appropriate for each category on our tags map and the number of different color categories to represent each time bracket in our years of publication map (see Appendix Image H). However, it is impossible for our visualizations to fully represent the data because of the process of selection and prioritization that is necessary in “thick mapping.” The specific implications and problems created by these maps will be discussed in the section titled “Implications of Mapping.”
Reading the Maps
Each of the maps presents a particular facet of our data. The map that layers the subset of pamphlets digitized in our project over the whole set shows the geographical distribution of the pamphlets that are included in our Omeka site relative to the larger set. While the collection contains pamphlets from throughout Germany and even some from surrounding countries and the US, this map allows us to see that the subset of pamphlets represent a smaller region, drawn largely from southern Germany, with only one pamphlet drawn from outside its borders in France. While representing the geographic distribution however, this map silences the chronology of the pamphlets, plotting them all simultaneously and giving the impression that they were contemporary. There is a similar challenge with the map in which we represented the pamphlets by city of publication. The map shows frequent cities in a darker red color, making the popular printing sites evident on the map, like Leipzig and Nuremberg. Even in the data it represents, however, this map is limited. Two of the cities, Frankfurt and Leipzig, occur frequently together as dual publication sites of the same documents within the larger collection of pamphlets. Yet this frequency does not appear on the map, since none of these pamphlets with multi-city publication are included in the subset.
Implications of Mapping
The maps we created are an attempt to represent our data visually, but any representation is always only a selection of the available data. Precisely this conundrum is what Presner and his colleagues refer to and struggle with in Hypercities: our maps tell a story about the “Poor Sinner’s Pamphlets,” but they tell a partial story. CartoDB itself reflects this tension. Within the software, users can choose between “Map View”, which displays the visual story that is being created, and “Data View”, which display the underlying data. The map attempts to represent the data, but, as the data itself never actually represents the documents themselves, it is still only a representation, a visual stand-in for another mode of information. Moreover, as we worked to render our data compatible with the program, we were also forced to continually winnow it down, leaving out things that we could not adequately represent: the “undetermined” locations, the partial years, the serial murder pamphlets that lacking a location, lacked the possibility of mapping at all. Indeed, digital mapping foregrounds the tension in the act of representation itself: what we gain in the creation of a visual story comes at the cost of some portion of the data.
Our maps thus not only represent visually the data, but also serve as a reminder of the gaps created by limitations within the data and within the program used to make it visible. CartoDB offers many “wizards” for representing data, but most of them rely upon single variables: for example, “Choropleth” plots chronology, “Intensity” plots quantity, and “Category” plots differentiation. Yet this complexity comes at the cost of functionality. To plot multiple variables requires layering which can make data more difficult to read – one of our initial attempts to create a layered map showing the subset above an intensity-mapped full dataset proved to be unclear. Likewise, our use of the historical map required us to minimize the file size, severely restricting its manipulability. Our own data also posed problems, as well. Much of the dataset (both for the Criminology Collection and for the subset presented in our project) contained numerous gaps. In order to plot the data at all, we had to assign item numbers arbitrarily to each data point. Large portions of temporal data were “undetermined,” and even the sheer size of the full dataset was a deterrent, as it kept us from creating a subject tagged map of the documents. Insofar as the maps intend to reveal the data, they also hide it. This fact is perhaps most apparent in our map layering the data subset represented on our Omeka site over the full data set. This map clearly shows that the subset is problematically more geographically heterogeneous than the full set, making a widespread collection appear heavily concentrated in southern Germany. Yet this same map has silently eliminated many “undetermined” entries that might have significantly altered this conclusion, had their locations been legible.
At the same time, the maps also allow for new forms of synthesis of the information, and privilege a view of the collection unavailable in the archive, where perusal of the documents requires shuffling through hundreds of pages. Our collection goes beyond the traditional definition of the archive, which would privilege the maintenance of the Criminology Collection within its original context and as a whole. Digital mapping rearranges this context, allowing scholars to experience the collection through new matrices of relation. The ability to select a subset from the larger collection made the project possible, since the scope of the documents would have otherwise been prohibitive.
Hypercities describes “thick mapping” as “the processes of collecting, aggregating, and visualizing ever more layers of geographic or place-specific data,” layers that often complicates traditional knowledge categories (Presner et al. 16). Mapping the “Poor Sinners’ Pamphlets” illustrates the layering at work here, which both reveals and conceals information. The map organized to represent pamphlets according to category is simultaneously flattening chronology. Events chronicled in our collection took place across at least two centuries, but on the map they appear plotted all in a single moment in time; reading the chronology is dependent on the interaction of the viewer, hovering over the individual pamphlets. Likewise, the map that looks at publication by city appears chronologically heterogeneous, unable to show if a certain popularly publishing city had a steady rate of publication over time or a sudden explosion over a 30-year period. Drawing on new trends in digital mapping and “thick mapping” that embrace the possibility of visualizing history and place in new, non-linear ways, the maps we generated as part of “The Poor Sinners’ Pamphlets” render a historical collection newly accessible to scholars as well as explore the possibility and limits of mapping the past onto the present, of experiencing the boundaries of representation itself. The pamphlets, as sensationalized documents, historically publicized the infamy of their subjects, yet through the digital collection, these individuals, many still unnamed, are being offered a form of new representation in the present. We cannot tell their stories for them, but we can tell a story that remembers them, that plots them in time and place as part of history, and now as new scholarship, as part of our present digital era.
Presner, Todd, David Shepard and Yoh Kawano. Hypercities: Thick Mapping in the Digital Humanities. Harvard: Harvard UP, 2014. Print.