On April 26, Lindsay Van Tine and Rachel Buurma from the Early Novels Database project team joined us at WORD LAB to talk about the project’s history and future. The END project is undergraduate-driven and works to create rich metadata for specific copies of early fictional works. They engage in 1) cataloging (data collection), 2) project development (data analysis), and are paid for their work. They have a “40/60 day” where they devote time to learning about the field and their work as well as engaging in END tasks. This post covers more detail about the project as well as examples of student activities and Q&A.
The END team involves members from many academic roles at Penn and Swarthmore, as well as Bryn Mawr, Haverford, and other area liberal arts colleges; this summer they will expand to NYU. They recruit students from all of these institutions to work for 10 weeks during the summer on the END project.
Students create rich metadata for objects as part of the project, defined as “things not captured in traditional library catalogs.” The ultimate goal, as the team members put it, is “new entryways into early modern fiction.” The students work on MARC XML records with metadata enhancement, and collaborate with traditional rare book catalogers. However, their work is not quite the same. For example, they enter information about “author claims” when the author is unclear: traditional library records would not contain information like gender, translation, etc., claims within the text rather than an attribution, when none is available. Their records are intended to be both machine- and human-readable, facilitating new possibilities for computer-assisted research without foreclosing direct human interaction with the metadata.
Allowing for students’ intellectual investment leads to rewards for everyone involved. They try to create an experience that students are excited about, so they can be productive and produce high quality data while also learning and developing themselves. The undergraduates, Rachel and Lindsay think, are perhaps the perfect candidates for a fresh, excited perspective on the work. The students also see that they are making an impact on a long-term project, “engaged in real scholarship,” and see the cataloging as the most fulfilling part even as they are excited about working on their individual projects.
END sees the student catalogers as full collaborators, and prides itself on being a teaching and learning endeavor. Rachel and Lindsay see student work as a laboratory — “it’s real research.” The students engage in their own projects as well as taking on cataloging tasks for the larger database. The goals include research, publications, and collaborations — not just “student work.”
For example, one student looked at discrepancies in the 1760s between what books call themselves and what we would regard them now; their titles may boast that they are “novels” or not, and they may have been forgotten as novels over time. The student ended up creating Vine videos to promote the forgotten novels.
In terms of collaboration, Andrew Piper is interested in auto-detecting footnotes and the END project is going to provide a data set input by students based on page images, rather than OCR — they hope it will result in a publication with students now working on their PhDs, plus other involved students. There is also the 18th-century fiction syllabus project, a team project that examines what we teach and how canonical it really is. Because END contains many more novels than what are actually taught, it can help support this work. (As an aside, the project found that what is taught is less canonical than assumed, and they’re now looking at Open Syllabus Project data, at how a smaller hand-curated set of 18th-century fiction syllabi compares to the larger set.)
The team considers END to be a public service and is actively looking for new collaborations with a variety of partners and audiences. They hope to engage with historical/speculative fiction writers and fan communities in the future, and to think about what fiction is and what implications it has for the present. “How can fiction ‘live’ in the digital world?” They see a collaborative opportunity with Rosenbach digitizing their collections as well.
They have some ongoing data set and data sharing issues: they’re using a custom interface but are moving to Blacklight, and want it to be as interconnectable as possible and thus facilitate collaboration. One big question related to that is how to get an interoperable data format. MARC is a standard for libraries, but it’s not hierarchical and doesn’t work as well for more information about copy-specific markings and multiple editions. They often have proliferating subfields but want to nest those relationships.
There’s a question of third-party services used in the project: will they continue to be supported, and how reliable are they? For example, Molly noted that Flickr appears to be on the out with Yahoo! and the END uses Flickr for its footnotes component; in addition, they’ve been told by IT departments to “just use Dropbox” for collaboration instead of coming up with their own (or more robust) solution. Because it’s a cross-institutional project, collaboration, sharing, and communication is really important. On top of it, everyone needs access and what do you decide you preserve at a high, library-quality preservation standard? There are a lot of questions around collaboration and access that they’re still working out.
Another question came up from Molly and Brian about “objectivity” and cataloging. For example, a “normal” book is not normal for the rest of the world, and subject headings change over time; librarians are also held to a certain standard of “objectivity” in determining headings. (Molly’s example was that you can’t label a book that you think is a conspiracy theory that way, but can if it’s about conspiracy theories. This was also at the time when Congress was attempting to change a Library of Congress subject heading to “illegal aliens.”) Subject headings do require interpretation. The END team members noted that we’re not stuck with a card anymore, so they can move beyond LOC subject headings (or a limited number of them) when creating richer metadata.
“How is ‘the novel’ defined?” was another WORD LAB member question. The team members explained that while a lot of recent work has been historicizing literary genre in the 18th century these days, it’s “a fiction project” and they are okay with using the genre “novel” to describe the works they fit into that framework. They select the works based on LOC call numbers and then make a judgment about whether it’s fiction or not. They’re currently drawing from two core collections at Penn and are interested in the composition of the collection itself, rather than an “ideal corpus.”
The END team described themselves as “a slow project” in response to a question about pedagogy and efficiency (training and getting up to speed). The less they try to make the students efficient, the happier and more productive they are! They have them spend a good deal of time with the objects and encourage learning. They’re more focused on pedagogy than producing data and publication or “results”/”outputs.” Being at a liberal arts college like Swarthmore encourages this style. They also described the project as “valu[ing] students’ amateurism;” they’re not producing professional records even though they’re being trained by professionals. They have a “fresh and untrained view” and are using “process-based knowledge” and making subjective decisions.
“Are we teachers or project managers?” they posed. They encourage students to spend a half day cataloging and a half day researching, so they can connect things more quickly. There’s also a casual blogging component so students are making connections through writing as well. In other words, project management is part of teaching and vice-versa.
We also had questions about institutional support, and the team responded that the funding and support comes from different centers and institutions, which are a consortium with a wider network: in other words, it’s a collaboration. They also have significant buy-in from Penn Libraries.