A text miner’s revelation: how historians use text

As a text miner looking to the humanities as a source of interesting problems, I need to know how “humanities researchers” use text. So I went to Great Lakes THATCamp in March (2010) to find out. I had conversations with about 20 historians, anthropologists, archeologists, political scientists, archivists, librarians, and others, and eavesdropped on many, many more, and managed to characterize their use of text into two broad categories.

My findings may not be surprising to humanists and social scientists, but I hope it will be informative to my fellow techies, who have only the fuzziest notions of what humanities researchers do all day. I’ve drawn heavily on my conversations with historians, because we’re now working to develop a text mining tool with some of them at UC Berkeley.

Humanities researchers use text in two ways. The first is to get an idea of what’s out there, in a way common to all researchers in all fields. The second is as evidence – what traces might have an event, personal characteristic, impression, or anything else, have left in textual records from around a time?

In the first domain, they have the same questions of “the literature” as any other researcher – which are the good books or papers to read? Who are the people working in this area? What are the current opinions and approaches? Where did I read this idea? Where did I see this quote? This process is know as orienteering in the information seeking literature, which my excellent advisor gives an overview of here.

Finding new and better ways to support the orienteering process is an active research area. Different aspects of the problem have been tackled by text mining, natural language processing, and information visualization. The “Previous Work” section of this paper has an overview of the high points.

In the second domain, evidence, they treat text like something out of a detective show. They have a hypothesis in mind and examine all the text they can lay their hands on for traces of evidence relevant in any way.

They might track the language around a term over time, find a change in the way a concept is discussed, observe the way people express their thoughts, or if they are lucky find an original document confirming or denying their hypothesis.

The second use case is more specific to the humanities than the first. Supporting it means giving researchers the ability to ask highly specific and structured natural language processing questions such as “what phrases were used to describe this entity, and how did their use change over time?”. Right now, I’m very interested in what kinds of tools we can build to help humanities researchers ask these kinds of questions of a large collection of text.

Update: As Lincoln Mullen pointed out, researchers tend to use secondary texts for orienteering, and primary texts for evidence. These are very different kinds of text, and secondary sources tend to be much more available in digital collections. With primary sources, OCR, encoding, and availability make getting to the “text” stage of text mining quite a bit harder.

A text miner’s revelation: how historians use text

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Praye – Wodin (Throwback Music)

David Perell - Write of Passage 2025

Sarah Samis, Emil Bove III

Karimnagar District Police Office Mobile Numbers List in Telangana State

Three walk

In the context of Data Services an unknown internal server error occured

Oracle GoldenGate 12c New Features: Trail Encryption and Credentials with...

Lauren Alaina – All My Exes (feat. Chase Matthew) – Single [iTunes Plus M4A]

[GET] Fabian Markl – AI Automations and Agent Templates

Gabriela Bee & Powfu – Blue – Single [iTunes Plus M4A]

99 God Status for Whatsapp, Facebook

charmilles roboform E998

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

Mp3 Download: Mdu - Mazola

Firefighters attend car crash in Melton Mowbray

Woman's body found on Lincolnshire beach

2014 kambi phone calls recording Mp3 Audio clips surabila yamangal

Download: Rich Bizzy -Panono Ukwenda (Cover)

The 10 Tennessee Cities With The Largest Black Population For 2021