Jan Pokorny - Automatic Subject Indexing and Classification Using Text Recognition and Computer-Based Analysis of Tables of Contents

elpub:4607 - ELectronic PUBlishing, June 20, 2018, Connecting the Knowledge Commons: From Projects to Sustainable Infrastructure - https://doi.org/10.4000/proceedings.elpub.2018.19
Automatic Subject Indexing and Classification Using Text Recognition and Computer-Based Analysis of Tables of ContentsArticle

Authors: Jan Pokorny 1

  • 1 ENKI, o.p.s.

This paper will describe a method for machine-based creation of high quality subject indexing and classification for both electronic and print documents using tables of contents (ToCs). The technology described here is primarily focused on electronic and print documents for which, because of technical or licensing reasons, it is not possible to index full text. However, the technology would also be useful for full text documents, because it could significantly enhance the accuracy and relevance of subject description by analyzing the structure of ToCs.


Volume: Connecting the Knowledge Commons: From Projects to Sustainable Infrastructure
Section: Long Papers
Published on: June 20, 2018
Accepted on: June 20, 2018
Submitted on: June 20, 2018
Keywords: text mining, computer-generated subject headings, computer-generated keywords, machine learning system, library automatization, [ SHS.INFO ] Humanities and Social Sciences/Library and information sciences

2 Documents citing this article

Consultation statistics

This page has been seen 255 times.
This article's PDF has been downloaded 1077 times.