eprintid: 1993 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/00/19/93 datestamp: 2023-11-09 15:50:10 lastmod: 2023-11-09 15:50:10 status_changed: 2023-11-09 15:41:47 type: article metadata_visibility: show creators_name: Zamin, N. creators_name: Oxley, A. title: Building a corpus-derived gazetteer for named entity recognition ispublished: pub keywords: Computational costs; Document Representation; Entity identification; Essential component; Filtering technique; Gazetteer; Information Extraction; Named entity recognition; NAtural language processing; Rule based; Stumbling blocks, Computational linguistics; Learning algorithms; Software engineering, Natural language processing systems note: cited By 5; Conference of 2nd International Conference on Software Engineering and Computer Systems, ICSECS 2011 ; Conference Date: 27 June 2011 Through 29 June 2011; Conference Code:85603 abstract: Gazetteers, or entity dictionaries, are an important element for Named Entity Recognition. Named Entity Recognition is an essential component of Information Extraction. Gazetteers work as specialized dictionaries to support initial tagging. They provide quick entity identification thus creating richer document representation. However, the compilation of such gazetteers is sometimes mentioned as a stumbling block in Named Entity Recognition. Machine learning, both rule-based and look-up based approaches, are often used to perform this process. In this paper, a gazetteer developed from MUC-3 annotated data for the 'person named' entity type is presented. The process used has a small computational cost. We combine rule-based grammars and a simple filtering technique for automatically inducing the gazetteer. We conclude with experiments to compare the content of the gazetteer with the manually crafted one. © 2011 Springer-Verlag. date: 2011 official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-79960375105&doi=10.1007%2f978-3-642-22191-0_6&partnerID=40&md5=91d7dd8f8824c47f81a3a320f495ffa0 id_number: 10.1007/978-3-642-22191-0₆ full_text_status: none publication: Communications in Computer and Information Science volume: 180 CC number: PART 2 place_of_pub: Kuantan pagerange: 73-80 refereed: TRUE isbn: 9783642221903 issn: 18650929 citation: Zamin, N. and Oxley, A. (2011) Building a corpus-derived gazetteer for named entity recognition. Communications in Computer and Information Science, 180 CC (PART 2). pp. 73-80. ISSN 18650929