An effective context search and text indexing based on compressed inverted-signature file and approximate union key

Control, Computers and Informatics


Аuthors

Sizikov E. V.

e-mail: sizikov@mail.ru

Abstract

New data structures as well as text indexing algorithm for appropriate context search in large text collections is suggested in the paper. Efficient support of approximate string matching search in texts (string matching with k-errors) is the main feature of the algorithm. The proposed algorithm is characterized by high search speed for requests with complex regular expressions which contains large amount of atomic patterns. A new index file type called compressed inverted-signature file is introduced for compact storage of indexing data. A new dynamic structure named approximate union key is offered to solve an approximate string matching task for index vocabulary. The compressed inverted-signature file with approximate union key can be used as a kernel of next generation text retrieval systems with flexible query language to allow advanced text search.

mai.ru — informational site of MAI

Copyright © 1994-2024 by MAI