Control, Computers and Informatics
Аuthors
e-mail: sizikov@mail.ru
Abstract
New data structures as well as text indexing algorithm for appropriate context search in large text collections is suggested in the paper. Efficient support of approximate string matching search in texts (string matching with k-errors) is the main feature of the algorithm. The proposed algorithm is characterized by high search speed for requests with complex regular expressions which contains large amount of atomic patterns. A new index file type called compressed inverted-signature file is introduced for compact storage of indexing data. A new dynamic structure named approximate union key is offered to solve an approximate string matching task for index vocabulary. The compressed inverted-signature file with approximate union key can be used as a kernel of next generation text retrieval systems with flexible query language to allow advanced text search.
mai.ru — informational site of MAI Copyright © 1994-2024 by MAI |