An Information Retrieval Model Based on Discrete Fourier Transform

Alberto Costa (Laboratoire d'Informatique de l'Ecole Polytechnique, FR) and Massimo Melucci (Department of Information Engineering, University of Padua, IT)


Abstract
 

Information Retrieval (IR) systems combine a variety of techniques stemming from logical, vector-space and probabilistic models. This variety of combinations has produced a significant increase in retrieval effectiveness since early 1990s. Nevertheless, the quest for new frameworks has not been less intense than the research in the optimization and experimentation of the most common retrieval models. This paper presents a new framework based on Discrete Fourier Transform (DFT) for IR. Basically, this model represents a query term as a sine curve and a query is the sum of sine curves, thus it acquires an elegant and sound mathematical form. The sinusoidal representation of the query is transformed from the time domain to the frequency domain through DFT. The result of the DFT is a spectrum. Each document of the collection corresponds to a set of filters and the retrieval operation corresponds to filtering the spectrum - for each document the spectrum is filtered and the result is a power. Hence, the documents are ranked by the power of the spectrum such that the more the document decreases the power of the spectrum, the higher the rank of the document. This paper is mainly theoretical and the retrieval algorithm is reported to suggest the feasibility of the proposed model. Some small-scale experiments carried out for testing the effectiveness of the algorithm indicate a performance comparable to the state-of-the-art.
 

Laboratoire d'Informatique de l'Ecole Polytechnique

Department of Information Engineering, University of Padua