Information Retrieval also is the science of searching for the following:
- documents
- information within documents and
- for metadata about documents
- as well as that of searching relational data bases and the
- world wide web
Information Retrieval is Interdisciplinary. It is based on:
- Computer Science
- Mathematics
- Library Science
- Information Science
- Information Architecture
- Cognitive Psychology
- Linguistics
- Statistics and
- Physics
When a user enters a query into the system, it is then considered as the beginning of information retrieval process.
WHAT ARE QUERIES?
Queries are official statements of information needed in information retrieval. Through this, a query does not identify, individually, a single object in the group. As a substitute, numerous objects may match the query. It can also match different degrees of relevancy.
A certain object is a unit which stores information in a database. The user's queries are then matched with objects so as to keep it in the database. Usually, the documents are not stored directly in the IR System. Document surrogate is the name of the representation of the system.
Many IR systems use a numeric score so as to compute how well each object matches the query. It ranks the objects according to this value. The top ranking object will then be apparent. It may then be iterated if you want to refine the query.
Recall is the sensitivity in a binary arrangement. There is a chance that a relevant document can be retrieved by the query.
To answer any query, it is insignificant to achieve a recall of 100%. It means the recall alone is not enough. It also needs to measure the number of non-relevant documents.
Many use the IR Systems such as:
- universities
- public libraries
- web
There are proposals of different measures so as to evaluate information retrieval systems. It requires a collection of documents and a query. All measures assume the truth concept of relevancy. It may be known either as non-relevant or relevant to the query. Queries may be ill-posed. There may be different shades of relatedness.
The portion of the documents retrieved which are related to the user's information need is called PRECISION.
Precision is similar to the positive predictive value within a binary classification. Precision takes all the retrieved documents. It can be assessed at a specified cut-off rank. It needs to consider only the top results returned by the system. This measure is called precision at n or P@n. Precision in Information Retrieval is different from accuracy and precision of other branches of science and technology.
A Fall-out is closely related to “specificity” in a binary classification.
To be exact: fall-out is equal to 1 minus specificity.
It is the chance that a non-relevant document can be retrieved by the query. It is insignificant to achieve fall-out of 0%. It is also done by returning zero documents in response to any query.
The F1 measure is the weighed harmonic means of precision and recall. It is the conventional F-measure or balanced F-score.
Two other commonly used F measures are:
1. F2 measure - weighs recall twice as much as precision
2. F0.5 measure – weighs precision twice as much as recall.
Van Rijsbergen derived the F-measure. It is to measure Fβ's effectiveness on retrieval in connection with a user who assigns β times as much weight to retrieve as precision.
It relies on the effectiveness measure of Rijsbergen's:
E = 1 – (1/(a/P + (1 – a) / R )).
Their relationship stands as Fβ = 1 – E
Where in: α = 1 / (β2 + 1)
The basis of precision and recall are on the complete records of document returned by the system. The normal precision emphasizes on returning more related documents earlier. It is average precision that is computed after truncating the list and following the related documents which in turn:
r indicates the rank
N indicates the number recalled
rel() indicates a binary function regarding the importance of a specified rank, and
P() indicates precision at a specified cut-off rank.