Text Analytics

I have a table docquery with columns docid,term,count.

This represents how many times a term occurs in a document denoted by docid. I pick a set of search terms from the same table and find out which of these terms occur the most number of times in other documents.

I am not excluding the document(9_txt) in which the search term occurs.

I am sure this query can be simplified.

Sample data

1_txt|industrys|1
2_txt|follow|1
3_txt|yes|1
4_txt|deferring|1
5_txt|wilf|1
6_txt|future|2
7_txt|reason|1
8_txt|dropped|2
9_txt|doesnt|2
9_txt|yes|2
select f1d,sum(s) sim FROM
(select f1d,f1t,sum(f1c) s FROM
(select f1.docid f1d,f1.term f1t,sum(f1.count) f1c
        FROM docquery f1
                WHERE f1.term IN (select term from docquery
                                        WHERE docquery.docid = '9_txt')
        GROUP BY f1.docid,f1.term)
GROUP BY f1d
ORDER BY s)
        GROUP BY f1d
        ORDER BY sim;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: