Text Analytics
July 23, 2014 Leave a comment
I have a table docquery with columns docid,term,count.
This represents how many times a term occurs in a document denoted by docid. I pick a set of search terms from the same table and find out which of these terms occur the most number of times in other documents.
I am not excluding the document(9_txt) in which the search term occurs.
I am sure this query can be simplified.
Sample data
1_txt|industrys|1 2_txt|follow|1 3_txt|yes|1 4_txt|deferring|1 5_txt|wilf|1 6_txt|future|2 7_txt|reason|1 8_txt|dropped|2 9_txt|doesnt|2 9_txt|yes|2
select f1d,sum(s) sim FROM (select f1d,f1t,sum(f1c) s FROM (select f1.docid f1d,f1.term f1t,sum(f1.count) f1c FROM docquery f1 WHERE f1.term IN (select term from docquery WHERE docquery.docid = '9_txt') GROUP BY f1.docid,f1.term) GROUP BY f1d ORDER BY s) GROUP BY f1d ORDER BY sim;