Text Analytics
July 23, 2014 Leave a comment
I have a table docquery with columns docid,term,count.
This represents how many times a term occurs in a document denoted by docid. I pick a set of search terms from the same table and find out which of these terms occur the most number of times in other documents.
I am not excluding the document(9_txt) in which the search term occurs.
I am sure this query can be simplified.
Sample data
1_txt|industrys|1 2_txt|follow|1 3_txt|yes|1 4_txt|deferring|1 5_txt|wilf|1 6_txt|future|2 7_txt|reason|1 8_txt|dropped|2 9_txt|doesnt|2 9_txt|yes|2
select f1d,sum(s) sim FROM
(select f1d,f1t,sum(f1c) s FROM
(select f1.docid f1d,f1.term f1t,sum(f1.count) f1c
FROM docquery f1
WHERE f1.term IN (select term from docquery
WHERE docquery.docid = '9_txt')
GROUP BY f1.docid,f1.term)
GROUP BY f1d
ORDER BY s)
GROUP BY f1d
ORDER BY sim;