python - Checking the language of extracted trends from twitter -
i extracting top hashtags twitter using tweepy module in python. there 1 major problem face, wish check if tag in english or not. tags not in english should removed.
example:
tags=['askorange','charlestonshooting','replytoasong','uberlive','otecmatkasyn'] should not have otecmatkasyn.
what need use language detector api. 1 one offered google, not free. option language detection api.
after choose best api you, you'd need parse text makes sense sentence. example, tag 'askorange' must split read 'ask orange'. can iterate on each character of string, check if uppercase , insert space there:
new_tags = [] tag in tags: new_word = tag uppercases = 0 # in case sentence has several uppercases in xrange(1, len(tag)): if tag[i].istitle(): new_word = new_word[:i+uppercases] + ' ' + new_word[i+uppercases:] uppercases = uppercases + 1 new_tags.append(new_word) finally, send list of new_tags api detect language.
Comments
Post a Comment