python - Checking the language of extracted trends from twitter -
i extracting top hashtags twitter using tweepy module in python. there 1 major problem face, wish check if tag in english or not. tags not in english should removed.
example:
tags=['askorange','charlestonshooting','replytoasong','uberlive','otecmatkasyn']
should not have otecmatkasyn
.
what need use language detector api. 1 one offered google, not free. option language detection api.
after choose best api you, you'd need parse text makes sense sentence. example, tag 'askorange'
must split read 'ask orange'
. can iterate on each character of string, check if uppercase , insert space there:
new_tags = [] tag in tags: new_word = tag uppercases = 0 # in case sentence has several uppercases in xrange(1, len(tag)): if tag[i].istitle(): new_word = new_word[:i+uppercases] + ' ' + new_word[i+uppercases:] uppercases = uppercases + 1 new_tags.append(new_word)
finally, send list of new_tags
api detect language.
Comments
Post a Comment