Python encoding issue when trying to parse JSON tweets -
i trying parse out tweet , username sections of json object returned twitter using following code:
class listener(streamlistener): def on_data(self, data): all_data = json.loads(data) tweet = all_data["text"] username = all_data["user"]["screen_name"] c.execute("insert tweets (tweet_time, username, tweet) values (%s,%s,%s)" , (time.time(), username, tweet)) print (username, tweet) return true def on_error(self, status): print (status) auth = oauthhandler(ckey, csecret) auth.set_access_token(atoken, asecret) twitterstream = stream(auth, listener()) twitterstream.filter(track = ["lebron james"])
but following error. how can code adjusted decode or encode response properly?
traceback (most recent call last): file "c:/users/sagars/pycharmprojects/youtube nlp lessons/twitter stream db.py", line 45, in <module> twitterstream.filter(track = ["lebron james"]) file "c:\python34\lib\site-packages\tweepy\streaming.py", line 428, in filter self._start(async) file "c:\python34\lib\site-packages\tweepy\streaming.py", line 346, in _start self._run() file "c:\python34\lib\site-packages\tweepy\streaming.py", line 286, in _run raise exception file "c:\python34\lib\site-packages\tweepy\streaming.py", line 255, in _run self._read_loop(resp) file "c:\python34\lib\site-packages\tweepy\streaming.py", line 309, in _read_loop self._data(next_status_obj) file "c:\python34\lib\site-packages\tweepy\streaming.py", line 289, in _data if self.listener.on_data(data) false: file "c:/users/sagars/pycharmprojects/youtube nlp lessons/twitter stream db.py", line 36, in on_data print (username, tweet) file "c:\python34\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] unicodeencodeerror: 'charmap' codec can't encode characters in position 0-8: character maps <undefined>
unfortunately problem information twitter not utf-8
encoded, causing charmap
error. fix that, you'll need encode it.
tweet = all_data["text"].encode('utf-8') username = all_data["user"]["screen_name"].encode('utf-8')
this cause lose of emoji , special characters show in tweet, converted \x899
. if need information (i discard myself) sentiment analysis, you'll need install package pre-compiled list convert them accordingly.
Comments
Post a Comment