Python encoding issue when trying to parse JSON tweets -


i trying parse out tweet , username sections of json object returned twitter using following code:

class listener(streamlistener):    def on_data(self, data):           all_data = json.loads(data)           tweet = all_data["text"]           username = all_data["user"]["screen_name"]            c.execute("insert tweets (tweet_time, username, tweet) values (%s,%s,%s)" ,                     (time.time(), username, tweet))           print (username, tweet)           return true    def on_error(self, status):       print (status)  auth = oauthhandler(ckey, csecret) auth.set_access_token(atoken, asecret) twitterstream = stream(auth, listener()) twitterstream.filter(track = ["lebron james"]) 

but following error. how can code adjusted decode or encode response properly?

traceback (most recent call last):    file "c:/users/sagars/pycharmprojects/youtube nlp lessons/twitter stream db.py", line 45, in <module>     twitterstream.filter(track = ["lebron james"])   file "c:\python34\lib\site-packages\tweepy\streaming.py", line 428, in filter     self._start(async)   file "c:\python34\lib\site-packages\tweepy\streaming.py", line 346, in _start     self._run()   file "c:\python34\lib\site-packages\tweepy\streaming.py", line 286, in _run     raise exception   file "c:\python34\lib\site-packages\tweepy\streaming.py", line 255, in _run     self._read_loop(resp)   file "c:\python34\lib\site-packages\tweepy\streaming.py", line 309, in _read_loop     self._data(next_status_obj)   file "c:\python34\lib\site-packages\tweepy\streaming.py", line 289, in _data     if self.listener.on_data(data) false:   file "c:/users/sagars/pycharmprojects/youtube nlp lessons/twitter stream db.py", line 36, in on_data     print (username, tweet)   file "c:\python34\lib\encodings\cp1252.py", line 19, in encode     return codecs.charmap_encode(input,self.errors,encoding_table)[0] unicodeencodeerror: 'charmap' codec can't encode characters in position 0-8: character maps <undefined> 

unfortunately problem information twitter not utf-8 encoded, causing charmap error. fix that, you'll need encode it.

tweet = all_data["text"].encode('utf-8') username = all_data["user"]["screen_name"].encode('utf-8') 

this cause lose of emoji , special characters show in tweet, converted \x899. if need information (i discard myself) sentiment analysis, you'll need install package pre-compiled list convert them accordingly.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -