In: Computer Science
Write a Python program that extracts 1000 unique links from Twitter using Tweepy. How can I filter out all links with Twitter domains and shortened links?
Here is the python program which will extracts 1000 unique links from Twitter using Tweepy and filter out all links with Twitter domains:-
Extraction.py
#Importing tweepy,json libraries with needed methods
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
#For accessing the Twitter API
main_token = "YOUR_TOKEN" #Enter here your different access tokens in all 4 fields
main_token_secret = "SECRET_TOKEN" #Tokens can be found on twitter Developer auth
key = "CONUMER_KEY"
key_secret = "CONUMER_SECRET_KEY"
#Extracting only 1000 tweets
total = 1000
#For printing received tweets
class StdOutListener(StreamListener):
def on_data(self, data):
#Twitter data in JSON Format
decoded = json.loads(data)
# extraction count
global total
if total <= 0: #In case of total count is 0 or less then it will
exit
import sys
sys.exit()
else:
try:
for url in decoded["datas"]["links"]:
total = total - 1
print total, ':', "%s" % url["full_link"]
except KeyError:
print decoded.keys()
def on_error(self, status):
#Printing status of getting success or not
print status
if __name__ == '__main__':
#Connection with twitter using your own account values
l = StdOutListener()
auth = OAuthHandler(key, key_secret)
auth.set_access_token(main_token, main_token_secret)
stream = Stream(auth, l)
#filter out all links with Twitter domains and you can add more shortened links
stream.filter(track=['Twitter'])
We can I filter out all links with Twitter domains and shortened links using this syntax in python:
stream.filter(track=['Twitter'],[shortened extensions])
Note: you need to specify various shortened link extensions to get filtered.
Hope you will like my work, please consider a thumbs-up or hit that like button to motivate me>3?(Hope you will do that)