Tweet downloading class. The TweetDownloader class contains the main downloading function as well as the storing and plotting functions accessible to the user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
credentials |
str
|
A path pointing to the location of the TwitterAPI credentials file. |
required |
name |
str, optional
|
The name to use when saving downloaded files and exports. The default value 'Project_[date]' with the current date in %m%d%Y_%H%M%S format. |
None
|
output_folder |
str, optional
|
Path to the folder in which saved information is going to be stored. It defaults to the current location |
''
|
Attributes:
| Name | Type | Description |
|---|---|---|
credentials |
str
|
A path pointing to the location of the TwitterAPI credentials file. |
name |
str, optional
|
The name to use when saving downloaded files and exports. |
output_folder |
str, optional
|
Path to the folder in which saved information is going to be stored. |
tweets |
list
|
List of pages of the response tweet object obtained from Twitter API calls. |
authors |
list
|
List of pages of the response authors object obtained from Twitter API calls. |
places |
list
|
List of pages of the response location object obtained from Twitter API calls. |
replies |
list
|
List of tweets that are replies to the tweets in the tweets attribute |
tweets_df |
pandas.DataFrame
|
Table with the tweets from the attribute tweets |
authors_df |
pandas.DataFrame
|
Table with the authors from the attribute authors |
places_df |
pandas.DataFrame
|
Table with the georreferenced locations from the attribute places |
replies_df |
pandas.DataFrame
|
Table containing replies to the tweets in the tweets_df table |
search_args |
dict
|
Dictionary containing the Twitter keys required to access the API |
timestamp |
str
|
A string to append at the end of saved files, so they all have a timestamp |
get_tweets(query, start_time=None, end_time=None, lang=None, include_retweets=False, place=None, has_geo=True, max_tweets=10, max_page=500, save_temp=True, save_final=True, save_replies=False, include_replies=False, max_replies=10, temp_replies=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
Words to be searched in tweets. Twitter API query operators supported. |
required |
start_time |
str
|
Lower bound of time frame in which tweets are going to be searched in date-time format (default is current date and time minus 24 hours) |
None
|
end_time |
str
|
Upper bound of time frame in which tweets are going to be searched in date-time format (default is current date and time time) |
None
|
lang |
str, optional
|
Two letter code for language to be imposed in retrieved tweets |
None
|
include_retweets |
bool
|
Whether to include tweets that are just a retweet of a previous one (default is False) |
False
|
place |
str, optional
|
Two letter code for country or place in which the search is going to be constraint |
None
|
has_geo |
bool, optional
|
Whether to only include tweets with geographic reference (default is True) |
True
|
max_tweets |
int
|
The maximum amount of tweets to retrieve in total (default is 10) |
10
|
max_page |
int
|
The maximum amount of tweets allowed per tweets page (default is 500) |
500
|
save_temp |
bool
|
Whether to save current progress (default is True) |
True
|
save_final |
bool
|
Whether to save final tweets dataframe after download is over (default is True) |
True
|
save_replies |
bool
|
Whether to include the replies to the downoaded tweets (default is false) |
False
|
max_replies |
bool
|
Maximum amount of replies per tweet if replies are allowed (default is 10) |
10
|
temp_replies |
bool
|
Whether to save progress while downloading replies if these are allowed (default is True) |
True
|
get_replies(max_replies=10, save_temp=True, save_final=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_replies |
int
|
Maximum number of replies for each tweet in the original tweets dataset (default is 10) |
10
|
save_temp |
bool
|
Whether to save progress at each page (default is True) |
True
|
save_final |
bool
|
Whether to save final replies dataset (default is True) |
True
|
tweets_from_csv(path, sep=',', save_temp=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the csv path containing the download parameters |
required |
sep |
str, optional
|
The separator of the csv file (default is ,) |
','
|
save_temp |
bool, optional
|
Whether to save or not progress at each downloaded page (default is True) |
True
|
tweets_to_gdf(geo_type='centroids')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
geo_type |
The type of geometry (default is centroids) |
'centroids'
|
places_to_gdf(geo_type='centroids')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
geo_type |
The type of geometry (default is centroids) |
'centroids'
|
preview_tweet_locations()
interactive_map()
plot_heatmap(radius=20)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
radius |
int
|
The radius of the heatmap plot (default is 20) |
20
|
map_animation(time_unit)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_unit |
Time unit to aggregate by (default is 'day') |
'second'
|
wordcloud(custom_stopwords=None, background_color='black', min_word_length=4, save_wordcloud=True, bar_plot=False, save_bar_plot=False)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
custom_stopwords |
list
|
List of words to exclude from word cloud |
None
|
background_color |
Background color of wordcloud plot |
'black'
|
|
min_word_length |
int
|
Minimum length of strings to be considered for word cloud (default is 4) |
4
|
save_wordcloud |
Whether to save plot (default is True) |
True
|
|
bar_plot |
Whether to display barplot with word frequency (default is False) |
False
|
|
save_bar_plot |
Whether to save barplot (default is False) |
False
|