https://github.com/sfu-db/apiconnectors
A curated list of example code to collect data from Web APIs using DataPrep.Connector.
https://github.com/sfu-db/apiconnectors
configfile connector datacollection dataconnector dataprep example webapis webdata
Last synced: 11 months ago
JSON representation
A curated list of example code to collect data from Web APIs using DataPrep.Connector.
- Host: GitHub
- URL: https://github.com/sfu-db/apiconnectors
- Owner: sfu-db
- Created: 2019-11-28T21:23:40.000Z (over 6 years ago)
- Default Branch: develop
- Last Pushed: 2023-03-25T00:46:21.000Z (about 3 years ago)
- Last Synced: 2025-03-24T09:46:59.908Z (about 1 year ago)
- Topics: configfile, connector, datacollection, dataconnector, dataprep, example, webapis, webdata
- Language: Python
- Homepage: https://github.com/sfu-db/dataprep#connector
- Size: 1.72 MB
- Stars: 34
- Watchers: 11
- Forks: 24
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# Data Collection From Web APIs
[](#contributors-)
A curated list of example code to collect data from Web APIs using DataPrep.Connector.
## How to Contribute?
You can contribute to this project in two ways. Please check the [contributing guide](CONTRIBUTING.md).
1. Add your example code on this page
2. Add a new configuration file to this repo
## Why Contribute?
* Your contribution will benefit [~100K DataPrep users](https://github.com/sfu-db/dataprep).
* Your contribution will be recoginized on [Contributors](#contributors-).
## Index
* [Art](#art)
* [Business](#business)
* [Calendar](#calendar)
* [Crime](#crime)
* [Finance](#finance)
* [Geocoding](#geocoding)
* [Jobs](#jobs)
* [Lifestyle](#lifestyle)
* [Music](#music)
* [Networking](#networking)
* [News](#news)
* [Science](#science)
* [Shopping](#shopping)
* [Social](#social)
* [Sports](#sports)
* [Travel](#travel)
* [Video](#video)
* [Weather](#weather)
### Art
#### [Harvard Art Museum](./api-connectors/harvardartmuseum) -- Collect Museums' Collection Data
Find the objects with dog in their titles and were made in 1990.
```python
from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('object', title='dog', yearmade=1990)
df[['title', 'division', 'classification', 'technique', 'department', 'century', 'dated']]
```
| | title | division | classification | technique | department | century | dated |
| --- | --------------------------- | --------------------------- | -------------- | -------------------- | ------------------------- | ------------ | ----- |
| 0 | Paris (black dog on street) | Modern and Contemporary Art | Photographs | Gelatin silver print | Department of Photographs | 20th century | 1990s |
| 1 | Pregnant Woman with Dog | Modern and Contemporary Art | Photographs | Gelatin silver print | Department of Photographs | 20th century | 1990 |
| 2 | Pompeii Dog | Modern and Contemporary Art | Prints | Drypoint | Department of Prints | 20th century | 1990 |
Find 10 people that are Dutch.
```python
from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('person', q='culture:Dutch', size=10)
df[['display name', 'gender', 'culture', 'display date', 'object count', 'birth place', 'death place']]
```
| | display name | gender | culture | display date | object count | birth place | death place |
| --- | ------------------------------- | ------- | ------- | -------------- | ------------ | -------------------------------- | ---------------------- |
| 0 | Joris Abrahamsz. van der Haagen | unknown | Dutch | 1615 - 1669 | 7 | Arnhem or Dordrecht, Netherlands | The Hague, Netherlands |
| 1 | François Morellon de la Cave | unknown | Dutch | 1723 - 65 | 1 | None | None |
| 2 | Cornelis Vroom | unknown | Dutch | 1590/92 - 1661 | 3 | Haarlem(?), Netherlands | Haarlem, Netherlands |
| 3 | Constantijn Daniel van Renesse | unknown | Dutch | 1626 - 1680 | 2 | Maarssen | Eindhoven |
| 4 | Dirck Dalens, the Younger | unknown | Dutch | 1654 - 1688 | 3 | Amsterdam, Netherlands | Amsterdam, Netherlands |
Find all exhibitions that take place at a Harvard Art Museums venue after 2020-01-01.
```python
from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('exhibition', venue='HAM', after='2020-01-01')
df
```
| | title | begin date | end date | url |
| --- | ------------------------------------------------------- | ---------- | ---------- | -------------------------------------------------------- |
| 0 | Painting Edo: Japanese Art from the Feinberg Collection | 2020-02-14 | 2021-07-18 | https://www.harvardartmuseums.org/visit/exhibitions/5909 |
Find 5 records for publications that were published in 2013.
```python
from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('publication', q='publicationyear:2013', size=5)
df[['title','publication date','publication place','format']]
```
| | title | publication date | publication place | format |
| --- | ------------------------------------------------- | ---------------- | ----------------- | ------------------------ |
| 0 | 19th Century Paintings, Drawings & Watercolours | January 23, 2013 | London | Auction/Dealer Catalogue |
| 1 | "With Éclat" The Boston Athenæum and the Orig... | 2013 | Boston, MA | Book |
| 2 | "Review: Fragonard's Progress of Love at the F... | 2013 | London | Article/Essay |
| 3 | Alternative Narratives | February 2013 | None | Article/Essay |
| 4 | Victorian & British Impressionist Art | July 11, 2013 | London | Auction/Dealer Catalogue |
Find 5 galleries that are on floor (Level) 2 in the Harvard Art Museums building.
```python
from dataprep.connector import connect
# You can get ”api_key“ by following https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z8MpY2rvQ/viewform
dc = connect('harvardartmuseum', _auth={'access_token': api_key})
df = await dc.query('gallery', floor=2, size=5)
df[['id','name','theme','object count']]
```
| | id | name | theme | object count |
| --- | ---- | -------------------------------------------- | ------------------------------------------------- | ------------ |
| 0 | 2200 | European and American Art, 17th–19th century | The Emergence of Romanticism in Early Nineteen... | 20 |
| 1 | 2210 | West Arcade | None | 6 |
| 2 | 2340 | European and American Art, 17th–19th century | The Silver Cabinet: Art and Ritual, 1600–1850 | 73 |
| 3 | 2460 | East Arcade | None | 2 |
| 4 | 2700 | European and American Art, 19th century | Impressionism and the Late Nineteenth Century | 19 |
### Business
#### [Yelp](./api-connectors/yelp) -- Collect Local Business Data
What's the phone number of Capilano Suspension Bridge Park?
```python
from dataprep.connector import connect
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Capilano Suspension Bridge Park", location = "Vancouver", _count = 1)
df[["name","phone"]]
```
| id | name | phone |
| --- | ------------------------------- | --------------- |
| 0 | Capilano Suspension Bridge Park | +1 604-985-7474 |
Which yoga store has the highest review count in Vancouver?
```python
from dataprep.connector import connect
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 1)
# Check all supported categories: https://www.yelp.ca/developers/documentation/v3/all_category_list
df = await conn_yelp.query("businesses", categories = "yoga", location = "Vancouver", sort_by = "review_count", _count = 1)
df[["name", "review_count"]]
```
| id | name | review_count |
| --- | ------------------- | ------------ |
| 0 | YYOGA Downtown Flow | 107 |
How many Starbucks stores in Seattle and where are they?
```python
from dataprep.connector import connect
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
df = await conn_yelp.query("businesses", term = "Starbucks", location = "Seattle", _count = 1000)
# Remove irrelevant data
df = df[(df['city'] == 'Seattle') & (df['name'] == 'Starbucks')]
df[['name', 'address1', 'city', 'state', 'country', 'zip_code']].reset_index(drop=True)
```
| id | name | address1 | city | state | country | zip_code |
| --- | --------- | ------------------------ | ------- | ----- | ------- | -------- |
| 0 | Starbucks | 515 Westlake Ave N | Seattle | WA | US | 98109 |
| 1 | Starbucks | 442 Terry Avenue N | Seattle | WA | US | 98109 |
| ... | ....... | ....... | ...... | .. | .. | .... |
| 126 | Starbucks | 17801 International Blvd | Seattle | WA | US | 98158 |
What are the ratings for a list of resturants?
```python
from dataprep.connector import connect
import pandas as pd
import asyncio
# You can get ”yelp_access_token“ by following https://www.yelp.com/developers/documentation/v3/authentication
conn_yelp = connect("yelp", _auth={"access_token":yelp_access_token}, _concurrency = 5)
names = ["Miku", "Boulevard", "NOTCH 8", "Chambar", "VIJ’S", "Fable", "Kirin Restaurant", "Cafe Medina", \
"Ask for Luigi", "Savio Volpe", "Nicli Pizzeria", "Annalena", "Edible Canada", "Nuba", "The Acorn", \
"Lee's Donuts", "Le Crocodile", "Cioppinos", "Six Acres", "St. Lawrence", "Hokkaido Santouka Ramen"]
query_list = [conn_yelp.query("businesses", term=name, location = "Vancouver", _count=1) for name in names]
results = asyncio.gather(*query_list)
df = pd.concat(await results)
df[["name", "rating", "city"]].reset_index(drop=True)
```
| ID | Name | Rating | City |
| --- | ------------------------------ | ------ | --------- |
| 0 | Miku | 4.5 | Vancouver |
| 1 | Boulevard Kitchen & Oyster Bar | 4.0 | Vancouver |
| ... | ... | ... | ... |
| 20 | Hokkaido Ramen Santouka | 4.0 | Vancouver |
#### [Hunter](./api-connectors/hunter) -- Collect and Verify Professional Email Addresses
Who are executives of Asana and what are their emails?
```python
from dataprep.connector import connect
# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})
df = await conn_hunter.query('all_emails', domain='asana.com', _count=10)
df[df['department']=='executive']
```
first_name
last_name
email
position
department
0
Dustin
Moskovitz
dustin@asana.com
Cofounder
executive
1
Stephanie
Heß
shess@asana.com
CEO
executive
2
Erin
Cheng
erincheng@asana.com
Strategic Initiatives
executive
What is Dustin Moskovitz's email?
```python
from dataprep.connector import connect
# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})
df = await conn_hunter.query("individual_email", full_name='dustin moskovitz', domain='asana.com')
df
```
first_name
last_name
email
position
0
Dustin
Moskovitz
dustin@asana.com
Cofounder
Are the emails of Asana executives valid?
```python
from dataprep.connector import connect
# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})
employees = await conn_hunter.query("all_emails", domain='asana.com', _count=10)
executives = employees.loc[employees['department']=='executive']
emails = executives[['email']]
for email in emails.iterrows():
status = await conn_hunter.query("email_verifier", email=email[1][0])
emails['status'] = status
emails
```
email
status
0
dustin@asana.com
valid
3
shess@asana.com
NaN
4
erincheng@asana.com
NaN
How many available requests do I have left?
```python
from dataprep.connector import connect
# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})
df = await conn_hunter.query("account")
df
```
requests available
0
19475
What are the counts of each level of seniority of Intercom employees?
```python
from dataprep.connector import connect
# You can get ”hunter_access_token“ by registering as a developer https://hunter.io/users/sign_up
conn_hunter = connect("hunter", _auth={"access_token":'hunter_access_token'})
df = await conn_hunter.query("email_count", domain='intercom.io')
df.drop('total', axis=1)
```
junior
senior
executive
0
0
2
2
### Calendar
#### [Holiday](./api-connectors/holiday) -- Collect Holiday, Workday Data
What are the supported countries, their country codes and languages supported?
```python
from dataprep.connector import connect
# You can get ”holiday_key“ by following https://holidayapi.com/docs
dc = connect('holiday', _auth={'access_token': holiday_key})
df = await dc.query("country")
df
```
| | code | name | languages |
| --- | ---- | -------------------- | --------- |
| 0 | AD | Andorra | ['ca' |
| 1 | AE | United Arab Emirates | ['ar'] |
| .. | .. | ... | ... |
| 249 | ZW | Zimbabwe | ['en'] |
What are the public holidays of Canada in 2020?
```python
from dataprep.connector import connect
# You can get ”holiday_key“ by following https://holidayapi.com/docs
dc = connect('holiday', _auth={'access_token': holiday_key})
df = await dc.query('holiday', country='CA', year=2020, public=True)
df
```
| | name | date | public | observed | weekday |
| --- | -------------- | ---------- | ------ | ---------- | --------- |
| 0 | New Year's Day | 2020-01-01 | True | 2020-01-01 | Wednesday |
| 1 | Good Friday | 2020-04-10 | True | 2020-04-10 | Friday |
| 2 | Victoria Day | 2020-05-18 | True | 2020-05-18 | Monday |
| 3 | Canada Day | 2020-07-01 | True | 2020-07-01 | Wednesday |
| 4 | Labor Day | 2020-09-07 | True | 2020-09-07 | Monday |
| 5 | Christmas Day | 2020-12-25 | True | 2020-12-25 | Friday |
Which day is the 100th workday starting from 2020-01-01, in Canada?
```python
from dataprep.connector import connect
# You can get ”holiday_key“ by following https://holidayapi.com/docs
dc = connect('holiday', _auth={'access_token': holiday_key})
df = await dc.query('workday', country='CA', start='2020-01-01', days=100)
df
```
| | date | weekday |
| --- | --------- | ------- |
| 0 | 2020-5-22 | Friday |
### Crime
#### [JailBase](./api-connectors/jailbase) -- Collect Prisoner Data
What is the URL for the mugshot of Almondo Smith?
```python
# You can get ”jailbase_access_token“ by registering as a developer https://rapidapi.com/JailBase/api/jailbase
dc = connect('jailbase', _auth={'access_token':jailbase_access_token})
df = await dc.query('search', source_id='wi-wcsd', last_name='smith', first_name='almondo')
df['mugshot'][0]
```
'https://imgstore.jailbase.com/small/arrested/wi-wcsd/2017-12-29/almondo-smith-679063bf90e389938d70b0b49caf7944.pic1.jpg'
Who were the 10 most recently arrested people by Wood County Sheriff's Department?
```python
# You can get ”jailbase_access_token“ by registering as a developer https://rapidapi.com/JailBase/api/jailbase
dc = connect('jailbase', _auth={'access_token':jailbase_access_token})
sources = await dc.query('sources')
department = sources[sources['name']=='Wood County Sheriff\'s Dept']
df = await dc.query('recent', source_id=department['source_id'].values[0])
df
```
id
name
mugshot
charges
more_info_url
0
23917656
Curtis Joseph
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdcurtis-josep...
1
23917654
Taner Summers
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdtaner-summer...
2
23901411
Maryann Randolph
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdmaryann-rand...
3
23821284
Antonia Cinodijay
https://imgstore.jailbase.com/widgets/NoMug.gif
[[]]
http://www.jailbase.com/en/wi-wcsdantonia-cino...
4
23821280
Deangelo Barker
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsddeangelo-bar...
5
23811811
Tekeisha Faucibus
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdtekeisha-fau...
6
23811810
Tariq Nunoke
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdtariq-nunoke...
7
23811808
Sarah Jusakaja
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdsarah-jusaka...
8
23791805
Angela Burch
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdangela-burch...
9
23775367
Suzanne Nicholson
https://imgstore.jailbase.com/small/arrested/w...
[[]]
http://www.jailbase.com/en/wi-wcsdsuzanne-nich...
How many police offices are in each US state in the JailBase system?
```python
# You can get ”jailbase_access_token“ by registering as a developer https://rapidapi.com/JailBase/api/jailbase
dc = connect('jailbase', _auth={'access_token':jailbase_access_token})
df = await dc.query('sources')
state_counts = df['state'].value_counts()
state_counts
```
North Carolina 81
Kentucky 75
Missouri 73
Arkansas 70
Iowa 67
Texas 57
Virginia 47
Florida 46
Mississippi 44
Indiana 38
New York 37
South Carolina 35
Ohio 29
Colorado 27
Tennessee 26
Alabama 26
Idaho 23
New Mexico 18
California 18
Michigan 17
Georgia 17
Illinois 14
Washington 13
Wisconsin 11
Oregon 10
Nevada 9
Arizona 9
Louisiana 8
New Jersey 7
Oklahoma 6
Utah 5
Minnesota 5
Pennsylvania 4
Maryland 4
Kansas 3
North Dakota 3
South Dakota 2
Wyoming 2
Alaska 1
West Virginia 1
Nebraska 1
Montana 1
Connecticut 1
Name: state, dtype: int64
### Finance
#### [Finnhub](./api-connectors/finnhub) -- Collect Financial, Market, Economic Data
How to get a list of cryptocurrencies and their exchanges
```python
import pandas as pd
from dataprep.connector import connect
# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)
df = await conn_finnhub.query('crypto_exchange')
exchanges = df['exchange'].to_list()
symbols = []
for ex in exchanges:
data = await df.query('crypto_symbols', exchange=ex)
symbols.append(data)
df_symbols = pd.concat(symbols)
df_symbols
```
| id | description | displaySymbol | symbol |
| --- | ----------------- | ------------- | ----------------- |
| 0 | Binance FRONT/ETH | FRONT/ETH | BINANCE:FRONTETH |
| 1 | Binance ATOM/BUSD | ATOM/BUSD | BINANCE:ATOMBUSD |
| ... | ... | ... | ... |
| 281 | Poloniex AKRO/BTC | AKRO/BTC | POLONIEX:BTC_AKRO |
Which ipo in the current month has the highest total share values?
```python
import calendar
from datetime import datetime
from dataprep.connector import connect
# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)
today = datetime.today()
days_in_month = calendar.monthrange(today.year, today.month)[1]
date_from = today.replace(day=1).strftime('%Y-%m-%d')
date_to = today.replace(day=days_in_month).strftime('%Y-%m-%d')
ipo_df = await conn_finnhub.query('ipo_calender', from_=date_from, to=date_to)
ipo_df[ipo_df['totalSharesValue'] == ipo_df['totalSharesValue'].max()]
```
| id | date | exchange | name | numberOfShares | ... | totalSharesValue |
| --- | ---------- | -------- | ------------------------------ | -------------- | --- | ---------------- |
| 5 | 2021-02-03 | NYSE | TELUS International (Cda) Inc. | 33333333 | ... | 9.58333e+08 |
What are the average acutal earnings from the last 4 seasons of a list of 10 popular stocks?
```python
import asyncio
import pandas as pd
from dataprep.connector import connect
# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)
stock_list = ['TSLA', 'AAPL', 'WMT', 'GOOGL', 'FB', 'MSFT', 'COST', 'NVDA', 'JPM', 'AMZN']
query_list = [conn_finnhub.query('earnings', symbol=symbol) for symbol in stock_list]
query_results = asyncio.gather(*query_list)
stocks_df = pd.concat(await query_results)
stocks_df = stocks_df.groupby('symbol', as_index=False).agg({'actual': ['mean']})
stocks_df.columns = stocks_df.columns.get_level_values(0)
stocks_df = stocks_df.sort_values(by='actual', ascending=False).rename(columns={'actual': 'avg_actual'})
stocks_df.reset_index(drop=True)
```
| id | symbol | avg_actual |
| --- | ------ | ---------- |
| 0 | GOOGL | 12.9375 |
| 1 | AMZN | 8.5375 |
| 2 | FB | 2.4475 |
| .. | ... | ... |
| 9 | TSLA | 0.556 |
What is the earnings of last 4 quarters of a given company? (e.g. TSLA)
```python
from dataprep.connector import connect
from datetime import datetime, timedelta, timezone
# You can get ”finnhub_access_token“ by following https://finnhub.io/
conn_finnhub = connect("finnhub", _auth={"access_token":finnhub_access_token}, update=True)
today = datetime.now(tz=timezone.utc)
oneyear = today - timedelta(days = 365)
start = int(round(oneyear.timestamp()))
result = await conn_finnhub.query('earnings_calender', symbol='TSLA', from_=start, to=today)
result = result.set_index('date')
result
```
| id | date | epsActual | epsEstimate | hour | quarter | ... | symbol | year |
| :--- | :--------- | --------: | ----------: | :--- | ------: | --- | :----- | ---: |
| 0 | 2021-01-27 | 0.8 | 1.37675 | amc | 4 | ... | TSLA | 2020 |
| 1 | 2020-10-21 | 0.76 | 0.600301 | amc | 3 | ... | TSLA | 2020 |
| 2 | 2020-07-22 | 0.436 | -0.0267036 | amc | 2 | ... | TSLA | 2020 |
| .. | ... | ... | ... | ... | ... | ... | ... | ... |
| 3 | 2011-02-15 | -0.094 | -0.101592 | amc | 4 | ... | TSLA | 2010 |
#### [CoinGecko](./api-connectors/coingecko) -- Collect Cryptocurrency Data
What are the 10 cryptocurrencies with highest market cap and their current information?
```python
from dataprep.connector import connect
conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('markets', vs_currency='usd', order='market_cap_desc', per_page=10, page=1)
df
```
| | name | symbol | current_price | market_cap | market_cap_rank | high_24h | low_24h | price_change_24h | price_change_percentage_24h | market_cap_change_24h | market_cap_change_percentage_24h | last_updated |
| ---: | :----------- | :----- | ------------: | ----------: | --------------: | -------: | ------: | ---------------: | --------------------------: | --------------------: | -------------------------------: | :----------------------- |
| 0 | Bitcoin | btc | 36811 | 6.86613e+11 | 1 | 37153 | 35344 | 1440.68 | 4.0731 | 3.10933e+10 | 4.7433 | 2021-02-03T19:24:09.271Z |
| 1 | Ethereum | eth | 1628.99 | 1.87035e+11 | 2 | 1645.73 | 1486.42 | 132.91 | 8.88404 | 1.64296e+10 | 9.63018 | 2021-02-03T19:22:32.413Z |
| .. | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9 | Binance Coin | bnb | 51.47 | 7.60256e+09 | 10 | 51.63 | 49.76 | 1.24 | 2.47631 | 1.64863e+08 | 2.21659 | 2021-02-03T19:25:45.456Z |
What are the cryptocurrencies with highest increasing and decreasing percentage?
```python
from dataprep.connector import connect
conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('markets', vs_currency='usd', per_page=1000, page=1)
df = df.sort_values(by=['price_change_percentage_24h']).reset_index(drop=True).dropna()
print("Coin with highest decreasing percetage: {}, which decreases {}%".format(df['name'].iloc[0], df['price_change_percentage_24h'].iloc[0]))
print("Coin with highest increasing percetage: {}, which increases {}%".format(df['name'].iloc[-1], df['price_change_percentage_24h'].iloc[-1]))
```
Coin with the highest decreasing percentage: `PancakeSwap`, which decreases `-13.79622%`
Coin with the highest increasing percentage: `StormX`, which increases `101.24182%`
Which cryptocurrencies are trending in CoinGecko?
```python
from dataprep.connector import connect
conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('trend')
df
```
| | id | name | symbol | market_cap_rank | score |
| ---: | :---------------- | :---------- | :----- | --------------: | ----: |
| 0 | bao-finance | Bao Finance | BAO | 175 | 0 |
| 1 | milk2 | MILK2 | MILK2 | 634 | 1 |
| 2 | unitrade | Unitrade | TRADE | 529 | 2 |
| 3 | pancakeswap-token | PancakeSwap | CAKE | 110 | 3 |
| 4 | fsw-token | Falconswap | FSW | 564 | 4 |
| 5 | zeroswap | ZeroSwap | ZEE | 550 | 5 |
| 6 | storm | StormX | STMX | 211 | 6 |
What are the 10 US exchanges with highest trade volume in the past 24 hours?
```python
from dataprep.connector import connect
conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('exchanges')
result = df[df['country']=='United States'].reset_index(drop=True).head(10)
result
```
| | id | name | year_established | ... | trade_volume_24h_btc_normalized |
| ---: | :--------- | :----------- | ---------------: | :--- | ------------------------------: |
| 0 | gdax | Coinbase Pro | 2012 | ... | 90085.6 |
| 1 | kraken | Kraken | 2011 | ... | 48633.1 |
| 2 | binance_us | Binance US | 2019 | ... | 7380.83 |
| .. | ... | ... | ... | ... | ... |
What are the 3 latest traded derivatives with perpetual contract?
```python
from dataprep.connector import connect
import pandas as pd
conn_coingecko = connect("coingecko")
df = await conn_coingecko.query('derivatives')
perpetual_df = df[df['contract_type'] == 'perpetual'].reset_index(drop=True)
perpetual_df['last_traded_at'] = pd.to_datetime(perpetual_df['last_traded_at'], unit='s')
perpetual_df.sort_values(by=['last_traded_at'], ascending=False).head(3).reset_index(drop=True)
```
| | market | symbol | index_id | contract_type | index | basis | funding_rate | open_interest | volume_24h | last_traded_at |
| ---: | :------------- | :--------- | :------- | :------------ | --------: | --------: | -----------: | ------------: | ----------: | :------------------ |
| 0 | Huobi Futures | MATIC-USDT | MATIC | perpetual | 0.0433357 | -0.606296 | 0.247604 | nan | 1.43338e+06 | 2021-02-03 20:14:24 |
| 1 | Biki (Futures) | 1 | BTC | perpetual | 36769.8 | -0.153111 | -0.0519 | nan | 1.00131e+08 | 2021-02-03 20:14:23 |
| 2 | Huobi Futures | CVC-USDT | CVC | perpetual | 0.178268 | -0.336302 | 0.106314 | nan | 876960 | 2021-02-03 20:14:23 |
### Geocoding
#### [MapQuest](./api-connectors/mapquest) -- Collect Driving Directions, Maps, Traffic Data
Where is the Simon Fraser University? Give all the places if there is more than one campus.
```python
from dataprep.connector import connect
# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
BC_BBOX = "-139.06,48.30,-114.03,60.00"
campus = await conn_map.query("place", q = "Simon Fraser University", sort = "relevance", bbox = BC_BBOX, _count = 50)
campus = campus[campus["name"] == "Simon Fraser University"].reset_index()
```
| id | index | name | country | state | city | address | postalCode | coordinates | details |
| ---: | ----: | :---------------------- | :------ | :---- | :-------- | :---------------------- | :--------- | :----------------------- | :------ |
| 0 | 0 | Simon Fraser University | CA | BC | Burnaby | 8888 University Drive E | V5A 1S6 | [-122.90416, 49.27647] | ... |
| 1 | 2 | Simon Fraser University | CA | BC | Vancouver | 602 Hastings St W | V6B 1P2 | [-123.113431, 49.284626] | ... |
How many KFC are there in Burnaby? What are their address?
```python
from dataprep.connector import connect
# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
BC_BBOX = "-139.06,48.30,-114.03,60.00"
kfc = await conn_map.query("place", q = "KFC", sort = "relevance", bbox = BC_BBOX, _count = 500)
kfc = kfc[(kfc["name"] == "KFC") & (kfc["city"] == "Burnaby")].reset_index()
print("There are %d KFCs in Burnaby" % len(kfc))
print("Their addresses are:")
kfc['address']
```
There are 1 KFCs in Burnaby
Their addresses are:
| id | address |
| ---: | ------------: |
| 0 | 5094 Kingsway |
The ratio of Starbucks to Tim Hortons in Vancouver?
```python
from dataprep.connector import connect
# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
VAN_BBOX = '-123.27,49.195,-123.020,49.315'
starbucks = await conn_map.query('place', q='starbucks', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)
timmys = await conn_map.query('place', q='Tim Hortons', sort='relevance', bbox=VAN_BBOX, page='1', pageSize = '50', _count=200)
is_vancouver_sb = starbucks['city'] == 'Vancouver'
is_vancouver_tim = timmys['city'] == 'Vancouver'
sb_in_van = starbucks[is_vancouver_sb]
tim_in_van = timmys[is_vancouver_tim]
print('The ratio of Starbucks:Tim Hortons in Vancouver is %d:%d' % (len(sb_in_van), len(tim_in_van)))
```
The ratio of Starbucks:Tim Hortons in Vancouver is 188:120
What is the closest gas station from Metropolist and how far is it?
```python
from dataprep.connector import connect
from numpy import radians, sin, cos, arctan2, sqrt
def distance_in_km(cord1, cord2):
R = 6373.0
lat1 = radians(cord1[1])
lon1 = radians(cord1[0])
lat2 = radians(cord2[1])
lon2 = radians(cord2[0])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * arctan2(sqrt(a), sqrt(1 - a))
distance = R * c
return(distance)
# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
METRO_TOWN = [-122.9987, 49.2250]
METRO_TOWN_string = '%f,%f' % (METRO_TOWN[0], METRO_TOWN[1])
nearest_petro = await conn_map.query('place', q='gas station', sort='distance', location=METRO_TOWN_string, page='1', pageSize = '1')
print('Metropolist is %fkm from the nearest gas station' % distance_in_km(METRO_TOWN, nearest_petro['coordinates'][0]))
print('The gas station is %s at %s' % (nearest_petro['name'][0], nearest_petro['address'][0]))
```
Metropolist is 0.376580km from the nearest gas station
The gas station is Chevron at 4692 Imperial St
In BC, which city has the most amount of shopping centers?
```python
from dataprep.connector import connect
# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
BC_BBOX = "-139.06,48.30,-114.03,60.00"
GROCERY = 'sic:541105'
shop_list = await conn_map.query("place", sort="relevance", bbox=BC_BBOX, category=GROCERY, _count=500)
shop_list = shop_list[shop_list["state"] == "BC"]
shop_list.groupby('city')['name'].count().sort_values(ascending=False).head(10)
```
| city | count |
| --------------: | ----: |
| Vancouver | 42 |
| Victoria | 24 |
| Surrey | 15 |
| Burnaby | 14 |
| ... | ... |
| North Vancouver | 8 |
Where is the nearest grocery of SFU? How many miles far? And how much time estimated for driving?
```python
from dataprep.connector import connect
# You can get ”mapquest_access_token“ by following https://developer.mapquest.com/
conn_map = connect("mapquest", _auth={"access_token": mapquest_access_token}, _concurrency = 10)
SFU_LOC = '-122.90416, 49.27647'
GROCERY = 'sic:541105'
nearest_grocery = await conn_map.query("place", location=SFU_LOC, sort="distance", category=GROCERY)
destination = nearest_grocery.iloc[0]['details']
name = nearest_grocery.iloc[0]['name']
route = await conn_map.query("route", from_='8888 University Drive E, Burnaby', to=destination)
total_distance = sum([float(i)for i in route.iloc[:]['distance']])
total_time = sum([int(i)for i in route.iloc[:]['time']])
print('The nearest grocery of SFU is ' + name + '. It is ' + str(total_distance) + ' miles far, and It is expected to take ' + str(total_time // 60) + 'm' + str(total_time % 60)+'s of driving.')
route
```
The nearest grocery of SFU is Nesters Market. It is 1.234 miles far, and It is expected to take 3m21s of driving.
| id | index | narrative | distance | time |
| ---: | ----: | :------------------------------------------------------------------- | -------: | ---: |
| 0 | 0 | Start out going east on University Dr toward Arts Rd. | 0.348 | 57 |
| 1 | 1 | Turn left to stay on University Dr. | 0.606 | 84 |
| 2 | 2 | Enter next roundabout and take the 1st exit onto University High St. | 0.28 | 60 |
| 3 | 3 | 9000 UNIVERSITY HIGH STREET is on the left. | 0 | 0 |
### Jobs
#### [The Muse](./api-connectors/themuse) -- Collect Job Ads, Company Information
What are the data science jobs in Vancouver on the fisrt page?
```python
from dataprep.connector import connect
# You can get ”app_key“ by following https://www.themuse.com/developers/api/v2/apps
dc = connect('themuse', _auth={'access_token': app_key})
df = await dc.query('jobs', page=1, category='Data Science', location='Vancouver, Canada')
df[['id', 'name', 'company', 'locations', 'levels', 'publication_date']]
```
| | id | name | company | locations | levels | publication_date |
| --- | ------- | -------------------------------------- | ----------------- | ------------------------------------------------- | ------------------------------------------------- | --------------------------- |
| 0 | 5126286 | Senior Data Scientist | Discord | [{'name': 'Flexible / Remote'}] | [{'name': 'Senior Level', 'short_name': 'senio... | 2021-03-15T11:10:24Z |
| 1 | 5543215 | Data Scientist-AI/ML (Remote) | Dell Technologies | [{'name': 'Chicago, IL'}, {'name': 'Flexible /... | [{'name': 'Mid Level', 'short_name': 'mid'}] | 2021-04-02T11:45:57Z |
| 2 | 4959228 | Senior Data Scientist | Humana | [{'name': 'Flexible / Remote'}] | [{'name': 'Senior Level', 'short_name': 'senio... | 2021-01-05T11:28:23.814281Z |
| 3 | 5172631 | Data Scientist - Marketing | Stash | [{'name': 'Flexible / Remote'}] | [{'name': 'Mid Level', 'short_name': 'mid'}] | 2021-03-26T23:09:33Z |
| 4 | 5372353 | Data Science Intern, Machine Learning | Coursera | [{'name': 'Flexible / Remote'}] | [{'name': 'Internship', 'short_name': 'interns... | 2021-04-05T23:04:40Z |
| 5 | 5298606 | Senior Machine Learning Engineer | Affirm | [{'name': 'Flexible / Remote'}] | [{'name': 'Senior Level', 'short_name': 'senio... | 2021-03-17T23:10:51Z |
| 6 | 5166882 | Data Scientist | Postmates | [{'name': 'Bellevue, WA'}, {'name': 'Los Angel... | [{'name': 'Mid Level', 'short_name': 'mid'}] | 2021-02-01T17:49:53.238832Z |
| 7 | 5375212 | Director, Data Science & Analytics | UKG | [{'name': 'Flexible / Remote'}, {'name': 'Lowe... | [{'name': 'management', 'short_name': 'managem... | 2021-03-31T23:17:53Z |
| 8 | 5130731 | Senior Data Scientist | Humana | [{'name': 'Flexible / Remote'}] | [{'name': 'Senior Level', 'short_name': 'senio... | 2021-01-26T11:42:44.232111Z |
| 9 | 5306269 | Director of Data Sourcing and Strategy | Opendoor | [{'name': 'Flexible / Remote'}] | [{'name': 'management', 'short_name': 'managem... | 2021-03-31T23:05:22Z |
What are the senior-level data science positions at Amazon on the first page?
```python
from dataprep.connector import connect
# You can get ”app_key“ by following https://www.themuse.com/developers/api/v2/apps
dc = connect('themuse', _auth={'access_token': app_key})
df = await dc.query('jobs', page=1, category='Data Science', company='Amazon', level='Senior Level')
df[:10][['id', 'name', 'company', 'locations', 'publication_date']]
```
| | id | name | company | locations | publication_date |
| --- | ------- | ------------------------------------------------- | ------- | ------------------------------------ | --------------------------- |
| 0 | 5153796 | Sr. Data Architect, Data Lake & Analytics - Na... | Amazon | [{'name': 'San Diego, CA'}] | 2021-02-01T22:54:14.002653Z |
| 1 | 4083477 | Principal Data Architect, Data Lake & Analytics | Amazon | [{'name': 'Chicago, IL'}] | 2021-02-01T23:14:17.251814Z |
| 2 | 4149878 | Principal Data Architect, Data Warehousing & MPP | Amazon | [{'name': 'Arlington, VA'}] | 2021-02-01T23:15:22.017573Z |
| 3 | 4497753 | Data Architect - Data Lake & Analytics - Natio... | Amazon | [{'name': 'Irvine, CA'}] | 2021-02-01T23:15:22.439949Z |
| 4 | 4870271 | Data Scientist | Amazon | [{'name': 'Seattle, WA'}] | 2021-02-01T23:04:25.967878Z |
| 5 | 4603482 | Data Scientist - Prime Gaming | Amazon | [{'name': 'Seattle, WA'}] | 2021-02-01T23:10:37.628292Z |
| 6 | 5193240 | Data Scientist | Amazon | [{'name': 'Seattle, WA'}] | 2021-02-04T23:56:19.176327Z |
| 7 | 4678426 | Sr Data Architect - Streaming | Amazon | [{'name': 'Roseville, CA'}] | 2021-02-01T22:51:25.598645Z |
| 8 | 4150011 | Data Architect - Data Lake & Analytics - Natio... | Amazon | [{'name': 'Tampa, FL'}] | 2021-02-04T23:56:18.281215Z |
| 9 | 4346719 | Sr. Data Scientist - ML Labs | Amazon | [{'name': 'London, United Kingdom'}] | 2021-02-01T23:12:42.038111Z |
What are the top 10 companies in engineering? (sorted by factors such as trendiness, uniqueness, newness, etc)?
```python
from dataprep.connector import connect
# You can get ”app_key“ by following https://www.themuse.com/developers/api/v2/apps
dc = connect('themuse', _auth={'access_token': app_key})
df = await dc.query('companies', industry='Engineering', page=1)
df[:10]
```
| | id | name | locations | size | publication_date | url |
| --- | ----- | -------------------- | ------------------------------------------------- | ----------- | --------------------------- | ------------------------------------------------- |
| 0 | 706 | Appian | [{'name': 'Tysons Corner, VA'}] | Medium Size | 2015-11-25T18:17:50.926146Z | https://www.themuse.com/companies/appian |
| 1 | 12168 | Bristol Myers Squibb | [{'name': 'Boudry, Switzerland'}, {'name': 'De... | Large Size | 2020-12-15T15:55:56.940074Z | https://www.themuse.com/companies/bristolmyers... |
| 2 | 11897 | McMaster-Carr | [{'name': 'Atlanta, GA'}, {'name': 'Chicago, I... | Large Size | 2020-02-10T21:57:15.338561Z | https://www.themuse.com/companies/mcmastercarr |
| 3 | 12162 | ServiceNow | [{'name': 'Santa Clara, CA'}] | Large Size | 2021-01-26T23:48:13.066632Z | https://www.themuse.com/companies/servicenow |
| 4 | 11731 | Tenaska | [{'name': 'Boston, MA'}, {'name': 'Dallas, TX'... | Large Size | 2019-03-14T14:01:54.465873Z | https://www.themuse.com/companies/tenaska |
| 5 | 11885 | Brex | [{'name': 'Flexible / Remote'}, {'name': 'New ... | Medium Size | 2020-02-05T23:16:44.780028Z | https://www.themuse.com/companies/brex |
| 6 | 1483 | Inline Plastics | [{'name': 'Shelton, CT'}] | Medium Size | 2017-09-11T14:49:24.153633Z | https://www.themuse.com/companies/inlineplastics |
| 7 | 12113 | Dematic | [{'name': 'Atlanta, GA'}, {'name': 'Banbury, U... | Large Size | 2020-09-17T20:29:19.400892Z | https://www.themuse.com/companies/dematic |
| 8 | 11967 | Kairos Power | [{'name': 'Albuquerque, NM'}, {'name': 'Charlo... | Medium Size | 2020-12-07T21:29:33.538815Z | https://www.themuse.com/companies/kairospower |
| 9 | 11913 | Siemens | [{'name': 'Munich, Germany'}] | Large Size | 2020-01-23T21:35:56.937727Z | https://www.themuse.com/companies/siemens |
### Lifestyle
#### [Spoonacular](./api-connectors/spoonacular) -- Collect Recipe, Food, and Nutritional Information Data
Which foods are unhealthy, i.e.,have high carbs and high fat content?
```python
from dataprep.connector import connect
import pandas as pd
dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)
df = await dc.query('recipes_by_nutrients', minFat=65, maxFat=100, minCarbs=75, maxCarbs=100, _count=20)
df["calories"] = pd.to_numeric(df["calories"]) # convert string type to numeric
df = df[df['calories']>1100] # considering foods with more than 1100 calories per serving to be unhealthy
df[["title","calories","fat","carbs"]].sort_values(by=['calories'], ascending=False)
```
| id | title | calories | fat | carbs |
| --- | --------------------------------- | -------- | --- | ----- |
| 2 | Brownie Chocolate Chip Cheesecake | 1210 | 92g | 79g |
| 8 | Potato-Cheese Pie | 1208 | 80g | 96g |
| 0 | Stuffed Shells with Beef and Broc | 1192 | 72g | 81g |
| 3 | Coconut Crusted Rockfish | 1187 | 72g | 92g |
| 4 | Grilled Ratatouille | 1143 | 82g | 88g |
| 7 | Pecan Bars | 1121 | 84g | 91g |
Which meat dishes are rich in proteins?
```python
from dataprep.connector import connect
dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)
df = await dc.query('recipes', query='beef', diet='keto', minProtein=25, maxProtein=60, _count=5)
df = df[["title","nutrients"]]
# Output of 'nutrients' column : [{'title': 'Protein', 'amount': 22.3768, 'unit': 'g'}]
g = [] # to extract the exact amount of Proteins in grams and store as list
for i in df["nutrients"]:
z = i[0]
g.append(z['amount'])
df.insert(1,'Protein(g)',g)
df[["title","Protein(g)"]].sort_values(by='Protein(g)',ascending=False)
```
| id | title | Protein(g) |
| --- | ------------------------------------------------- | ---------- |
| 3 | Strip steak with roasted cherry tomatoes and v... | 56.2915 |
| 0 | Low Carb Brunch Burger | 53.7958 |
| 2 | Entrecote Steak with Asparagus | 41.6676 |
| 1 | Italian Style Meatballs | 35.9293 |
Which Italian Vegan dishes are popular?
```python
from dataprep.connector import connect
dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)
df = await dc.query('recipes', query='popular veg dishes', cuisine='italian', diet='vegan', _count=20)
df[["title"]]
```
| id | Title |
| --- | ------------------------------------------------- |
| 0 | Vegan Pea and Mint Pesto Bruschetta |
| 1 | Gluten Free Vegan Gnocchi |
| 2 | Fresh Tomato Risotto with Grilled Green Vegeta... |
What are the top 5 liked chicken recipes with common ingredients?
```python
from dataprep.connector import connect
import pandas as pd
dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)
df= await dc.query('recipes_by_ingredients', ingredients='chicken,buttermilk,salt,pepper')
df['likes'] = pd.to_numeric(df['likes'])
df[['title', 'likes']].sort_values(by=['likes'], ascending=False).head(5)
```
| id | title | likes |
| --- | ------------------------------------------------- | ----- |
| 9 | Oven-Fried Ranch Chicken | 561 |
| 1 | Fried Chicken and Wild Rice Waffles with Pink ... | 78 |
| 6 | CCC: Carla Hall’s Fried Chicken | 47 |
| 2 | Buttermilk Fried Chicken | 12 |
| 0 | My Pantry Shelf | 10 |
What is the average calories for high calorie Korean foods?
```python
from dataprep.connector import connect
from statistics import mean
dc = connect('spoonacular', _auth={'access_token': API_key}, concurrency=3, update=True)
df = await dc.query('recipes', query='korean', minCalories = 500)
nutri = df['nutrients'].tolist()
calories = []
for i in range(len(nutri)):
calories.append(nutri[i][0]['amount'])
print('Average calories for high calorie Korean foods:', mean(calories),'kcal')
```
Average calories for high calorie Korean foods: 644.765 kcal
### Music
#### [MusixMatch](./api-connectors/musicmatch) -- Collect Music Lyrics Data
What is Katy Perry's Twitter URL?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
df = await conn_musixmatch.query("artist_info", artist_mbid = "122d63fc-8671-43e4-9752-34e846d62a9c")
df[['name', 'twitter_url']]
```
name
twitter_url
0
Katy Perry
https://twitter.com/katyperry
What album is the song "Gone, Gone, Gone" in?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
df = await conn_musixmatch.query("track_matches", q_track = "Gone, Gone, Gone")
df[['name', 'album_name']]
```
name
album_name
0
Gone, Gone, Gone
The World From the Side of the Moon
Which artist/artists group is most popular in Canada?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
df = await conn_musixmatch.query("top_artists", country = "Canada")
df['name'][0]
```
'BTS'
How many genres are in the Musixmatch database?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
df = await conn_musixmatch.query("genres")
len(df)
```
362
Who is the most popular American artist named Michael?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)
df = await conn_musixmatch.query("artists", q_artist = "Michael")
df = df[df['country'] == "US"].sort_values('rating', ascending=False)
df['name'].iloc[0]
```
'Michael Jackson'
What is the genre of the album "Atlas"?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
album = await conn_musixmatch.query("album_info", album_id = 11339785)
genres = await conn_musixmatch.query("genres")
album_genre = genres[genres['id'] == album['genre_id'][0][0]]['name']
album_genre.iloc[0]
```
'Soundtrack'
What is the link to lyrics of the most popular song in the album "Yellow"?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)
df = await conn_musixmatch.query("album_tracks", album_id = 10266231)
df = df.sort_values('rating', ascending=False)
df['track_share_url'].iloc[0]
```
'https://www.musixmatch.com/lyrics/Coldplay/Yellow?utm_source=application&utm_campaign=api&utm_medium=SFU%3A1409620992740'
What are Lady Gaga's albums from most to least recent?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, update = True)
df = await conn_musixmatch.query("artist_albums", artist_mbid = "650e7db6-b795-4eb5-a702-5ea2fc46c848", s_release_date = "desc")
df.name.unique()
```
array(['Chromatica', 'Stupid Love',
'A Star Is Born (Original Motion Picture Soundtrack)', 'Your Song'],
dtype=object)
Which artists are similar to Lady Gaga?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
df = await conn_musixmatch.query("related_artists", artist_mbid = "650e7db6-b795-4eb5-a702-5ea2fc46c848")
df
```
id
name
rating
country
twitter_url
updated_time
artist_alias_list
0
6985
Cast
41
2015-03-29T03:32:49Z
[キャスト]
1
7014
black eyed peas
77
US
https://twitter.com/bep
2016-06-30T10:07:05Z
[The Black Eyed Peas, ブラック・アイド・ピーズ, heiyandoud...
2
269346
OneRepublic
74
US
https://twitter.com/OneRepublic
2015-01-07T08:21:52Z
[ワンリパブリツク, Gong He Shi Dai, Timbaland presents...
3
276451
Taio Cruz
60
GB
2016-06-30T10:32:58Z
[タイオ クルーズ, tai ou ke lu zi, Trio Cruz, Jacob M...
4
409736
Inna
54
RO
https://twitter.com/inna_ro
2014-11-13T03:37:43Z
[インナ]
5
475281
Skrillex
62
US
https://twitter.com/Skrillex
2013-11-05T11:28:57Z
[スクリレックス, shi qi lei ke si, Sonny, Skillrex]
6
13895270
Imagine Dragons
82
US
https://twitter.com/Imaginedragons
2013-11-05T11:30:28Z
[イマジン・ドラゴンズ, IMAGINE DRAGONS]
7
27846837
Shawn Mendes
80
CA
2015-02-17T10:33:56Z
[ショーン・メンデス, xiaoenmengdezi]
8
33491890
Rihanna
81
GB
https://twitter.com/rihanna
2018-10-15T20:32:58Z
[りあーな, Rihanna, 蕾哈娜, Rhianna, Riannah, Robyn R...
9
33491981
Avicii
74
SE
https://twitter.com/avicii
2018-04-20T18:27:01Z
[アヴィーチー, ai wei qi, Avicci]
What are the highest rated songs in Canada from highest to lowest popularity?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token}, _concurrency = 5)
df = await conn_musixmatch.query("top_tracks", country = 'CA')
df[df['is_explicit'] == 0].sort_values('rating', ascending = False).reset_index()
```
index
id
name
rating
commontrack_id
has_instrumental
is_explicit
has_lyrics
has_subtitles
album_id
album_name
artist_id
artist_name
track_share_url
updated_time
genres
0
5
201621042
Dynamite
99
114947355
0
0
1
1
39721115
Dynamite - Single
24410130
BTS
https://www.musixmatch.com/lyrics/BTS/Dynamite...
2021-01-15T16:40:48Z
[Pop]
1
9
187880919
Before You Go
99
103153140
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-11-20T08:44:05Z
[Pop, Alternative]
2
7
189704353
Breaking Me
98
105304416
0
0
1
1
34892017
Keep On Loving
42930474
Topic feat. A7S
https://www.musixmatch.com/lyrics/Topic-8/Brea...
2021-01-19T16:57:29Z
[House, Dance]
3
3
189626475
Watermelon Sugar
95
103096346
0
0
1
1
36101498
Fine Line
24505463
Harry Styles
https://www.musixmatch.com/lyrics/Harry-Styles...
2020-02-14T08:07:12Z
[Music]
What are other songs in the same album as the song "Before You Go"?
```python
from dataprep.connector import connect
# You can get ”musixmatch_access_token“ by registering as a developer https://developer.musixmatch.com/signup
conn_musixmatch = connect("musixmatch", _auth={"access_token":musixmatch_access_token})
song = await conn_musixmatch.query("track_info", commontrack_id = 103153140)
album = await conn_musixmatch.query("album_tracks", album_id = song["album_id"][0])
album
```
id
name
rating
commontrack_id
has_instrumental
is_explicit
has_lyrics
has_subtitles
album_id
album_name
artist_id
artist_name
track_share_url
updated_time
genres
0
186884178
Grace
31
87857108
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-04-09T10:21:29Z
[Folk-Rock]
1
186884184
Bruises
68
70395936
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2020-07-31T12:58:04Z
[Music, Alternative]
2
186884187
Hold Me While You Wait
89
95176135
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2020-08-02T07:23:21Z
[Music]
3
186884189
Someone You Loved
95
89461086
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2020-06-22T15:34:07Z
[Pop, Alternative]
4
186884190
Maybe
31
95541701
0
1
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-05-20T11:41:00Z
[Music]
5
186884191
Forever
67
95541702
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-11-18T10:46:36Z
[Music]
6
186884192
One
31
95541699
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-05-19T04:08:23Z
[Music]
7
186884193
Don't Get Me Wrong
31
95541698
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-12-20T08:25:26Z
[Music]
8
186884194
Hollywood
31
95541700
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2019-05-21T08:00:54Z
[Music]
9
186884195
Lost on You
31
73530089
0
0
1
1
35611759
Divinely Uninspired To A Hellish Extent (Exten...
33258132
Lewis Capaldi
https://www.musixmatch.com/lyrics/Lewis-Capald...
2020-03-17T08:35:18Z
[Alternative]
#### [Spotify](./api-connectors/spotify) -- Collect Albums, Artists, and Tracks Metadata
How many followers does Eminem have?
```python
from dataprep.connector import connect
# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)
df = await conn_spotify.query("artist", q="Eminem", _count=500)
df.loc[df['# followers'].idxmax(), '# followers']
```
41157398
How many singles does Pink Floyd have that are available in Canada?
```python
from dataprep.connector import connect
# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)
artist_name = "Pink Floyd"
df = await conn_spotify.query("album", q = artist_name, _count = 500)
df = df.loc[[(artist_name in x) for x in df['artist']]]
df = df.loc[[('CA' in x) for x in df['available_markets']]]
df = df.loc[df['total_tracks'] == '1']
df.shape[0]
```
12
In the last quarter of 2020, which artist released the album with the most tracks?
```python
from dataprep.connector import connect
import pandas as pd
# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)
df = await conn_spotify.query("album", q = "2020", _count = 500)
df['date'] = pd.to_datetime(df['release_date'])
df = df[df['date'] > '2020-10-01'].drop(columns = ['image url', 'external urls', 'release_date'])
df['total_tracks'] = df['total_tracks'].astype(int)
df = df.loc[df['total_tracks'].idxmax()]
print(df['album_name'] + ", by " + df['artist'][0] + ", tracks: " + str(df['total_tracks']))
```
ASOT 996 - A State Of Trance Episode 996 (Top 50 Of 2020 Special), by Armin van Buuren ASOT Radio, tracks: 172
Who is the most popular artist: Eminem, Beyonce, Pink Floyd and Led Zeppelin
```python
# and what are their popularity ratings?
from dataprep.connector import connect
# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)
artists_and_num_followers = []
for artist in ['Beyonce', 'Pink Floyd', 'Eminem', 'Led Zeppelin']:
df = await conn_spotify.query("artist", q = artist, _count = 500)
num_followers = df.loc[df['# followers'].idxmax(), 'popularity']
artists_and_num_followers.append((artist, num_followers))
print(sorted(artists_and_num_followers, key=lambda x: x[1], reverse=True))
```
[('Eminem', 94.0), ('Beyonce', 88.0), ('Pink Floyd', 83.0), ('Led Zeppelin', 81.0)]```python
Who are the top 5 artists with the most followers from the current Billboard top 100 artists?
```python
from dataprep.connector import connect
from bs4 import BeautifulSoup
import requests
# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)
web_page = requests.get("https://www.billboard.com/charts/artist-100")
html_soup = BeautifulSoup(web_page.text, 'html.parser')
artist_100 = html_soup.find_all('span', class_ = 'chart-list-item__title-text')
artists = {}
artists_top5 = []
for artist in artist_100:
df_temp = await conn_spotify.query("artist", q = artist.text.strip(), _count = 10)
df_temp = df_temp.loc[df_temp['popularity'].idxmax()]
artists[df_temp['name']] = df_temp['# followers']
artists_top5 = sorted(artists, key = artists.get, reverse = True)[:5]
artists_top5
```
['Ed Sheeran', 'Ariana Grande', 'Drake', 'Justin Bieber', 'Eminem']
For a list of top 10 most popular albums from rollingstone.com which album has most selling markets (countries) around the world in 2020?
```python
from dataprep.connector import connect
import asyncio
# You can get ”spotify_client_id“ and "spotify_client_secret" by registering as a developer https://developer.spotify.com/dashboard/#
conn_spotify = connect("spotify", _auth={"client_id":spotify_client_id, "client_secret":spotify_client_secret}, _concurrency=3)
def count_markets(text):
lst = text.split(',')
return len(lst)
album_artists = ["Folklore", "Fetch the Bolt Cutters", "YHLQMDLG", "Rough and Rowdy Ways", "Future Nostalgia",
"RTJ4", "Saint Cloud", "Eternal Atake", "What’s Your Pleasure", "Punisher"]
album_list = [conn_spotify.query("album", q = name, _count = 1) for name in album_artists]
combined = asyncio.gather(*album_list)
df = pd.concat(await combined).reset_index()
df = df.drop(columns = ['image url', 'external urls', 'index'])
df['market_count'] = df['available_markets'].apply(lambda x: count_markets(x))
df = df.loc[df['market_count'].idxmax()]
print(df['album_name'] + ", by " + df['artist'][0] + ", with " + str(df['market_count']) + " avalible countries")
```
folklore, by Taylor Swift, with 92 avalible countries
#### [iTunes](./api-connectors/itunes) -- Collect iTunes Data
What are all Jack Johnson audio and video content?
```python
from dataprep.connector import connect
conn_itunes = connect('itunes')
df = await conn_itunes.query('search', term="jack+johnson")
df
```
| id | Type | kind | artistName | collectionName | trackName | trackTime |
| ---: | ----: | ---: | -----------: | ------------------------------------------------: | ------------------------: | --------- |
| 0 | track | song | Jack Johnson | Jack Johnson and Friends: Sing-A-Longs and Lul... | Upside Down | 208643 |
| 1 | track | song | Jack Johnson | In Between Dreams (Bonus Track Version) | Better Together | 207679 |
| 2 | track | song | Jack Johnson | In Between Dreams (Bonus Track Version) | Sitting, Waiting, Wishing | 183721 |
| ... | ... | ... | ... | ... | ... | ... |
| 49 | track | song | Jack Johnson | Sleep Through the Static | While We Wait | 86112 |
How to compute the average track time of Rich Brian's music videos?
```python
from dataprep.connector import connect
conn_itunes = connect('itunes')
df = await conn_itunes.query("search", term="rich+brian", entity="musicVideo")
avg_track_time = df['trackTime'].mean()/(1000*60)
print("The average track time is {:.3} minutes.".format(avg_track_time))
```
The average track time is 4.13 minutes.
How to get all Ang Lee's movies which are made in the Unite States?
```python
from dataprep.connector import connect
conn_itunes = connect('itunes')
df = await conn_itunes.query("search", term="Ang+Lee", entity="movie", country="us")
df = df[df['artistName']=='Ang Lee']
df
```
| id | type | kind | artistName | collectionName | trackName | trackTime |
| --- | ----- | ------------- | ---------- | --------------------------- | ------------------- | --------- |
| 0 | track | feature-movie | Ang Lee | Fox 4K HDR Drama Collection | Life of Pi | 7642675 |
| 1 | track | feature-movie | Ang Lee | None | Gemini Man | 7049958 |
| ... | ... | ... | ... | ... | ... | ... |
| 11 | track | feature-movie | Ang Lee | None | Ride With the Devil | 8290498 |
### Networking
#### [IPLegit](./api-connectors/iplegit) -- Collect IP Address Data
How can I check if an IP address is bad, so I can block it from accessing my website?
```python
from dataprep.connector import connect
# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})
ip_addresses = ['16.210.143.176',
'98.124.198.1',
'182.50.236.215',
'90.104.138.217',
'61.44.131.150',
'210.64.150.243',
'89.141.156.184']
for ip in ip_addresses:
ip_status = await conn_iplegit.query('status', ip=ip)
bad_status = ip_status['bad_status'].get(0)
if bad_status == True:
print('block ip address: ', ip_status['ip'].get(0))
```
block ip address: 98.124.198.1
What country are most people from who have visited my website?
```python
from dataprep.connector import connect
import pandas as pd
# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})
ip_addresses = ['16.210.143.176',
'98.124.198.1',
'182.50.236.215',
'90.104.138.217',
'61.44.131.150',
'210.64.150.243',
'89.141.156.184',
'85.94.168.133',
'98.14.201.52',
'98.57.106.207',
'185.254.139.250',
'206.246.126.82',
'147.44.75.68',
'123.42.224.40',
'253.29.140.44',
'97.203.209.153',
'196.63.36.253']
ip_details = []
for ip in ip_addresses:
ip_details.append(await conn_iplegit.query('details', ip=ip))
df = pd.concat(ip_details)
df.country.mode().get(0)
```
'UNITED STATES'
Make a map showing locations of people who have visited my website.
```python
from dataprep.connector import connect
import pandas as pd
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame
# You can get ”iplegit_access_token“ by registering as a developer https://rapidapi.com/IPLegit/api/iplegit
conn_iplegit = connect('iplegit', _auth={'access_token':iplegit_access_token})
ip_addresses = ['16.210.143.176',
'98.124.198.1',
'182.50.236.215',
'90.104.138.217',
'61.44.131.150',
'210.64.150.243',
'89.141.156.184',
'85.94.168.133',
'98.14.201.52',
'98.57.106.207',
'185.254.139.250',
'206.246.126.82',
'147.44.75.68',
'123.42.224.40',
'253.29.140.44',
'97.203.209.153',
'196.63.36.253']
ip_details = []
for ip in ip_addresses:
ip_details.append(await conn_iplegit.query('details', ip=ip))
df = pd.concat(ip_details)
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])]
gdf = GeoDataFrame(df, geometry=geometry)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);
```

### News
#### [Guardian](./api-connectors/guardian) -- Collect Guardian News Data
Which news section contain most mentions related to bitcoin ?
```python
from dataprep.connector import connect, info, Connector
import pandas as pd
conn_guardian = connect('guardian', update = True, _auth={'access_token': API_key}, concurrency=3)
df3 = await conn_guardian.query('article', _q='covid 19', _count=1000)
df3.groupby('section').count().sort_values("headline", ascending=False)
```
| section | headline | url | publish_date |
| ---------------------------------- | -------- | --- | ------------ |
| | | | |
| World news | 378 | 378 | 378 |
| Business | 103 | 103 | 103 |
| US news | 76 | 76 | 76 |
| Opinion | 72 | 72 | 72 |
| Sport | 53 | 53 | 53 |
| Australia news | 49 | 49 | 49 |
| Society | 44 | 44 | 44 |
| Politics | 34 | 34 | 34 |
| Football | 28 | 28 | 28 |
| Global development | 26 | 26 | 26 |
| UK news | 26 | 26 | 26 |
| Education | 17 | 17 | 17 |
| Environment | 14 | 14 | 14 |
| Technology | 10 | 10 | 10 |
| Film | 10 | 10 | 10 |
| Science | 8 | 8 | 8 |
| Books | 8 | 8 | 8 |
| Life and style | 7 | 7 | 7 |
| Television & radio | 6 | 6 | 6 |
| Media | 4 | 4 | 4 |
| Culture | 4 | 4 | 4 |
| Stage | 4 | 4 | 4 |
| News | 4 | 4 | 4 |
| Travel | 2 | 2 | 2 |
| WEHI: Brighter together | 2 | 2 | 2 |
| Xero: Resilient business | 2 | 2 | 2 |
| Money | 2 | 2 | 2 |
| The new rules of work | 1 | 1 | 1 |
| LinkedIn: Hybrid workplace | 1 | 1 | 1 |
| Global | 1 | 1 | 1 |
| Getting back on track | 1 | 1 | 1 |
| Westpac Scholars: Rethink tomorrow | 1 | 1 | 1 |
| Food | 1 | 1 | 1 |
| All together | 1 | 1 | 1 |
Find articles with covid precautions ?
```python
from dataprep.connector import connect, Connector
conn_guardian = connect('guardian', update = True, _auth={'access_token': API_key}, concurrency=3)
df2 = await conn_guardian.query('article', _q='covid 19 protect', _count=100)
df2[df2.section=='Opinion']
```
| id | headline | section | url | publish_date |
| --- | ------------------------------------------------- | ------- | ------------------------------------------------- | -------------------- |
| 0 | Billionaires made $1tn since Covid-19. They ca... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-12-09T11:32:20Z |
| 1 | Jeff Bezos became even richer thanks to Covid-... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-12-13T07:30:00Z |
| 20 | Here's how to tackle the Covid-19 anti-vaxxers... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-11-26T16:02:14Z |
| 41 | Can the UK deliver on the Covid vaccine rollou... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-12-11T09:00:24Z |
| 68 | Covid-19 has turned back the clock on working ... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-12-10T14:19:27Z |
| 84 | The Guardian view on Covid-19 promises: season... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-12-14T18:42:10Z |
| 88 | The Guardian view on responding to the Covid-1... | Opinion | https://www.theguardian.com/commentisfree/2020... | 2020-12-30T18:58:05Z |
#### [Times](./api-connectors/times) -- Collect New York Times Data
Who is the author of article 'Yellen Outlines Economic Priorities, and Republicans Draw Battle Lines'
```python
from dataprep.connector import connect
# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q='Yellen Outlines Economic Priorities, and Republicans Draw Battle Lines')
df[["authors"]]
```
| id | authors |
| ---: | :---------------- |
| 0 | By Alan Rappeport |
What is the newest news from Ottawa
```python
from dataprep.connector import connect
# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q="ottawa",sort='newest')
df[['headline','authors','abstract','url','pub_date']].head(1)
```
| | headline | ... | pub_date |
| ---: | :------------------------------------------------------------ | :--- | :----------------------- |
| 0 | 21 Men Accuse Lincoln Project Co-Founder of Online Harassment | ... | 2021-01-31T14:48:35+0000 |
What are Headlines of articles where Trump was mentioned in the last 6 months of 2020 in the technology news section
```python
from dataprep.connector import connect
# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
df = await conn_times.query('ac',q="Trump",fq='section_name:("technology")',begin_date='20200630',end_date='20201231',sort='newest', _count=50)
print(df['headline'])
print("Trump was mentioned in " + str(len(df)) + " articles")
```
| id | headline |
| ---: | :--------------------------------------------------------------------------------- |
| 0 | No, Trump cannot win Georgia’s electoral votes through a write-in Senate campaign. |
| 1 | How Misinformation ‘Superspreaders’ Seed False Election Theories |
| 2 | No, Trump’s sister did not publicly back him. He was duped by a fake account. |
| .. | ... |
| 49 | Trump Official’s Tweet, and Its Removal, Set Off Flurry of Anti-Mask Posts |
Trump was mentioned in 50 articles
What is the ranking of times a celebrity is mentioned in a headline in latter half of 2020?
```python
from dataprep.connector import connect
import pandas as pd
# You can get ”times_access_token“ by following https://developer.nytimes.com/apis
conn_times = connect("times", _auth={"access_token":times_access_token})
celeb_list = ['Katy Perry', 'Taylor Swift', 'Lady Gaga', 'BTS', 'Rihanna', 'Kim Kardashian']
number_of_mentions = []
for i in celeb_list:
df1 = await conn_times.query('ac',q=i,begin_date='20200630',end_date='20201231')
df1 = df1[df1['headline'].str.contains(i)]
a = len(df1['headline'])
number_of_mentions.append(a)
print(number_of_mentions)
ranking_df = pd.DataFrame({'name': celeb_list, 'number of mentions': number_of_mentions})
ranking_df = ranking_df.sort_values(by=['number of mentions'], ascending=False)
ranking_df
```
[2, 6, 3, 6, 1, 0]
| | name | number of mentions |
| ---: | :------------- | -----------------: |
| 1 | Taylor Swift | 6 |
| 3 | BTS | 6 |
| 2 | Lady Gaga | 3 |
| 0 | Katy Perry | 2 |
| 4 | Rihanna | 1 |
| 5 | Kim Kardashian | 0 |
#### [Currents](./api-connectors/currents) -- Collect Currents News Data
How to get latest Chinese news?
```python
from dataprep.connector import connect
# You can get ”currents_access_token“ by following https://currentsapi.services/zh_CN
conn_currents = connect('currents', _auth={'access_token': currents_access_token})
df = await conn_currents.query('latest_news', language='zh')
df.head()
```
| id | title | category | ... | author | published |
| ---: | :------------------- | :------------- | :--- | :------- | :------------------------ |
| 0 | 為何上市公司該汰換了 | [entrepreneur] | ... | 經濟日報 | 2021-02-03 08:48:39 +0000 |
How to get the political news about 'Trump'?
```python
from dataprep.connector import connect
# You can get ”currents_access_token“ by following https://currentsapi.services/zh_CN
conn_currents = connect('currents', _auth={'access_token': currents_access_token})
df = await conn_currents.query('search', keywords='Trump', category='politics')
df.head(3)
```
| | title | category | description | url | author | published |
| ---: | :----------------------------------------------------------------------------------------------------------- | :-------------------- | :----------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------- | :------------ | :------------------------ |
| 0 | Biden Started The Process Of Unwinding Trump's Assault On Immigration, But Activists Want Him To Move Faster | ['politics', 'world'] | "These people cannot continue to wait." | https://www.buzzfeednews.com/article/adolfoflores/biden-immigration-executive-orders-review | Adolfo Flores | 2021-02-03 08:39:51 +0000 |
| 1 | Pro-Trump lawyer Lin Wood reportedly under investigation for voter fraud | ['politics', 'world'] | A source told CBS Atlanta affiliate WGCL that Lin Wood is being investigated for allegedly voting "out of state." | https://www.cbsnews.com/news/pro-trump-lawyer-lin-wood-under-investigation-for-alleged-illegal-voting-2020-02-03/ | April Siese | 2021-02-03 08:21:25 +0000 |
| 2 | Trump Supporters Say They Attacked The Capitol Because He Told Them To, Undercutting His Impeachment Defense | ['politics', 'world'] | “President Trump told Us to ‘fight like hell,’” one Trump supporter reportedly posted online after the assault on the Capitol. | https://www.buzzfeednews.com/article/zoetillman/trump-impeachment-capitol-rioters-fight-like-hell | Zoe Tillman | 2021-02-03 07:25:34 +0000 |
How to get the news about COVID-19 from 2020-12-25?
```python
from dataprep.connector import connect
# You can get ”currents_access_token“ by following https://currentsapi.services/zh_CN
conn_currents = connect('currents', _auth={'access_token': currents_access_token})
df = await conn_currents.query('search', keywords='covid', start_date='2020-12-25',end_date='2020-12-25')
df.head(1)
```
| | title | category | ... | published |
| ---: | :------------------------------------------------------------------ | :---------- | :--- | :------------------------ |
| 0 | Commentary: Let our charitable giving equal our political donations | ['opinion'] | ... | 2020-12-25 00:00:00 +0000 |
### Science
#### [DBLP](./api-connectors/dblp) -- Collect Computer Science Publication Data
Who wrote this paper?
```python
from dataprep.connector import connect
conn_dblp = connect("dblp")
df = await conn_dblp.query("publication", q = "Scikit-learn: Machine learning in Python", _count = 1)
df[["title", "authors", "year"]]
```
| id | title | authors | year |
| --- | ------------------------------------------ | ------------------------------------------------- | ---- |
| 0 | Scikit-learn - Machine Learning in Python. | [Fabian Pedregosa, Gaël Varoquaux, Alexandre G... | 2011 |
How to fetch all publications of Andrew Y. Ng?
```python
from dataprep.connector import connect
conn_dblp = connect("dblp", _concurrency = 5)
df = await conn_dblp.query("publication", author = "Andrew Y. Ng", _count = 2000)
df[["title", "authors", "venue", "year"]].reset_index(drop=True)
```
| id | title | authors | venue | year |
| --- | ------------------------------------------------- | ------------------------------------------------- | ---------------- | ---- |
| 0 | The 1st Agriculture-Vision Challenge - Methods... | [Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennife... | [CVPR Workshops] | 2020 |
| ... | ... | ... | ... | ... |
| 242 | An Experimental and Theoretical Comparison of ... | [Michael J. Kearns, Yishay Mansour, Andrew Y. ... | [COLT] | 1995 |
How to fetch all publications of NeurIPS 2020?
```python
from dataprep.connector import connect
conn_dblp = connect("dblp", _concurrenncy = 5)
df = await conn_dblp.query("publication", q = "NeurIPS 2020", _count = 5000)
# filter non-neurips-2020 papers
mask = df.venue.apply(lambda x: 'NeurIPS' in x)
df = df[mask]
df = df[(df['year'] == '2020')]
df[["title", "venue", "year"]].reset_index(drop=True)
```
| id | title | venue | year |
| ---- | ------------------------------------------------- | --------- | ---- |
| 0 | Towards More Practical Adversarial Attacks on ... | [NeurIPS] | 2020 |
| ... | ... | ... | ... |
| 1899 | Triple descent and the two kinds of overfittin... | [NeurIPS] | 2020 |
#### [NASA](api-connectors/nasa) -- Collect NASA Data.
What are the title of Astronomy Picture of the Day from 2020-01-01 to 2020-01-10?
```python
from dataprep.connector import connect
# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})
df = await conn_nasa.query("apod", start_date='2020-01-01', end_date='2020-01-10')
df['title']
```
| id | title |
| ---: | :------------------------------ |
| 0 | Betelgeuse Imagined |
| 1 | The Fainting of Betelgeuse |
| 2 | Quadrantids over the Great Wall |
| ... | ... |
| 9 | Nacreous Clouds over Sweden |
What are Coronal Mass Ejection(CME) data from 2020-01-01 to 2020-02-01?
```python
from dataprep.connector import connect
# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})
df = await conn_nasa.query('cme', startDate='2020-01-01', endDate='2020-02-01')
df
```
| id | activity_id | catalog | start_time | ... | link |
| ---: | :-------------------------- | :---------- | :---------------- | :--- | :------------------------------------------------------- |
| 0 | 2020-01-05T16:45:00-CME-001 | M2M_CATALOG | 2020-01-05T16:45Z | ... | https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/CME/15256/-1 |
| 1 | 2020-01-14T11:09:00-CME-001 | M2M_CATALOG | 2020-01-14T11:09Z | ... | https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/CME/15271/-1 |
| .. | ... | ... | ... | ... | ... |
| 4 | 2020-01-25T18:54:00-CME-001 | M2M_CATALOG | 2020-01-25T18:54Z | ... | https://kauai.ccmc.gsfc.nasa.gov/DONKI/view/CME/15296/-1 |
How many Geomagnetic Storms(GST) have occurred from 2020-01-01 to 2021-01-01? When is it?
```python
from dataprep.connector import connect
# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})
df = await conn_nasa.query('gst', startDate='2020-01-01', endDate='2021-01-01')
print("Geomagnetic Storms have occurred %s times from 2020-01-01 to 2021-01-01." % len(df))
df['start_time']
```
Geomagnetic Storms have occurred 1 times from 2020-01-01 to 2021-01-01.
| id | start_time |
| ---: | :---------------- |
| 0 | 2020-09-27T21:00Z |
How many Solar Flare(FLR) have occurred and completed from 2020-01-01 to 2021-01-01? How long did they last?
```python
import pandas as pd
from dataprep.connector import connect
# You can get ”nasa_access_key“ by following https://api.nasa.gov/
conn_nasa = connect("api-connectors/nasa", _auth={'access_token': nasa_access_key})
df = await conn_nasa.query('flr', startDate='2020-01-01', endDate='2021-01-01')
df = df.dropna(subset=['end_time']).reset_index(drop=True)
df['duration'] = pd.to_datetime(df['end_time']) - pd.to_datetime(df['begin_time'])
print('Solar Flare have occurred %s times from 2020-01-01 to 2021-01-01.' % len(df))
print(df['duration'])
```
There are 1 times Geomagnetic Storms(GST) have occurred from 2020-01-01 to 2021-01-01.
| id | duration |
| ---: | :-------------- |
| 0 | 0 days 01:07:00 |
| 1 | 0 days 00:23:00 |
| 2 | 0 days 00:47:00 |
What are Solar Energetic Particle(SEP) data from 2019-01-01 to 2021-01-01?
```python
import pandas as pd
from dataprep.connector import connect
# You can get ”nasa_access_key“ by follow