NWHL Scraping Functions¶
Scraping¶
There are three ways to scrape games:
1. Scrape by Season:
Scrape games on a season by season level (Note: A given season is referred to by the first of the two years it spans. So you would refer to the 2016-2017 season as 2016).
import hockey_scraper
# Scrapes the 2015 & 2016 season and stores the data in a Csv file (both are equivalent!!!)
hockey_scraper.nwhl.scrape_seasons([2015, 2016])
hockey_scraper.nwhl.scrape_seasons([2015, 2016], data_format='Csv')
# Scrapes the 2008 season and returns a Pandas DataFrame
scraped_data = hockey_scraper.nwhl.scrape_seasons([2017], data_format='Pandas')
2. Scrape by Game:
Scrape a list of games provided.
import hockey_scraper
# Scrapes games and store in a Csv file
hockey_scraper.nwhl.scrape_games([14694271, 14814946, 14689491], True)
# Scrapes games and return DataFrame with data
scraped_data = hockey_scraper.nwhl.scrape_games([14689624, 18507470, 20575219, 22207005], data_format='Pandas')
3. Scrape by Date Range:
Scrape all games between a specified date range. All dates must be written in a “yyyy-mm-dd” format.
import hockey_scraper
# Scrapes all games between 2016-10-10 and 2017-01-01 and returns a Pandas DataFrame containing the pbp
hockey_scraper.nwhl.scrape_date_range('2016-10-10', '2017-01-01', data_format='pandas')
Scrape Functions¶
Functions to scrape by season, games, and date range
-
hockey_scraper.nwhl.scrape_functions.
print_errors
()¶ Print any scraping errors.
Returns: None
-
hockey_scraper.nwhl.scrape_functions.
scrape_date_range
(from_date, to_date, data_format='csv', rescrape=False, docs_dir=None)¶ Scrape games in given date range
Parameters: - from_date – date you want to scrape from
- to_date – date you want to scrape to
- data_format – format you want data in - csv or pandas (csv is default)
- rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir. (def. = None)
- docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping. (default is None)
Returns: Dictionary with DataFrames and errors or None
-
hockey_scraper.nwhl.scrape_functions.
scrape_games
(games, data_format='csv', rescrape=False, docs_dir=None)¶ Scrape a list of games
Parameters: - games – list of game_ids
- data_format – format you want data in - csv or pandas (csv is default)
- rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir.
- docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping
Returns: Dictionary with DataFrames or None
-
hockey_scraper.nwhl.scrape_functions.
scrape_list_of_games
(games)¶ Scrape an arbitrary list of games given the game id’s
Parameters: games – List of game_id’s to scrape Returns: DataFrame of pbp info
-
hockey_scraper.nwhl.scrape_functions.
scrape_seasons
(seasons, data_format='csv', rescrape=False, docs_dir=None)¶ Given list of seasons it scrapes all the seasons
Parameters: - seasons – list of seasons
- data_format – format you want data in - csv or pandas (csv is default)
- rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir.
- docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping
Returns: Dictionary with DataFrames and errors or None