NWHL Scraping Functions¶

Scraping¶

There are three ways to scrape games:

1. Scrape by Season:

Scrape games on a season by season level (Note: A given season is referred to by the first of the two years it spans. So you would refer to the 2016-2017 season as 2016).

import hockey_scraper

 # Scrapes the 2015 & 2016 season and stores the data in a Csv file (both are equivalent!!!)
 hockey_scraper.nwhl.scrape_seasons([2015, 2016])
 hockey_scraper.nwhl.scrape_seasons([2015, 2016], data_format='Csv')

 # Scrapes the 2008 season and returns a Pandas DataFrame
 scraped_data = hockey_scraper.nwhl.scrape_seasons([2017], data_format='Pandas')

2. Scrape by Game:

Scrape a list of games provided.

import hockey_scraper

# Scrapes games and store in a Csv file
hockey_scraper.nwhl.scrape_games([14694271, 14814946, 14689491], True)

# Scrapes games and return DataFrame with data
scraped_data = hockey_scraper.nwhl.scrape_games([14689624, 18507470, 20575219, 22207005], data_format='Pandas')

3. Scrape by Date Range:

Scrape all games between a specified date range. All dates must be written in a “yyyy-mm-dd” format.

import hockey_scraper

# Scrapes all games between 2016-10-10 and 2017-01-01 and returns a Pandas DataFrame containing the pbp
hockey_scraper.nwhl.scrape_date_range('2016-10-10', '2017-01-01', data_format='pandas')

Scrape Functions¶

Functions to scrape by season, games, and date range

hockey_scraper.nwhl.scrape_functions.print_errors()¶

Print any scraping errors.

Returns:	None

hockey_scraper.nwhl.scrape_functions.scrape_date_range(from_date, to_date, data_format='csv', rescrape=False, docs_dir=None)¶

Scrape games in given date range

Parameters:

from_date – date you want to scrape from
to_date – date you want to scrape to
data_format – format you want data in - csv or pandas (csv is default)
rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir. (def. = None)
docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping. (default is None)

Returns:

Dictionary with DataFrames and errors or None

hockey_scraper.nwhl.scrape_functions.scrape_games(games, data_format='csv', rescrape=False, docs_dir=None)¶

Scrape a list of games

Parameters:	games – list of game_ids data_format – format you want data in - csv or pandas (csv is default) rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir. docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping
Returns:	Dictionary with DataFrames or None

hockey_scraper.nwhl.scrape_functions.scrape_list_of_games(games)¶

Scrape an arbitrary list of games given the game id’s

Parameters:	games – List of game_id’s to scrape
Returns:	DataFrame of pbp info

hockey_scraper.nwhl.scrape_functions.scrape_seasons(seasons, data_format='csv', rescrape=False, docs_dir=None)¶

Given list of seasons it scrapes all the seasons

Parameters:	seasons – list of seasons data_format – format you want data in - csv or pandas (csv is default) rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir. docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping
Returns:	Dictionary with DataFrames and errors or None

NWHL Scraping Functions¶

Scraping¶

Scrape Functions¶

Html Schedule¶

Json PBP¶

Table of Contents

This Page