NWHL Scraping Functions

Scraping

There are three ways to scrape games:

1. Scrape by Season:

Scrape games on a season by season level (Note: A given season is referred to by the first of the two years it spans. So you would refer to the 2016-2017 season as 2016).

import hockey_scraper

 # Scrapes the 2015 & 2016 season and stores the data in a Csv file (both are equivalent!!!)
 hockey_scraper.nwhl.scrape_seasons([2015, 2016])
 hockey_scraper.nwhl.scrape_seasons([2015, 2016], data_format='Csv')

 # Scrapes the 2008 season and returns a Pandas DataFrame
 scraped_data = hockey_scraper.nwhl.scrape_seasons([2017], data_format='Pandas')

2. Scrape by Game:

Scrape a list of games provided.

import hockey_scraper

# Scrapes games and store in a Csv file
hockey_scraper.nwhl.scrape_games([14694271, 14814946, 14689491], True)

# Scrapes games and return DataFrame with data
scraped_data = hockey_scraper.nwhl.scrape_games([14689624, 18507470, 20575219, 22207005], data_format='Pandas')

3. Scrape by Date Range:

Scrape all games between a specified date range. All dates must be written in a “yyyy-mm-dd” format.

import hockey_scraper

# Scrapes all games between 2016-10-10 and 2017-01-01 and returns a Pandas DataFrame containing the pbp
hockey_scraper.nwhl.scrape_date_range('2016-10-10', '2017-01-01', data_format='pandas')

Scrape Functions

Functions to scrape by season, games, and date range

hockey_scraper.nwhl.scrape_functions.print_errors()

Print any scraping errors.

Returns:None
hockey_scraper.nwhl.scrape_functions.scrape_date_range(from_date, to_date, data_format='csv', rescrape=False, docs_dir=None)

Scrape games in given date range

Parameters:
  • from_date – date you want to scrape from
  • to_date – date you want to scrape to
  • data_format – format you want data in - csv or pandas (csv is default)
  • rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir. (def. = None)
  • docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping. (default is None)
Returns:

Dictionary with DataFrames and errors or None

hockey_scraper.nwhl.scrape_functions.scrape_games(games, data_format='csv', rescrape=False, docs_dir=None)

Scrape a list of games

Parameters:
  • games – list of game_ids
  • data_format – format you want data in - csv or pandas (csv is default)
  • rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir.
  • docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping
Returns:

Dictionary with DataFrames or None

hockey_scraper.nwhl.scrape_functions.scrape_list_of_games(games)

Scrape an arbitrary list of games given the game id’s

Parameters:games – List of game_id’s to scrape
Returns:DataFrame of pbp info
hockey_scraper.nwhl.scrape_functions.scrape_seasons(seasons, data_format='csv', rescrape=False, docs_dir=None)

Given list of seasons it scrapes all the seasons

Parameters:
  • seasons – list of seasons
  • data_format – format you want data in - csv or pandas (csv is default)
  • rescrape – If you want to rescrape pages already scraped. Only applies if you supply a docs dir.
  • docs_dir – Directory that either contains previously scraped docs or one that you want them to be deposited in after scraping
Returns:

Dictionary with DataFrames and errors or None

Html Schedule

Json PBP