ncaa_stats_py

Welcome!

This is the official docs page for the ncaa_stats_py python package.

Basic Setup

How to Install

This package is is available through the pip package manager, and can be installed through one of the following commands in your terminal/shell:

pip install ncaa_stats_py

OR

python -m pip install ncaa_stats_py

If you are using a Linux/Mac instance, you may need to specify python3 when installing, as shown below:

python3 -m pip install ncaa_stats_py

Alternatively, cfbd-json-py can be installed from this GitHub repository with the following command through pip:

pip install git+https://github.com/armstjc/ncaa_stats_py

OR

python -m pip install git+https://github.com/armstjc/ncaa_stats_py

OR

python3 -m pip install git+https://github.com/armstjc/ncaa_stats_py

How to Use

ncaa_stats_py separates itself by doing the following things when attempting to get data:

  1. Automatically caching any data that is already parsed
  2. Automatically forcing a 5 second sleep timer for any HTML call, to ensure that any function call from this package won't result in you getting IP banned (you do not need to add sleep timers if you're looping through, and calling functions in this python package).
  3. Automatically refreshing any cached data if the data hasn't been refreshed in a while.

For example, the following code will work as-is, and in the second loop, the code will load in the teams even faster because the data is cached on the device you're running this code.

from timeit import default_timer as timer

from ncaa_stats_py.baseball import (
    get_baseball_team_roster,
    get_baseball_teams
)

start_time = timer()

# Loads in a table with every DI NCAA baseball team in the 2024 season.
# If this is the first time you run this script,
# it may take some time to repopulate the NCAA baseball team information data.

teams_df = get_baseball_teams(season=2024, level="I")

end_time = timer()

time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.

")

# Gets 5 random D1 teams from 2024
teams_df = teams_df.sample(5)
print(teams_df)
print()

# Let's send this to a list to make the loop slightly faster
team_ids_list = teams_df["team_id"].to_list()

# First loop
# If the data isn't cached, it should take 35-40 seconds to do this loop
start_time = timer()

for t_id in team_ids_list:
    print(f"On Team ID: {t_id}")
    df = get_baseball_team_roster(team_id=t_id)
    # print(df)

end_time = timer()

time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.

")

# Second loop
# Because the data has been parsed and cached,
# this shouldn't take that long to loop through
start_time = timer()

for t_id in team_ids_list:
    print(f"On Team ID: {t_id}")
    df = get_baseball_team_roster(team_id=t_id)
    # print(df)

end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.

")

Dependencies

ncaa_stats_py is dependent on the following python packages:

  • beautifulsoup4: To assist with parsing HTML data.
  • lxml: To work with beautifulsoup4 in assisting with parsing HTML data.
  • pandas: For DataFrame creation within package functions.
  • pytz: Used to attach timezone information for any date/date time objects encountered by this package.
  • requests: Used to make HTTPS requests.
  • tqdm: Used to show progress bars for actions in functions that are known to take minutes to load.

License

This package is licensed under the MIT license. You can view the package's license here.

  1"""
  2# Welcome!
  3This is the official docs page for the `ncaa_stats_py` python package.
  4
  5# Basic Setup
  6
  7## How to Install
  8
  9This package is is available through the
 10[`pip` package manager](https://en.wikipedia.org/wiki/Pip_(package_manager)),
 11and can be installed through one of the following commands
 12in your terminal/shell:
 13```
 14pip install ncaa_stats_py
 15```
 16OR
 17```
 18python -m pip install ncaa_stats_py
 19```
 20
 21If you are using a Linux/Mac instance,
 22you may need to specify `python3` when installing, as shown below:
 23```
 24python3 -m pip install ncaa_stats_py
 25```
 26
 27Alternatively, `cfbd-json-py` can be installed from
 28this GitHub repository with the following command through pip:
 29```
 30pip install git+https://github.com/armstjc/ncaa_stats_py
 31```
 32OR
 33```
 34python -m pip install git+https://github.com/armstjc/ncaa_stats_py
 35```
 36OR
 37```
 38python3 -m pip install git+https://github.com/armstjc/ncaa_stats_py
 39```
 40
 41## How to Use
 42`ncaa_stats_py` separates itself by doing the following
 43things when attempting to get data:
 441. Automatically caching any data that is already parsed
 452. Automatically forcing a 5 second sleep timer for any HTML call,
 46    to ensure that any function call from this package
 47    won't result in you getting IP banned
 48    (you do not *need* to add sleep timers if you're looping through,
 49    and calling functions in this python package).
 503. Automatically refreshing any cached data if the data
 51    hasn't been refreshed in a while.
 52
 53For example, the following code will work as-is,
 54    and in the second loop, the code will load in the teams
 55    even faster because the data is cached
 56    on the device you're running this code.
 57
 58```python
 59from timeit import default_timer as timer
 60
 61from ncaa_stats_py.baseball import (
 62    get_baseball_team_roster,
 63    get_baseball_teams
 64)
 65
 66start_time = timer()
 67
 68# Loads in a table with every DI NCAA baseball team in the 2024 season.
 69# If this is the first time you run this script,
 70# it may take some time to repopulate the NCAA baseball team information data.
 71
 72teams_df = get_baseball_teams(season=2024, level="I")
 73
 74end_time = timer()
 75
 76time_elapsed = end_time - start_time
 77print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n")
 78
 79# Gets 5 random D1 teams from 2024
 80teams_df = teams_df.sample(5)
 81print(teams_df)
 82print()
 83
 84# Let's send this to a list to make the loop slightly faster
 85team_ids_list = teams_df["team_id"].to_list()
 86
 87# First loop
 88# If the data isn't cached, it should take 35-40 seconds to do this loop
 89start_time = timer()
 90
 91for t_id in team_ids_list:
 92    print(f"On Team ID: {t_id}")
 93    df = get_baseball_team_roster(team_id=t_id)
 94    # print(df)
 95
 96end_time = timer()
 97
 98time_elapsed = end_time - start_time
 99print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n")
100
101# Second loop
102# Because the data has been parsed and cached,
103# this shouldn't take that long to loop through
104start_time = timer()
105
106for t_id in team_ids_list:
107    print(f"On Team ID: {t_id}")
108    df = get_baseball_team_roster(team_id=t_id)
109    # print(df)
110
111end_time = timer()
112time_elapsed = end_time - start_time
113print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n")
114
115```
116
117# Dependencies
118
119`ncaa_stats_py` is dependent on the following python packages:
120- [`beautifulsoup4`](https://www.crummy.com/software/BeautifulSoup/):
121    To assist with parsing HTML data.
122- [`lxml`](https://lxml.de/): To work with `beautifulsoup4`
123    in assisting with parsing HTML data.
124- [`pandas`](https://github.com/pandas-dev/pandas):
125    For `DataFrame` creation within package functions.
126- [`pytz`](https://pythonhosted.org/pytz/):
127    Used to attach timezone information for any date/date time objects
128    encountered by this package.
129- [`requests`](https://github.com/psf/requests): Used to make HTTPS requests.
130- [`tqdm`](https://github.com/tqdm/tqdm):
131    Used to show progress bars for actions in functions
132    that are known to take minutes to load.
133
134# License
135
136This package is licensed under the MIT license.
137You can view the package's license
138[here](https://github.com/armstjc/ncaa_stats_py/blob/main/LICENSE).
139
140
141"""
142
143from ncaa_stats_py.baseball import *  # noqa: F403
144from ncaa_stats_py.basketball import *  # noqa: F403
145from ncaa_stats_py.field_hockey import *  # noqa: F403
146from ncaa_stats_py.hockey import *  # noqa: F403
147from ncaa_stats_py.softball import *  # noqa: F403