ncaa_stats_py
Welcome!
This is the official docs page for the ncaa_stats_py
python package.
Basic Setup
How to Install
This package is is available through the
pip
package manager,
and can be installed through one of the following commands
in your terminal/shell:
pip install ncaa_stats_py
OR
python -m pip install ncaa_stats_py
If you are using a Linux/Mac instance,
you may need to specify python3
when installing, as shown below:
python3 -m pip install ncaa_stats_py
Alternatively, cfbd-json-py
can be installed from
this GitHub repository with the following command through pip:
pip install git+https://github.com/armstjc/ncaa_stats_py
OR
python -m pip install git+https://github.com/armstjc/ncaa_stats_py
OR
python3 -m pip install git+https://github.com/armstjc/ncaa_stats_py
How to Use
ncaa_stats_py
separates itself by doing the following
things when attempting to get data:
- Automatically caching any data that is already parsed
- Automatically forcing a 5 second sleep timer for any HTML call, to ensure that any function call from this package won't result in you getting IP banned (you do not need to add sleep timers if you're looping through, and calling functions in this python package).
- Automatically refreshing any cached data if the data hasn't been refreshed in a while.
For example, the following code will work as-is, and in the second loop, the code will load in the teams even faster because the data is cached on the device you're running this code.
from timeit import default_timer as timer
from ncaa_stats_py.baseball import (
get_baseball_team_roster,
get_baseball_teams
)
start_time = timer()
# Loads in a table with every DI NCAA baseball team in the 2024 season.
# If this is the first time you run this script,
# it may take some time to repopulate the NCAA baseball team information data.
teams_df = get_baseball_teams(season=2024, level="I")
end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.
")
# Gets 5 random D1 teams from 2024
teams_df = teams_df.sample(5)
print(teams_df)
print()
# Let's send this to a list to make the loop slightly faster
team_ids_list = teams_df["team_id"].to_list()
# First loop
# If the data isn't cached, it should take 35-40 seconds to do this loop
start_time = timer()
for t_id in team_ids_list:
print(f"On Team ID: {t_id}")
df = get_baseball_team_roster(team_id=t_id)
# print(df)
end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.
")
# Second loop
# Because the data has been parsed and cached,
# this shouldn't take that long to loop through
start_time = timer()
for t_id in team_ids_list:
print(f"On Team ID: {t_id}")
df = get_baseball_team_roster(team_id=t_id)
# print(df)
end_time = timer()
time_elapsed = end_time - start_time
print(f"Elapsed time: {time_elapsed:03f} seconds.
")
Dependencies
ncaa_stats_py
is dependent on the following python packages:
beautifulsoup4
: To assist with parsing HTML data.lxml
: To work withbeautifulsoup4
in assisting with parsing HTML data.pandas
: ForDataFrame
creation within package functions.pytz
: Used to attach timezone information for any date/date time objects encountered by this package.requests
: Used to make HTTPS requests.tqdm
: Used to show progress bars for actions in functions that are known to take minutes to load.
License
This package is licensed under the MIT license. You can view the package's license here.
1""" 2# Welcome! 3This is the official docs page for the `ncaa_stats_py` python package. 4 5# Basic Setup 6 7## How to Install 8 9This package is is available through the 10[`pip` package manager](https://en.wikipedia.org/wiki/Pip_(package_manager)), 11and can be installed through one of the following commands 12in your terminal/shell: 13``` 14pip install ncaa_stats_py 15``` 16OR 17``` 18python -m pip install ncaa_stats_py 19``` 20 21If you are using a Linux/Mac instance, 22you may need to specify `python3` when installing, as shown below: 23``` 24python3 -m pip install ncaa_stats_py 25``` 26 27Alternatively, `cfbd-json-py` can be installed from 28this GitHub repository with the following command through pip: 29``` 30pip install git+https://github.com/armstjc/ncaa_stats_py 31``` 32OR 33``` 34python -m pip install git+https://github.com/armstjc/ncaa_stats_py 35``` 36OR 37``` 38python3 -m pip install git+https://github.com/armstjc/ncaa_stats_py 39``` 40 41## How to Use 42`ncaa_stats_py` separates itself by doing the following 43things when attempting to get data: 441. Automatically caching any data that is already parsed 452. Automatically forcing a 5 second sleep timer for any HTML call, 46 to ensure that any function call from this package 47 won't result in you getting IP banned 48 (you do not *need* to add sleep timers if you're looping through, 49 and calling functions in this python package). 503. Automatically refreshing any cached data if the data 51 hasn't been refreshed in a while. 52 53For example, the following code will work as-is, 54 and in the second loop, the code will load in the teams 55 even faster because the data is cached 56 on the device you're running this code. 57 58```python 59from timeit import default_timer as timer 60 61from ncaa_stats_py.baseball import ( 62 get_baseball_team_roster, 63 get_baseball_teams 64) 65 66start_time = timer() 67 68# Loads in a table with every DI NCAA baseball team in the 2024 season. 69# If this is the first time you run this script, 70# it may take some time to repopulate the NCAA baseball team information data. 71 72teams_df = get_baseball_teams(season=2024, level="I") 73 74end_time = timer() 75 76time_elapsed = end_time - start_time 77print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n") 78 79# Gets 5 random D1 teams from 2024 80teams_df = teams_df.sample(5) 81print(teams_df) 82print() 83 84# Let's send this to a list to make the loop slightly faster 85team_ids_list = teams_df["team_id"].to_list() 86 87# First loop 88# If the data isn't cached, it should take 35-40 seconds to do this loop 89start_time = timer() 90 91for t_id in team_ids_list: 92 print(f"On Team ID: {t_id}") 93 df = get_baseball_team_roster(team_id=t_id) 94 # print(df) 95 96end_time = timer() 97 98time_elapsed = end_time - start_time 99print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n") 100 101# Second loop 102# Because the data has been parsed and cached, 103# this shouldn't take that long to loop through 104start_time = timer() 105 106for t_id in team_ids_list: 107 print(f"On Team ID: {t_id}") 108 df = get_baseball_team_roster(team_id=t_id) 109 # print(df) 110 111end_time = timer() 112time_elapsed = end_time - start_time 113print(f"Elapsed time: {time_elapsed:03f} seconds.\n\n") 114 115``` 116 117# Dependencies 118 119`ncaa_stats_py` is dependent on the following python packages: 120- [`beautifulsoup4`](https://www.crummy.com/software/BeautifulSoup/): 121 To assist with parsing HTML data. 122- [`lxml`](https://lxml.de/): To work with `beautifulsoup4` 123 in assisting with parsing HTML data. 124- [`pandas`](https://github.com/pandas-dev/pandas): 125 For `DataFrame` creation within package functions. 126- [`pytz`](https://pythonhosted.org/pytz/): 127 Used to attach timezone information for any date/date time objects 128 encountered by this package. 129- [`requests`](https://github.com/psf/requests): Used to make HTTPS requests. 130- [`tqdm`](https://github.com/tqdm/tqdm): 131 Used to show progress bars for actions in functions 132 that are known to take minutes to load. 133 134# License 135 136This package is licensed under the MIT license. 137You can view the package's license 138[here](https://github.com/armstjc/ncaa_stats_py/blob/main/LICENSE). 139 140 141""" 142 143from ncaa_stats_py.baseball import * # noqa: F403 144from ncaa_stats_py.basketball import * # noqa: F403 145from ncaa_stats_py.field_hockey import * # noqa: F403 146from ncaa_stats_py.hockey import * # noqa: F403 147from ncaa_stats_py.softball import * # noqa: F403