206854224e | ||
---|---|---|
sql | ||
.gitignore | ||
COPYING | ||
Makefile | ||
README.md |
README.md
lahmanlite
lahmanlite
is a project for creating a SQLite database of baseball statistics
from the Lahman database/Baseball Databank.
A makefile and SQL scripts are provided that can create a database from Lahman's CSV files. Ideally, this means that as long as new releases continue, (and the structure of the releases is maintained), an up-to-date database can be created. I have also done my best to normalize the data, incorporate constraints, and correct errors I've found.
How to use
Using either export
or env
, set the LAHMANLITE_CSV_DIR
environment
variable to the directory containing Lahman data, then run make
. This will
generate two files:
lahman-raw.db
, a straight import of the CSV data into SQLite.lahman.db
, a modified version oflahman-raw.db
with data corrections, key constraints, and additional schema modifications.
If you only want the raw data, run make lahman-raw.db
instead.
Corrections
Many of the corrections are simple in nature, like:
- correcting obvious typos
- changing empty cells to NULL
- deleting duplicated data
See the sql
directory to view the exact SQL statements run for each table.