|
|
||
|---|---|---|
| sql | ||
| .gitignore | ||
| COPYING | ||
| Makefile | ||
| README.md | ||
README.md
lahmanlite
lahmanlite is a project for creating a SQLite database of baseball statistics
from the Lahman database/Baseball Databank.
A makefile and SQL scripts are provided that can create a database from Lahman's CSV files. Ideally, this means that as long as new releases continue (and the structure of the releases is maintained), an up-to-date database can be created. I have also done my best to normalize the data, incorporate constraints, and correct errors I've found.
How to use
Using either export or env, set the LAHMANLITE_CSV_DIR environment
variable to the directory containing Lahman data, then run make. This will
generate two files:
lahman-raw.db, a straight import of the CSV data into SQLite.lahman.db, a modified version oflahman-raw.dbwith data corrections, key constraints, and additional schema modifications.
If you only want the raw data, run make lahman-raw.db instead.
Corrections
Many of the corrections are simple in nature, like:
- correcting obvious typos
- changing empty cells to NULL
- deleting duplicated data
See the sql directory to view the exact SQL statements run for each table.