lahmanlite/README.md

28 lines
1.2 KiB
Markdown
Raw Permalink Normal View History

2024-05-05 03:22:38 +00:00
# lahmanlite
`lahmanlite` is a project for creating a SQLite database of baseball statistics
from [the Lahman database/Baseball Databank](https://seanlahman.com).
A makefile and SQL scripts are provided that can create a database from
2024-05-05 05:56:03 +00:00
Lahman's CSV files. Ideally, this means that as long as new releases continue
2024-05-05 03:22:38 +00:00
(and the structure of the releases is maintained), an up-to-date database can
be created. I have also done my best to normalize the data, incorporate
constraints, and correct errors I've found.
## How to use
Using either `export` or `env`, set the `LAHMANLITE_CSV_DIR` environment
variable to the directory containing Lahman data, then run `make`. This will
generate two files:
* `lahman-raw.db`, a straight import of the CSV data into SQLite.
* `lahman.db`, a modified version of `lahman-raw.db` with data corrections, key
constraints, and additional schema modifications.
If you only want the raw data, run `make lahman-raw.db` instead.
### Corrections
Many of the corrections are simple in nature, like:
* correcting obvious typos
* changing empty cells to NULL
* deleting duplicated data
See the `sql` directory to view the exact SQL statements run for each table.