add readme

This commit is contained in:
filifa 2024-05-04 22:22:38 -05:00
parent 3b9349d4e1
commit 9f21bbbaae
1 changed files with 27 additions and 0 deletions

27
README.md Normal file
View File

@ -0,0 +1,27 @@
# lahmanlite
`lahmanlite` is a project for creating a SQLite database of baseball statistics
from [the Lahman database/Baseball Databank](https://seanlahman.com).
A makefile and SQL scripts are provided that can create a database from
Lahman's CSV files. Ideally, this means that as long as new releases continue,
(and the structure of the releases is maintained), an up-to-date database can
be created. I have also done my best to normalize the data, incorporate
constraints, and correct errors I've found.
## How to use
Using either `export` or `env`, set the `LAHMANLITE_CSV_DIR` environment
variable to the directory containing Lahman data, then run `make`. This will
generate two files:
* `lahman-raw.db`, a straight import of the CSV data into SQLite.
* `lahman.db`, a modified version of `lahman-raw.db` with data corrections, key
constraints, and additional schema modifications.
If you only want the raw data, run `make lahman-raw.db` instead.
### Corrections
Many of the corrections are simple in nature, like:
* correcting obvious typos
* changing empty cells to NULL
* deleting duplicated data
See the `sql` directory to view the exact SQL statements run for each table.