add readme
This commit is contained in:
parent
3b9349d4e1
commit
9f21bbbaae
|
@ -0,0 +1,27 @@
|
|||
# lahmanlite
|
||||
`lahmanlite` is a project for creating a SQLite database of baseball statistics
|
||||
from [the Lahman database/Baseball Databank](https://seanlahman.com).
|
||||
|
||||
A makefile and SQL scripts are provided that can create a database from
|
||||
Lahman's CSV files. Ideally, this means that as long as new releases continue,
|
||||
(and the structure of the releases is maintained), an up-to-date database can
|
||||
be created. I have also done my best to normalize the data, incorporate
|
||||
constraints, and correct errors I've found.
|
||||
|
||||
## How to use
|
||||
Using either `export` or `env`, set the `LAHMANLITE_CSV_DIR` environment
|
||||
variable to the directory containing Lahman data, then run `make`. This will
|
||||
generate two files:
|
||||
* `lahman-raw.db`, a straight import of the CSV data into SQLite.
|
||||
* `lahman.db`, a modified version of `lahman-raw.db` with data corrections, key
|
||||
constraints, and additional schema modifications.
|
||||
|
||||
If you only want the raw data, run `make lahman-raw.db` instead.
|
||||
|
||||
### Corrections
|
||||
Many of the corrections are simple in nature, like:
|
||||
* correcting obvious typos
|
||||
* changing empty cells to NULL
|
||||
* deleting duplicated data
|
||||
|
||||
See the `sql` directory to view the exact SQL statements run for each table.
|
Loading…
Reference in New Issue