add readme
This commit is contained in:
parent
3b9349d4e1
commit
9f21bbbaae
|
@ -0,0 +1,27 @@
|
||||||
|
# lahmanlite
|
||||||
|
`lahmanlite` is a project for creating a SQLite database of baseball statistics
|
||||||
|
from [the Lahman database/Baseball Databank](https://seanlahman.com).
|
||||||
|
|
||||||
|
A makefile and SQL scripts are provided that can create a database from
|
||||||
|
Lahman's CSV files. Ideally, this means that as long as new releases continue,
|
||||||
|
(and the structure of the releases is maintained), an up-to-date database can
|
||||||
|
be created. I have also done my best to normalize the data, incorporate
|
||||||
|
constraints, and correct errors I've found.
|
||||||
|
|
||||||
|
## How to use
|
||||||
|
Using either `export` or `env`, set the `LAHMANLITE_CSV_DIR` environment
|
||||||
|
variable to the directory containing Lahman data, then run `make`. This will
|
||||||
|
generate two files:
|
||||||
|
* `lahman-raw.db`, a straight import of the CSV data into SQLite.
|
||||||
|
* `lahman.db`, a modified version of `lahman-raw.db` with data corrections, key
|
||||||
|
constraints, and additional schema modifications.
|
||||||
|
|
||||||
|
If you only want the raw data, run `make lahman-raw.db` instead.
|
||||||
|
|
||||||
|
### Corrections
|
||||||
|
Many of the corrections are simple in nature, like:
|
||||||
|
* correcting obvious typos
|
||||||
|
* changing empty cells to NULL
|
||||||
|
* deleting duplicated data
|
||||||
|
|
||||||
|
See the `sql` directory to view the exact SQL statements run for each table.
|
Loading…
Reference in New Issue