# lahmanlite `lahmanlite` is a project for creating a SQLite database of baseball statistics from [the Lahman database/Baseball Databank](https://seanlahman.com). A makefile and SQL scripts are provided that can create a database from Lahman's CSV files. Ideally, this means that as long as new releases continue (and the structure of the releases is maintained), an up-to-date database can be created. I have also done my best to normalize the data, incorporate constraints, and correct errors I've found. ## How to use Using either `export` or `env`, set the `LAHMANLITE_CSV_DIR` environment variable to the directory containing Lahman data, then run `make`. This will generate two files: * `lahman-raw.db`, a straight import of the CSV data into SQLite. * `lahman.db`, a modified version of `lahman-raw.db` with data corrections, key constraints, and additional schema modifications. If you only want the raw data, run `make lahman-raw.db` instead. ### Corrections Many of the corrections are simple in nature, like: * correcting obvious typos * changing empty cells to NULL * deleting duplicated data See the `sql` directory to view the exact SQL statements run for each table.