Create a SQLite database from Lahman data
Go to file
filifa 9f21bbbaae add readme 2024-05-04 22:22:38 -05:00
sql change filename to lahman 2024-05-04 22:11:20 -05:00
.gitignore remove old data dir 2024-05-04 21:47:09 -05:00
Makefile change env var name 2024-05-04 22:14:53 -05:00
README.md add readme 2024-05-04 22:22:38 -05:00

README.md

lahmanlite

lahmanlite is a project for creating a SQLite database of baseball statistics from the Lahman database/Baseball Databank.

A makefile and SQL scripts are provided that can create a database from Lahman's CSV files. Ideally, this means that as long as new releases continue, (and the structure of the releases is maintained), an up-to-date database can be created. I have also done my best to normalize the data, incorporate constraints, and correct errors I've found.

How to use

Using either export or env, set the LAHMANLITE_CSV_DIR environment variable to the directory containing Lahman data, then run make. This will generate two files:

  • lahman-raw.db, a straight import of the CSV data into SQLite.
  • lahman.db, a modified version of lahman-raw.db with data corrections, key constraints, and additional schema modifications.

If you only want the raw data, run make lahman-raw.db instead.

Corrections

Many of the corrections are simple in nature, like:

  • correcting obvious typos
  • changing empty cells to NULL
  • deleting duplicated data

See the sql directory to view the exact SQL statements run for each table.