Create a SQLite database from Lahman data
Go to file
filifa 557e016751 fix small typo 2024-05-05 19:12:55 -05:00
sql fix angels team id 2024-05-05 00:18:57 -05:00
.gitignore remove old data dir 2024-05-04 21:47:09 -05:00
COPYING add license 2024-05-04 22:45:47 -05:00
Makefile remove old file from build steps 2024-05-04 23:01:53 -05:00
README.md fix small typo 2024-05-05 19:12:55 -05:00

README.md

lahmanlite

lahmanlite is a project for creating a SQLite database of baseball statistics from the Lahman database/Baseball Databank.

A makefile and SQL scripts are provided that can create a database from Lahman's CSV files. Ideally, this means that as long as new releases continue (and the structure of the releases is maintained), an up-to-date database can be created. I have also done my best to normalize the data, incorporate constraints, and correct errors I've found.

How to use

Using either export or env, set the LAHMANLITE_CSV_DIR environment variable to the directory containing Lahman data, then run make. This will generate two files:

  • lahman-raw.db, a straight import of the CSV data into SQLite.
  • lahman.db, a modified version of lahman-raw.db with data corrections, key constraints, and additional schema modifications.

If you only want the raw data, run make lahman-raw.db instead.

Corrections

Many of the corrections are simple in nature, like:

  • correcting obvious typos
  • changing empty cells to NULL
  • deleting duplicated data

See the sql directory to view the exact SQL statements run for each table.