add readme
This commit is contained in:
		
							parent
							
								
									3b9349d4e1
								
							
						
					
					
						commit
						9f21bbbaae
					
				| 
						 | 
				
			
			@ -0,0 +1,27 @@
 | 
			
		|||
# lahmanlite
 | 
			
		||||
`lahmanlite` is a project for creating a SQLite database of baseball statistics
 | 
			
		||||
from [the Lahman database/Baseball Databank](https://seanlahman.com).
 | 
			
		||||
 | 
			
		||||
A makefile and SQL scripts are provided that can create a database from
 | 
			
		||||
Lahman's CSV files. Ideally, this means that as long as new releases continue,
 | 
			
		||||
(and the structure of the releases is maintained), an up-to-date database can
 | 
			
		||||
be created. I have also done my best to normalize the data, incorporate
 | 
			
		||||
constraints, and correct errors I've found.
 | 
			
		||||
 | 
			
		||||
## How to use
 | 
			
		||||
Using either `export` or `env`, set the `LAHMANLITE_CSV_DIR` environment
 | 
			
		||||
variable to the directory containing Lahman data, then run `make`. This will
 | 
			
		||||
generate two files:
 | 
			
		||||
* `lahman-raw.db`, a straight import of the CSV data into SQLite.
 | 
			
		||||
* `lahman.db`, a modified version of `lahman-raw.db` with data corrections, key
 | 
			
		||||
  constraints, and additional schema modifications.
 | 
			
		||||
 | 
			
		||||
If you only want the raw data, run `make lahman-raw.db` instead.
 | 
			
		||||
 | 
			
		||||
### Corrections
 | 
			
		||||
Many of the corrections are simple in nature, like:
 | 
			
		||||
* correcting obvious typos
 | 
			
		||||
* changing empty cells to NULL
 | 
			
		||||
* deleting duplicated data
 | 
			
		||||
 | 
			
		||||
See the `sql` directory to view the exact SQL statements run for each table.
 | 
			
		||||
		Loading…
	
		Reference in New Issue