Seamheads.com Ballparks Database

The Seamheads.com Parkfactors Database (KJOK Parkfactors 2017_12_13.accdb)

Release Date:  December 14, 2017

 

----------------------------------------------------------------------

README CONTENTS

  0.1 Copyright Notice

  0.2 Contact Information

  1.0 Release Contents

  1.1 Introduction

  1.2 What's New

  1.3 Acknowledgements

  1.4 Using this Database

  2.0 Data Tables

  2.1 Home Main Data Tables

  2.2 Visitor Main Data Table

  2.3 ParkConfig Table

  2.4 RH_LH_Data Tables

  2.5 Parks Table

  2.6 Teams Table

  2.7 Leagues Table

  2.8 LeagueTeams Table

  2.9 x-Reference Table

  3.0 Data Issues

  4.0 Online Version of Database

 

----------------------------------------------------------------------

0.1 Copyright Notice & Limited Use License

 

This database is copyright 2017 by Kevin D. Johnson. A license is granted

for individual use for research purposes. It may not be re-distributed

without permission. Any commercial use, or other dissemination of the

database in part or in whole is prohibited. Use of this database

constitutes acceptance of these terms. 

 

For licensing information or further information, contact me at

kjokbaseball@yahaoo.com.

 

----------------------------------------------------------------------

0.2 Contact Information

 

Yahoo egroup: kjokbaseball

 E-Mail : kjokbaseball@yahoo.com

 

 

----------------------------------------------------------------------

1.0  Release Contents

 

 

 

MS Access Versions:

      KJOK Parkfactors 2017_12_14.accdb

      KJOK Parkfactors 2017_12_14.txt Documentation.txt

 

Comma Delimited Version:

      Seamheads Ballpark Database 2017_12_14 Documentation.txt

      Active Park Events.csv

      Home Main Data.csv 

      Home Main Data With Parks Breakout.csv

      Leagues.csv

      LeaguesTeams.csv

      Park Events.csv

      ParkConfig.csv

      Parks.csv

      Retroshet_BBDB_Team_XREF.csv

      RH_LH_ALL_HR.csv

      RH_LH_Boxscore.csv

      RH_LH_Data.csv

      Teams.csv

      Visitor_Main_Data.csv

 

 

----------------------------------------------------------------------

1.1 Introduction

 

This database contains batting statistics by ballpark for

Major League Baseball from 1871 through 2017.  It includes data from

the two current leagues (American and National), the four other "major"

leagues (American Association, Union Association, Players League, and

Federal League), and the National Association of 1871-1875.

 

This database also contains park configuration data by year for each

major league ballpark used.   This data, however, should be understood

to be based on many reported measurements which may be unreliable and may

conflict with other reported measurements.  Some reported data has been

corrected via the usage of maps and geometric approximations to make

the data as accurate as possible.

 

If you have any problems or find any errors, please let me know.  Any

feedback is appreciated

 

----------------------------------------------------------------------

1.2 What's New

 

2017 December Version data changes (12/14/2017)

 

The 2017 version now includes 2017, 1908, 1909 and 1910 home and away, and LH and RH splits based on

Retrosheet play-by-play and Boxscore data. (note sometimes bathand is unknown for switch-hitters

and for some other batters pre-1941).

 

Tables deleted:

 

 

RH_LH_Boxscore.csv - Retrosheet 'boxscore' data.  In the RH_LH_Data.csv, boxscore data is used for any seasons prior to 1941,

even though there are 16 seasons (1925-1940) that have both some play by play and boxscore data.

Since RH_LH_Data data now has this boxscore data for 1908-1940, plus contains data for 1941-2017 seasons that

have Event File data, the RH_LH_Boxscore data is no longer provided separately.

 

 

Added numerous configuration data updates.

 

 

 

----------------------------------------------------------------------

1.3 Acknowledgements

 

 

This database has been built based on the data in many other sources,

and help from many people over the years, including:

 

 

The MacMillan Encyclopedia - Original source of Home and Road Runs for 20th century thru 1987

Retrosheet.org - Game Logs source of Home and Road data.  Event Files source of LH/RH H/A by Park Data

The SABR Home Run Log (David Vincent) - source of Home and Road Home Runs

Dan Hirsch - LH/RH H/A breakdowns of Retrosheet.org Event and Boxscore file data

Charles Saeger - Original source of 19th Century Home and Road Home Runs

The Lahman Database - Source of Team/League Data

Green Cathedrals by Phillip Lowry - primary source of Park Configuration data

Ballparks of the Deadball Era by Ron Selter - primary source of Deadball era Park Configuration data

Ballparks.com - Secondary source of Park Configuration data

Ballparksofbaseball.com - Secondary source of Park Configuration data

Tom M. Tango (TangoTiger) - Database Design Assistance

Sean Forman - Database Design Assistance

Brian Cartwright – LH/RH H/A breakdowns of Retrosheet.org Event File data

Mark Miller - LH/RH H/A breakdowns of Retrosheet.org Event File data

Howard Johnson - LH/RH H/A breakdowns of Retrosheet.org Event and Boxscore File data

Clem Comley provided breakouts of Retrosheet Boxscore files for RH/LH H/A by Park data

Victor Wilson - LH/RH H/A HR breakdowns

Paul Wendt - Various 19th century ballpark usage issues

Eric Jones - Detailed 1914 & 1915 Federal League Home and Road Data

 

     The information used here was obtained free of

     charge from and is copyrighted by Retrosheet.  Interested

     parties may contact Retrosheet at "www.retrosheet.org".

 

For the online version of this database at www.seamheads.com, special thanks to Mike Lynch for his leadership and persistence and

Dan Hirsch for his technical expertise in setting up the user interface.

 

Thanks to all, and if I missed anyone, please let me know.

 

----------------------------------------------------------------------

1.4 Using this Database

 

This version of the database is available in Microsoft Access

format or in a generic, comma delimited format. Because this is a

relational database, you will not be able to use the data in a

flat-database application.

 

 

Please note that this is not a stand alone application.  It requires

a database application or some other application designed specifically

to interact with the database.

 

 

------------------------------------------------------------------------------

2.0 Data Tables

 

The design follows these general principles.  Each park is assigned a

unique number (ParkID).  All of the information relating to that park

is tagged with its ParkID.  The ParkIDs are linked to Teams and

Years in the main data tables.

 

The database is comprised of the following main tables:

 

Home Main Data W Park BreakOUT - Runs and Home Runs for Home and Opponent teams for all seasons by Park by Team By Season.

               Other offensive events (doubles, triples, etc) for the seasons:

               1908-2017

               1871, 1872, 1874 (NA)

 

Home Main Data - Runs and Home Runs for Home and Opponent teams for all seasons by Team by Season.

               Other offensive events (doubles, triples, etc) for the seasons:

               1908-2017

               1871, 1872, 1874 (NA)

 

Visitor Main Data - Runs and Home Runs for Visitor and Opponent teams for all seasons by Team By Season.

               Other offensive events (doubles, triples, etc) for the seasons:

               1908-2017

               1871, 1872, 1874 (NA)

 

ParkConfig - Data on descriptive park items such as foul line distances, fence heights, capacities, etc.

 

It is supplemented by these tables:

 

RH_LH_Data - Subset of Main Data Tables broken down by LH/RH for most offensive events within H/A for

 years where available (note some years before 1974 incomplete). Box Score Data was used to supplement the data for seasons prior to 1941.

 

RH_LH_ALL_HR - Home Runs broken down by LH/RH within H/A for all seasons by Team by Season

(may not match Home Runs in  RH_LH_Data for any years where Retrosheet play by play data is incomplete). 

              

Parks - Park ID Master Table

Teams - Team ID Master Table

Leagues  - League ID Master Table

LeagueTeams - League/Team Valid Combinations Master Table

Retrosheet_BBDB_Team_Xref - Cross reference of team IDs between Retrosheet and Baseball Databank where different

Park Events - Firsts and Lasts, special events per park for selected historical parks (under construction)

Active Park Events - Firsts and lasts, special events per park for active parks (under construction)

 

Sections 2.1 through 2.9 of this document describe each of the tables in

detail and the fields that each contains.

 

---------------------------------------------------------------------------

 

2.1 Home Main Data With Parks Breakout

 

Year                      Season

TeamID                Lahman Database Team Identifier

ParkID                  Park Code based on Retrosheet Park IDs (NOT in Home_Main_Data_WO_Parks)

LgID                      League (NL, AL, AA, etc.)

SEQ                       Numeric Code which helps link a team's "main" home park data with the road data for that same year (NOT in Home Main Data W/O Parks)

GP_H                    Games Played as Home Team

R_Off_H              Runs      Scored as Home Team

R_Def_H              Runs      Allowed as Home Team

HR_Off_H            Home Runs Hit as Home Team

HR_Def_H           Home Runs Allowed as Home Team

AB_Off_H            At Bats as Home Team

H_Off_H                             Base Hits as Home Team

D_Off_H                             Doubles as Home Team

T_Off_H                              Triples as Home Team

RBI_Off_H           RBI's as Home Team

SH_Off_H            Sacrifices as Home Team

SF_Off_H             Sacrifice Flies as Home Team

HBP_OFF_H        Hit By Pitches as Home Team

BB_Off_H            Base on Balls as Home Team

IW_Off_H            Intentional Walks as Home Team

K_Off_H                              Strikeouts as Home Team

SB_Off_H            Stolen Bases as Home Team

CS_Off_H            Caught Stealing as Home Team

GDP_Off_H         Grounded Into Double Plays as Home Team

AB_Def_H           At Bats for Opposition when Home Team

H_Def_H                             Base Hits Allowed as Home Team

D_Def_H                             Doubles Allowed as Home Team

T_Def_H                             Triples Allowed as Home Team

RBI_Def_H          RBI's Allowed as Home Team

SH_Def_H            Sacrifices Allowed as Home Team

SF_Def_H            Sacrifice Flies Allowed as Home Team

HBP_Def_H         Hit By Pitches Allowed as Home Team

BB_Def_H           Base on Balls Allowed as Home Team

IW_Def_H           Intentional Walks Given as Home Team

K_Def_H                             Strikeouts Made as Home Team

SB_Def_H            Stolen Bases Allowed as Home Team

CS_Def_H            Caught Stealing Made as Home Team

GDP_Def_H         Grounded Into Double Plays Made as Home Team

------------------------------------------------------------------------------

 

2.11 Home Main Data

 

Year                      Season

TeamID                Lahman Database Team Identifier

LgID                      League (NL, AL, AA, etc.)

GP_H                    Games Played as Home Team

R_Off_H                             Runs Scored as Home Team

R_Def_H                             Runs Allowed as Home Team

HR_Off_H            Home Runs Hit as Home Team

HR_Def_H           Home Runs Allowed as Home Team

AB_Off_H            At Bats as Home Team

H_Off_H                             Base Hits as Home Team

D_Off_H                             Doubles as Home Team

T_Off_H                              Triples as Home Team

RBI_Off_H           RBI's as Home Team

SH_Off_H            Sacrifices as Home Team

SF_Off_H             Sacrifice Flies as Home Team

HBP_OFF_H        Hit By Pitches as Home Team

BB_Off_H            Base on Balls as Home Team

IW_Off_H            Intentional Walks as Home Team

K_Off_H                              Strikeouts as Home Team

SB_Off_H            Stolen Bases as Home Team

CS_Off_H            Caught Stealing as Home Team

GDP_Off_H         Grounded Into Double Plays as Home Team

AB_Def_H           At Bats for Opposition when Home Team

H_Def_H                             Base Hits Allowed as Home Team

D_Def_H                             Doubles Allowed as Home Team

T_Def_H                             Triples Allowed as Home Team

RBI_Def_H          RBI's Allowed as Home Team

SH_Def_H            Sacrifices Allowed as Home Team

SF_Def_H            Sacrifice Flies Allowed as Home Team

HBP_Def_H         Hit By Pitches Allowed as Home Team

BB_Def_H           Base on Balls Allowed as Home Team

IW_Def_H           Intentional Walks Given as Home Team

K_Def_H                             Strikeouts Made as Home Team

SB_Def_H            Stolen Bases Allowed as Home Team

CS_Def_H            Caught Stealing Made as Home Team

GDP_Def_H         Grounded Into Double Plays Made as Home Team

-----------------------------------------------------------------------------

 

2.2 Visitor Main Data

 

Year                      Season

TeamID                Lahman Database Team Identifier

ParkID                  Park Code based on Retrosheet Park IDs

LgID                      League (NL, AL, AA, etc.)

SEQ                       Numeric Code which helps link a team's "main" home park data with the road data for that same year

GP_A                    Games Played as Visiting Team

R_Off_A                              Runs Scored as Visiting Team

R_Def_A                             Runs Allowed as Visiting Team

HR_Off_A            Home Runs Hit as Visiting Team

HR_Def_A           Home Runs Allowed as Visiting Team

AB_Off_A            At Bats as Visiting Team

H_Off_A                             Base Hits as Visiting Team

D_Off_A                             Doubles as Visiting Team

T_Off_A                              Triples as Visiting Team

RBI_Off_A           RBI's as Visiting Team

SH_Off_A            Sacrifices as Visiting Team

SF_Off_A             Sacrifice Flies as Visiting Team

HBP_OFF_A         Hit By Pitches as Visiting Team

BB_Off_A            Base on Balls as Visiting Team

IW_Off_A            Intentional Walks as Visiting Team

K_Off_A                              Strikeouts as Visiting Team

SB_Off_A             Stolen Bases as Visiting Team

CS_Off_A             Caught Stealing as Visiting Team

GDP_Off_A         Grounded Into Double Plays as Visiting Team

AB_Def_A            At Bats for Opposition when Visiting Team

H_Def_A                             Base Hits Allowed as Visiting Team

D_Def_A                             Doubles Allowed as Visiting Team

T_Def_A                             Triples Allowed as Visiting Team

RBI_Def_A          RBI's Allowed as Visiting Team

SH_Def_A            Sacrifices Allowed as Visiting Team

SF_Def_A             Sacrifice Flies Allowed as Visiting Team

HBP_Def_A         Hit By Pitches Allowed as Visiting Team

BB_Def_A            Base on Balls Allowed as Visiting Team

IW_Def_A           Intentional Walks Given as Visiting Team

K_Def_A                             Strikeouts Made as Visiting Team

SB_Def_A            Stolen Bases Allowed as Visiting Team

CS_Def_A            Caught Stealing Made as Visiting Team

GDP_Def_A         Grounded Into Double Plays Made as Visiting Team

------------------------------------------------------------------------------

2.3 ParkConfig table

 

ParkID                  Park Code based on Retrosheet Park IDs

Name                   Most common name used for park IN THAT SEASON

Year                      Season

Capacity              Estimated Normal Maximum Capacity

Surface                Type of Surface (N=Natural, T=Turf)

Area_Fair             Square Feet of Fair Territory estimated in thousands of Square Feet

Cover                   Type of Roof (O=Open, D=Dome, R=Retractable)

LF_Dim                 Left Field Line Fence Distance in Feet at the Foul Pole

SLF_Dim                              Straightaway Left Field Distance in Feet approx. 15 degrees in from foul line

LFA_Dim                             Left Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line

LC_Dim                Left Center Field Distance in Feet approx. 30 degrees in from foul line

RCC_Dim                            Right Centerfield Corner Distance in Feet between Left Center and CF

CF_Dim                Centerfield (straightway) Fence Distance in feet 45 degrees in from foul lines

LCC_Dim                             Left Centerfield Corner Distance in Feet between Right Center and CF

RC_Dim                Right Center Field Distance in Feet approx. 30 degrees in from foul line

RFA_Dim                             Right Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line

SRF_Dim                             Straightaway Right Field Distance in Feet approx. 15 degrees in from foul line

RF_Dim                Right Field Line Fence Distance in Feet at the Foul Pole

Backstop             Distance from Home Plate to Stands

Foul                      Square Feet of Foul Territory estimated in thousands of Square Feet OR (L=Large, N=Normal, S=Small)

LF_W                    Left Field Wall Height in Feet

LC_W                   Left Center Field Wall Height in Feet

CF_W                   Center Field Wall Height in Feet

RC_W                   Right Center Field Wall Height in Feet

RF_W                   Right Field Wall Height in Feet

Comments          Comments about remodeling, fires, special features, etc.

------------------------------------------------------------------------------

2.4 RH_LH_Data

 

Year                      Season

TeamID                Lahman Database Team Identifier

Park_ID                Park Code based on Retrosheet Park IDs

Off_Def                Indicator of team being on offense or defense

H_A                       Indicator of team being Home or Visitors

Bathand                              R=Right, L=Left, B=Switch-hitter, bathand unknown, U=bathand unknown

AB                         At Bats

1B                         Singles

2B                         Doubles

3B                         Triples

HR                         Home Runs

RBI                        Runs Batted In

BB                         Base on Balls

IBB                        Intentional Walks

K                            Strikeouts

HBP                       Hit By Pitches

SF                          Sacrifice Flies

SH                         Sacrifices

GDP                      Ground into Double Plays

PA                         Plate Appearances

R                            Runs

G                           Games

SB                          Stolen Bases (based on batter handedness)

CS                          Caught Stealing (based on batter handedness)

ROE                      Reached on Error

 

------------------------------------------------------------------------------

2.41 RH_LH_ALL_HR

 

Year                      Season

TeamID                Lahman Database Team Identifier

Off_Def               Indicator of team being on offense or defense

H_A                       Indicator of team being Home or Visitors

Bathand                              R=Right and L=Left

HR                         Home Runs

------------------------------------------------------------------------------

 

2.5  Parks

 

PARKID                 Park Code based on Retrosheet Park IDs

NAME                   Most Common Name for Park DURING IT's LIFETIME

CITY                      City Location of Park

STATE                   STATE or Province Location of Park

START                   Date of first major league game at Park

END                      Date of last major league game at Park

LEAGUE               League that Park was most often used in

NOTES                  Various Notes about Park

AKA                       Other Names Park may have been known as

EXACT                  Latitude and Longitude are exact known coordinates

Latitude               Latitude location of park in degrees decimal

Longitude            Longitude location of park in degrees longitude

Altitude Altitude of park in thousands of feet

 

NOTE: The EXACT locations of all parks ever used in MLB games are now the Parks table, except for two parks:

Monumental Park - Baltimore, MD

Ludlow Park - Ludlow, KY

------------------------------------------------------------------------------

 

2.6  Teams

 

TeamID                Lahman Database Team Identifier

League                 League

Start Year            First Year of Team

ENd Year             Last Year of Team

City                       City Name only of Team

Nickname            Nickname only of team

------------------------------------------------------------------------------

 

2.7  Leagues

 

LgID                      League (NL, AL, AA, etc.)

LgName                              Name of League

Start_Year           First Major League Season of League

End_Year             Last Major League Season of League

Comments          Comments about league history

------------------------------------------------------------------------------

 

2.8 LeagueTeams table

 

LgID                      League (NL, AL, AA, etc.)

TeamID                Lahman Database Team Identifier

Year                      Season

 

------------------------------------------------------------------------------

 

2.9 Retrosheet_BBDB_Team_Xref

 

Year                      Season

RetroID                Retrosheet Team ID

BBDBID                Baseball-Databank Team ID

 

------------------------------------------------------------------------------

 

2.10 Park Events

 

ParkID                                

First_Game                       

Score_Winner                  

First_Attendance             

First_Starting_Pitchers   

First_Batter

First_Game_Result

First_Hit

First_Result_Inning

First_Run

First_RBI

FIRST_HR

Date_Game_No

First K_Batter

First_Win

First_Loss

First_Grand_Slam

First_Inside_the_Park_HR

First_No_Hitter

Last_Game

Score_Winner

Last-Attendance

Last_Starting_Pitchers   

Last_Batter

Last_Game_Result

Last_Hit

Last_Result_Inning

Last_Run

Last_RBI

Last_HR

Last_K_Batter

Last_Win

Last_Loss

Last_Grand_Slam

Last_Inside_the_Park_HR

Last_Most_Recent_No_Hitter

Trivia

 

2.11 Active Park Events

 

ParkID                                

Name   

City

Startdate

Enddate

League

Notes

AKA

FirstGame

Score    

FirstAttendance

FirstStarting_Pitchers     

FirstBatter

Result

FirstHit

ResultInning

FirstRun

FirstRBI

FIRSTHR

Date (GameNo)

First K/Batter

FirstWin

FirstLoss

FirstGrandSlam

FirstInsidetheParkHR

FirstNo_Hitter

LastGame

ScoreWinner

LastAttendance

LastStartingPitchers        

LastBatter

LastGameResult

LastHit

LastResultInning

LastRun

LastRBI

LastHR

LastK/Batter

LastWin

LastLoss

LastGrandSlam

LastInsidetheParkHR

Last/MostRecentNoHitter

Trivia

 

3.0 Data Issues

 

RH_LH_Data Table MAY not tie exactly to Home_Main_Data Tables and Visitor_Main_Data in some seasons for all data due to incomplete play by play data in Retrosheet Event Files.

 

RH_LH_ALL_HR_Only Table home runs MAY not tie exactly to RH_LH_Data table home runs due to incomplete play by play data in Retrosheet Event Files.

 

LH/RH HR data for 1908 thru 1940 seasons have records marked as "U" for "Unknown" as the handedness of some batters is not known.

 

LH/RH data for 1908 thru 1940 seasons have records marked as "B" for "switch-hitter, handedness unknown".

 

If anyone knows where to find the exact numbers for these items, please let me know.

 

------------------------------------------------------------------------------

 

 

 

4.0 Online Version of Database

 

The online database version includes the following additional data:

 

Map of park location

Percentage calculations by season of Turf and Roof types.

Averages by season of field dimensions, wall heights, fair territory, and backstop distances.

Totals of games by team and by city.

 

 

1 Year Park Factors:

 

The 1 year park factors are based on UNREGRESSED observed data. There is an 'other parks corrector' calculation

made due to the other road parks' total difference from the league average being offset by the park rating

of the park that is being rated. In other words, if you're calculating factors for Coors Field, then Coors itself

is not part of the 'road' set of parks it is being compared against, so that road set of parks is actually

slightly pitcher friendly (assuming Coors is hitter friendly that year) instead of being 100% neutral, so the

'other parks corrector' makes an adjustment for that fact.  Except for the other parks corrector calculation,

the 1-year park factors are simple rates of components per At Bat at home divided by rate of components per AB on the road.

 

 

3 Year Park Factors:

 

The 3 year park factors are REGRESSED, and meant as an estimate of the park's 'true' impact on batting components.

We calculate this factor slightly different from Total Baseball/Baseball-Reference.com (see http://www.baseball-reference.com/about/parkadjust.shtml

for an explanation of that calculation).  For a given park/season, we use the 1-year factor for that park/season plus

the 1-year factor for the PREVIOUS season plus the 1-year factor for the FOLLOWING season.  We weight each based on the

number of home games for each season, but otherwise we weight them equally (we don't add weight to the current season).

Then we regress that number 25% towards that parks' long-term historical factor for that component.  The long-term

historical factor is the sum of all 1-year factors for the history of the park weighted by home games each season.

We believe this gives us a closest possible approximation of the 'true' park factor without adding more complicating

variables such as modifications to park characteristics, new parks in the league, weighting the long term

factor by number of seasons, etc.

 

For the very first and very last year of a park, since there is no PREVIOUS or FOLLOWING season in those cases, we

chose to use an additional following season (season +2) for the first park year and an additional previous season

(season - 2) in those calculations.  This results in the first TWO seasons and the last TWO seasons of the 3 year

calculations being the same!  What we're saying is that lacking an adjacent season our best guess of the first

and last seasons of a parks existence is the same guess we use for the 2nd season and the next to last season.

If anyone wants to prove that there is a more accurate way to handle these 'end' seasons, we are very open to ideas.