Monday, August 31, 2015

Seamheads.com Ballparks Database

2014 Stats Now Included!

Ballparks / Years / Teams / Cities / About

The Seamheads.com Parkfactors Database (KJOK Parkfactors 2015_02_23.accdb)
Release Date: February 23, 2015

----------------------------------------------------------------------
README CONTENTS
0.1 Copyright Notice
0.2 Contact Information
1.0 Release Contents
1.1 Introduction
1.2 What's New
1.3 Acknowledgements
1.4 Using this Database
2.0 Data Tables
2.1 Home Main Data Tables
2.2 Visitor Main Data Table
2.3 ParkConfig Table
2.4 RH_LH_Data Tables
2.5 Parks Table
2.6 Teams Table
2.7 Leagues Table
2.8 LeagueTeams Table
2.9 x-Reference Table
3.0 Data Issues
4.0 Online Version of Database

----------------------------------------------------------------------
0.1 Copyright Notice & Limited Use License

This database is copyright 2015 by Kevin D. Johnson. A license is granted
for individual use for research purposes. It may not be re-distributed
without permission. Any commercial use, or other dissemination of the
database in part or in whole is prohibited. Use of this database
constitutes acceptance of these terms.

For licensing information or further information, contact me at
kjokbaseball@yahaoo.com.

----------------------------------------------------------------------
0.2 Contact Information

Yahoo egroup: kjokbaseball
E-Mail : kjokbaseball@yahoo.com

----------------------------------------------------------------------
1.0 Release Contents

 

MS Access Versions:
KJOK Parkfactors 2015_02_23.accdb
KJOK Parkfactors 2015_02_23.txt Documentation.txt

Comma Delimited Version:
KJOK Parkfactors 2015_02_23 Documentation.txt
Home Main Data.csv
Home Main Data W Parks Breakout.csv
Leagues.csv
LeaguesTeams.csv
Park Events.csv
ParkConfig.csv
Parks.csv
RH_LH_ALL_HR.csv
RH_LH_Data.csv
Retroshet_BBDB_Team_XREF.csv
Teams.csv
Visitor_Main_Data.csv

----------------------------------------------------------------------
1.1 Introduction

This database contains batting statistics by ballpark for
Major League Baseball from 1871 through 2015. It includes data from
the two current leagues (American and National), the four other "major"
leagues (American Association, Union Association, Players League, and
Federal League), and the National Association of 1871-1875.

This database also contains park configuration data by year for each
major league ballpark used. This data, however, should be understood
to be based on many reported measurements which may be unreliable and may
conflict with other reported measurements. Some reported data has been
corrected via the usage of maps and geometric approximations to make
the data as accurate as possible.

If you have any problems or find any errors, please let me know. Any
feedback is appreciated

----------------------------------------------------------------------
1.2 What's New

2015 February Version data changes (02/22/2015)

The 2015 version now includes 2015, 1936 and 1925 home and away splits, and LH and RH splits based on
Retrosheet play-by-play-years. (note sometimes bathand is unknown for switch-hitters
and for some other batters).

Added numerous configuration data updates from Ron Selter for 1890-1919 Configurations.

The field Foul in the Park Config Table now includes foul territory size (per thousand square feet) for selected parks.
Most parks still have a letter code (L=Large, N=Normal, S=Small)

----------------------------------------------------------------------
1.3 Acknowledgements

This database has been built based on the data in many other sources,
and help from many people over the years, including:

The MacMillan Encyclopedia - Original source of Home and Road Runs for 20th century thru 1987
Retrosheet.org - Game Logs source of Home and Road data. Event Files source of LH/RH H/A by Park Data
The SABR Home Run Log (David Vincent) - source of Home and Road Home Runs
The Lahman Database - Source of Team/League Data
Green Cathedrals by Phillip Lowry - primary source of Park Configuration data
Ballparks of the Deadball Era by Ron Selter - primary source of Deadball era Park Configuration data
Ballparks.com - Secondary source of Park Configuration data
Ballparksofbaseball.com - Secondary source of Park Configuration data
Tom M. Tango (TangoTiger) - Database Design
Sean Forman - Database Design
Dan Hirsch - LH/RH H/A breakdowns of Retrosheet.org Event file data
Brian Cartwright – LH/RH H/A breakdowns of Retrosheet.org Event File data
Mark Miller - LH/RH H/A breakdowns of Retrosheet.org Event File data
Clem Comley provided breakouts of Retrosheet Boxscore files for 1919-1949 RH/LH H/A by Park data
Victor Wilson - LH/RH H/A HR breakdowns
Charles Saeger - Original source of 19th Century Home and Road Home Runs
Paul Wendt - Various 19th century ballpark usage issues
Eric Jones - Detailed 1914 & 1915 Federal League Home and Road Data

The information used here was obtained free of
charge from and is copyrighted by Retrosheet. Interested
parties may contact Retrosheet at "www.retrosheet.org".

For the online version of this database at www.seamheads.com, special thanks to Mike Lynch for his leadership and persistence and
Dan Hirsch for his technical expertise in setting up the user interface.

Thanks to all, and if I missed anyone, please let me know.

----------------------------------------------------------------------
1.4 Using this Database

This version of the database is available in Microsoft Access
format or in a generic, comma delimited format. Because this is a
relational database, you will not be able to use the data in a
flat-database application.

Please note that this is not a stand alone application. It requires
a database application or some other application designed specifically
to interact with the database.

------------------------------------------------------------------------------
2.0 Data Tables

The design follows these general principles. Each park is assigned a
unique number (ParkID). All of the information relating to that park
is tagged with its ParkID. The ParkIDs are linked to Teams and
Years in the main data tables.

The database is comprised of the following main tables:

Home Main Data W Park BreakOUT - Runs and Home Runs for Home and Opponent teams for all seasons by Park by Team By Season.
Other offensive events (doubles, triples, etc) for the seasons:
1914-2014
1911 (NL),
1871, 1872, 1874 (NA)

Home Main Data - Runs and Home Runs for Home and Opponent teams for all seasons by Team by Season.
Other offensive events (doubles, triples, etc) for the seasons:
1914-2014
1911 (NL),
1871, 1872, 1874 (NA)

Visitor Main Data - Runs and Home Runs for Visitor and Opponent teams for all seasons by Team By Season.
Other offensive events (doubles, triples, etc) for the seasons:
1914-2014
1911 (NL),
1871, 1872, 1874 (NA)

ParkConfig - Data on descriptive park items such as foul line distances, fence heights, capacities, etc.

It is supplemented by these tables:

RH_LH_Data - Subset of Main Data Tables broken down by LH/RH for most offensive events within H/A for
years where available (note some years before 1974 incomplete):

RH_LH_ALL_HR - Home Runs broken down by LH/RH within H/A for all seasons by Team by Season
(may not match Home Runs in RH_LH_Data for any years where Retrosheet play-by-play data is complete).

Parks - Park ID Master Table
Teams - Team ID Master Table
Leagues - League ID Master Table
LeagueTeams - League/Team Valid Combinations Master Table
Retrosheet_BBDB_Team_Xref - Cross reference of team IDs between Retrosheet and Baseball Databank where different
Park Events - Firsts and Lasts special events per park for selected parks

Sections 2.1 through 2.9 of this document describe each of the tables in
detail and the fields that each contains.

---------------------------------------------------------------------------

2.1 Home Main Data W Parks Breakout

Year Season
TeamID Lahman Database Team Identifier
ParkID Park Code based on Retrosheet Park IDs (NOT in Home_Main_Data_WO_Parks)
LgID League (NL, AL, AA, etc.)
SEQ Numeric Code which helps link a team's "main" home park data with the road data for that same year (NOT in Home Main Data W/O Parks)
GP_H Games Played as Home Team
R_Off_H Runs Scored as Home Team
R_Def_H Runs Allowed as Home Team
HR_Off_H Home Runs Hit as Home Team
HR_Def_H Home Runs Allowed as Home Team
AB_Off_H At Bats as Home Team
H_Off_H Base Hits as Home Team
D_Off_H Doubles as Home Team
T_Off_H Triples as Home Team
RBI_Off_H RBI's as Home Team
SH_Off_H Sacrifices as Home Team
SF_Off_H Sacrifice Flies as Home Team
HBP_OFF_H Hit By Pitches as Home Team
BB_Off_H Base on Balls as Home Team
IW_Off_H Intentional Walks as Home Team
K_Off_H Strikeouts as Home Team
SB_Off_H Stolen Bases as Home Team
CS_Off_H Caught Stealing as Home Team
GDP_Off_H Grounded Into Double Plays as Home Team
AB_Def_H At Bats for Opposition when Home Team
H_Def_H Base Hits Allowed as Home Team
D_Def_H Doubles Allowed as Home Team
T_Def_H Triples Allowed as Home Team
RBI_Def_H RBI's Allowed as Home Team
SH_Def_H Sacrifices Allowed as Home Team
SF_Def_H Sacrifice Flies Allowed as Home Team
HBP_Def_H Hit By Pitches Allowed as Home Team
BB_Def_H Base on Balls Allowed as Home Team
IW_Def_H Intentional Walks Given as Home Team
K_Def_H Strikeouts Made as Home Team
SB_Def_H Stolen Bases Allowed as Home Team
CS_Def_H Caught Stealing Made as Home Team
GDP_Def_H Grounded Into Double Plays Made as Home Team
------------------------------------------------------------------------------

2.11 Home Main Data

Year Season
TeamID Lahman Database Team Identifier
LgID League (NL, AL, AA, etc.)
GP_H Games Played as Home Team
R_Off_H Runs Scored as Home Team
R_Def_H Runs Allowed as Home Team
HR_Off_H Home Runs Hit as Home Team
HR_Def_H Home Runs Allowed as Home Team
AB_Off_H At Bats as Home Team
H_Off_H Base Hits as Home Team
D_Off_H Doubles as Home Team
T_Off_H Triples as Home Team
RBI_Off_H RBI's as Home Team
SH_Off_H Sacrifices as Home Team
SF_Off_H Sacrifice Flies as Home Team
HBP_OFF_H Hit By Pitches as Home Team
BB_Off_H Base on Balls as Home Team
IW_Off_H Intentional Walks as Home Team
K_Off_H Strikeouts as Home Team
SB_Off_H Stolen Bases as Home Team
CS_Off_H Caught Stealing as Home Team
GDP_Off_H Grounded Into Double Plays as Home Team
AB_Def_H At Bats for Opposition when Home Team
H_Def_H Base Hits Allowed as Home Team
D_Def_H Doubles Allowed as Home Team
T_Def_H Triples Allowed as Home Team
RBI_Def_H RBI's Allowed as Home Team
SH_Def_H Sacrifices Allowed as Home Team
SF_Def_H Sacrifice Flies Allowed as Home Team
HBP_Def_H Hit By Pitches Allowed as Home Team
BB_Def_H Base on Balls Allowed as Home Team
IW_Def_H Intentional Walks Given as Home Team
K_Def_H Strikeouts Made as Home Team
SB_Def_H Stolen Bases Allowed as Home Team
CS_Def_H Caught Stealing Made as Home Team
GDP_Def_H Grounded Into Double Plays Made as Home Team
-----------------------------------------------------------------------------

2.2 Visitor Main Data

Year Season
TeamID Lahman Database Team Identifier
ParkID Park Code based on Retrosheet Park IDs
LgID League (NL, AL, AA, etc.)
SEQ Numeric Code which helps link a team's "main" home park data with the road data for that same year
GP_A Games Played as Visiting Team
R_Off_A Runs Scored as Visiting Team
R_Def_A Runs Allowed as Visiting Team
HR_Off_A Home Runs Hit as Visiting Team
HR_Def_A Home Runs Allowed as Visiting Team
AB_Off_A At Bats as Visiting Team
H_Off_A Base Hits as Visiting Team
D_Off_A Doubles as Visiting Team
T_Off_A Triples as Visiting Team
RBI_Off_A RBI's as Visiting Team
SH_Off_A Sacrifices as Visiting Team
SF_Off_A Sacrifice Flies as Visiting Team
HBP_OFF_A Hit By Pitches as Visiting Team
BB_Off_A Base on Balls as Visiting Team
IW_Off_A Intentional Walks as Visiting Team
K_Off_A Strikeouts as Visiting Team
SB_Off_A Stolen Bases as Visiting Team
CS_Off_A Caught Stealing as Visiting Team
GDP_Off_A Grounded Into Double Plays as Visiting Team
AB_Def_A At Bats for Opposition when Visiting Team
H_Def_A Base Hits Allowed as Visiting Team
D_Def_A Doubles Allowed as Visiting Team
T_Def_A Triples Allowed as Visiting Team
RBI_Def_A RBI's Allowed as Visiting Team
SH_Def_A Sacrifices Allowed as Visiting Team
SF_Def_A Sacrifice Flies Allowed as Visiting Team
HBP_Def_A Hit By Pitches Allowed as Visiting Team
BB_Def_A Base on Balls Allowed as Visiting Team
IW_Def_A Intentional Walks Given as Visiting Team
K_Def_A Strikeouts Made as Visiting Team
SB_Def_A Stolen Bases Allowed as Visiting Team
CS_Def_A Caught Stealing Made as Visiting Team
GDP_Def_A Grounded Into Double Plays Made as Visiting Team
------------------------------------------------------------------------------
2.3 ParkConfig table

ParkID Park Code based on Retrosheet Park IDs
Name Most common name used for park IN THAT SEASON
Year Season
Capacity Estimated Normal Maximum Capacity
Surface Type of Surface (N=Natural, T=Turf)
Area_Fair Square Feet of Fair Territory estimated in thousands of Square Feet.
Cover Type of Roof (O=Open, D=Dome, R=Retractable)
LF_Dim Left Field Line Fence Distance in Feet at the Foul Pole
SLF_Dim Straightaway Left Field Distance in Feet approx. 15 degrees in from foul line
LFA_Dim Left Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line
LC_Dim Left Center Field Distance in Feet approx. 30 degrees in from foul line
RCC_Dim Right Centerfield Corner Distance in Feet between Left Center and CF
CF_Dim Centerfield (straightway) Fence Distance in feet 45 degrees in from foul lines
LCC_Dim Left Centerfield Corner Distance in Feet between Right Center and CF
RC_Dim Right Center Field Distance in Feet approx. 30 degrees in from foul line
RFA_Dim Right Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line
SRF_Dim Straightaway Right Field Distance in Feet approx. 15 degrees in from foul line
RF_Dim Right Field Line Fence Distance in Feet at the Foul Pole
Backstop Distance from Home Plate to Stands
Foul Square Feet of Foulr Territory estimated in thousands of Square Feet OR (L=Large, N=Normal, S=Small)
LF_W Left Field Wall Height in Feet
LC_W Left Center Field Wall Height in Feet
CF_W Center Field Wall Height in Feet
RC_W Right Center Field Wall Height in Feet
RF_W Right Field Wall Height in Feet
Comments Comments about remodeling, fires, special features, etc.
------------------------------------------------------------------------------
2.4 RH_LH_Data

Year Season
TeamID Lahman Database Team Identifier
Park_ID Park Code based on Retrosheet Park IDs
Off_Def Indicator of team being on offense or defense
H_A Indicator of team being Home or Visitors
Bathand R=Right, L=Left, B=Switch-hitter, bathand unknown, X=bathand unknown
AB At Bats
1B Singles
2B Doubles
3B Triples
HR Home Runs
RBI Runs Batted In
BB Base on Balls
IBB Intentional Walks
K Strikeouts
HBP Hit By Pitches
SF Sacrifice Flies
SH Sacrifices
GDP Ground into Double Plays
PA Plate Appearances
R Runs
G Games

------------------------------------------------------------------------------
2.41 RH_LH_ALL_HR

Year Season
TeamID Lahman Database Team Identifier
Off_Def Indicator of team being on offense or defense
H_A Indicator of team being Home or Visitors
Bathand R=Right and L=Left
HR Home Runs
------------------------------------------------------------------------------

2.5 Parks

PARKID Park Code based on Retrosheet Park IDs
NAME Most Common Name for Park DURING IT's LIFETIME
CITY City Location of Park
STATE STATE or Province Location of Park
START Date of first major league game at Park
END Date of last major league game at Park
LEAGUE League that Park was most often used in
NOTES Various Notes about Park
AKA Other Names Park may have been known as
EXACT Latitude and Longitude are exact known coordinates
Latitude Latitude location of park in degrees decimal
Longitude Longitude location of park in degrees longitude
Altitude Altitude of park in thousands of feet
------------------------------------------------------------------------------

2.6 Teams

TeamID Lahman Database Team Identifier
League League
Start Year First Year of Team
ENd Year Last Year of Team
City City Name only of Team
Nickname Nickname only of team
------------------------------------------------------------------------------

2.7 Leagues

LgID League (NL, AL, AA, etc.)
LgName Name of League
Start_Year First Major League Season of League
End_Year Last Major League Season of League
Comments Comments about league history
------------------------------------------------------------------------------

2.8 LeagueTeams table

LgID League (NL, AL, AA, etc.)
TeamID Lahman Database Team Identifier
Year Season

------------------------------------------------------------------------------

2.9 Retrosheet_BBDB_Team_Xref

Year Season
RetroID Retrosheet Team ID
BBDBID Baseball-Databank Team ID

------------------------------------------------------------------------------

2.10 Park Events

ParkID
First_Game
Score_Winner
First_Attendance
First_Starting_Pitchers
First_Batter
First_Game_Result
First_Hit
First_Result_Inning
First_Run
First_RBI
FIRST_HR
Date_Game_No
First K_Batter
First_Win
First_Loss
First_Grand_Slam
First_Inside_the_Park_HR
First_No_Hitter
Last_Game
Score_Winner
Last-Attendance
Last_Starting_Pitchers
Last_Batter
Last_Game_Result
Last_Hit
Last_Result_Inning
Last_Run
Last_RBI
Last_HR
Last_K_Batter
Last_Win
Last_Loss
Last_Grand_Slam
Last_Inside_the_Park_HR
Last_Most_Recent_No_Hitter
Trivia

3.0 Data Issues

RH_LH_Data Table MAY not tie exactly to Home_Main_Data Tables and Visitor_Main_Data in some seasons for all data due to incomplete play by play data in Retrosheet Event Files.

RH_LH_ALL_HR_Only Table home runs MAY not tie exactly to RH_LH_Data table home runs due to incomplete play by play data in Retrosheet Event Files.

LH/RH HR data for 1914 thru 1918 seasons have records marked as "U" for "Unknown" as the handedness of some batters is not known.

LH/RH data for 1919 thru 1933 seasons have records marked as "X" for "unknown" and "B" for "switch-hitter, handedness unknown".

If anyone knows where to find the exact numbers for these items, please let me know.

------------------------------------------------------------------------------

 

4.0 Online Version of Database

The online database version includes the following additional data:

Map of park location
Percentage calculations by season of Turf and Roof types.
Averages by season of field dimensions, wall heights, fair territory, and backstop distances.
Totals of games by team and by city.

1 Year Park Factors:

The 1 year park factors are based on UNREGRESSED observed data. There is an 'other parks corrector' calculation
made due to the other road parks' total difference from the league average being offset by the park rating
of the park that is being rated. In other words, if you're calculating factors for Coors Field, then Coors itself
is not part of the 'road' set of parks it is being compared against, so that road set of parks is actually
slightly pitcher friendly (assuming Coors is hitter friendly that year) instead of being 100% neutral, so the
'other parks corrector' makes an adjustment for that fact. Except for the other parks corrector calculation,
the 1-year park factors are simple rates of components per At Bat at home divided by rate of components per AB on the road.

3 Year Park Factors:

The 3 year park factors are REGRESSED, and meant as an estimate of the park's 'true' impact on batting components.
We calculate this factor slightly different from Total Baseball/Baseball-Reference.com (see http://www.baseball-reference.com/about/parkadjust.shtml
for an explanation of that calculation). For a given park/season, we use the 1-year factor for that park/season plus
the 1-year factor for the PREVIOUS season plus the 1-year factor for the FOLLOWING season. We weight each based on the
number of home games for each season, but otherwise we weight them equally (we don't add weight to the current season).
Then we regress that number 25% towards that parks' long-term historical factor for that component. The long-term
historical factor is the sum of all 1-year factors for the history of the park weighted by home games each season.
We believe this gives us a closest possible approximation of the 'true' park factor without adding more complicating
variables such as modifications to park characteristics, new parks in the league, weighting the long term
factor by number of seasons, etc.

For the very first and very last year of a park, since there is no PREVIOUS or FOLLOWING season in those cases, we
chose to use an additional following season (season +2) for the first park year and an additional previous season
(season - 2) in those calculations. This results in the first TWO seasons and the last TWO seasons of the 3 year
calculations being the same! What we're saying is that lacking an adjacent season our best guess of the first
and last seasons of a parks existence is the same guess we use for the 2nd season and the next to last season.
If anyone wants to prove that there is a more accurate way to handle these 'end' seasons, we are very open to ideas.

<end of file>


Ballparks Database created by Kevin Johnson. Internet portion designed by Dan Hirsch, creator of The Baseball Gauge.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Download Ballparks Database