Friday, October 31, 2014

Seamheads.com Ballparks Database

2013 Stats Now Included!

Ballparks / Years / Teams / Cities / About

The Seamheads.com Parkfactors Database (KJOK Parkfactors 2011_03_07.accdb)
Release Date: March 7, 2011

----------------------------------------------------------------------
README CONTENTS
0.1 Copyright Notice
0.2 Contact Information
1.0 Release Contents
1.1 Introduction
1.2 What's New
1.3 Acknowledgements
1.4 Using this Database
2.0 Data Tables
2.1 Home Main Data Tables
2.2 Visitor Main Data Table
2.3 ParkConfig Table
2.4 RH_LH_Data Tables
2.5 Parks Table
2.6 Teams Table
2.7 Leagues Table
2.8 LeagueTeams Table
2.9 x-Reference Table
3.0 Data Issues
4.0 Online Version of Database

----------------------------------------------------------------------
0.1 Copyright Notice & Limited Use License

This database is copyright 2011 by Kevin D. Johnson. A license is granted
for individual use for research purposes. It may not be re-distributed
without permission. Any commercial use, or other dissemination of the
database in part or in whole is prohibited. Use of this database
constitutes acceptance of these terms.

For licensing information or further information, contact me at
kjokbaseball@yahaoo.com.

----------------------------------------------------------------------
0.2 Contact Information

Yahoo egroup: kjokbaseball
E-Mail : kjokbaseball@yahoo.com

----------------------------------------------------------------------
1.0 Release Contents

 

MS Access Versions:
KJOK Parkfactors 2011_03_07.accdb
KJOK Parkfactors 2011_03_07 Documentation.txt

Comma Delimited Version:
KJOK Parkfactors 2011_03_07 Documentation.txt
Home Main Data W_O Parks.csv
Home Main Data With Parks Break.csv
Leagues.csv
LeaguesTeams.csv
ParkConfig.csv
Parks.csv
RH_LH_ALL_HR_ONLY.csv
RH_LH_Data.csv
Retroshet_BBDB_Team_XREF.csv
Teams.csv
Visitor_Main_Data.csv

----------------------------------------------------------------------
1.1 Introduction

This database contains batting statistics by ballpark for
Major League Baseball from 1871 through 2010. It includes data from
the two current leagues (American and National), the four other "major"
leagues (American Association, Union Association, Players League, and
Federal League), and the National Association of 1871-1875.

This database also contains park configuration data by year for each
major league ballpark used. This data, however, should be understood
to be based on many reported measurements which may be unreliable, may
conflict with other reported measurements, and which generally needs
to be researched more thoroughly.

If you have any problems or find any errors, please let me know. Any
feedback is appreciated

----------------------------------------------------------------------
1.2 What's New

2011 February Version data changes (02/17/2011)

The 2011 version includes 2010, 2009, 1953, 1952, 1951 and 1950 home and away splits, and LH and RH splits based on
Retrosheet play-by-play-years. (Note some of this data prior to 1974 is incomplete)

Also added are home and away splits for Retrosheet ‘boxscore’ years from 1919-1949. (Note some of this data is incomplete)

2011 February Version table changes (02/17/2011):

a) Deleted RH_LH_Data_1871_1953A as HRs were the only component that had splits during those years, so that data was
moved to RH_LH_ALL_HR_Only.

b) Changed RH_LH_Data_1953N_2008 to RH_LH_Data_1950_2010 to reflect additional RH LH split season data.

c) Changed RH_LH_ALL_HR_0nly_1953N_2008 to RH_LH_ALL_HR_Only as it now includes all HR split data for all seasons.

d) Added Retrosheet_BBDB_Team_XREF to assist in building the database with data from both Retrosheet
and Baseball Databank combined, which use different team IDs.

e) Removed Field LgID from all RH_LH Tables as data was redundant with other tables.

2011 March Version Data Changes (03/07/2011)

Added RH/LH splits for Retrosheet 'boxscore' years from 1919-1949 (note sometimes bathand is unknown for switch-hitters
and for some batters.

Added complete data from Ron Selter for 1901-1919 Configurations.

----------------------------------------------------------------------
1.3 Acknowledgements

This database has been built based on the data in many other sources,
and help from many people, including:

The MacMillan Encyclopedia - Original source of Home and Road Runs for 20th century thru 1987
The Lahman Database - Source of Team/League Data
The SABR Home Run Log (David Vincent) - source of Home and Road Home Runs
Victor Wilson - LH/RH H/A HR breakdowns
Brian Cartwright – LH/RH H/A breakdowns of Retrosheet.org Event File data
Mark Miller - LH/RH H/A breakdowns of Retrosheet.org Event File data
Tom M. Tango (TangoTiger) - Database Design
Sean Forman - Database Design
Paul Wendt - Various 19th century ballpark usage issues
Charles Saeger - Original source of 19th Century Home and Road Home Runs
Eric Jones - Detailed 1914 & 1915 Federal League Home and Road Data
Green Cathedrals by Phillip Lowry - primary source of Park Configuration data
Ballparks of the Deadball Era by Ron Selter - primary source of Deadball era Park Configuration data
Ballparks.com - Secondary source of Park Configuration data
Ballparksofbaseball.com - Secondary source of Park Configuration data
Retrosheet.org - Game Logs source of Home and Road data. Event Files source of LH/RH H/A by Park Data
Clem Comley provided breakouts of Retrosheet Boxscore files for 1919-1949 RH/LH H/A by Park data

The information used here was obtained free of
charge from and is copyrighted by Retrosheet. Interested
parties may contact Retrosheet at "www.retrosheet.org".

For the online version of this database at www.seamheads.com, special thanks to Mike Lynch for his leadership and persistence and
Dan Hirsch for his technical expertise in setting up the user interface.

Thanks to all, and if I missed anyone, please let me know.

----------------------------------------------------------------------
1.4 Using this Database

This version of the database is available in Microsoft Access
format or in a generic, comma delimited format. Because this is a
relational database, you will not be able to use the data in a
flat-database application.

Please note that this is not a stand alone application. It requires
a database application or some other application designed specifically
to interact with the database.

------------------------------------------------------------------------------
2.0 Data Tables

The design follows these general principles. Each park is assigned a
unique number (ParkID). All of the information relating to that park
is tagged with its ParkID. The ParkIDs are linked to Teams and
Years in the main data tables.

The database is comprised of the following main tables:

Home Main Data With Park Break - Runs and Home Runs for Home and Opponent teams for all seasons by Park by Team By Season.
Other offensive events (doubles, triples, etc) for the seasons:
1919-2010
1914 (FL),
1915 (FL),
1911 (NL),
1871 (NA)

Home Main Data W/O Parks - Runs and Home Runs for Home and Opponent teams for all seasons by Team by Season.
Other offensive events (doubles, triples, etc) for the seasons:
1919-2010
1914 (FL),
1915 (FL),
1911 (NL),
1871 (NA)

Visitor Main Data - Runs and Home Runs for Visitor and Opponent teams for all seasons by Team By Season.
Other offensive events (doubles, triples, etc) for the seasons:
1919-2010
1914 (FL),
1915 (FL),
1911 (NL),
1871 (NA)

ParkConfig - Data on descriptive park items such as foul line distances, fence heights, capacities, etc.

It is supplemented by these tables:

RH_LH_Data - Subset of Main Data Tables broken down by LH/RH for most offensive events within H/A for
years where available (note some years before 1974 incomplete):

RH_LH_ALL_HR_ONLY - Home Runs broken down by LH/RH within H/A for all seasons by Team by Season
(may not match home runs in RH_LH_Data_1950_2010 for any years where Retrosheet play-by-play data is complete).

Parks - Park ID Master Table
Teams - Team ID Master Table
Leagues - League ID Master Table
LeagueTeams - League/Team Valid Combinations Master Table

Sections 2.1 through 2.9 of this document describe each of the tables in
detail and the fields that each contains.

---------------------------------------------------------------------------

2.1 Home Main Data With Parks Break

Year Season
TeamID Lahman Database Team Identifier
ParkID Park Code based on Retrosheet Park IDs (NOT in Home_Main_Data_WO_Parks)
LgID League (NL, AL, AA, etc.)
SEQ Numeric Code which helps link a team's "main" home park data with the road data for that same year (NOT in Home Main Data W/O Parks)
GP_H Games Played as Home Team
R_Off_H Runs Scored as Home Team
R_Def_H Runs Allowed as Home Team
HR_Off_H Home Runs Hit as Home Team
HR_Def_H Home Runs Allowed as Home Team

AB_Off_H At Bats as Home Team
H_Off_H Base Hits as Home Team
D_Off_H Doubles as Home Team
T_Off_H Triples as Home Team
RBI_Off_H RBI's as Home Team
SH_Off_H Sacrifices as Home Team
SF_Off_H Sacrifice Flies as Home Team
HBP_OFF_H Hit By Pitches as Home Team
BB_Off_H Base on Balls as Home Team
IW_Off_H Intentional Walks as Home Team
K_Off_H Strikeouts as Home Team
SB_Off_H Stolen Bases as Home Team
CS_Off_H Caught Stealing as Home Team
GDP_Off_H Grounded Into Double Plays as Home Team

AB_Def_H At Bats for Opposition when Home Team
H_Def_H Base Hits Allowed as Home Team
D_Def_H Doubles Allowed as Home Team
T_Def_H Triples Allowed as Home Team
RBI_Def_H RBI's Allowed as Home Team
SH_Def_H Sacrifices Allowed as Home Team
SF_Def_H Sacrifice Flies Allowed as Home Team
HBP_Def_H Hit By Pitches Allowed as Home Team
BB_Def_H Base on Balls Allowed as Home Team
IW_Def_H Intentional Walks Given as Home Team
K_Def_H Strikeouts Made as Home Team
SB_Def_H Stolen Bases Allowed as Home Team
CS_Def_H Caught Stealing Made as Home Team
GDP_Def_H Grounded Into Double Plays Made as Home Team
------------------------------------------------------------------------------

2.11 Home Main Data W_O Parks

Year Season
TeamID Lahman Database Team Identifier
LgID League (NL, AL, AA, etc.)
GP_H Games Played as Home Team
R_Off_H Runs Scored as Home Team
R_Def_H Runs Allowed as Home Team
HR_Off_H Home Runs Hit as Home Team
HR_Def_H Home Runs Allowed as Home Team

AB_Off_H At Bats as Home Team
H_Off_H Base Hits as Home Team
D_Off_H Doubles as Home Team
T_Off_H Triples as Home Team
RBI_Off_H RBI's as Home Team
SH_Off_H Sacrifices as Home Team
SF_Off_H Sacrifice Flies as Home Team
HBP_OFF_H Hit By Pitches as Home Team
BB_Off_H Base on Balls as Home Team
IW_Off_H Intentional Walks as Home Team
K_Off_H Strikeouts as Home Team
SB_Off_H Stolen Bases as Home Team
CS_Off_H Caught Stealing as Home Team
GDP_Off_H Grounded Into Double Plays as Home Team

AB_Def_H At Bats for Opposition when Home Team
H_Def_H Base Hits Allowed as Home Team
D_Def_H Doubles Allowed as Home Team
T_Def_H Triples Allowed as Home Team
RBI_Def_H RBI's Allowed as Home Team
SH_Def_H Sacrifices Allowed as Home Team
SF_Def_H Sacrifice Flies Allowed as Home Team
HBP_Def_H Hit By Pitches Allowed as Home Team
BB_Def_H Base on Balls Allowed as Home Team
IW_Def_H Intentional Walks Given as Home Team
K_Def_H Strikeouts Made as Home Team
SB_Def_H Stolen Bases Allowed as Home Team
CS_Def_H Caught Stealing Made as Home Team
GDP_Def_H Grounded Into Double Plays Made as Home Team
-----------------------------------------------------------------------------

2.2 Visitor Main Data

Year Season
TeamID Lahman Database Team Identifier
ParkID Park Code based on Retrosheet Park IDs
LgID League (NL, AL, AA, etc.)
SEQ Numeric Code which helps link a team's "main" home park data with the road data for that same year
GP_A Games Played as Visiting Team
R_Off_A Runs Scored as Visiting Team
R_Def_A Runs Allowed as Visiting Team
HR_Off_A Home Runs Hit as Visiting Team
HR_Def_A Home Runs Allowed as Visiting Team

AB_Off_A At Bats as Visiting Team
H_Off_A Base Hits as Visiting Team
D_Off_A Doubles as Visiting Team
T_Off_A Triples as Visiting Team
RBI_Off_A RBI's as Visiting Team
SH_Off_A Sacrifices as Visiting Team
SF_Off_A Sacrifice Flies as Visiting Team
HBP_OFF_A Hit By Pitches as Visiting Team
BB_Off_A Base on Balls as Visiting Team
IW_Off_A Intentional Walks as Visiting Team
K_Off_A Strikeouts as Visiting Team
SB_Off_A Stolen Bases as Visiting Team
CS_Off_A Caught Stealing as Visiting Team
GDP_Off_A Grounded Into Double Plays as Visiting Team

AB_Def_A At Bats for Opposition when Visiting Team
H_Def_A Base Hits Allowed as Visiting Team
D_Def_A Doubles Allowed as Visiting Team
T_Def_A Triples Allowed as Visiting Team
RBI_Def_A RBI's Allowed as Visiting Team
SH_Def_A Sacrifices Allowed as Visiting Team
SF_Def_A Sacrifice Flies Allowed as Visiting Team
HBP_Def_A Hit By Pitches Allowed as Visiting Team
BB_Def_A Base on Balls Allowed as Visiting Team
IW_Def_A Intentional Walks Given as Visiting Team
K_Def_A Strikeouts Made as Visiting Team
SB_Def_A Stolen Bases Allowed as Visiting Team
CS_Def_A Caught Stealing Made as Visiting Team
GDP_Def_A Grounded Into Double Plays Made as Visiting Team
------------------------------------------------------------------------------
2.3 ParkConfig table

ParkID Park Code based on Retrosheet Park IDs
Name Most common name used for park IN THAT SEASON
Year Season
Capacity Estimated Normal Maximum Capacity
Surface Type of Surface (N=Natural, T=Turf)
Area_Fair Square Feet of Fair Territory estimated in thousands of Square Feet.
Cover Type of Roof (O=Open, D=Dome, R=Retractable)
LF_Dim Left Field Line Fence Distance in Feet at the Foul Pole
SLF_Dim Straightaway Left Field Distance in Feet approx. 15 degrees in from foul line
LFA_Dim Left Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line
LC_Dim Left Center Field Distance in Feet approx. 30 degrees in from foul line
RCC_Dim Right Centerfield Corner Distance in Feet between Left Center and CF
CF_Dim Centerfield (straightway) Fence Distance in feet 45 degrees in from foul lines
LCC_Dim Left Centerfield Corner Distance in Feet between Right Center and CF
RC_Dim Right Center Field Distance in Feet approx. 30 degrees in from foul line
RFA_Dim Right Field Power Alley Distance in Feet approx. 22.5 degrees in from foul line
SRF_Dim Straightaway Right Field Distance in Feet approx. 15 degrees in from foul line
RF_Dim Right Field Line Fence Distance in Feet at the Foul Pole
Backstop Distance from Home Plate to Stands
Foul Foul Territory Area (L=Large, N=Normal, S=Small)
LF_W Left Field Wall Height in Feet
LC_W Left Center Field Wall Height in Feet
CF_W Center Field Wall Height in Feet
RC_W Right Center Field Wall Height in Feet
RF_W Right Field Wall Height in Feet
Comments Comments about remodeling, fires, special features, etc.
------------------------------------------------------------------------------
2.4 RH_LH_Data

Year Season
TeamID Lahman Database Team Identifier
Off_Def Indicator of team being on offense or defense
H_A Indicator of team being Home or Visitors
Bathand R=Right, L=Left, B=Switch-hitter, bathand unknown, X=bathand unknown
AB At Bats
H Hits
2B Doubles
3B Triples
HR Home Runs
RBI Runs Batted In
BB Base on Balls
IBB Intentional Walks
K Strikeouts
HBP Hit By Pitches
SF Sacrifice Flies
SH Sacrifices
GDP Ground into Double Plays

------------------------------------------------------------------------------
2.41 RH_LH_ALL_HR_Only

Year Season
TeamID Lahman Database Team Identifier
Off_Def Indicator of team being on offense or defense
H_A Indicator of team being Home or Visitors
Bathand R=Right and L=Left
HR Home Runs
------------------------------------------------------------------------------

2.5 Parks

PARKID Park Code based on Retrosheet Park IDs
NAME Most Common Name for Park DURING IT's LIFETIME
CITY City Location of Park
STATE STATE or Province Location of Park
START_DATE Date of first major league game at Park
END_DATE Date of last major league game at Park
LEAGUE League that Park was most often used in
NOTES Various Notes about Park
AKA Other Names Park may have been known as
------------------------------------------------------------------------------

2.6 Teams

TeamID Lahman Database Team Identifier
FullName City and Nickname of Team
City City Name only of Team
Nickname Nickname only of team
------------------------------------------------------------------------------

2.7 Leagues

LgID League (NL, AL, AA, etc.)
LgName Name of League
Start_Year First Major League Season of League
End_Year Last Major League Season of League
Comments Comments about league history
------------------------------------------------------------------------------

2.8 LeagueTeams table

TeamID Lahman Database Team Identifier
Year Season
LgID League (NL, AL, AA, etc.)
------------------------------------------------------------------------------

2.9 Retrosheet_BBDB_Team_Xref

Year Season
RetroID Retrosheet Team ID
BBDBID Baseball-Databank Team ID

------------------------------------------------------------------------------

 

3.0 Data Issues

RH_LH_Data Table MAY not tie exactly to Home_Main_Data Tables and Visitor_Main_Data in some seasons for all data due to incomplete play by play data in Retrosheet Event Files.

RH_LH_ALL_HR_Only Table home runs MAY not tie exactly to RH_LH_Data table home runs due to incomplete play by play data in Retrosheet Event Files.

LH/RH HR data for 1914 and older seasons have records marked as "U" for "Unknown" as the handedness of some batters is not known.

LH/RH data for 1919 and older seasons have records marked as "X" for "unknown" and "B" for "switch-hitter, handedness unknown".

If anyone knows where to find the exact numbers for these items, please let me know.

------------------------------------------------------------------------------

 

4.0 Online Version of Database

The online database version includes the following additional data:

Latitude and Longitude of Park location.
Altitude of park location
Map of park location
Percentage calculations by season of Turf and Roof types.
Averages by season of field dimensions, wall heights, fair territory, and backstop distances.
Totals of games by team and by city.

1 Year Park Factors:

The 1 year park factors are based on UNREGRESSED observed data. There is an 'other parks corrector' calculation
made due to the other road parks' total difference from the league average being offset by the park rating
of the park that is being rated. In other words, if you're calculating factors for Coors Field, then Coors itself
is not part of the 'road' set of parks it is being compared against, so that road set of parks is actually
slightly pitcher friendly (assuming Coors is hitter friendly that year) instead of being 100% neutral, so the
'other parks corrector' makes an adjustment for that fact. Except for the other parks corrector calculation,
the 1-year park factors are simple rates of components per At Bat at home divided by rate of components per AB on the road.

3 Year Park Factors:

The 3 year park factors are REGRESSED, and meant as an estimate of the park's 'true' impact on batting components.
We calculate this factor slightly different from Total Baseball/Baseball-Reference.com (see http://www.baseball-reference.com/about/parkadjust.shtml
for an explanation of that calculation). For a given park/season, we use the 1-year factor for that park/season plus
the 1-year factor for the PREVIOUS season plus the 1-year factor for the FOLLOWING season. We weight each based on the
number of home games for each season, but otherwise we weight them equally (we don't add weight to the current season).
Then we regress that number 25% towards that parks' long-term historical factor for that component. The long-term
historical factor is the sum of all 1-year factors for the history of the park weighted by home games each season.
We believe this gives us a closest possible approximation of the 'true' park factor without adding more complicating
variables such as modifications to park characteristics, new parks in the league, weighting the long term
factor by number of seasons, etc.

For the very first and very last year of a park, since there is no PREVIOUS or FOLLOWING season in those cases, we
chose to use an additional following season (season +2) for the first park year and an additional previous season
(season - 2) in those calculations. This results in the first TWO seasons and the last TWO seasons of the 3 year
calculations being the same! What we're saying is that lacking an adjacent season our best guess of the first
and last seasons of a parks existence is the same guess we use for the 2nd season and the next to last season.
If anyone wants to prove that there is a more accurate way to handle these 'end' seasons, we are very open to ideas.


Ballparks Database created by Kevin Johnson. Internet portion designed by Dan Hirsch, creator of The Baseball Gauge.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Download Ballparks Database