X

Track changes made to this page

If you find this page useful and would like to be notified of changes made to this page, start by inputting your email below.



Privacy policy
Close this window

powered by ChangeDetection

Pennsylvania Technical Information


General

CSVs are copied to the server. Their group and ownership are changed to 'mysql' using the chgrp and chown commands. They are then moved to /var/lib/mysql/wellsdb_dev/ such that they can be read by mysql.

CSVs should be checked for " \n \ and other special characters prior to being uploaded to the server. These special characters can cause the files to be processed incorrectly, which leads to data going into the wrong columns or not being uploaded at all.

Always make backups prior to destructive operations.

Well

A list of well APIs with some information from production / waste was compiled and added to the well table. Only wells that do not already exist are added. Additional fields in well are to be updated using the well update script.

LOAD DATA INFILE 'wells.csv' IGNORE

INTO TABLE well FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n' IGNORE 1 ROWS (WELL_API, WELL_API_COUNTY_ID, OPERATOR_NAME, OPERATOR_ID, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, UNCONVENTIONAL)

[Added 676 rows]


Production

All previous Pennsylvania production information was deleted. All production data was downloaded from the PADEP site with date range from 01/01/1800 to 05/26/2015.

An excel macro has been written and stored on google drive in order to expedite pre-processing of the production CSV from the PADEP website. Well APIs were created by prepending 37- to Well Permit # values.


LOAD DATA INFILE 'mergedProduction.csv' IGNORE

INTO TABLE production FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n' IGNORE 1 ROWS (WELL_API, WELL_API_COUNTY_ID, PERIOD_ID, PRODUCTION_INDICATOR, WELL_STATUS, FARM_NAME, @SPUD_DATE_TMP, GAS_QUANTITY_MCF, GAS_PRODUCTION_DAYS, CONDENSATE_QUANTITY_BBL, CONDENSATE_PRODUCTION_DAYS, OIL_QUANTITY_BBL, OIL_PRODUCTION_DAYS, AVERAGED, OPERATOR_NAME, OPERATOR_ID, WELL_COUNTY, WELL_MUNICIPALITY, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, UNCONVENTIONAL, CONFIGURATION, HOME_USE, REPORTING_PERIOD, COMMENT_REASON, COMMENT_TEXT,DB_COMMENTS, WELL_COUNTRY, WELL_STATE_CODE)

SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');

1143915 rows deleted.

1485575 rows inserted. ( Query took 125.4058 sec )

Waste

Like production, all previous waste information was removed to be replaced with the most recent information provided by PADEP, with date range from 01/01/1800 to 05/26/2015. Preprocessing necessary:

Insert WELL_API in leftmost column Append "37-" to Well Permit # values to obtain WELL_API values. Insert WELL_STATE_CODE after column P Insert COUNTRY after WELL_STATE_CODE No DB comments No SPUD_DATE_TEXT


LOAD DATA INFILE 'WasteExport2014-2.csv'

INTO TABLE waste

FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(WELL_API, WELL_API_COUNTY_ID, PERIOD_ID, WELL_STATUS, FARM_NAME, WELL_ID, @SPUD_DATE_TMP, WASTE_TYPE, WASTE_QUANTITY, UNITS, DISPOSAL_METHOD, AVERAGED, OPERATOR_NAME, OPERATOR_OGO, WELL_COUNTY, WELL_MUNICIPALITY, WELL_STATE_CODE, WELL_COUNTRY, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, UNCONVENTIONAL, CONFIGURATION, HOME_USE, WASTE_FACILITY_PERMIT_ID, WASTE_FACILITY_NAME, FACILITY_ADDRESS_1, FACILITY_ADDRESS_2, FACILITY_CITY, FACILITY_STATE, FACILITY_ZIP_CODE, FACILITY_PHONE, FACILITY_LATITUDE, FACILITY_LONGITUDE, REPORTING_PERIOD, COMMENT_REASON, COMMENT_TEXT) SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');


CSVs for Compliance, Permits, and SPUD dates were obtained from http://www.portal.state.pa.us/portal/server.pt/community/oil_and_gas_reports/20297 using their respectively linked utilities. Default settings for all categories were appropriate except for "inspections with violations only" for compliance, which had to be set to "no" in order to properly obtain all compliance records.


SPUD

Only records past 2014/10/13 were used to avoid data duplication.


Need to prepend "37-" to all entries in WELL_API

WELL_API can be cloned without "37-" for WELL_API_COUNTY_ID

Delete REGION column

Change WELL_CODE_DESC to WELL_TYPE

Delete WELL_STATUS

Add State Code and Country Column before FARM_NAME

No Farm Well NO


LOAD DATA INFILE 'cleanedSPUD.csv'

INTO TABLE spud FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(@SPUD_DATE_TMP, WELL_API, WELL_API_COUNTY_ID, OPERATOR_OGO, OPERATOR_NAME, WELL_COUNTY, WELL_MUNICIPALITY, WELL_STATE_CODE, WELL_COUNTRY, FARM_NAME, WELL_TYPE, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, CONFIGURATION, UNCONVENTIONAL) SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');


PERMITS

Grabbed all permits from 2014-10-09 to 2015-05-27

delete REGION

Insert column before WELL_API, prepend with "37-" with WELL_API entries. Old

Insert state code and country columns after municipality

Delete LATITUDE_DEGREES, LONGITUDE_DEGREES

No PERMIT_ISSUED_DATE, DB_COMMENTS, SPUD_DATE_TEXT, permit_application_date, permit_issue_date, permit_status, proposed_total_depth


LOAD DATA INFILE 'perms.csv'

INTO TABLE permit FIELDS TERMINATED BY ','

ENCLOSED BY '"'

LINES TERMINATED BY '\n'

IGNORE 1 ROWS

(WELL_COUNTY, WELL_MUNICIPALITY, WELL_STATE_CODE, WELL_COUNTRY, PERMIT_ISSUED_DATE, OPERATOR_NAME, APPLICATION_TYPE, AUTHORIZATION_DESCRIPTION, WELL_API, WELL_API_COUNTY_ID, UNCONVENTIONAL, CONFIGURATION, WELL_TYPE, FARM_NAME, @SPUD_DATE_TMP, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, OPERATOR_OGO, OPERATOR_ADDRESS, OPERATOR_CITY, OPERATOR_STATE, OPERATOR_ZIP, AUTHORIZATION_ID, CLIENT_ID, PRIMARY_FACILITY_ID) SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');


Compliance

Use dates from 2014-9-10 to 2015-05-27

WELL_API created by appending 37- to API_PERMIT values

Creating entries for wells not in well table: Delete columns: INSPECTION_DATE, INSPECTION_TYPE, INSPECTION_ID, FARM_NAME, SITE_NAME, FACILITY_TYPE, INSPECTION_CATEGORY, REGION, COUNTY, MUNICIPALITY, INSPECTION_RESULT_DESCRIPTION, INSPECTION_CATEGORY, VIOLATION*, RESOLVED_DATE, RESOLUTION_REASON_CODE_DESCRIPTION, ENFORCEMENT_ID, , ENFORCEMENT_DESCRIPTION, DATE_EXECUTED, ENFORCEMENT_FINAL_DATE, ENFORCEMENT_STATUS, PENALTY*, TOTAL_AMOUNT_COLLECTED

Insert prepended 37-XXX column for WELL_API.

Perhaps data isn't sanitized properly? Will give another try with new data.

Table should be modified for Penalty / Fines.


Compliance has a autogenerated primary key INSP_ID_PK This identifies the WELL_API with Violation (INSP_ID_FK) and has its own primary key. We can export a list of all wells with the corresponding INSP_ID_PK and do a lookup to replace keys appropriately in excel.

Data from: http://www.depweb.state.pa.us/portal/server.pt/community/oil_and_gas_reports/20297


Ran ComplianceClean macro. Needed to edit file in notepad++ to manually remove all instances of " character (and some other special characters resembling quotations)


LOAD DATA INFILE 'complianceClean.csv' INTO TABLE compliance FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (WELL_ID_FK, INSP_ID, @INSPECTION_DATE_TMP, INSPECTION_TYPE, @dummy, INSP_CATEGORY, INSPECTION_RESULT_DESC, INSP_COMMENT) SET INSPECTION_DATE = STR_TO_DATE(@INSPECTION_DATE_TMP, '%m/%d/%Y');

Made INSP_CATAGORY a 30 varchar instead of 10 varchar Dropped foreign key on Compliance.

20406 - 17654 = 2752 wells without WELL_API - Deleted from records.

Violation

Violation and Penalty data for PA come from the compliance downloads, and as such the original, unedited compliance CSV must be manipulated to provide violation and penalty data.

Run the following and export results to CSV such that excel index+match functions can give proper insp_id_fk values for violation SELECT insp_id_pk, well_id_fk FROM compliance WHERE insp_id_pk >1132437

Where 1132437 is a value less than your most recently updated column. Taking the last insp_id_pk - #rows inserted - 1 works.

Violaton: Delete everything except violation columns, date_resolved and well API prepend well_api with 37-, use index+match with previous query results to fill out insp_id_fk

Remove all entries from compliance file that do not have violation_ids.

1658 - 1118 = 540 violation entries with no associated well_API, deleted.

Prior to insertion, temporarily disable global foreign key checks with: SET GLOBAL FOREIGN_KEY_CHECKS=0; After insertion, restore checks with SET GLOBAL FOREIGN_KEY_CHECKS=1;

LOAD DATA INFILE 'violation.csv' INTO TABLE violation FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (INSP_ID_FK, VIOLATION_ID, @DATE_VIOLATION_TMP, VIOLATION_CODE, VIOLATION_TYPE, VIOLATION_COMMENT, @DATE_RESOLVED_TMP) SET DATE_VIOLATION = STR_TO_DATE(@DATE_VIOLATION_TMP, '%m/%d/%Y'), DATE_RESOLVED = STR_TO_DATE(@DATE_RESOLVED_TMP, '%m/%d/%Y')

Penalty

For index+match lookup (will need to be nested with previous index/match lookups)

SELECT violation_id_pk, insp_id_fk FROM violation WHERE violation_id_pk >419014

Remove all entries from original compliance file that do not have enforcement_ids. Some violation entries that had no well_API were removed.

LOAD DATA INFILE 'penalty.csv' INTO TABLE penalty FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (VIOLATION_ID_FK, ENF_ID, ENFORCEMENT_CODE, @DATE_EXECUTED_TMP, @ENF_DATE_FINAL_TMP, PEN_DATE_FINAL, PENALTY_FINAL_STATUS_DESC, PENALTY_AMOUNT, TOTAL_AMOUNT_COLLECTED) SET DATE_EXECUTED = STR_TO_DATE(@DATE_EXECUTED_TMP, '%m/%d/%Y'), ENF_DATE_FINAL = STR_TO_DATE(@ENF_DATE_FINAL_TMP, '%m/%d/%Y')

Chemical Data

After the creation / updating of Pennsylvania pages, chemical data from the fracfocus registry should be re-added to select pages. This is done by executing python chemicals.py US from the wikimanager folder. This will generated modified pages within staged_sources. Due to a shell bug, existing code within chemicals.py does not commit the newly generated pages to the wiki in their entirety (instead, they are truncated to chunks in multiples of 4096 bytes). As such, the standard procedure of calling commit_staged_sources.py on the staged_sources folder is required.

In order to expediate the process, the sql statement us_well_sql can be modified to only select wells from PA.