If you find this page useful and would like to be notified of changes made to this page, start by inputting your email below.
powered by ChangeDetection
CSVs are copied to the server. Their group and ownership are changed to 'mysql' using the chgrp and chown commands. They are then moved to /var/lib/mysql/wellsdb_dev/ such that they can be read by mysql.
CSVs should be checked for " \n \ and other special characters prior to being uploaded to the server. These special characters can cause the files to be processed incorrectly, which leads to data going into the wrong columns or not being uploaded at all.
Always make backups prior to destructive operations.
A list of well APIs with some information from production / waste was compiled and added to the well table. Only wells that do not already exist are added. Additional fields in well are to be updated using the well update script.
LOAD DATA INFILE 'wells.csv' IGNORE
INTO TABLE well FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n' IGNORE 1 ROWS (WELL_API, WELL_API_COUNTY_ID, OPERATOR_NAME, OPERATOR_ID, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, UNCONVENTIONAL)
[Added 676 rows]
All previous Pennsylvania production information was deleted. All production data was downloaded from the PADEP site with date range from 01/01/1800 to 05/26/2015.
An excel macro has been written and stored on google drive in order to expedite pre-processing of the production CSV from the PADEP website. Well APIs were created by prepending 37- to Well Permit # values.
LOAD DATA INFILE 'mergedProduction.csv' IGNORE
INTO TABLE production FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n' IGNORE 1 ROWS (WELL_API, WELL_API_COUNTY_ID, PERIOD_ID, PRODUCTION_INDICATOR, WELL_STATUS, FARM_NAME, @SPUD_DATE_TMP, GAS_QUANTITY_MCF, GAS_PRODUCTION_DAYS, CONDENSATE_QUANTITY_BBL, CONDENSATE_PRODUCTION_DAYS, OIL_QUANTITY_BBL, OIL_PRODUCTION_DAYS, AVERAGED, OPERATOR_NAME, OPERATOR_ID, WELL_COUNTY, WELL_MUNICIPALITY, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, UNCONVENTIONAL, CONFIGURATION, HOME_USE, REPORTING_PERIOD, COMMENT_REASON, COMMENT_TEXT,DB_COMMENTS, WELL_COUNTRY, WELL_STATE_CODE)
SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');
1143915 rows deleted.
1485575 rows inserted. ( Query took 125.4058 sec )
Like production, all previous waste information was removed to be replaced with the most recent information provided by PADEP, with date range from 01/01/1800 to 05/26/2015. Preprocessing necessary:
Insert WELL_API in leftmost column Append "37-" to Well Permit # values to obtain WELL_API values. Insert WELL_STATE_CODE after column P Insert COUNTRY after WELL_STATE_CODE No DB comments No SPUD_DATE_TEXT
LOAD DATA INFILE 'WasteExport2014-2.csv'
INTO TABLE waste
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(WELL_API, WELL_API_COUNTY_ID, PERIOD_ID, WELL_STATUS, FARM_NAME, WELL_ID, @SPUD_DATE_TMP, WASTE_TYPE, WASTE_QUANTITY, UNITS, DISPOSAL_METHOD, AVERAGED, OPERATOR_NAME, OPERATOR_OGO, WELL_COUNTY, WELL_MUNICIPALITY, WELL_STATE_CODE, WELL_COUNTRY, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, UNCONVENTIONAL, CONFIGURATION, HOME_USE, WASTE_FACILITY_PERMIT_ID, WASTE_FACILITY_NAME, FACILITY_ADDRESS_1, FACILITY_ADDRESS_2, FACILITY_CITY, FACILITY_STATE, FACILITY_ZIP_CODE, FACILITY_PHONE, FACILITY_LATITUDE, FACILITY_LONGITUDE, REPORTING_PERIOD, COMMENT_REASON, COMMENT_TEXT) SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');
CSVs for Compliance, Permits, and SPUD dates were obtained from http://www.portal.state.pa.us/portal/server.pt/community/oil_and_gas_reports/20297 using their respectively linked utilities. Default settings for all categories were appropriate except for "inspections with violations only" for compliance, which had to be set to "no" in order to properly obtain all compliance records.
Only records past 2014/10/13 were used to avoid data duplication.
Need to prepend "37-" to all entries in WELL_API
WELL_API can be cloned without "37-" for WELL_API_COUNTY_ID
Delete REGION column
Change WELL_CODE_DESC to WELL_TYPE
Delete WELL_STATUS
Add State Code and Country Column before FARM_NAME
No Farm Well NO
LOAD DATA INFILE 'cleanedSPUD.csv'
INTO TABLE spud FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@SPUD_DATE_TMP, WELL_API, WELL_API_COUNTY_ID, OPERATOR_OGO, OPERATOR_NAME, WELL_COUNTY, WELL_MUNICIPALITY, WELL_STATE_CODE, WELL_COUNTRY, FARM_NAME, WELL_TYPE, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, CONFIGURATION, UNCONVENTIONAL) SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');
Grabbed all permits from 2014-10-09 to 2015-05-27
delete REGION
Insert column before WELL_API, prepend with "37-" with WELL_API entries. Old
Insert state code and country columns after municipality
Delete LATITUDE_DEGREES, LONGITUDE_DEGREES
No PERMIT_ISSUED_DATE, DB_COMMENTS, SPUD_DATE_TEXT, permit_application_date, permit_issue_date, permit_status, proposed_total_depth
LOAD DATA INFILE 'perms.csv'
INTO TABLE permit FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(WELL_COUNTY, WELL_MUNICIPALITY, WELL_STATE_CODE, WELL_COUNTRY, PERMIT_ISSUED_DATE, OPERATOR_NAME, APPLICATION_TYPE, AUTHORIZATION_DESCRIPTION, WELL_API, WELL_API_COUNTY_ID, UNCONVENTIONAL, CONFIGURATION, WELL_TYPE, FARM_NAME, @SPUD_DATE_TMP, LATITUDE_DECIMAL, LONGITUDE_DECIMAL, OPERATOR_OGO, OPERATOR_ADDRESS, OPERATOR_CITY, OPERATOR_STATE, OPERATOR_ZIP, AUTHORIZATION_ID, CLIENT_ID, PRIMARY_FACILITY_ID) SET SPUD_DATE = STR_TO_DATE(@SPUD_DATE_TMP, '%m/%d/%Y');
Use dates from 2014-9-10 to 2015-05-27
WELL_API created by appending 37- to API_PERMIT values
Creating entries for wells not in well table: Delete columns: INSPECTION_DATE, INSPECTION_TYPE, INSPECTION_ID, FARM_NAME, SITE_NAME, FACILITY_TYPE, INSPECTION_CATEGORY, REGION, COUNTY, MUNICIPALITY, INSPECTION_RESULT_DESCRIPTION, INSPECTION_CATEGORY, VIOLATION*, RESOLVED_DATE, RESOLUTION_REASON_CODE_DESCRIPTION, ENFORCEMENT_ID, , ENFORCEMENT_DESCRIPTION, DATE_EXECUTED, ENFORCEMENT_FINAL_DATE, ENFORCEMENT_STATUS, PENALTY*, TOTAL_AMOUNT_COLLECTED
Insert prepended 37-XXX column for WELL_API.
Perhaps data isn't sanitized properly? Will give another try with new data.
Table should be modified for Penalty / Fines.
Compliance has a autogenerated primary key INSP_ID_PK
This identifies the WELL_API with Violation (INSP_ID_FK) and has its own primary key.
We can export a list of all wells with the corresponding INSP_ID_PK and do a lookup to replace keys appropriately in excel.
Data from: http://www.depweb.state.pa.us/portal/server.pt/community/oil_and_gas_reports/20297
Ran ComplianceClean macro. Needed to edit file in notepad++ to manually remove all instances of " character (and some other special characters resembling quotations)
LOAD DATA INFILE 'complianceClean.csv'
INTO TABLE compliance
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(WELL_ID_FK, INSP_ID, @INSPECTION_DATE_TMP, INSPECTION_TYPE, @dummy, INSP_CATEGORY, INSPECTION_RESULT_DESC, INSP_COMMENT)
SET INSPECTION_DATE = STR_TO_DATE(@INSPECTION_DATE_TMP, '%m/%d/%Y');
Made INSP_CATAGORY a 30 varchar instead of 10 varchar Dropped foreign key on Compliance.
20406 - 17654 = 2752 wells without WELL_API - Deleted from records.
Violation and Penalty data for PA come from the compliance downloads, and as such the original, unedited compliance CSV must be manipulated to provide violation and penalty data.
Run the following and export results to CSV such that excel index+match functions can give proper insp_id_fk values for violation SELECT insp_id_pk, well_id_fk FROM compliance WHERE insp_id_pk >1132437
Where 1132437 is a value less than your most recently updated column. Taking the last insp_id_pk - #rows inserted - 1 works.
Violaton: Delete everything except violation columns, date_resolved and well API prepend well_api with 37-, use index+match with previous query results to fill out insp_id_fk
Remove all entries from compliance file that do not have violation_ids.
1658 - 1118 = 540 violation entries with no associated well_API, deleted.
Prior to insertion, temporarily disable global foreign key checks with: SET GLOBAL FOREIGN_KEY_CHECKS=0; After insertion, restore checks with SET GLOBAL FOREIGN_KEY_CHECKS=1;
LOAD DATA INFILE 'violation.csv' INTO TABLE violation FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (INSP_ID_FK, VIOLATION_ID, @DATE_VIOLATION_TMP, VIOLATION_CODE, VIOLATION_TYPE, VIOLATION_COMMENT, @DATE_RESOLVED_TMP) SET DATE_VIOLATION = STR_TO_DATE(@DATE_VIOLATION_TMP, '%m/%d/%Y'), DATE_RESOLVED = STR_TO_DATE(@DATE_RESOLVED_TMP, '%m/%d/%Y')
For index+match lookup (will need to be nested with previous index/match lookups)
SELECT violation_id_pk, insp_id_fk FROM violation WHERE violation_id_pk >419014
Remove all entries from original compliance file that do not have enforcement_ids. Some violation entries that had no well_API were removed.
LOAD DATA INFILE 'penalty.csv' INTO TABLE penalty FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (VIOLATION_ID_FK, ENF_ID, ENFORCEMENT_CODE, @DATE_EXECUTED_TMP, @ENF_DATE_FINAL_TMP, PEN_DATE_FINAL, PENALTY_FINAL_STATUS_DESC, PENALTY_AMOUNT, TOTAL_AMOUNT_COLLECTED) SET DATE_EXECUTED = STR_TO_DATE(@DATE_EXECUTED_TMP, '%m/%d/%Y'), ENF_DATE_FINAL = STR_TO_DATE(@ENF_DATE_FINAL_TMP, '%m/%d/%Y')
After the creation / updating of Pennsylvania pages, chemical data from the fracfocus registry should be re-added to select pages. This is done by executing python chemicals.py US from the wikimanager folder. This will generated modified pages within staged_sources. Due to a shell bug, existing code within chemicals.py does not commit the newly generated pages to the wiki in their entirety (instead, they are truncated to chunks in multiples of 4096 bytes). As such, the standard procedure of calling commit_staged_sources.py on the staged_sources folder is required.
In order to expediate the process, the sql statement us_well_sql can be modified to only select wells from PA.