作者:mobiledu2502890657 | 来源:互联网 | 2023-05-21 11:55
我已经设置了一个sqlite3数据库.我填写了一些数据(约400万条记录~1.2Gb的数据).然后我做一些查询(选择/删除/更新).问题是有时在插入后脚本会停止而不会出错.有时它会一直运行到最后.这些是我运行的查询类型:
from __future__ import print_function
import sqlite3
import csv
import os
import glob
import sys
import time
db = 'test.db'
cOnn= sqlite3.connect(db)
conn.text_factory = str # allows utf-8 data to be stored
c = conn.cursor()
i = 0
### traverse the directory and process each .csv file
##print("debug")
csvfile =('/home/Desktop/Untitled Folder/Crimes_-_2001_to_present.csv')
with open(csvfile, "rb") as f:
reader = csv.reader(f)
t = time.time()
header = True
for row in reader:
if header:
# gather column names from the first row of the csv
header = False
sql = "DROP TABLE IF EXISTS test_table"
c.execute(sql)
#print("debug 1")
sql = "CREATE TABLE test_table (ID INTEGER,FBI_Code INTEGER,Updated_On TEXT,District TEXT,Beat INTEGER,Primary_Type TEXT,Location BLOB,Latitude REAL,Arrest INTEGER,Domestic INTEGER,Longitude REAL,Community_Area INTEGER,Case_Number INTEGER,Block TEXT,Location_Description TEXT,Ward INTEGER,IUCR INTEGER,Year INTEGER, Date TEXT,Y_Coordinate INTEGER,Description TEXT,X_Coordinate INTEGER);"
c.execute(sql)
#print("debug 2")
insertsql = "INSERT INTO test_table VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
rowlen = len(row)
#print("debug 3")
else:
# skip lines that don't have the right number of columns
#print("debug 4")
#if len(row) == rowlen:
#print("debug 5")
try:
c.execute(insertsql, row)
except:
print("problem in row %d" % i)
print(row)
continue
# print("debug 6")
i +=1
if i == 1000:
conn.commit()
#### if i == 4000000:
#### break
## #print (row)
conn.commit()
print('\nTime for Insertions TOTAL~> \n')
print(float( time.time() -t ))
print('\nTime for Insertions per Query~> \n')
print(float( time.time() -t )/i)
del rows
rows = list()
print('\nTime for Selections ~> Domestic\n')
t = time.time()
c.execute("SELECT * FROM test_table WHERE Domestic == 'false'")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
del rows
rows = list()
print('\nTime for Selections ~> Arrests\n')
t = time.time()
c.execute("SELECT * FROM test_table WHERE Arrest == 'false'")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
del rows
rows = list()
print('\nTime for Selections ID~> \n')
t = time.time()
c.execute("SELECT * FROM test_table WHERE ID <9938614")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
del rows
rows = list()
print('\nTime for Selections ~> Primary_Type\n')
t = time.time()
c.execute("SELECT * FROM test_table WHERE Primary_Type == 'BATTERY'")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
del rows
rows = list()
print('\nTime for Selections Year~> \n')
t = time.time()
c.execute("SELECT * FROM test_table WHERE Year <= 2014")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
del rows
rows = []
print('\nTime for Updates ~> YEAR\n')
t = time.time()
c.execute("UPDATE test_table SET Year = '2016' WHERE Year == '2014'")
print(float( time.time() -t ))
print('\nTime for Selections Year~> \n')
t = time.time()
c.execute("SELECT * FROM test_table WHERE Year <= 2014")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
print('\nTime for DELETIONS ~> Domestic\n')
t = time.time()
c.execute("DELETE FROM test_table WHERE Domestic == 'false'")
rows = c.fetchall()
print(float( time.time() -t ))
print(len(rows))
del rows
c.close()
conn.close()
每次我重新分配行列表,因为在一些查询后我的内存不足.但我不认为这是问题(以防万一我使用del行并重新分配它,这样就慢了).在其中一些查询后,脚本停止没有错误,我无法弄清楚为什么,因为有时它运行正常.
编辑
我已经包含了上面的代码.问题是在我执行查询后插入部分后脚本终止而没有任何错误.
例如它直到这里:
......选择时间〜>逮捕
123.231
3928182
然后它终止.在第一种方法中,我没有删除列表,当我尝试重新声明列表时,Cython产生了核心转储错误.现在我删除然后声明列表Cython运行正常.我的问题是为什么python没有捕获任何异常?
在重新分配列表之后,垃圾收集器会清除垃圾数据(正如我从linux监视器中看到的那样)但它崩溃没有错误.而最令人讨厌的是,有时它会一直运行到最后.