当前位置: 开发笔记 > 编程语言 > 正文

LDA模型实战常用知识点

作者：卖女孩的小方子 | 来源：互联网 | 2023-09-15 16:21

2019Stata&Python实证计量与爬虫分析暑期工作坊还有几天就要开始了。之前在公众号里分享过好几次LDA话题模型的，但考虑的问题都比较简单。这次我将分享在这个

2019 Stata & Python 实证计量与爬虫分析暑期工作坊还有几天就要开始了。之前在公众号里分享过好几次LDA话题模型的&＃xff0c;但考虑的问题都比较简单。这次我将分享在这个notebook中&＃xff0c;将会对以下问题进行实战&＃xff1a;
提取话题的关键词
gridsearch寻找最佳模型参数
可视化话题模型
预测新输入的文本的话题
如何查看话题的特征词组
如何获得每个话题的最重要的n个特征词

1.导入数据

这里我们使用的20newsgroups数据集

import pandas as pd df &＃61; pd.read_json(&＃39;newsgroups.json&＃39;) df.head()

640?wx_fmt&＃61;png

查看target_names有哪些类别

df.target_names.unique()

Run

array([&＃39;rec.autos&＃39;, &＃39;comp.sys.mac.hardware&＃39;, &＃39;rec.motorcycles&＃39;, &＃39;misc.forsale&＃39;, &＃39;comp.os.ms-windows.misc&＃39;, &＃39;alt.atheism&＃39;, &＃39;comp.graphics&＃39;, &＃39;rec.sport.baseball&＃39;, &＃39;rec.sport.hockey&＃39;, &＃39;sci.electronics&＃39;, &＃39;sci.space&＃39;, &＃39;talk.politics.misc&＃39;, &＃39;sci.med&＃39;, &＃39;talk.politics.mideast&＃39;, &＃39;soc.religion.christian&＃39;, &＃39;comp.windows.x&＃39;, &＃39;comp.sys.ibm.pc.hardware&＃39;, &＃39;talk.politics.guns&＃39;, &＃39;talk.religion.misc&＃39;, &＃39;sci.crypt&＃39;], dtype&＃61;object)

2.英文清洗数据

使用正则表达式去除邮件和换行等多余空白字符
使用gensim库的simple_preprocess分词&＃xff0c;得到词语列表
保留某些词性的词语 https://www.guru99.com/pos-tagging-chunking-nltk.html

注意&＃xff1a;

nltk和spacy安装配置比较麻烦&＃xff0c;可以看这篇文章。

自然语言处理库nltk、spacy安装及配置方法其中nltk语料库和spacy的英文模型均已放置在教程文件夹内~

import nltk import gensim from nltk import pos_tag import re from nltk.corpus import stopwords #导入spacy的模型 nlp &＃61; spacy.load(&＃39;en_core_web_sm&＃39;, disable&＃61;[&＃39;parser&＃39;, &＃39;ner&＃39;]) def clean_text(text, allowed_postags&＃61;[&＃39;NOUN&＃39;, &＃39;ADJ&＃39;, &＃39;VERB&＃39;, &＃39;ADV&＃39;]): text &＃61; re.sub(&＃39;\S*&＃64;\S*\s?&＃39;, &＃39;&＃39;, text) #去除邮件 text &＃61; re.sub(&＃39;\s&＃43;&＃39;, &＃39; &＃39;, text) #将连续空格、换行、制表符替换为空格 #deacc&＃61;True可以将某些非英文字母转化为英文字母&＃xff0c;例如 #"Šéf chomutovských komunistů dostal poštou bílý prášek"转化为 #u&＃39;Sef chomutovskych komunistu dostal postou bily prasek&＃39; words &＃61; gensim.utils.simple_preprocess(text, deacc&＃61;True) #可以在此处加入去停词操作 stpwords &＃61; stopwords.words(&＃39;english&＃39;) #保留词性为&＃39;NOUN&＃39;, &＃39;ADJ&＃39;, &＃39;VERB&＃39;, &＃39;ADV&＃39;词语 doc &＃61; nlp(&＃39; &＃39;.join(words)) text &＃61; " ".join([token.lemma_ if token.lemma_ not in [&＃39;-PRON-&＃39;] else &＃39;&＃39; for token in doc if token.pos_ in allowed_postags]) return text test &＃61; "From: lerxst&＃64;wam.umd.edu (where&＃39;s my thing)\nSubject: WHAT car is this!?\nNntp-Posting-Host: rac3.wam.umd.edu\nOrganization: University of Maryland, College Park\nLines: 15\n\n I was wondering if anyone out there could enlighten me on this car I saw\nthe other day. It was a 2-door sports car, looked to be from the late 60s/\nearly 70s. It was called a Bricklin. The doors were really small. In addition,\nthe front bumper was separate from the rest of the body. This is \nall I know. If anyone can tellme a model name, engine specs, years\nof production, where this car is made, history, or whatever info you\nhave on this funky looking car, please e-mail.\n\nThanks,\n- IL\n ---- brought to you by your neighborhood Lerxst ----\n\n\n\n\n" clean_text(test)

Run

&＃39;where thing subject car be nntp post host rac wam umd edu organization university maryland college park line be wonder anyone out there could enlighten car see other day be door sport car look be late early be call bricklin door be really small addition front bumper be separate rest body be know anyone can tellme model name engine spec year production where car be make history info have funky look car mail thank bring neighborhood lerxst&＃39;

将将数据content列进行批处理&＃xff08;数据清洗clean_text&＃xff09;

df.content &＃61; df.content.apply(clean_text) df.head()

640?wx_fmt&＃61;png

3. 构建文档词频矩阵 document-word matrix

from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer #vectorizer &＃61; TfidfVectorizer(min_df&＃61;10)#单词至少出现在10个文档中 vectorizer &＃61; CountVectorizer(analyzer&＃61;&＃39;word&＃39;, min_df&＃61;10, # minimum reqd occurences of a word lowercase&＃61;True, # convert all words to lowercase token_pattern&＃61;&＃39;[a-zA-Z0-9]{3,}&＃39;, # num chars > 3 # max_features&＃61;50000, # max number of uniq words ) data_vectorized &＃61; vectorizer.fit_transform(df.content)

检查数据的稀疏性,

data_dense &＃61; data_vectorized.todense() # Compute Sparsicity &＃61; Percentage of Non-Zero cells print("Sparsicity: ", ((data_dense > 0).sum()/data_dense.size)*100, &＃39;%&＃39;)

Run

Sparsicity: 0.9138563473570427 %

4.构建LDA模型

使用sklearn库的LatentDirichletAllocation

from sklearn.decomposition import LatentDirichletAllocation # 构建LDA话题模型 lda_model &＃61; LatentDirichletAllocation(n_components&＃61;20) # 话题数 lda_output &＃61; lda_model.fit_transform(data_vectorized)

模型表现

# 越高越好 print(lda_model.score(data_vectorized)) #训练好的模型的参数 print(lda_model.get_params())

Run

-11868684.751381714 {&＃39;batch_size&＃39;: 128, &＃39;doc_topic_prior&＃39;: None, &＃39;evaluate_every&＃39;: -1, &＃39;learning_decay&＃39;: 0.7, &＃39;learning_method&＃39;: &＃39;batch&＃39;, &＃39;learning_offset&＃39;: 10.0, &＃39;max_doc_update_iter&＃39;: 100, &＃39;max_iter&＃39;: 10, &＃39;mean_change_tol&＃39;: 0.001, &＃39;n_components&＃39;: 20, &＃39;n_jobs&＃39;: None, &＃39;perp_tol&＃39;: 0.1, &＃39;random_state&＃39;: None, &＃39;topic_word_prior&＃39;: None, &＃39;total_samples&＃39;: 1000000.0, &＃39;verbose&＃39;: 0}

5. 如何找到最佳的话题数

LatentDirichletAllocation中有很多参数&＃xff0c;调整参数会使得结果发生变化。为了训练出更好的模型&＃xff0c;这里我们使用ncomponents和learningdecay这两个参数作为示范&＃xff0c;设置这两个参数可能的取值范围。

运行时间半个小时~

from sklearn.model_selection import GridSearchCV # 设置参数搜寻的范围 search_params &＃61; {&＃39;n_components&＃39;: [10, 15, 20, 25, 30], &＃39;learning_decay&＃39;: [.5, .7, .9]} # 初始化LDA模型 lda &＃61; LatentDirichletAllocation() # 初始化GridSearchCV model &＃61; GridSearchCV(lda, param_grid&＃61;search_params) # 训练LDA模型 model.fit(data_vectorized)

查看模型参数

model.cv_results_

Run

{&＃39;mean_fit_time&＃39;: array([76.23844155, 78.47619971, 75.65877469, 92.04278994, 92.47375035, 70.50102162, 77.17208759, 77.42245611, 78.51173854, 80.36060111, 64.35273759, 80.74369526, 78.33191927, 97.60522366, 91.52556197]), &＃39;std_fit_time&＃39;: array([ 1.90773724, 6.00546298, 2.90480388, 10.82104708, 2.15837996, 0.91492716, 1.78299082, 0.99124146, 0.88202007, 2.52887488, 1.42895102, 3.4966494 , 4.10921772, 8.57965772, 2.97772162]), &＃39;mean_score_time&＃39;: array([3.03948617, 3.12327973, 3.17385236, 4.1181256 , 4.14796472, 2.80464379, 3.00497603, 3.18396346, 3.29176935, 3.34573205, 2.60685007, 3.05136299, 3.39874609, 3.77345729, 4.19327569]), &＃39;std_score_time&＃39;: array([0.29957093, 0.0616576 , 0.13170509, 0.4152717 , 0.58759639, 0.05777709, 0.17347846, 0.06664403, 0.13021069, 0.12982755, 0.06256295, 0.13255927, 0.43057235, 0.29978059, 0.44248399]), &＃39;param_learning_decay&＃39;: masked_array(data&＃61;[0.5, 0.5, 0.5, 0.5, 0.5, 0.7, 0.7, 0.7, 0.7, 0.7, 0.9, 0.9, 0.9, 0.9, 0.9], mask&＃61;[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], fill_value&＃61;&＃39;?&＃39;, dtype&＃61;object), &＃39;param_n_components&＃39;: masked_array(data&＃61;[10, 15, 20, 25, 30, 10, 15, 20, 25, 30, 10, 15, 20, 25, 30], mask&＃61;[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False], fill_value&＃61;&＃39;?&＃39;, dtype&＃61;object), &＃39;params&＃39;: [{&＃39;learning_decay&＃39;: 0.5, &＃39;n_components&＃39;: 10}, {&＃39;learning_decay&＃39;: 0.5, &＃39;n_components&＃39;: 15}, {&＃39;learning_decay&＃39;: 0.5, &＃39;n_components&＃39;: 20}, {&＃39;learning_decay&＃39;: 0.5, &＃39;n_components&＃39;: 25}, {&＃39;learning_decay&＃39;: 0.5, &＃39;n_components&＃39;: 30}, {&＃39;learning_decay&＃39;: 0.7, &＃39;n_components&＃39;: 10}, {&＃39;learning_decay&＃39;: 0.7, &＃39;n_components&＃39;: 15}, {&＃39;learning_decay&＃39;: 0.7, &＃39;n_components&＃39;: 20}, {&＃39;learning_decay&＃39;: 0.7, &＃39;n_components&＃39;: 25}, {&＃39;learning_decay&＃39;: 0.7, &＃39;n_components&＃39;: 30}, {&＃39;learning_decay&＃39;: 0.9, &＃39;n_components&＃39;: 10}, {&＃39;learning_decay&＃39;: 0.9, &＃39;n_components&＃39;: 15}, {&＃39;learning_decay&＃39;: 0.9, &＃39;n_components&＃39;: 20}, {&＃39;learning_decay&＃39;: 0.9, &＃39;n_components&＃39;: 25}, {&＃39;learning_decay&＃39;: 0.9, &＃39;n_components&＃39;: 30}], &＃39;split0_test_score&＃39;: array([-3874856.42190824, -3881092.28265286, -3905854.25463761, -3933237.60526826, -3945083.8541135 , -3873412.75021688, -3873882.90565526, -3911751.31895979, -3921171.68942096, -3949413.2598192 , -3876577.95159756, -3886340.65539402, -3896362.39547871, -3926181.21965185, -3950533.84046263]), &＃39;split1_test_score&＃39;: array([-4272638.34477011, -4294980.87988645, -4310841.4440567 , -4336244.55854965, -4341014.91687451, -4279229.66282939, -4302326.23456232, -4317599.83998105, -4325020.1483235 , -4338663.90026249, -4284095.2173055 , -4294941.56802545, -4299746.08581904, -4331262.03558289, -4338027.82208097]), &＃39;split2_test_score&＃39;: array([-4200870.80494405, -4219318.82663835, -4222122.82436968, -4237003.85511169, -4258352.71194228, -4192824.54480934, -4200329.40329793, -4231613.93138699, -4258255.99302186, -4270014.58888107, -4199499.64459735, -4209918.86599275, -4230265.99859102, -4247913.06952193, -4256046.3237088 ]), &＃39;mean_test_score&＃39;: array([-4116100.53270373, -4131775.17089196, -4146251.59136724, -4168807.85000785, -4181462.93317874, -4115134.28591336, -4125490.60725673, -4153633.64919084, -4168127.44754368, -4186009.66931221, -4120036.0842904 , -4130378.79165891, -4142103.10465406, -4168430.69488042, -4181515.57804474]), &＃39;std_test_score&＃39;: array([173105.26046897, 179953.68165447, 173824.10245002, 171450.68036995, 170539.38663682, 174546.8275931 , 182743.94823856, 174623.71594324, 176761.14575071, 169651.81366214, 175603.01769822, 176039.50084949, 176087.37700361, 174665.17839821, 166743.56843518]), &＃39;rank_test_score&＃39;: array([ 2, 6, 8, 12, 13, 1, 4, 9, 10, 15, 3, 5, 7, 11, 14], dtype&＃61;int32)}

输出参数搜寻出模型的效果并将其可视化

import matplotlib.pyplot as plt # Get Log Likelyhoods from Grid Search Output n_topics &＃61; [10, 15, 20, 25, 30] log_likelyhoods_5 &＃61; model.cv_results_[&＃39;mean_test_score&＃39;][model.cv_results_[&＃39;param_learning_decay&＃39;]&＃61;&＃61;0.5] log_likelyhoods_7 &＃61; model.cv_results_[&＃39;mean_test_score&＃39;][model.cv_results_[&＃39;param_learning_decay&＃39;]&＃61;&＃61;0.7] log_likelyhoods_9 &＃61; model.cv_results_[&＃39;mean_test_score&＃39;][model.cv_results_[&＃39;param_learning_decay&＃39;]&＃61;&＃61;0.9] # Show graph plt.figure(figsize&＃61;(12, 8)) plt.plot(n_topics, log_likelyhoods_5, label&＃61;&＃39;0.5&＃39;) plt.plot(n_topics, log_likelyhoods_7, label&＃61;&＃39;0.7&＃39;) plt.plot(n_topics, log_likelyhoods_9, label&＃61;&＃39;0.9&＃39;) plt.title("Choosing Optimal LDA Model") plt.xlabel("Num Topics") plt.ylabel("Log Likelyhood Scores") plt.legend(title&＃61;&＃39;Learning decay&＃39;, loc&＃61;&＃39;best&＃39;) plt.show()

640?wx_fmt&＃61;png

#最佳话题模型 best_lda_model &＃61; model.best_estimator_ print("Best Model&＃39;s Params: ", model.best_params_) print("Best Log Likelihood Score: ", model.best_score_)

Run

Best Model&＃39;s Params: {&＃39;learning_decay&＃39;: 0.7, &＃39;n_components&＃39;: 10} Best Log Likelihood Score: -4115134.285913357

6. 如何查看每个文档的话题信息

LDA会给每个文档分配一个话题分布&＃xff0c;其中概率最大的话题最能代表该文档

import numpy as np # 构建文档-词频矩阵 lda_output &＃61; best_lda_model.transform(data_vectorized) # 列名 topicnames &＃61; ["Topic" &＃43; str(i) for i in range(best_lda_model.n_components)] # 行索引名 docnames &＃61; ["Doc" &＃43; str(i) for i in range(len(df.content))] # 转化为pd.DataFrame df_document_topic &＃61; pd.DataFrame(np.round(lda_output, 2), columns&＃61;topicnames, index&＃61;docnames) # Get dominant topic for each document dominant_topic &＃61; np.argmax(df_document_topic.values, axis&＃61;1) df_document_topic[&＃39;dominant_topic&＃39;] &＃61; dominant_topic # Styling def color_green(val): color &＃61; &＃39;green&＃39; if val > .1 else &＃39;black&＃39; return &＃39;color: {col}&＃39;.format(col&＃61;color) def make_bold(val): weight &＃61; 700 if val > .1 else 400 return &＃39;font-weight: {weight}&＃39;.format(weight&＃61;weight) # Apply Style df_document_topics &＃61; df_document_topic.sample(10).style.applymap(color_green).applymap(make_bold) df_document_topics

640?wx_fmt&＃61;png

查看话题分布情况

df_topic_distribution &＃61; df_document_topic[&＃39;dominant_topic&＃39;].value_counts().reset_index(name&＃61;"Num Documents") df_topic_distribution.columns &＃61; [&＃39;Topic Num&＃39;, &＃39;Num Documents&＃39;] df_topic_distribution

640?wx_fmt&＃61;png

7.如何可视化LDA

pyLDAvis可视化话题

import pyLDAvis import pyLDAvis.sklearn #在notebook中显示 pyLDAvis.enable_notebook() panel &＃61; pyLDAvis.sklearn.prepare(best_lda_model, #训练好的lda模型 data_vectorized,#训练库语料的词语特征空间&＃xff08;即Tfidfvecterizer或者CounterVecterizer&＃xff09; vectorizer) panel

640?wx_fmt&＃61;png

由于网络问题&＃xff0c;这里插不了gif动图&＃xff0c;我放之前的文章链接&＃xff0c;大家可以看看可视化效果。手把手教你学会LDA话题模型可视化pyLDAvis库

8. 如何查看话题的特征词组

每个话题都是由带有权重的词组进行表征&＃xff0c;是一个二维空间

# 话题-关键词矩阵&＃xff08;Topic-Keyword Matrix&＃xff09; df_topic_keywords &＃61; pd.DataFrame(best_lda_model.components_) # 重新分配dataframe中的列名和行索引名 df_topic_keywords.columns &＃61; vectorizer.get_feature_names() #训练集的词语空间的词表 df_topic_keywords.index &＃61; topicnames df_topic_keywords

640?wx_fmt&＃61;png

9.如何获得每个话题的最重要的n个特征词

# 显示每个话题最重要的n个词语 def show_topics(vectorizer&＃61;vectorizer, lda_model&＃61;lda_model, top_n&＃61;20): keywords &＃61; np.array(vectorizer.get_feature_names()) topic_keywords &＃61; [] #话题-词语权重矩阵 for topic_weights in lda_model.components_: #获得权重最大的top_n词语的权重向量 top_keyword_locs &＃61; (-topic_weights).argsort()[:top_n] #在keywords中找到对于的关键词 topic_keywords.append(keywords.take(top_keyword_locs)) return topic_keywords topic_keywords &＃61; show_topics(vectorizer&＃61;vectorizer, lda_model&＃61;best_lda_model, top_n&＃61;10) #最重要的10个词语 df_topic_keywords &＃61; pd.DataFrame(topic_keywords) df_topic_keywords.columns &＃61; [&＃39;Word &＃39;&＃43;str(i) for i in range(df_topic_keywords.shape[1])] df_topic_keywords.index &＃61; [&＃39;Topic &＃39;&＃43;str(i) for i in range(df_topic_keywords.shape[0])] df_topic_keywords

640?wx_fmt&＃61;png

10. 如何对新文本进行话题预测

给训练好的模型输入新文本&＃xff0c;预测该文本的话题

# Define function to predict topic for a given text document. #nlp &＃61; spacy.load(&＃39;en&＃39;, disable&＃61;[&＃39;parser&＃39;, &＃39;ner&＃39;]) def predict_topic(texts, nlp&＃61;nlp): #清洗数据&＃xff0c;如提出空格、邮箱、剔除无意义的词语、保留信息量比较大的词性 cleaned_texts &＃61; [] for text in texts: cleaned_texts.append(clean_text(text)) doc_term_matrix &＃61; vectorizer.transform(cleaned_texts) #LDA transform topic_term_prob_matrix &＃61; best_lda_model.transform(doc_term_matrix) #话题 topic_index &＃61; np.argmax(topic_term_prob_matrix) topic_word &＃61; df_topic_keywords.iloc[topic_index, :].values.tolist() return topic_index, topic_word, topic_term_prob_matrix #预测 mytext &＃61; ["Some text about christianity and bible"] topic_index, topic_word, topic_term_prob_matrix &＃61; predict_topic(mytext) print("该文本的所属的话题是Topic",topic_index) print("该话题的特征词 ", topic_word) print("特征词的权重分布情况 ", topic_term_prob_matrix)

Run

该文本的所属的话题是Topic 5 该话题的特征词 [&＃39;not&＃39;, &＃39;have&＃39;, &＃39;max&＃39;, &＃39;god&＃39;, &＃39;say&＃39;, &＃39;can&＃39;, &＃39;there&＃39;, &＃39;write&＃39;, &＃39;christian&＃39;, &＃39;would&＃39;] 特征词的权重分布情况 [[0.02500225 0.025 0.02500547 0.02500543 0.02500001 0.7749855 0.02500082 0.02500052 0.025 0.025 ]]

推荐阅读

import
YOLOv7基于自己的数据集从零构建模型完整训练、推理计算超详细教程

本文介绍了关于人工智能、神经网络和深度学习的知识点，并提供了YOLOv7基于自己的数据集从零构建模型完整训练、推理计算的详细教程。文章还提到了郑州最低生活保障的话题。对于从事目标检测任务的人来说，YOLO是一个熟悉的模型。文章还提到了yolov4和yolov6的相关内容，以及选择模型的优化思路。 ... [详细]

蜡笔小新 2023-12-14 18:28:01
import
Python爬虫中使用正则表达式的方法和注意事项

本文介绍了在Python爬虫中使用正则表达式的方法和注意事项。首先解释了爬虫的四个主要步骤，并强调了正则表达式在数据处理中的重要性。然后详细介绍了正则表达式的概念和用法，包括检索、替换和过滤文本的功能。同时提到了re模块是Python内置的用于处理正则表达式的模块，并给出了使用正则表达式时需要注意的特殊字符转义和原始字符串的用法。通过本文的学习，读者可以掌握在Python爬虫中使用正则表达式的技巧和方法。 ... [详细]

蜡笔小新 2023-12-12 11:51:07
search
python3 nmap函数简介及使用方法

本文介绍了python3 nmap函数的简介及使用方法，python-nmap是一个使用nmap进行端口扫描的python库，它可以生成nmap扫描报告，并帮助系统管理员进行自动化扫描任务和生成报告。同时，它也支持nmap脚本输出。文章详细介绍了python-nmap的几个py文件的功能和用途，包括__init__.py、nmap.py和test.py。__init__.py主要导入基本信息，nmap.py用于调用nmap的功能进行扫描，test.py用于测试是否可以利用nmap的扫描功能。 ... [详细]

蜡笔小新 2023-12-10 12:15:27
import
开发笔记:加密&json&StringIO模块&BytesIO模块

篇首语：本文由编程笔记#小编为大家整理，主要介绍了加密&json&StringIO模块&BytesIO模块相关的知识，希望对你有一定的参考价值。一、加密加密 ... [详细]

蜡笔小新 2023-12-14 15:18:35
import
使用正则表达式爬取36Kr网站首页新闻的操作步骤和代码示例

本文介绍了使用正则表达式来爬取36Kr网站首页所有新闻的操作步骤和代码示例。通过访问网站、查找关键词、编写代码等步骤，可以获取到网站首页的新闻数据。代码示例使用Python编写，并使用正则表达式来提取所需的数据。详细的操作步骤和代码示例可以参考本文内容。 ... [详细]

蜡笔小新 2023-12-12 19:16:21
import
使用cacti监控mssql 2005运行资源情况的操作步骤

本文介绍了使用cacti监控mssql 2005运行资源情况的操作步骤，包括安装必要的工具和驱动，测试mssql的连接，配置监控脚本等。通过php连接mssql来获取SQL 2005性能计算器的值，实现对mssql的监控。详细的操作步骤和代码请参考附件。 ... [详细]

蜡笔小新 2023-12-12 13:57:58
import
Python自动提取文本中的时间（包含中文日期）及特殊时间识别方法

本文介绍了在处理不规则数据时如何使用Python自动提取文本中的时间日期，包括使用dateutil.parser模块统一日期字符串格式和使用datefinder模块提取日期。同时，还介绍了一段使用正则表达式的代码，可以支持中文日期和一些特殊的时间识别，例如'2012年12月12日'、'3小时前'、'在2012/12/13哈哈'等。 ... [详细]

蜡笔小新 2023-12-12 12:09:33
import
python限制递归次数（python最大公约数递归）

本文目录一览：1、python为什么要进行递归限制 ... [详细]

蜡笔小新 2023-12-11 17:39:02
import
OC学习笔记之@property和@synthesize

本文介绍了OC学习笔记中的@property和@synthesize，包括属性的定义和合成的使用方法。通过示例代码详细讲解了@property和@synthesize的作用和用法。 ... [详细]

蜡笔小新 2023-12-14 12:05:06
import
Python爬虫技术基础篇面向对象高级编程（中）的多重继承

本文介绍了Python爬虫技术基础篇面向对象高级编程（中）中的多重继承概念。通过继承，子类可以扩展父类的功能。文章以动物类层次的设计为例，讨论了按照不同分类方式设计类层次的复杂性和多重继承的优势。最后给出了哺乳动物和鸟类的设计示例，以及能跑、能飞、宠物类和非宠物类的增加对类数量的影响。 ... [详细]

蜡笔小新 2023-12-12 16:19:02
import
Python开源库和第三方包的常用框架及库

本文介绍了Python开源库和第三方包中常用的框架和库，包括Django、CubicWeb等。同时还整理了GitHub中最受欢迎的15个Python开源框架，涵盖了事件I/O、OLAP、Web开发、高性能网络通信、测试和爬虫等领域。 ... [详细]

蜡笔小新 2023-12-11 18:24:06
command
pack布局管理器的使用方法及注意事项

本文介绍了pack布局管理器在Perl/Tk中的使用方法及注意事项。通过调用pack()方法，可以控制部件在显示窗口中的位置和大小。同时，本文还提到了在使用pack布局管理器时，应注意将部件分组以便在水平和垂直方向上进行堆放。此外，还介绍了使用Frame部件或Toplevel部件来组织部件在窗口内的方法。最后，本文强调了在使用pack布局管理器时，应避免在中间切换到grid布局管理器，以免造成混乱。 ... [详细]

蜡笔小新 2023-12-10 16:03:24
jsp
Foundation框架中常用结构体和类的介绍

本文介绍了Foundation框架中一些常用的结构体和类，包括表示范围作用的NSRange结构体的创建方式，处理几何图形的数据类型NSPoint和NSSize，以及由点和大小复合而成的矩形数据类型NSRect。同时还介绍了创建这些数据类型的方法，以及字符串类NSString的使用方法。 ... [详细]

蜡笔小新 2023-12-09 17:56:07
jsp
Python字典视图对象的示例和用法

本文介绍了Python字典视图对象的示例和用法。通过对示例代码的解释，展示了字典视图对象的基本操作和特点。字典视图对象可以通过迭代或转换为列表来获取字典的键或值。同时，字典视图对象也是动态的，可以反映字典的变化。通过学习字典视图对象的用法，可以更好地理解和处理字典数据。 ... [详细]

蜡笔小新 2023-12-09 09:14:13
import
Tkinter Frame容器grid布局并使用Scrollbar滚动原理

本文介绍了如何使用Tkinter实现Frame容器的grid布局，并通过Scrollbar实现滚动效果。通过将Canvas作为父容器，使用滚动Canvas来滚动Frame，实现了在Frame中添加多个按钮，并通过Scrollbar进行滚动。同时，还介绍了更新Frame大小和绑定滚动按钮的方法，以及配置Scrollbar的相关参数。 ... [详细]

蜡笔小新 2023-12-09 07:37:04

卖女孩的小方子

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章