vocabulary - Tensorflow vocabularyprocessor -
i following wildml blog on text classification using tensorflow. not able understand purpose of max_document_length in code statement :
vocab_processor = learn.preprocessing.vocabularyprocessor(max_document_length)
also how can extract vocabulary vocab_processor
i have figured out how extract vocabulary vocabularyprocessor object. worked me.
import numpy np tensorflow.contrib import learn x_text = ['this cat','this must boy', 'this a dog'] max_document_length = max([len(x.split(" ")) x in x_text]) ## create vocabularyprocessor object, setting max lengh of documents. vocab_processor = learn.preprocessing.vocabularyprocessor(max_document_length) ## transform documents using vocabulary. x = np.array(list(vocab_processor.fit_transform(x_text))) ## extract word:id mapping object. vocab_dict = vocab_processor.vocabulary_._mapping ## sort vocabulary dictionary on basis of values(id). ## both statements perform same task. #sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1)) sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1]) ## treat id's index list , create list of words in ascending order of id's ## word id goes @ index of list. vocabulary = list(list(zip(*sorted_vocab))[0]) print(vocabulary) print(x)
Comments
Post a Comment