栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

Python-WordCloud词云库初始化参数

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Python-WordCloud词云库初始化参数

  • 当前环境:Python3.10.0下去下载的WordCloud词云库
  • 初始化函数来源:WordCloud初始化函数__init__代码

__init__部分代码如下:

 def __init__(self, font_path=None, width=400, height=200, margin=2,
                 ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,
                 color_func=None, max_words=200, min_font_size=4,
                 stopwords=None, random_state=None, background_color='black',
                 max_font_size=None, font_step=1, mode="RGB",
                 relative_scaling='auto', regexp=None, collocations=True,
                 colormap=None, normalize_plurals=True, contour_width=0,
                 contour_color='black', repeat=False,
                 include_numbers=False, min_word_length=0, collocation_threshold=30):

其中,

  • font_path:String类型,格式可以是OTF 或TTF。Linux下默认是和WordCloud.py同目录下的DroidSansMono.ttf,如果没有该字体库,可以自定义添加。
  • width:int类型,画布的宽度:默认400
  • height:int类型,画布的高度:默认200
  • prefer_horizontal:float类型,词语水平方向排版出现的频率,默认 0.9 (所以词语垂直方向排版出现频率为 0.1 )
  • mask:默认为空。如果参数为空,则使用二维遮罩作为画布来绘制词云。如果 mask 非空,设置的宽高值将被忽略,遮罩形状被 mask 取代。除全白(#FFFFFF)的部分将不会绘制,其余部分会用于绘制词云。如:bg_pic = imread(‘读取一张图片.png’),背景图片的画布一定要设置为白色(#FFFFFF),然后显示的形状为不是白色的其他颜色。可以用ps工具将自己要显示的形状复制到一个纯白色的画布上再保存,就ok了。
  • contour_width:float类型,默认为0。如果mask不为空,且contour_width>0,则画布将会绘制contour_width大小的轮廓
  • contour_color:默认为‘black’。画布轮廓的颜色
  • scale:float类型,默认为1。按照比例进行放大画布,如设置为1.5,则长和宽都是原来画布的1.5倍。对于大型文字云图像,使用比例而不是更大的画布尺寸会明显更快,但可能会导致用词更粗糙。
  • min_font_size:int (default=4) ,显示的最小的字体大小
  • font_step:int (default=1),字体步长,如果步长大于1,会加快运算但是可能导致结果出现较大的误差
  • max_words:number (default=200),要显示的词的最大个数
  • stopwords:设置需要屏蔽的词,默认为空。如果设置,则为String集合
  • background_color:背景颜色(default=“black”)
  • max_font_size:int or None (default=None),显示的最大的字体大小。如果为None,则使用图像的高度。
  • mode:String类型,(default=“RGB”)。当参数为“RGBA”并且background_color不为空时,背景为透明
  • relative_scaling:float (default=‘auto’)。词频和字体大小的关联性。当relative_scaling=0时,只考虑单词等级。如果relative_scaling=1,那么频率加倍的单词,其大小也会加倍。如果考虑单词频率,而不仅仅是它们的排名,relative_scaling在0.5左右通常看起来不错。如果’auto’,它将被设置为0.5,除非repeat为true,在这种情况下,它将被设置为0
  • color_func:callable, default=None。生成新颜色的函数
  • regexp:string or None (optional),使用正则表达式分隔输入的文本。将输入文本拆分为process_text中的标记的正则表达式。如果指定None,则使用’ ’ r"w[w’]+" ’ '。如果使用generate_from_frequencies则忽略。
  • collocations:bool, default=True。是否包括两个词的搭配(发现词云的单词重复出现时,可修改其为False)
  • colormap:tring or matplotlib colormap, default=”viridis” #给每个单词随机分配颜色,若指定color_func,则忽略该方法
  • normalize_plurals:bool, default=True。是否删除单词的末尾’s’。如果为True,那么带结尾’s’的单词将被删除,其计数将添加到不带结尾’s’的版本中——除非单词以’ss’结尾。如果使用generate_from_frequencies则忽略。
  • repeat : bool, default=False。是否重复单词和短语,直到达到max_words或min_font_size。
  • include_numbers : bool, default=False。是否包含数字作为短语
  • min_word_length : int, default=0。一个单词必须包含的最小字母数。
  • collocation_threshold:int, default=30。Bigrams必须具有比该参数更大的Dunning似然搭配得分才能算作Bigrams。默认值30是任意的。
  • 以及和_init_一样处在wordcloud中的函数:
  • it_words(frequencies) :根据词频生成词云
    generate(text) :根据文本生成词云
    generate_from_frequencies(frequencies[, …]) :根据词频生成词云
    generate_from_text(text) :根据文本生成词云
    process_text(text) :将长文本分词并去除屏蔽词(此处指英语,中文分词还是需要自己用别的库先行实现,使用上面的 fit_words(frequencies) )
    recolor([random_state, color_func, colormap]) :对现有输出重新着色。重新上色会比重新生成整个词云快很多
    to_array() :转化为 numpy array
    to_file(filename) :输出到文件

附带__init__初始化代码(全函数):

     r"""Word cloud object for generating and drawing.

    Parameters
    ----------
    font_path : string
        Font path to the font that will be used (OTF or TTF).
        Defaults to DroidSansMono path on a Linux machine. If you are on
        another OS or don't have this font, you need to adjust this path.

    width : int (default=400)
        Width of the canvas.

    height : int (default=200)
        Height of the canvas.

    prefer_horizontal : float (default=0.90)
        The ratio of times to try horizontal fitting as opposed to vertical.
        If prefer_horizontal < 1, the algorithm will try rotating the word
        if it doesn't fit. (There is currently no built-in way to get only
        vertical words.)

    mask : nd-array or None (default=None)
        If not None, gives a binary mask on where to draw words. If mask is not
        None, width and height will be ignored and the shape of mask will be
        used instead. All white (#FF or #FFFFFF) entries will be considerd
        "masked out" while other entries will be free to draw on. [This
        changed in the most recent version!]

    contour_width: float (default=0)
        If mask is not None and contour_width > 0, draw the mask contour.

    contour_color: color value (default="black")
        Mask contour color.

    scale : float (default=1)
        Scaling between computation and drawing. For large word-cloud images,
        using scale instead of larger canvas size is significantly faster, but
        might lead to a coarser fit for the words.

    min_font_size : int (default=4)
        Smallest font size to use. Will stop when there is no more room in this
        size.

    font_step : int (default=1)
        Step size for the font. font_step > 1 might speed up computation but
        give a worse fit.

    max_words : number (default=200)
        The maximum number of words.

    stopwords : set of strings or None
        The words that will be eliminated. If None, the build-in STOPWORDS
        list will be used. Ignored if using generate_from_frequencies.

    background_color : color value (default="black")
        Background color for the word cloud image.

    max_font_size : int or None (default=None)
        Maximum font size for the largest word. If None, height of the image is
        used.

    mode : string (default="RGB")
        Transparent background will be generated when mode is "RGBA" and
        background_color is None.

    relative_scaling : float (default='auto')
        importance of relative word frequencies for font-size.  With
        relative_scaling=0, only word-ranks are considered.  With
        relative_scaling=1, a word that is twice as frequent will have twice
        the size.  If you want to consider the word frequencies and not only
        their rank, relative_scaling around .5 often looks good.
        If 'auto' it will be set to 0.5 unless repeat is true, in which
        case it will be set to 0.

        .. versionchanged: 2.0
            Default is now 'auto'.

    color_func : callable, default=None
        Callable with parameters word, font_size, position, orientation,
        font_path, random_state that returns a PIL color for each word.
        Overwrites "colormap".
        See colormap for specifying a matplotlib colormap instead.
        To create a word cloud with a single color, use
        ``color_func=lambda *args, **kwargs: "white"``.
        The single color can also be specified using RGB code. For example
        ``color_func=lambda *args, **kwargs: (255,0,0)`` sets color to red.

    regexp : string or None (optional)
        Regular expression to split the input text into tokens in process_text.
        If None is specified, ``r"w[w']+"`` is used. Ignored if using
        generate_from_frequencies.

    collocations : bool, default=True
        Whether to include collocations (bigrams) of two words. Ignored if using
        generate_from_frequencies.


        .. versionadded: 2.0

    colormap : string or matplotlib colormap, default="viridis"
        Matplotlib colormap to randomly draw colors from for each word.
        Ignored if "color_func" is specified.

        .. versionadded: 2.0

    normalize_plurals : bool, default=True
        Whether to remove trailing 's' from words. If True and a word
        appears with and without a trailing 's', the one with trailing 's'
        is removed and its counts are added to the version without
        trailing 's' -- unless the word ends with 'ss'. Ignored if using
        generate_from_frequencies.

    repeat : bool, default=False
        Whether to repeat words and phrases until max_words or min_font_size
        is reached.

    include_numbers : bool, default=False
        Whether to include numbers as phrases or not.

    min_word_length : int, default=0
        Minimum number of letters a word must have to be included.

    collocation_threshold: int, default=30
        Bigrams must have a Dunning likelihood collocation score greater than this
        parameter to be counted as bigrams. Default of 30 is arbitrary.

        See Manning, C.D., Manning, C.D. and Schütze, H., 1999. Foundations of
        Statistical Natural Language Processing. MIT press, p. 162
        https://nlp.stanford.edu/fsnlp/promo/colloc.pdf#page=22

    Attributes
    ----------
    ``words_`` : dict of string to float
        Word tokens with associated frequency.

        .. versionchanged: 2.0
            ``words_`` is now a dictionary

    ``layout_`` : list of tuples (string, int, (int, int), int, color))
        Encodes the fitted word cloud. Encodes for each word the string, font
        size, position, orientation and color.

    Notes
    -----
    Larger canvases with make the code significantly slower. If you need a
    large word cloud, try a lower canvas size, and set the scale parameter.

    The algorithm might give more weight to the ranking of the words
    than their actual frequencies, depending on the ``max_font_size`` and the
    scaling heuristic.
    """
    def __init__(self, font_path=None, width=400, height=200, margin=2,
                 ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,
                 color_func=None, max_words=200, min_font_size=4,
                 stopwords=None, random_state=None, background_color='black',
                 max_font_size=None, font_step=1, mode="RGB",
                 relative_scaling='auto', regexp=None, collocations=True,
                 colormap=None, normalize_plurals=True, contour_width=0,
                 contour_color='black', repeat=False,
                 include_numbers=False, min_word_length=0, collocation_threshold=30):
        if font_path is None:
            font_path = FONT_PATH
        if color_func is None and colormap is None:
            version = matplotlib.__version__
            if version[0] < "2" and version[2] < "5":
                colormap = "hsv"
            else:
                colormap = "viridis"
        self.colormap = colormap
        self.collocations = collocations
        self.font_path = font_path
        self.width = width
        self.height = height
        self.margin = margin
        self.prefer_horizontal = prefer_horizontal
        self.mask = mask
        self.contour_color = contour_color
        self.contour_width = contour_width
        self.scale = scale
        self.color_func = color_func or colormap_color_func(colormap)
        self.max_words = max_words
        self.stopwords = stopwords if stopwords is not None else STOPWORDS
        self.min_font_size = min_font_size
        self.font_step = font_step
        self.regexp = regexp
        if isinstance(random_state, int):
            random_state = Random(random_state)
        self.random_state = random_state
        self.background_color = background_color
        self.max_font_size = max_font_size
        self.mode = mode

        if relative_scaling == "auto":
            if repeat:
                relative_scaling = 0
            else:
                relative_scaling = .5

        if relative_scaling < 0 or relative_scaling > 1:
            raise ValueError("relative_scaling needs to be "
                             "between 0 and 1, got %f." % relative_scaling)
        self.relative_scaling = relative_scaling
        if ranks_only is not None:
            warnings.warn("ranks_only is deprecated and will be removed as"
                          " it had no effect. Look into relative_scaling.",
                          DeprecationWarning)
        self.normalize_plurals = normalize_plurals
        self.repeat = repeat
        self.include_numbers = include_numbers
        self.min_word_length = min_word_length
        self.collocation_threshold = collocation_threshold
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/619218.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号