Utils

java.lang.Object
- affective.core.Utils

```
public class Utils
extends java.lang.Object
```
Provides static functions for String processing.

Version:

$Revision: 2 $

Author:

Felipe Bravo-Marquez (fbravoma@waikato.ac.nz)

Constructor Summary

Constructors
Constructor Description

Utils()

Constructors
Constructor	Description
`Utils()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static it.unimi.dsi.fastutil.objects.Object2IntMap<java.lang.String>`	`calculateTermFreq(java.util.List<java.lang.String> tokens, java.lang.String prefix, boolean freqWeights)`	Calculates a vector of attributes from a list of tokens
`static java.util.List<java.lang.String>`	`calculateTokenNgram(java.util.List<java.lang.String> tokens, int n)`	Calculates token n-grams from a sequence of tokens.
`static java.util.List<java.lang.String>`	`clustList(java.util.List<java.lang.String> tokens, java.util.Map<java.lang.String,java.lang.String> dict)`	Calculates a sequence of word-clusters from a list of tokens and a dictionary.
`static java.util.List<java.lang.String>`	`extractCharNgram(java.lang.String content, int n)`	Calculates character n-grams from a String.
`static java.util.List<java.lang.String>`	`negateTokens(java.util.List<java.lang.String> tokens, java.util.Set<java.lang.String> set)`	Adds a negation prefix to the tokens that follow a negation word until the next punctuation mark.
`static java.util.List<java.lang.String>`	`tokenize(java.lang.String content, boolean toLowerCase, boolean standarizeUrlsUsers, boolean reduceRepeatedLetters, Tokenizer tokenizer, Stemmer stemmer, StopwordsHandler stop)`	Tokenizes a String

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- Utils
```
public Utils()
```

Method Detail

negateTokens
```
public static java.util.List<java.lang.String> negateTokens(java.util.List<java.lang.String> tokens,
                                                            java.util.Set<java.lang.String> set)
```
Adds a negation prefix to the tokens that follow a negation word until the next punctuation mark.

Parameters:

tokens - the list of tokens to negate

set - the set with the negated words to use

Returns:

the negated tokens

clustList

public static java.util.List<java.lang.String> clustList(java.util.List<java.lang.String> tokens,
                                                         java.util.Map<java.lang.String,java.lang.String> dict)

Calculates a sequence of word-clusters from a list of tokens and a dictionary.

Parameters:: tokens - the input tokens; dict - the dictionary with the word clusters
Returns:: a list of word-clusters

calculateTermFreq

public static it.unimi.dsi.fastutil.objects.Object2IntMap<java.lang.String> calculateTermFreq(java.util.List<java.lang.String> tokens,
                                                                                              java.lang.String prefix,
                                                                                              boolean freqWeights)

Calculates a vector of attributes from a list of tokens

Parameters:: tokens - the input tokens; prefix - the prefix of each vector attribute; freqWeights - true for considering term-frequency weights (booleans weights are used otherwise)
Returns:: an Object2IntMap object mapping the attributes to their values

calculateTokenNgram
```
public static java.util.List<java.lang.String> calculateTokenNgram(java.util.List<java.lang.String> tokens,
                                                                   int n)
```
Calculates token n-grams from a sequence of tokens.

Parameters:

tokens - the input tokens from which the word n-grams will be calculated

n - the size of the word n-gram

Returns:

a list with the word n-grams

extractCharNgram
```
public static java.util.List<java.lang.String> extractCharNgram(java.lang.String content,
                                                                int n)
```
Calculates character n-grams from a String.

Parameters:

content - the input String

n - the size of the character n-gram

Returns:

a list with the character n-grams

tokenize

public static java.util.List<java.lang.String> tokenize(java.lang.String content,
                                                        boolean toLowerCase,
                                                        boolean standarizeUrlsUsers,
                                                        boolean reduceRepeatedLetters,
                                                        Tokenizer tokenizer,
                                                        Stemmer stemmer,
                                                        StopwordsHandler stop)

Tokenizes a String

Parameters:: content - the content; toLowerCase - true for lowercasing the content; standarizeUrlsUsers - true for standarizing urls and users; reduceRepeatedLetters - true for reduing repeated letters; tokenizer - the tokenizer; stemmer - the stemmer; stop - the stopwords handler
Returns:: a list of tokens

Class Utils

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Utils

Method Detail

negateTokens

clustList

calculateTermFreq

calculateTokenNgram

extractCharNgram

tokenize