Topics

Anahita Project

Anahita Project's Topics

Rastin Mehr

Rastin Mehr

May 20 2014

Testing the hashtag extraction script

Hello Anahita tribe,

So now we have a new type of node in Anahita called a Hashtag and all the other nodes which can be associated with a #hashtag are #hashtagables.

I have created the foundations for the com_hashtags and hashtagable behavior. So actors, mediums, and comments are hashtagable. Hashtags ought to be added to the body of the nodes to be extracted.

I have also developed a script that looks up all the instances of hashtag terms in the body of hashtagable nodes and creates actual hashtag records in the database. For now you can clone the Anahita fork from my account:

https://github.com/rmdstudio/anahita

and run the migration:

php anahita db:migrate:up

to migrate up. This script is going to take a while to run. To give you an idea, for the Anahitapolis data it took 6 hours to run the script. Depending on the size of your data, you'd need to allocate a lot of memory to your php. To do that increase the memory_limit to 512MB or more. I tested the Anahitapolis data with 1024MB of memory limit so far and currently testing with 512MB. Let's see how much lower I can go.

After the script is finished a lot of hashtag nodes and graphs are added to the Anahita which can be used for all kinds of fun stuff and that is what I am going to work on next.

To see the hashtag data you can use the following queries:

Number of hashtag nodes:

SELECT COUNT(*) FROM jos_anahita_nodes WHERE TYPE LIKE '%hashtag%;

Number of hashtagable nodes:

SELECT COUNT(*) FROM jos_anahita_nodes WHERE hashtag_ids != '';

Number of edges connecting hashtags and hashtagables:

SELECT COUNT(*) FROM jos_anahita_edges WHERE TYPE LIKE '%hashtag%;

And to get a list of top 10 hashtags in your Anahita installation:

SELECT id, name, alias, hashtagable_count FROM jos_anahita_nodes WHERE TYPE LIKE '%hashtag%' ORDER BY hashtagable_count DESC limit 210;

that's it for now. Please create a local Anahita instance, feed the data from your live Anahita to it, and then run the migration. Please let me know of any errors, bugs, or issues.

If you are using a table prefix other than jos then adjust the example queries accordingly.
Will do the pull request today !!
Andy Nash
Andy Nash
May 20 2014 Permalink
Cool stuff!

"all the instances of hashtag terms in the body of hashtagable nodes"

In this context does "hashtag term" mean something tagged with a hash already, is in a network where no one has been encouraged to do this yet there will be few if any tags yet?

Another question: in anticipation of this feature, and given some of the nodes in my network were created from other sources running software that already included tags, where these existed I added these to a dB field in the node record in case I could use them later.

Should it be possible to modify the code in your script that finds the hashtag in the body, so that it looks in this field instead, and then run the rest of the script as usual? Or is there more to it than that?
Unknown Person liked this
@Andy yes it means something with a hash already. We told people to use hashtag terms in the body of their posts, becasue we knew that somebody we could convert them to real hashtag nodes and graphs.

Any app that extends the actor, media, and comment classes should become hashtagable, because those base classes have the hashtagable behavior. 

You need to write custom migration scripts if you have integrated with other technologies.
Unknown Person liked this

Powered by Anahita