Big Data Play Ground for Engineers: Dump Twitter Stream into Kafka topic

Git:

A fully functional code base and use case examples are up and running.

Example Illustration
  • Data is of text nature, giving us opportunity to come up quite a few use cases
  • Geo locations are available
  • User information (masked to some level!) available which can be used for building social graph
  • Last but the least the infamous #hashtags

Creating Your Own Credentials for Twitter APIs

Twitter exposes some REST endpoints which can be accessed with the approved credentials through some programming language packages.

Twitter Stream with Python

There is no better alternative than Tweepy for Python.

twitter_stream = Stream(auth, TweetsListener(kafka_addr=self._kafka_addr, topic=kafka_topic, is_ai=is_ai))
# https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter
# https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters
twitter_stream.filter(track=keywords, languages=["en"])
  1. Tweets that are generic or false positive that resembles AI/ML/DL but not i.e false positive
  • Continuously listen to tweets even if there is network issue by reconnecting
self._kafka_producer = KafkaProducer(bootstrap_servers='localhost:9092')
self._kafka_producer.send("kafka_topic", data.encode('utf-8')).get(timeout=10)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store